Исследование эпидемиологических данных COVID-19 с использованием вейвлет-анализа

Научная статья
  • Елистратов Степан Алексеевич0000-0002-7006-6879Институт системного программирования им. В. П. Иванникова, Москва, Российская Федерация; Институт океанологии им. П.П.Ширшова РАН, Москва, Российская Федерация; Научно-технологический университет Сириус, Сочи, Российская Федерация
https://doi.org/10.60797/BMED.2026.9.7
DOI:
https://doi.org/10.60797/BMED.2026.9.7
EDN:
EYCODA
Предложена:
15.04.2026
Принята:
03.06.2026
Опубликована:
26.06.2026
Выпуск: № 2 (9), 2026
Выпуск: № 2 (9), 2026
Правообладатель: авторы. Лицензия: Attribution 4.0 International (CC BY 4.0)
20
0
XML
PDF

Аннотация

В данной работе представлено всестороннее исследование применения вейвлет-анализа к реальным пандемическим данным — ежедневным новым случаям COVID-19 в России. Неотъемлемая сложность и многомасштабная природа динамики инфекционных заболеваний, усугубляемая такими факторами, как эволюция вируса и реакция населения, часто представляют собой вызов для эффективности традиционных эпидемиологических подходов. Вейвлет-анализ, благодаря своей уникальной способности разлагать сигналы на составляющие частоты, локализованные во времени, предлагает мощную альтернативу для выявления скрытых закономерностей и понимания явлений, происходящих на различных временных масштабах.

Наше исследование углубляется в изучение динамики заболеваемости COVID-19 в различных регионах России. Сравнивая региональную динамику на различных временных масштабах, мы стремимся выявить периоды, когда поведение является устойчиво схожим, тем самым обнажая лежащие в основе общие движущие силы или ответные реакции. Этот сравнительный анализ приводит к ключевому выводу относительно конкретных временных масштабов, на которых проявляются эти сходные паттерны поведения, и визуализации аномалий.

Результаты, полученные с помощью вейвлет-анализа, напрямую применимы к разработке, обобщению и масштабированию прогностических моделей, основанных на данных. Понимая временные масштабы, демонстрирующие надежное, межрегиональное сходство, мы можем создавать более эффективные и адаптируемые предиктивные модели. Модели, основанные на этих масштабно-специфических инсайтах, лучше подходят для широкого применения, уменьшая необходимость в обширной донастройке для различных географических областей и, в конечном итоге, повышая нашу способность к прогнозированию и управлению эпидемиями.

1. Introduction

The COVID-19 pandemic has presented an unprecedented global challenge, characterized by a continuous surge of new infections and the emergence of distinct viral variants. The sheer volume and granularity of data collected during this period — encompassing daily case counts, geographical spread, demographic impact, and mortality rates — provide a rich resource for understanding the intricate dynamics of infectious disease transmission

. However, the complexity inherent in these datasets, further amplified by the evolution of SARS-CoV-2 into various strains (including Alpha, Delta, Omicron ones), demands analytical tools capable of dissecting processes occurring across multiple temporal scales
.

Different viral strains exhibit varied transmissibility, virulence, and immune evasion properties, leading to the manifestation of epidemic processes at diverse scales. Short-term fluctuations might reflect localized outbreaks driven by a highly transmissible variant, while longer-term trends could be influenced by the introduction of new variants, the impact of public health interventions, or seasonal patterns

,
. Traditional statistical methods, while valuable, often struggle to adequately capture these multi-scale dynamics, making it challenging to disentangle the contributions of various factors and identify critical turning points in the epidemic trajectory.

This is where wavelet analysis

emerges as a particularly powerful and advantageous tool. Wavelet transforms offer a unique ability to decompose time-series data into its constituent frequencies while simultaneously preserving temporal localization. This allows for the identification of patterns and anomalies that may be masked by conventional approaches
. For the analysis of COVID-19 dynamics, wavelet analysis provides a means to multi-scale resolution, capable to distinguish between short-term fluctuations associated with specific outbreaks and slower, persistent trends indicative of broader epidemiological shifts, as well as identification and visualization of the abrupt changes and anomalies detectable due to the sensitivity of wavelet analysis to localized features in the data
.

This paper explores the application of wavelet analysis to detailed COVID-19 epidemiological data, demonstrating its efficiency in uncovering intricate patterns and providing a deeper understanding of the multi-scale nature of the pandemic, particularly as influenced by the evolution of its constituent viral strains.

2. Research methods and principles

The wavelet transform

is a mathematical tool that allows for the analysis of a signal in both the time and frequency domains simultaneously. Unlike the traditional Fourier analysis, which represents a signal as a sum of sinusoids of infinite duration, the wavelet transform uses short, wave-like functions called wavelets. These wavelets are localized in both time and frequency, meaning they have a finite duration and a limited frequency band.

The core idea behind the wavelet transform is to decompose a signal into a set of basis functions (wavelets) that are translated and scaled versions of a single “mother wavelet.” The output of the wavelet transform is a set of wavelet coefficients, or a field of continuous wavelet transform field. These coefficients represent the degree of similarity between the signal and the chosen wavelet at a particular scale (inversed frequency) and time location. A large coefficient indicates that the wavelet at that specific scale and time is a good representation of the signal at that point.

Continuous wavelet transform for a given signal

is defined by a formula:

formula

Here

is a mother wavelet which is required to be a continuous finite function with a finite spectrum, zero mean and finite
-norm. In the form given the wavelet transform represents the signal's localized temporal behaviour, where
denotes global time and
is time scale of the local process.

The key advantages of wavelet transform are time-frequency localization, multi-resolution analysis, adaptability for different signal types and successful detection of transient features even for purely-resolved digital sampling.

The wavelet transform is connected with a spectrogram

done with the windowed Fourier transform, a Fourier transform extension developed to analyze how the frequency content of a signal changes over time. It addresses a principle limitation of the standard Fourier transform, which provides frequency information for the entire signal duration but loses all temporal information. The method works by dividing the continuous signal into many small, overlapping segments A windowing function (e.g., Hanning, Hamming) is applied to each segment to smoothly taper its edges and reduce spectral leakage. Then, a standard Fourier transform is performed on each of these windowed segments.

formula

where

is a window function.

Based on the wavelet transform, wavelet scattering transform (WST)

represents a sophisticated evolution of wavelet analysis, designed to extract robust, invariant, and hierarchical features from complex signals. It addresses limitations of traditional methods by systematically decomposing a signal across multiple scales and translations, while simultaneously ensuring invariance to deformations and translations, which are common in real-world data.

The WST operates through a series of cascaded layers, each employing a wavelet transform followed by a non-linear operation. This layered structure allows for the progressive capture of increasingly complex signal properties.

At the foundation of the WST lies the zero-order scattering coefficient

. This is essentially a low-pass filtered version of the input signal. It is obtained by applying a smoothing filter (preliminary a Gaussian filter) or by averaging the wavelet coefficients over all scales and translations from the first wavelet transform represents the coarse, low-frequency component of the signal, essentially capturing its overall structure without fine details.

The first layer of the WST generates the first-order scattering coefficients (

). For each scale and translation of the initial wavelet transform, a set of coefficients is produced. These coefficients, which represent localized spectral information, are then subjected to a non-linear function (typically a logarithm or a similar compression) and then low-pass filtered. This process effectively captures information about the signal’s spectral content at various scales and orientations, but with a degree of translation invariance built in. coefficients thus represent features that describe the presence and intensity of specific frequency components at different locations, robust to minor shifts.

A key principle underpinning the WST’s ability to capture multi-scale information is the exponential expansion of scales. The wavelet filters are applied at scales that increase exponentially (or according to a geometric progression). This ensures that the decomposition covers a wide range of temporal or spatial resolutions, from very fine details to very coarse structures. This exponential expansion is crucial for capturing the full spectrum of underlying phenomena, from rapid fluctuations to long-term trends, which is particularly relevant when analyzing signals with diverse temporal characteristics, such as epidemiological data influenced by different viral strains operating at different timescales.

Subsequent layers of the WST build upon the information captured in the previous layers. For instance, the second-order scattering coefficients (

) are derived by applying a wavelet transform to the coefficients. This allows the WST to capture relationships and interactions between features detected in the first layer, providing information about the co-occurrence and combined patterns of spectral components. This hierarchical composition continues, creating a rich feature representation that effectively summarizes the signal’s structure across multiple levels of complexity and scales.

3. Main results

As a data, the new cases distributed by the regions of Russia will be used. An example is represented on Figure 1. The data has typical oscillations with a sharp peak in the beginning of 2022 corresponding the Omicron strain appearance.

Overall behaviour of new cases in Moscow

Overall behaviour of new cases in Moscow

example region

The data considered contains about 1200 counts. Despite the frequent records respectively other epidemiological data, they are not enough for the spectrogram of a good quality (Figure 2). The point is that the spectrogram is represened in "time-frequency" axis; the typical frequencies are respectively low, and the existing amount of data does not allow to resolve the spectrogram efficiently both in frequential and temporal localization. As a result, the spectral lines are heavily enwiden. Spectrogram are proper for more data (which is rare in epidemiology) or in orher fields of science (e.g. physics
).
Spectrogram for Moscow data

Figure 2 - Spectrogram for Moscow data

Wavelet scalogram is capable to solve this problem. Despite the maintaining the principle limitation that the diagram cannot be resolved in the both axes, it turns out to be more representative, partially because it is noted in time scales instead of frequencies, partially because the wavelet transform is more robust for abrupt spectral component appearance. On Figure 3, one can see the typical time scales of ~200 days, with the appearance of smaller scales indicating faster processes in the vicinity of Jan 2022 which is connected with Omicron strain. As it can be note, time scales on the wavelet scalogram are more comprehensible than in the spectrogram and visualizes anomalies.
Wavelet scalogram for Moscow

Figure 3 - Wavelet scalogram for Moscow

To compare the difference in the behaviour in different regions, the wavelet-coherence

was computed. As a measure of the closeness of the wavelets, corresponding the average closeness of the behaviours on the corresponding time scale, Pearson correlation coeffecient was used. The result of the comparison every region with the Moscow data as a reference is represented on Figure 4. It is seen that the main correspondence beween the regions behaviour is localized at the time scales of 110-210 days. The deep burgundy-colored line is the Moscow itself with the correlation 1 for all the scales, leaved on the diagram as a reference.

Wavelet-coherence for Moscow as refercnce region

Figure 4 - Wavelet-coherence for Moscow as refercnce region

As a deeper look, let us consider wavelet scattering transform (WST) made applied to our data. It is a powerful feature extraction technique that generates a hierarchical and invariant representation of signals. Its effectiveness hinges on a set of defining parameters and the specific nature of the coefficients
and
it produces. WST is governed by parameters such as
representing the total number of scales (or decomposition levels) to be analyzed, and
which dictates the number of wavelets used to cover each frequency octave. A larger
allows for the examination of features across a broader spectrum of resolutions, from fine to coarse, while a higher
provides finer discrimination within each frequency band. Together, these parameters define the grid of wavelet filters used, tailoring the analysis to the signal’s expected characteristics and the desired level of feature detail. It should be noted that scales investigated has an exponential expansion.

At the foundation of the WST lies the zero-order scattering coefficient

which essentially captures the signal’s overall, low-frequency component, acting as a smooth, low-pass filtered version of the input. It represents the global, slow-varying aspects of the signal and is highly invariant to minor translations and deformations because it averages out localized variations. Figure 5 represents the action of
in comparison with the other smoothing methods, including window smoothing (SMA) with 14-days Hanning window and EMD-based smoothing
. WST's
turns out to be the heaviest filter, partially losing the information on the sharp-processes intervals.

Illustration of S0 work and different smoothings for Moscow data

Figure 5 - Illustration of S0 work and different smoothings for Moscow data

However, building upon
, the first-order scattering coefficients
can be useful for the data's behaviour analysis on the different scales are derived from the initial layer of the WST. For each scale and translation of the wavelet transform applied to the input signal, a set of coefficients is generated. These localized spectral features are then processed through a non-linear function followed by a low-pass filter with a window characteristic for a current scale, a process repeated for all specified scales
. This cascading approach ensures that the resulting coefficients are largely invariant to translations of the original signal. Each coefficient quantifies the presence and magnitude of a particular spectral characteristic (defined by its scale and translation) across different signal locations, with minimal sensitivity to their exact position. The ensemble of coefficients thus provides a translation-invariant description of the signal’s spectral landscape across various resolutions.

The idea of the application is the following: to compare the local behaviour on the different time scales using

coefficients for different regions. As the wavelet scattering is translationary-invariant, it is expected to be more representative than the wavelet coherence. We continue to use Pearson correlation as a measure between the current region's data and the reference one, but compute a set of coherence diagram for different reference region, after which the data is averages over the reference region to avoid the artifacts. The result is shown on Figure 6. It is seen that the low scales do not correlate at all; the maximum correlation is observed on the scales of 110-210 days (as it was seen on the wavelet-coherence diagram, Figure 4). The large-scale processes with high correlation are too global may be non-representative in the scales of our problem.

S1-coherence averaged by reference regions

Figure 6 - S1-coherence averaged by reference regions

For the enhanced interpretability, we also averaged the S1-correlation across all analyzed regions. This process generates a consolidated function that illustrates the overall correlation strength across different time scales (Figure 7). The scales corresponding to the maximum value of this function represent those at which the collective behavior of the regional data is most congruent. This finding has significant implications for prognostic modeling. Specifically, models developed to capture dynamics at these scales of peak correlation are more likely to be generalizable across various geographical areas. In contrast, models focused on shorter time scales, which may capture more localized or transient phenomena, would typically necessitate refitting for each distinct region.
S1-coherence averaged by reference and target regios

Figure 7 - S1-coherence averaged by reference and target regios

Having previously discussed anomaly detection in the context of wavelet scalograms, it is appropriate to introduce an additional visualization tool for identifying non-regular behavior. Epidemiological data, when viewed as a complex dynamical system, evolve in a high-dimensional phase space that is inherently difficult to visualize. However, as demonstrated in the preceding analysis, different regions often exhibit similar long-term dynamics, suggesting that the evolution curves of these systems possess identifiable patterns.

To reconstruct the behavior of this multi-component system, we employ Principal Component Analysis (PCA)

combined with dimension reduction based on Takens’ embedding theorem
. By projecting the system’s trajectory onto the plane of the two primary principal components, we can represent the global dynamics compactly and identify deviations from typical patterns.

Figure 8 illustrates the evolution of new case data for Moscow, with the trajectory colored by time. The Omicron outbreak is clearly visible as a distinct deviation from the primary trajectory (indicated in light green, corresponding to January 2022). Outside of this period, the system exhibits quasi-cyclic behavior, highlighting the stability of the long-term epidemic trends. Note that axes PC1 and PC2 do not have the direct sense because they are the main directions of the PCA which the phase space is projected on. However, it does not interfere the visualisation.

System behaviour dynamic in Moscow visualization with PCA

Figure 8 - System behaviour dynamic in Moscow visualization with PCA

Figure 9 illustrates the overall evolution of the system. The first significant anomaly is observed during the first half of 2021, which coincides with the emergence of the Delta variant. Additionally, the Omicron outbreak is clearly identifiable as a distinct light-green loop in the phase space.
Overall behaviour dynamic visualization with PCA

Figure 9 - Overall behaviour dynamic visualization with PCA

4. Conclusion

This study successfully leveraged the power of wavelet analysis to unravel the complex, multi-scale dynamics of COVID-19 epidemiological data. By applying wavelet transforms, we were able to decompose the temporal series of infection rates and other relevant metrics into their constituent frequency components, revealing patterns that operate across distinct temporal scales.

A key contribution of this research lies in the application of wavelet coherence analysis. This powerful technique allowed us to not only identify the presence of correlations between different epidemiological signals (e.g., case counts across regions, or case counts and intervention timelines) but also to pinpoint the specific ranges of temporal scales where these correlations are maximized. This finding is crucial, as it suggests that epidemiological processes, driven by factors such as the emergence and spread of specific viral strains, tend to manifest with similar characteristics at certain scales.

The identification of these “privileged” scales of correlation has significant implications for epidemiological modeling and intervention strategies. It suggests that models developed for understanding disease dynamics in one region may indeed be generalizable and transferable to other regions, provided they are applied at these identified scales. This is because the underlying drivers of epidemic spread that are most strongly correlated across different locations appear to operate within these specific frequency bands. Consequently, understanding the dynamics at these particular scales can offer a more robust foundation for predictive modeling and the development of effective public health responses that are less geographically constrained.

Additionally, the PCA projection of the dynamic system is applied and used for the visualization of the system's behaviour. It is shown that it can be successfully used for detection of the anomal behaviour in the data alongside the wavelet transform.

Метрика статьи

Просмотров:20
Скачиваний:0
Просмотры
Всего:
Просмотров:20