Исследование эпидемиологических данных COVID-19 с использованием вейвлет-анализа
Исследование эпидемиологических данных COVID-19 с использованием вейвлет-анализа
Аннотация
В данной работе представлено всестороннее исследование применения вейвлет-анализа к реальным пандемическим данным — ежедневным новым случаям COVID-19 в России. Неотъемлемая сложность и многомасштабная природа динамики инфекционных заболеваний, усугубляемая такими факторами, как эволюция вируса и реакция населения, часто представляют собой вызов для эффективности традиционных эпидемиологических подходов. Вейвлет-анализ, благодаря своей уникальной способности разлагать сигналы на составляющие частоты, локализованные во времени, предлагает мощную альтернативу для выявления скрытых закономерностей и понимания явлений, происходящих на различных временных масштабах.
Наше исследование углубляется в изучение динамики заболеваемости COVID-19 в различных регионах России. Сравнивая региональную динамику на различных временных масштабах, мы стремимся выявить периоды, когда поведение является устойчиво схожим, тем самым обнажая лежащие в основе общие движущие силы или ответные реакции. Этот сравнительный анализ приводит к ключевому выводу относительно конкретных временных масштабов, на которых проявляются эти сходные паттерны поведения, и визуализации аномалий.
Результаты, полученные с помощью вейвлет-анализа, напрямую применимы к разработке, обобщению и масштабированию прогностических моделей, основанных на данных. Понимая временные масштабы, демонстрирующие надежное, межрегиональное сходство, мы можем создавать более эффективные и адаптируемые предиктивные модели. Модели, основанные на этих масштабно-специфических инсайтах, лучше подходят для широкого применения, уменьшая необходимость в обширной донастройке для различных географических областей и, в конечном итоге, повышая нашу способность к прогнозированию и управлению эпидемиями.
1. Introduction
The COVID-19 pandemic has presented an unprecedented global challenge, characterized by a continuous surge of new infections and the emergence of distinct viral variants. The sheer volume and granularity of data collected during this period — encompassing daily case counts, geographical spread, demographic impact, and mortality rates — provide a rich resource for understanding the intricate dynamics of infectious disease transmission
. However, the complexity inherent in these datasets, further amplified by the evolution of SARS-CoV-2 into various strains (including Alpha, Delta, Omicron ones), demands analytical tools capable of dissecting processes occurring across multiple temporal scales .Different viral strains exhibit varied transmissibility, virulence, and immune evasion properties, leading to the manifestation of epidemic processes at diverse scales. Short-term fluctuations might reflect localized outbreaks driven by a highly transmissible variant, while longer-term trends could be influenced by the introduction of new variants, the impact of public health interventions, or seasonal patterns
, . Traditional statistical methods, while valuable, often struggle to adequately capture these multi-scale dynamics, making it challenging to disentangle the contributions of various factors and identify critical turning points in the epidemic trajectory.This is where wavelet analysis
emerges as a particularly powerful and advantageous tool. Wavelet transforms offer a unique ability to decompose time-series data into its constituent frequencies while simultaneously preserving temporal localization. This allows for the identification of patterns and anomalies that may be masked by conventional approaches . For the analysis of COVID-19 dynamics, wavelet analysis provides a means to multi-scale resolution, capable to distinguish between short-term fluctuations associated with specific outbreaks and slower, persistent trends indicative of broader epidemiological shifts, as well as identification and visualization of the abrupt changes and anomalies detectable due to the sensitivity of wavelet analysis to localized features in the data .This paper explores the application of wavelet analysis to detailed COVID-19 epidemiological data, demonstrating its efficiency in uncovering intricate patterns and providing a deeper understanding of the multi-scale nature of the pandemic, particularly as influenced by the evolution of its constituent viral strains.
2. Research methods and principles
The wavelet transform
is a mathematical tool that allows for the analysis of a signal in both the time and frequency domains simultaneously. Unlike the traditional Fourier analysis, which represents a signal as a sum of sinusoids of infinite duration, the wavelet transform uses short, wave-like functions called wavelets. These wavelets are localized in both time and frequency, meaning they have a finite duration and a limited frequency band.The core idea behind the wavelet transform is to decompose a signal into a set of basis functions (wavelets) that are translated and scaled versions of a single “mother wavelet.” The output of the wavelet transform is a set of wavelet coefficients, or a field of continuous wavelet transform field. These coefficients represent the degree of similarity between the signal and the chosen wavelet at a particular scale (inversed frequency) and time location. A large coefficient indicates that the wavelet at that specific scale and time is a good representation of the signal at that point.
Continuous wavelet transform for a given signal
Here
The key advantages of wavelet transform are time-frequency localization, multi-resolution analysis, adaptability for different signal types and successful detection of transient features even for purely-resolved digital sampling.
The wavelet transform is connected with a spectrogram
done with the windowed Fourier transform, a Fourier transform extension developed to analyze how the frequency content of a signal changes over time. It addresses a principle limitation of the standard Fourier transform, which provides frequency information for the entire signal duration but loses all temporal information. The method works by dividing the continuous signal into many small, overlapping segments A windowing function (e.g., Hanning, Hamming) is applied to each segment to smoothly taper its edges and reduce spectral leakage. Then, a standard Fourier transform is performed on each of these windowed segments.where
Based on the wavelet transform, wavelet scattering transform (WST)
represents a sophisticated evolution of wavelet analysis, designed to extract robust, invariant, and hierarchical features from complex signals. It addresses limitations of traditional methods by systematically decomposing a signal across multiple scales and translations, while simultaneously ensuring invariance to deformations and translations, which are common in real-world data.The WST operates through a series of cascaded layers, each employing a wavelet transform followed by a non-linear operation. This layered structure allows for the progressive capture of increasingly complex signal properties.
At the foundation of the WST lies the zero-order scattering coefficient
The first layer of the WST generates the first-order scattering coefficients (
A key principle underpinning the WST’s ability to capture multi-scale information is the exponential expansion of scales. The wavelet filters are applied at scales that increase exponentially (or according to a geometric progression). This ensures that the decomposition covers a wide range of temporal or spatial resolutions, from very fine details to very coarse structures. This exponential expansion is crucial for capturing the full spectrum of underlying phenomena, from rapid fluctuations to long-term trends, which is particularly relevant when analyzing signals with diverse temporal characteristics, such as epidemiological data influenced by different viral strains operating at different timescales.
Subsequent layers of the WST build upon the information captured in the previous layers. For instance, the second-order scattering coefficients (
3. Main results
As a data, the new cases distributed by the regions of Russia will be used. An example is represented on Figure 1. The data has typical oscillations with a sharp peak in the beginning of 2022 corresponding the Omicron strain appearance.

Overall behaviour of new cases in Moscow
example region

Figure 2 - Spectrogram for Moscow data

Figure 3 - Wavelet scalogram for Moscow
To compare the difference in the behaviour in different regions, the wavelet-coherence
was computed. As a measure of the closeness of the wavelets, corresponding the average closeness of the behaviours on the corresponding time scale, Pearson correlation coeffecient was used. The result of the comparison every region with the Moscow data as a reference is represented on Figure 4. It is seen that the main correspondence beween the regions behaviour is localized at the time scales of 110-210 days. The deep burgundy-colored line is the Moscow itself with the correlation 1 for all the scales, leaved on the diagram as a reference.
Figure 4 - Wavelet-coherence for Moscow as refercnce region
At the foundation of the WST lies the zero-order scattering coefficient

Figure 5 - Illustration of S0 work and different smoothings for Moscow data
The idea of the application is the following: to compare the local behaviour on the different time scales using

Figure 6 - S1-coherence averaged by reference regions

Figure 7 - S1-coherence averaged by reference and target regios
To reconstruct the behavior of this multi-component system, we employ Principal Component Analysis (PCA)
combined with dimension reduction based on Takens’ embedding theorem . By projecting the system’s trajectory onto the plane of the two primary principal components, we can represent the global dynamics compactly and identify deviations from typical patterns.Figure 8 illustrates the evolution of new case data for Moscow, with the trajectory colored by time. The Omicron outbreak is clearly visible as a distinct deviation from the primary trajectory (indicated in light green, corresponding to January 2022). Outside of this period, the system exhibits quasi-cyclic behavior, highlighting the stability of the long-term epidemic trends. Note that axes PC1 and PC2 do not have the direct sense because they are the main directions of the PCA which the phase space is projected on. However, it does not interfere the visualisation.

Figure 8 - System behaviour dynamic in Moscow visualization with PCA

Figure 9 - Overall behaviour dynamic visualization with PCA
4. Conclusion
This study successfully leveraged the power of wavelet analysis to unravel the complex, multi-scale dynamics of COVID-19 epidemiological data. By applying wavelet transforms, we were able to decompose the temporal series of infection rates and other relevant metrics into their constituent frequency components, revealing patterns that operate across distinct temporal scales.
A key contribution of this research lies in the application of wavelet coherence analysis. This powerful technique allowed us to not only identify the presence of correlations between different epidemiological signals (e.g., case counts across regions, or case counts and intervention timelines) but also to pinpoint the specific ranges of temporal scales where these correlations are maximized. This finding is crucial, as it suggests that epidemiological processes, driven by factors such as the emergence and spread of specific viral strains, tend to manifest with similar characteristics at certain scales.
The identification of these “privileged” scales of correlation has significant implications for epidemiological modeling and intervention strategies. It suggests that models developed for understanding disease dynamics in one region may indeed be generalizable and transferable to other regions, provided they are applied at these identified scales. This is because the underlying drivers of epidemic spread that are most strongly correlated across different locations appear to operate within these specific frequency bands. Consequently, understanding the dynamics at these particular scales can offer a more robust foundation for predictive modeling and the development of effective public health responses that are less geographically constrained.
Additionally, the PCA projection of the dynamic system is applied and used for the visualization of the system's behaviour. It is shown that it can be successfully used for detection of the anomal behaviour in the data alongside the wavelet transform.
