In a recent study published in Nature Communications, researchers developed an approach for integrating genetic information, presented as whole-genomic viral sequences, and epidemiological information, for serial interval (SI) estimation, especially in cases of inadequate contact tracing data.
In infectious disease control, serial intervals are critical because they need information on individual exposures and contact tracing operations. Current approaches are best suited for small, restricted populations with high sampling; however, estimates from tiny early outbreaks are frequently used in large-scale epidemiological analysis.
Although genomic epidemiology studies can influence public health action, budget constraints and privacy concerns limit widespread reporting and usage.
About the study
In the present study, researchers introduced an efficient alternate framework using virus sequences to estimate serial intervals to explore cluster-specific SI estimates within the first and second coronavirus 2019 (COVID-19) waves in Victoria, Australia.
The team concentrated on the cluster-specific prediction of SI, a fundamental parameter reflecting infectious disease propagation, described as the period between symptom onset in primary and secondary cases. For inferring the SI distribution in incompletely-sampled case clusters, virus sequences were employed instead of direct data on infection pairings.
Uncertainty in infecting and infected individuals was introduced by picking particular viable transmission networks based on viral sequences and known symptom onset timings.
Given that the inferred transmission may not be the direct transmission, mixture modeling was performed for SI estimation. The technique was designed for larger environments with lower levels of sampling and genetic variety, where there may be insufficient evidence to rebuild transmission patterns confidently.
The researchers examined severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) whole-genome sequences and collected symptom onset from Victoria, Australia.
The team investigated the effect employing cluster-specific serial interval estimations had on downstream estimates of the time-dependent reproduction number (Rt). Validation was performed using simulated influenza-like epidemic data with a known SI distribution.
In Victoria, serial clusters were calculated in transmission clusters from the first (January 6 to April 14, 2020) and second (June 1 to October 28, 2020) COVID-19 waves. The ribonucleic acid (RNA) of SARS-CoV-2 was isolated from nasopharyngeal swabs and identified using reverse transcription-polymerase chain reaction (RT-PCR).
Phylogenetic reconstructions were done after sequencing data were mapped to the original SARS-CoV-2 Wuhan-Hu-1 strain reference sequence.
Results
Even though no information on contact between patients was required and the data was incompletely sampled, the COVID-19 SI estimates were comparable to those observed in extensive contact investigations.
The results are more ambiguous than many previously published estimates, although most estimates were based on small-sized populations with documented contact pairings not accounting for probable underreporting.
Clusters that occurred at locations linked with long encounters, such as elderly care and healthcare, showed larger SI values than clusters that occurred at sites frequented for shorter periods, such as meat processing or packing industries.
The findings demonstrated that genomic data may provide a high-resolution perspective of transmission on a broad scale, but a collection of contact-tracing data may be prohibitively expensive or impractical. SI estimates were shorter for schools and meat processing and packaging businesses than for healthcare institutions.
Viral sequences provided a feasible strategy for inferring cluster-specific estimates, although the approach could be used in larger situations, even in the absence of precise contact tracing data. Pathogen sequencing data acquired from diseased individuals cannot directly provide data on the infector and the infected, but they can provide a high-resolution perspective of transmission.
The technique performed well in estimating the mean SI but with an increase in uncertainty with a decrease in the percentage of instances. The results for the SI standard deviation were identical.
The estimated approach is not confined to cases where genetic data is utilized to identify prospective couples. If contact information is provided, it might be utilized to construct a set of possible transmission networks and estimate the SI distribution.
The strategy proved effective in conditions with little transmission divergence and was also suitable in settings with longer serial intervals as long as the sequences had enough diversity. Some clusters showed higher or lower sample quantities: such data might be utilized to monitor potential epidemics, particularly if the method is incorporated into real-time genomic surveillance.
The utilization of cluster-wise SIs to estimate Rt values against literature-based estimates increased Rt by 2-3 fold, especially in the initial period of outbreaks.
During the first COVID-19 wave, 1,242 samples from 1,075 SARS-CoV-2-positive individuals were sequenced, accounting for 81% of SARS-CoV-2 infections identified in Victoria in the period. For 10 first-wave clusters, 312 of 903 samples that passed quality control belonged to a genetic cluster with ≥15 instances.
During the second wave, 15,665 specimens from 14,075 individuals were sequenced, accounting for 84% of all cases found in Victoria. Of the 5,745 cases that passed quality control, 3,875 were recognized as being part of a cluster of ≥15 cases.
In the major clusters, the mean SI estimate ranged between 2.6 and 6.7 days. The mean serial interval was estimated to be five days using the ‘all cluster’ contact-cluster-specific estimate.
Overall, the study highlighted showed a novel technique that combines genetic data with epidemiological data to analyze genomic transmission. It can be integrated into real-time public health monitoring, comparing SARS-CoV-2 transmission and investigating genomically specified sampling networks, providing an intermediate regime for genomic epidemiology.