Epidemiological surveillance can now be conducted in close to real-time owing to current viral genomic sequencing efforts. The challenge is to identify variants that could pose potential threats, within viral sequences.

*Study: Motifs in SARS-CoV-2 evolution. Image Credit: softpixel/Shutterstock*

A new study posted on the bioRxiv* preprint server puts forward a bottom-up approach to quickly predict whether a lineage will dominate the viral population at a particular geographical location.

Background

Genomic surveillance is crucial to fight against rapidly mutating RNA viruses. This has been most recently observed in the context of the coronavirus disease 2019 (COVID-19) pandemic. The Omicron variant of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) reduced the efficacy of mRNA vaccines and at such times, the rapid recognition of critical adaptations has been particularly important.

Genomic surveillance uses next-generation phylogenetic methods and sequencing to identify phenotypically or antigenetically different variants. Lineage assignment, mutation extraction, biological analysis, and declaration are the four key steps that are involved. This aids in early anticipation and effective control of potential viral outbreaks.

Mutations in viral genomes do not always appear in an isolated manner. In fact, positions in a molecule that share common constraints do not evolve independently. Therefore, they leave a signature in patterns of homologous sequences. Extracting such signals fosters a deeper understanding of the impact of mutations on viral fitness. This could also lead to the early detection of emerging variants.

About the study

The relational structure of a viral population evolves. Dramatic changes in the relational structure can be suggestive of important events, such as the emergence of a new and fitter lineage. The current study illustrates this concept, using a case study, showing the evolution of relational structures.

The case study involved studying the relational structure SARS-CoV-2 population in the UK, from October week 3, 2020, to October week 4, 2021. At the time frame, the Alpha variant (lineage B.1.1.7) emerged and became the dominant strain. Soon after, it was replaced by the Delta variant (lineage B.1.617.2).

The novelty of the study is centered around focusing on the linkage and not the specific mutation patterns that are observed in the data. This leads to the concept of motifs and relational structures. This framework should pave the way to a more systematic development of relational structures for generally aligned data.

Findings

The key idea is to study the site co-occurring mutational patterns to quantify the evolution of multiple sequence alignments (MSA). The relational structure idea captures maximal sites that co-influence selection pressure. Varied changes in the relational structure within the MSA, provide key information on the viral “heartbeat.” The current study shows that selection pressure leads to some motifs that appear as distinguished patterns within the MSA. This leads to a significant reduction in data, and identifying sites aids subsequent biological analysis.

Lineage designation and phylogenetic analysis study the evolutionary closeness of genomic sequences. On the contrary, relational structure focuses on the interaction among sites in the MSA. The authors showed that combining lineage information and relational structure could help in gaining deeper insights into how a lineage emerges and develops. More coordination has been demonstrated to provide sensible statistical signals, which can help in making timely predictions.

The relational structure approach and lineage information are only sometimes compatible. In the current study, they have been linked by characteristic mutations for each lineage. The Pango database was used to obtain the characteristic mutation for each lineage. It must be noted that characteristic mutations are not “characteristic” enough, especially when one lineage is a sub-lineage of the other. The same mutation is sometimes used to characterize multiple lineages. To counter this challenge, researchers selected and merged lineages so that the sequences were better partitioned.

Concluding remarks

Below the level of species, there is no universal system performing satisfactorily well for viral taxonomy. Previous studies have proposed a dynamic nomenclature approach for SARS-CoV-2, but this approach has the problem of over-designation. Over 2500 lineage/sub-lineage names were assigned, and very few were considered to be critical from an epidemiological perspective. The authors hypothesized that by incorporating relational structures, researchers would be able to detect key lineages more accurately and in a timely fashion.

The authors hope that this approach can be generalized to elucidate genetic diversity within a population and designate (sub)lineages. The key will be to infer the hierarchical (sub)lineage structure. This could be achieved by using the relational structure to construct non-overlapping sub-samples and then recursively computing the relational structure, within each sample. This would then yield (sub)lineages for the given sub-population. In this manner, future researchers could recover the hierarchical structure between lineages without specialized human input.

*Important notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.