In a recent study published in Frontiers in Plant Science, researchers presented the assembly of the chia reference genome.

*Study: Reference genome of the nutrition-rich orphan crop chia (Salvia hispanica) and its implications for future breeding. Image Credit: New Africa/Shutterstock.com*

Background

Chia, a nutrient-rich food crop primarily grown in Southern Mexico and Central America, is crucial for long-term food and nutrition security. Global crop enhancement programs have increased grain production and saved several lives, but hidden hunger remains a significant issue. It is essential to diversify the diet of humans by adding produce of nutrient-dense minor crops and orphan crops grown in marginalized areas to ensure long-term food and nutrition security.

The emphasis on these crops has enhanced global demands, increased consumers, and made them valuable in mitigating climate change threats. Constructing genetic resources for these underutilized crops could enhance their manufacture and sustainability.

About the study

In the present study, researchers investigated the chia transcriptome.

The research involved genomic sequencing, transcriptomic analysis of metabolic genes (rosmarinic acid production, seed mucilage synthesis, and fatty acid metabolism), and the discovery of useful genetic indicators for the enhancement of crops. Chia seeds of the second-generation inbred varieties were grown in eight-inch-wide containers with autoclaved soil and meticulously watered in a controlled greenhouse environment.

Young leaves were collected from 14-day-old seedlings that had been pretreated under dark conditions for 2.0 days, frozen in nitrogen solution, and transported for genome deoxyribonucleic acid (DNA) retrieval, sequencing, and assembling. They created two Dovetail HiC genetic libraries and a Chicago HighRise deoxyribonucleic acid sequencing library for genomic scaffolding. For the de novo assembly, they used an array of 2x150bp paired-end genetic reads obtained by shotgun-type sequencing. The initial data set included 956 million pairs of gene reads from paired-end genetic libraries.

The team predicted de novo repeats, combining six plant libraries with the identified de novo gene repeats. They performed genetic model estimation using biopeptide datasets from five species and four Lamiaceae plants. The researchers used a trained dataset with external clues generated from previously published ribonucleic acid sequencing (RNA-seq) analyses of 13 tissues for genetic model estimation.

The team in silico analyzed the presence of biopeptide signatures in the chia proteome that can impact human health positively. They used a library of curated biopeptides as a probe to identify similar sequence signatures in chia proteins. The HiRise pipeline was used for genomic assembly and scaffolding improvements, predicting subcellular locations of proteins encoded by the chia genome and comparing recently published reports of S. hispanica genome sequences to their chia genomic assembly and gene mappings. The researchers created highly accurate splice site classifiers to filter splice junctions in RNA-Seq read alignments.

Results

The chia genome spanned 304 Mb and encoded 48,090 protein-encoding genes. The analysis showed that 42.0% of the genome harbored repetitive information and identified three million single nucleotide polymorphisms (SNPs) with 15,380 simple sequence repeat (SSR) regions. The researchers built the haploid-type chid genome with a 356 Mb genome size. The HiRise scaffolding produced 304 Mb (85%) of the expected chia genomic size, with 2,185 scaffolds and a projected physical cover of 2692x.

The sequenced genome was made up of 299 Mb of scaffolds encoding haploid chromosomes or pseudomolecules. The newly published transcriptomic atlas data from 13 tissue samples mapped onto the six biggest scaffolds provided 99.0% of de novo generated transcripts. The findings indicated that the six scaffolds span nearly all of the transcribed areas and correspond to haploid chromosomes. By detecting its repeat content, the genome assembly was repeat masked, making up 42% of the chia genome. The most prevalent repeat sequences (99.6 Mb) were not classified, indicating they were not found in public databases.

For genetic model estimation and downstream evaluation, researchers only used six pseudomolecules (Sh1-6). To generate non-redundant and comprehensive gene models, 48,743 protein-encoding genes were filtered by gene filtering, analysis, and conversion (gFACs). The chia genome had 799 transfer ribonucleic acid (tRNA) genes, 30 and 70% more genes than those of tomato and Arabidopsis, respectively. The ribosomal RNA (rRNA) annotation identified 37 rRNA genes in the genome, of which only ten were present in the pseudochromosomes. The team identified 98 members of the lectin family homologs in chia based on sequence similarity to the Arabidopsis lectin family members.

Based on the study findings, the reference genome of the nutrition-rich orphan crop chia (Salvia hispanica) provides nearly complete coverage of the gene space and contributes to genomic data resources. The 304 Mb genome assembly comprises 2,185 scaffolds covering 94% of the gene space and 48,090 protein-coding genes. The team proposes consistent naming of chia chromosomes and a reference genome nomenclature based on chromosome numbers and gene locations in pseudochromosomes. Harmonizing genome and gene nomenclature is a high priority.