Genomic diversity evaluation of Populus trichocarpa germplasm by screening 1,014 full genomes
Data files
Dec 12, 2024 version files 58.88 GB
-
README.md
971 B
-
Variant_calling_P_trichocarpa_1014.vcf.gz
58.88 GB
Abstract
Forest trees may harbor naturally occurring exon disruptive variants (DVs) in their gene sequences, which potentially impact important ecological and economic phenotypic traits. However, the abundance and molecular regulation of these variants remain largely unexplored. Here, 24,420 DVs were identified by screening 1014 Populus trichocarpa full genomes. The identified DVs were predominantly heterozygous with allelic frequencies below 5% (only 26% of DVs had frequencies greater than 5%). Using common garden-grown trees, DVs were assessed for gene expression variation in the developing xylem, revealing that their gene expression can be significantly altered, particularly for homozygous DVs (in the range of 27%-38% of cases depending on the studied common garden). DVs were further investigated for their correlations with 13 wood quality traits, revealing that, among the 148 discovered DV associations, 15 correlated with more than one wood property and six genes had more than one DV in their coding sequences associated with wood traits. Approximately one-third of DVs correlated with wood property variation also showed significant gene expression variation, confirming their non-spurious impact. These findings offer potential avenues for targeted introduction of homozygous mutations using tree biotechnology, and while the exact mechanisms by which DVs may directly influence wood formation remain to be unraveled, this study lays the groundwork for further investigation.
README: Genotypic Dataset of a Large Populus trichocarpa Germplasm for Rare Variant Genetic Association Studies
https://doi.org/10.5061/dryad.g79cnp5wb
This genotypic dataset contains rare and common small genetic variations (1-3bp) from 1014 Populus trichocarpa individuals sampled across most of the species range. Original genomes have been download from the JGI Genome Portal (https://genome.jgi.doe.gov/portal/).
The file (Variant_calling_P_trichocarpa_1014.vcf) is in the VCF format. Genetic variants have been identified using the platypus and GATK's Haplotype Caller variant callers. The resulting VCF file has been annotated using SnpEff with P. trichocarpa reference genome v3.0.
A complete description of how the dataset has been obtained is detailed in the associated publication (https://doi.org/10.3389/fgene.2019.01384).
Methods
A total of 1,014 pure Populus trichocarpa entire genomes were used to identify rare and common small genetic variants across individual genomes. Variant calls were compared between Platypus and HaplotypeCaller pipelines, and strict quality filters were further applied for improved genetic variant identification. Finally, only genetic variants that were identified by both variant callers were retained for increasing calling confidence. Based on these shared variants and after stringent quality filtering, a high genomic diversity in P. trichocarpa germplasm, with 7.4 million small genetic variants was found. Importantly, 377k non-synonymous variants (5% of the total) were uncovered.