Raw genotyped total called structural variant (SV)
Songsomboon, Kittikun et al. (2021), Raw genotyped total called structural variant (SV), Dryad, Dataset, https://doi.org/10.5061/dryad.9cnp5hqhc
Genomic structural mutations especially deletion are an important source of variation in many species and can play key roles in phenotypic diversification and evolution. Previous work in many plant species, including some crops, has identified multiple instances of structural variations (SVs) occurring in or near genes related to stress response and disease resistance, suggesting a possible role for SVs in local adaptation. Sorghum (Sorghum bicolor (L.) Moench) is one of the most widely grown cereal crops in the world, and over the course of its history it has been adapted to an array of different climates as well as bred for multiple purposes, resulting in a striking phenotypic diversity within the existing germplasm. In this study, we identified genome-wide deletions in the Biomass Association Panel (BAP), a collection of 347 diverse sorghum genotypes collected from multiple countries and continents. Using Illumina-based, short-read whole genome resequencing data from every genotype, we found a total of 22,359 deletions after filtering. The size of deletions ranged from 51 to 89,716 bp with a median size of 956 bp. The global site frequency spectrum of the deletions fit a model of neutral evolution, suggesting that the majority of deletions were not under any types of selection. Clustering results based on SNPs separated the deletions of the genotypes into eight clusters which largely corresponded with geographic origins. Even though most deletions appeared to be neutral, a handful of cluster-specific deletions were found in genes related to biotic (plant defense and bacterial resistance) and abiotic stress (drought and temperature) responses, supporting the possibility that at least some deletions contribute to local adaptation in sorghum.
The pipeline for calling SVs in the BAP was adopted from the svtools pipeline (Larson et al. 2019). Briefly, de-multiplexed sequences reads in FASTQ format for each individual were aligned to version 3.0.1 of the BTx623 reference genome (as downloaded from Phytozome v12.1.6: https://phytozome.jgi.doe.gov/pz/portal.html) using the program speedseq (Chiang et al. 2015). Structural variations were identified in each individual aligned BAM file using LUMPY (Layer et al. 2014) with default parameters. The resulting 347 structural variation files were then sorted and merged with svtools (Larson et al. 2019). A full tutorial of this process has been delineated by the authors of svtools, and can be found at https://github.com/hall-lab/svtools/blob/master/Tutorial.md. The merged vcf was then used to calculate a genotype for each individual at the variant positions resulting in a fully genotyped vcf file of each individual. CNVnator(Abyzov et al. 2011) was run within svtools in order to annotate the called variants based on copy number. Subsequently, svtools merged the genotyped and CNV-annotated vcf files to remove any redundant variants that were called by both programs.
The VCF file was the product of LUMPY pipeline integrating with genotyping and CNV detecting step to generate a merge SV