Combined analysis of transposable elements and structural variation in maize genomes reveals genome contraction outpaces expansion
Munasinghe, Manisha et al. (2023), Combined analysis of transposable elements and structural variation in maize genomes reveals genome contraction outpaces expansion, Dryad, Dataset, https://doi.org/10.5061/dryad.5qfttdz9t
Background – Structural differences between genomes are a major source of genetic variation that contributes to phenotypic differences. Transposable elements, mobile genetic sequences capable of increasing their copy number and propagating themselves within genomes, can generate structural variation. However, their repetitive nature makes it difficult to characterize fine-scale differences in their presence at specific positions, limiting our understanding of their impact on genome variation. Domesticated maize is a particularly good system for exploring the impact of transposable element proliferation as over 70% of the genome is annotated as transposable elements. High-quality transposable element annotations were recently generated for de-novo genome assemblies of 26 diverse inbred maize lines.
Results – We generated base-pair resolved pairwise alignments between the B73 maize reference genome and the remaining 25 inbred maize line assemblies. From this data, we classified transposable elements as either shared or polymorphic in a given pairwise comparison. Our analysis uncovered substantial structural variation between lines, representing both putative insertion and deletion events. Putative insertions in SNP-depleted regions, which represent recently diverged identity by state blocks, suggest some TE families may still be active. However, our analysis reveals that genome-wide, deletions of transposable elements account for more structural variation than insertions. These deletions are often large structural variants containing multiple transposable elements.
Conclusions – Combined, our results highlight how transposable elements contribute to structural variation and demonstrate that deletion events are a major contributor to genomic differences.
High-quality genome assemblies for the 26 Nested Association Mapping (NAM) inbred founder lines were downloaded from MaizeGDB (https://www.maizegdb.org/genome). AnchorWave v1.0.1 was used to perform pairwise whole genome alignments to compare each of the NAM inbred genomes to the B73 reference genome (included in the NAM founder line set) for a total of 25 pairwise whole-genome alignments via the 'genoAli' command and '-IV' parameter. The MAFToGVCF plugin of tassel v5.2.82 was used to reformat genome alignments in MAF format into variant calling records in GVCF format. Both the MAF and GVCF formats are provided here.
TE annotations, gene annotations, and gene synteny calls were downloaded from MaizeGDB. TE and gene annotations were downloaded from https://maizegdb.org/NAM_project, while synteny classifications for the NAM genes were downloaded from https://ars-usda.app.box.com/v/maizegdb-public/folder/186350887665.
Scripts used to filter publicly available datasets and to generate new data can be found on GitHub at https://github.com/mam737/PolymorphicTEs_NAM along with a README detailing what each script does.
National Science Foundation, Award: IOS-2010908
National Science Foundation, Award: IOS-2109697
National Science Foundation, Award: IOS-1907343
National Science Foundation, Award: IOS-1934384