Population analysis of retrotransposons in giraffe genomes supports RTE decline and widespread LINE1 activity in Giraffidae
Data files
May 27, 2022 version files 2.48 GB
-
md5sums.txt
-
Petersen_et_al_2021_data_package_1.tar.bz2
-
Petersen_et_al_2021_data_package_2.tar.bz2
-
Petersen_et_al_2021_data_package_3.tar.bz2
-
Petersen_et_al_2021_data_package_4.tar.bz2
-
README_Dryad.txt
Abstract
The majority of structural variation in genomes is caused by insertions of transposable elements (TEs). In mammalian genomes, the main TE fraction is made up of autonomous and non-autonomous non-LTR retrotransposons commonly known as LINEs and SINEs (Long and Short Interspersed Nuclear Elements). Here we present one of the first population-level analysis of TE insertions in a non-model organism, the giraffe. Giraffes are ruminant artiodactyls, one of the few mammalian groups with genomes that are colonized by putatively active LINEs of two different clades of non-LTR retrotransposons, namely the LINE1 and RTE/BovB LINEs as well as their associated SINEs. We analyzed TE insertions of both types, and their associated SINEs in three giraffe genome assemblies, as well as across a population level sampling of 48 individuals covering all extant giraffe species. Results The comparative genome screen identified 139,525 recent LINE1 and RTE insertions in the sampled giraffe population. The analysis revealed a drastically reduced RTE activity in giraffes, whereas LINE1 is still actively propagating in the genomes of extant (sub)-species. In concert with the extremely low activity of the giraffe RTE, we also found that RTE-dependent SINEs, namely Bov-tA and Bov-A2, have been virtually immobile in the last 2 million years. Despite the high current activity of the giraffe LINE1, we did not find evidence for the presence of currently active LINE1-dependent SINEs. TE insertion heterozygosity rates differ among the different (sub)-species, likely due to divergent population histories. Conclusions The horizontally transferred RTE/BovB and its derived SINEs appear to be close to inactivation and subsequent extinction in the genomes of extant giraffe species. This is the first time that the decline of a TE family has been meticulously analyzed from a population genetics perspective. Our study shows how detailed information about past and present TE activity can be obtained by analyzing large-scale population-level genomic data sets.
Methods
Sampling and sequencing
Whole genome shotgun short read sequencing data of 48 individuals covering all four giraffe species and seven subspecies from Coimbra et al. (2021) was used for TE analysis using MELT (Gardner et al. 2017). The northern giraffe (G. camelopardalis) is represented by 15 individuals, including its three subspecies: the Nubian (G. c. camelopardalis), the Kordofan (G. c. antiquorum), and the West African giraffe (G. c. peralta). The reticulated giraffe (G. reticulata) is represented by ten individuals. The Masai giraffe sensu lato (G. tippelskirchi) is represented by 12 individuals, including its two subspecies: the Luangwa (G. t. thornicrofti) and the Masai giraffe sensu stricto (G. t. tippelskirchi). Finally, the southern giraffe (G. giraffa) is represented by 11 individuals from its two subspecies: the Angolan (G. g. angolensis) and the South African giraffe (G. g. giraffa). The read FASTQ files are available on SRA from the Coimbra et al. (2021) publication BioProject PRJNA635165.
Quality control of short-reads:
- FastQC version 0.11.7 (www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- Trimmomatic version 0.38 with options ‘ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10’, ‘SLIDINGWINDOW:4:20’, and ‘MINLEN:40’
Mapping:
- Reference genome: Kordofan giraffe (accession: ASM1828223v1)
- BWA-MEM version 0.7.17-r1188
- BAM sorting with Samtools version 1.9
- Marked duplicates with MarkDuplicates tool from Picard version 2.18.21
The mapped BAM files from the 48 giraffe individuals have a mean coverage of 19.5X (7-31X) and a mean insert size of 310 bp (247-515 bp).
TE annotation in genome assemblies of Kordofan giraffe, okapi and cattle:
- RepeatMasker with curated giraffe-specific repeat library
- Annotation tracks and consensus sequences for LINE1v3 and RTEv3 used to create MEI files for MELT (see below)
TE insertion calling and analysis:
- MELT (Gardner et al. 2017); helper scripts available at Gitlab repository
- Filtering and downstream analysis: RMarkdown document and all parameters and auxiliary files also available at Gitlab repository
Usage notes
Refer to the included README file for usage instructions.