Bourgeois, Yann; Ruggiero, Robert; Hariyani, Imtiyaz; Boissinot, Stephane (2020), Disentangling the determinants of transposable elements dynamics in vertebrate genomes using empirical evidences and simulations, Dryad, Dataset, https://doi.org/10.5061/dryad.wpzgmsbjw
The interactions between transposable elements (TEs) and their hosts constitute one of the most profound co-evolutionary processes found in nature. The population dynamics of TEs depends on factors specific to each TE families, such as the rate of transposition and insertional preference, the demographic history of the host and the genomic landscape. How these factors interact has yet to be investigated holistically. Here we are addressing this question in the green anole ( Anolis carolinensis ) whose genome contains an extraordinary diversity of TEs (including non-LTR retrotransposons, SINEs, LTR-retrotransposons and DNA transposons). We observe a positive correlation between recombination rate and TEs frequencies and densities for LINEs, SINEs and DNA transposons. For these elements, there was a clear impact of demography on TE frequency and abundance, with a loss of polymorphic elements and skewed frequency spectra in recently expanded populations. On the other hand, some LTR-retrotransposons displayed patterns consistent with a very recent phase of intense amplification. To determine how demography, genomic features and intrinsic properties of TEs interact we ran simulations using SLiM3. We determined that i) short TE insertions are not strongly counter-selected, but long ones are, ii) neutral demographic processes, linked selection and preferential insertion may explain positive correlations between average TE frequency and recombination, iii) TE insertions are unlikely to have been massively recruited in recent adaptation. We demonstrate that deterministic and stochastic processes have different effects on categories of TEs and that a combination of empirical analyses and simulations can disentangle these mechanisms.
TEs were called using MELT (https://melt.igs.umaryland.edu/), using the consensus sequences found in the file TE_consensus_for_MELT.fasta. Information about TE counts, lengths, and densities were extracted from the VCF files using VCFTools and BEDTOOLs. A file with the average effective recombination rate divided by nucleotide diversity in 1Mb windows is also provided. Because the effective recombination rate depends on the effective population size, it is directly correlated to local reduction of diversity due to linked selection. Thus, a low value may be due to either low diversity due to linked selection or to reduced recombination (r). Since nucleotide diversity is an estimator of 4Nm,with m the mutation rate and N the local effective population size, this statistic can be seen as the ratio between r and the mutation rate m.
VCF files contain TE genotypes without missing data for the 29 individuals included in the study. The file Correspondance_individuals_VCF_clades.txt details to which genetic cluster/species each individual belongs.