Data from: Differential effect of selection against LINE retrotransposons among vertebrates inferred from whole-genome data and demographic modeling
Data files
May 28, 2018 version files 1.95 GB
Abstract
Variation in LINE composition is one of the major determinants for the substantial size and structural differences among vertebrate genomes. In particular, the larger genomes of mammals are characterized by hundreds of thousands of copies from a single LINE clade, L1, whereas nonmammalian vertebrates possess a much greater diversity of LINEs, yet with orders of magnitude less in copy number. It has been proposed that such variation in copy number among vertebrates is due to differential effect of LINE insertions on host fitness. To investigate LINE selection, we deployed a framework of demographic modeling, coalescent simulations, and probabilistic inference against population-level whole-genome data sets for four model species: one population each of threespine stickleback, green anole, and house mouse, as well as three human populations. Specifically, we inferred a null demographic background utilizing SNP data, which was then exploited to simulate a putative null distribution of summary statistics that was compared with LINE data. Subsequently,we applied the inferred null demographic model with an additional exponential size change parameter, coupled with model selection, to test for neutrality as well as estimate the strength of either negative or positive selection. We found a robust signal for purifying selection in anole and mouse, but a lack of clear evidence for selection in stickleback and human. Overall, we demonstrated LINE insertion dynamics that are not in accordance to a mammalian versus nonmammalian dichotomy, and instead the degree of existing LINE activity together with host-specific demographic history may be the main determinants of LINE abundance.