Data from: Patterns of linkage disequilibrium and long range hitchhiking in evolving experimental Drosophila melanogaster populations
Data files
Nov 11, 2015 version files 1.87 GB
-
Dmel_haplotype_fasta_base.zip
-
Dmel_haplotype_fasta_F67.zip
-
scripts.zip
Dec 07, 2016 version files 1.93 GB
-
AllDmelCagePops_F0toF59_RepIndMasked_pub_EU.sync.gz
-
Dmel_haplotype_fasta_base.zip
-
Dmel_haplotype_fasta_F67.zip
-
scripts.zip
Abstract
Whole genome re-sequencing of experimental populations evolving under a specific selection regime has become a popular approach to determine genotype-phenotype maps and understand adaptation to new environments. Despite its conceptual appeal and success in identifying some causative genes, it has become apparent that many studies suffer from an excess of candidate loci. Several explanations have been proposed for this phenomenon, but it is clear that information about the linkage structure during such experiments is needed. Until now only Pool-Seq data were available, which do not provide sufficient information about the correlation between linked sites. We address this problem in two complementary analyses of three replicate D. melanogaster populations evolving to a new hot temperature environment for almost 70 generations. In the first analysis, we sequenced 58 haploid genomes from the founder population and evolved flies at generation 67. We show that during the experiment LD increased almost uniformly over much greater distances than typically seen in Drosophila. In the second analysis, Pool-Seq time series data of the three replicates were combined with haplotype information from the founder population to follow blocks of initial haplotypes over time. We identified 17 selected haplotype-blocks that started at low frequencies in the base population and increased in frequency during the experiment. The size of these haplotype-blocks ranged from 0.082 to 4.095 Mb. Moreover, between 42-46% of the top candidate SNPs from the comparison of founder and evolved populations fell into the genomic region covered by the haplotype-blocks. We conclude that LD in such rising haplotype-blocks results in long range hitchhiking over multiple kb sized regions. LD in such haplotype-blocks is therefore a major factor contributing to an excess of candidate loci. While modifications of the experimental design may help to reduce the hitchhiking effect and allow for more precise mapping of causative variants, we also note that such haplotype-blocks might be well suited to study the dynamics of selected genomic regions during experimental evolution studies.