Data from: Patterns of linkage disequilibrium and long range hitchhiking in evolving experimental Drosophila melanogaster populations
Franssen, Susanne U., University of Veterinary Medicine Vienna
Nolte, Viola, University of Veterinary Medicine Vienna
Tobler, Ray, University of Veterinary Medicine Vienna
Schlötterer, Christian, University of Veterinary Medicine Vienna
Published Dec 07, 2016 on Dryad.
Cite this dataset
Franssen, Susanne U.; Nolte, Viola; Tobler, Ray; Schlötterer, Christian (2016). Data from: Patterns of linkage disequilibrium and long range hitchhiking in evolving experimental Drosophila melanogaster populations [Dataset]. Dryad. https://doi.org/10.5061/dryad.403b2
Whole genome re-sequencing of experimental populations evolving under a specific selection regime has become a popular approach to determine genotype-phenotype maps and understand adaptation to new environments. Despite its conceptual appeal and success in identifying some causative genes, it has become apparent that many studies suffer from an excess of candidate loci. Several explanations have been proposed for this phenomenon, but it is clear that information about the linkage structure during such experiments is needed. Until now only Pool-Seq data were available, which do not provide sufficient information about the correlation between linked sites. We address this problem in two complementary analyses of three replicate D. melanogaster populations evolving to a new hot temperature environment for almost 70 generations. In the first analysis, we sequenced 58 haploid genomes from the founder population and evolved flies at generation 67. We show that during the experiment LD increased almost uniformly over much greater distances than typically seen in Drosophila. In the second analysis, Pool-Seq time series data of the three replicates were combined with haplotype information from the founder population to follow blocks of initial haplotypes over time. We identified 17 selected haplotype-blocks that started at low frequencies in the base population and increased in frequency during the experiment. The size of these haplotype-blocks ranged from 0.082 to 4.095 Mb. Moreover, between 42-46% of the top candidate SNPs from the comparison of founder and evolved populations fell into the genomic region covered by the haplotype-blocks. We conclude that LD in such rising haplotype-blocks results in long range hitchhiking over multiple kb sized regions. LD in such haplotype-blocks is therefore a major factor contributing to an excess of candidate loci. While modifications of the experimental design may help to reduce the hitchhiking effect and allow for more precise mapping of causative variants, we also note that such haplotype-blocks might be well suited to study the dynamics of selected genomic regions during experimental evolution studies.
This zip archive includes two python scripts referenced in the associated publication. Functionality includes: 1) Estimation of short and long range LD from haplotype data, 2) Identification of haplotype-blocks through singleton markers. Each script comes with an associated README file, parameter specifications and example data. See Material and Methods and a description in the Supplemental Material for more details.
This zip archive contains fasta files of 29 haploid D. melanogaster genomes associated with the experimental evolution experiment in the referenced publication. They correspond to 29 haploid genomes for the experimental starting population. Provided are the major chromosomal arms X, 2L, 2R, 3L and 3R. See Material and Methods for more details.
This zip archive contains fasta files of 29 haploid D. melanogaster genomes associated with the experimental evolution experiment in the referenced publication. They correspond to 29 haploid genomes for a (hot) evolved population at generation 67 (2nd replicate). Provided are the major chromosomal arms X, 2L, 2R, 3L and 3R. See Material and Methods for more details.
Sync file for 3 time points for 3 replicates
Sync file of allele coverage counts for 1,453,774 called SNPs in the euchromatin of the *D. melanogaster* genome (for format specification see Kofler et al., PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq).
Bioinformatics 2011;27:3435-3436.). To the sync file a header line is added, which specifies the time points, replicates and treatment abbreviations (FX: time point at generation X, RY: replicate number Y, H: hot treatment). The sync file contains time points for generations: 0, 15, 37 and 59. For generations 0, 15 and 38 please additionally cite: Orozco-terWengel P, Kapun M, Nolte V, Kofler R, Flatt T, Schlötterer C. Adaptation of *Drosophila* to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol Ecol. 2012;21:4931-4941.