Skip to main content
Dryad

Data from: The role of structural genomic variants in population differentiation and ecotype formation in Timema cristinae walking sticks

Cite this dataset

Lucek, Kay; Gompert, Zach; Nosil, Patrik (2019). Data from: The role of structural genomic variants in population differentiation and ecotype formation in Timema cristinae walking sticks [Dataset]. Dryad. https://doi.org/10.5061/dryad.j4b543t

Abstract

Theory predicts that structural genomic variants such as inversions can promote adaptive diversification and speciation. Despite increasing empirical evidence that adaptive divergence can be triggered by one or a few large inversions, the degree to which widespread genomic regions under divergent selection are associated with structural variants remains unclear. Here we test for an association between structural variants and genomic regions that underlie parallel host-plant associated ecotype formation in Timema cristinae stick insects. Using mate-pair re-sequencing of 20 new whole genomes we find that modest-sized structural variants such as inversions, deletions, and duplications are widespread across the genome, being retained as standing variation within and among populations. Using 160 previously published, standard-orientation whole genome sequences we find little to no evidence that the DNA sequences within inversions exhibit accentuated differentiation between ecotypes. In contrast, a formerly described large region of reduced recombination that harbors genes controlling color-pattern exhibits evidence for accentuated differentiation between ecotypes, which is consistent with differences in the frequency of color-pattern morphs between host-associated ecotypes. Our results suggest that some types of structural variants (e.g., large inversions) are more likely to underlie adaptive divergence than others, and that structural variants are not required for subtle yet genome-wide genetic differentiation with gene flow.

Usage notes

Deletion variants

VCF file with the 194 deletion structural variants found that were identified using Lumpy and Delly. Data for all 20 Timema cristinae individuals are included.

mod_del_genotyped.vcf.gz

Duplication variants

VCF file with the 223 duplication structural variants found that were identified using Lumpy and Delly. Data for all 20 Timema cristinae individuals are included.

mod_dup_genotyped.vcf.gz

Inversion variants

VCF file with the 492 inversion structural variants found that were identified using Lumpy and Delly. Data for all 20 Timema cristinae individuals are included.

mod_lumpy_inversions_genotyped.vcf.gz

SV population genetics script

R script for population genetic analyses and plots of the structural variant data. This includes calculations for Fst.

svSummary.R

SV allele frequencies

This compressed directory includes maximum l likelihood allele frequency estimates for the SVs. There is one file per SV type (inv = inversion, del = deletion, dup = duplication) and population. Files without population IDs are for all individuals together. In each file, there is one row per SV, the first column gives the locus ID, and the third column gives the non-reference SV allele frequency.

svAlleleFreqs.tar.gz

MeasureOrientationFreqs

One of two complementary perl scripts used to identify the inversions from the whole genome comparative alignment.

ExtractOrientInversions

One of two complementary perl scripts used to identify the inversions from the whole genome comparative alignment.

SNP variant file

VCF file with SNPs from the 160 Timema cristinae genomes.

filtered1X_tcr_wgs_variants_x.vcf.gz

SNP allele frequencies

This compressed directory includes maximum l likelihood allele frequency estimates for the SNPs from the 160 genomes. There is one file per population. In each file, there is one row per SNP, the first column gives the locus ID, and the third column gives the non-reference allele frequency.

snpAlleleFreqs.tar.gz

R population genomics script

This R script contains the core analyses of genetic variation within inversions sequences based on SNPs from the 160 Timema cristinae genomes.

popgen.R

Funding

European Research Council, Award: R/129639

Swiss National Science Foundation, Award: P2BEP3_152103