Identifying the genetic changes driving adaptive variation in natural populations is key to understanding the origins of biodiversity. The mosaic of mimetic wing patterns in Heliconius butterflies makes an excellent system for exploring adaptive variation using next-generation sequencing. In this study, we use a combination of techniques to annotate the genomic interval modulating red color pattern variation, identify a narrow region responsible for adaptive divergence and convergence in Heliconius wing color patterns, and explore the evolutionary history of these adaptive alleles. We use whole genome resequencing from four hybrid zones between divergent color pattern races of Heliconius erato and two hybrid zones of the co-mimic Heliconius melpomene to examine genetic variation across 2.2 Mb of a partial reference sequence. In the intergenic region near optix, the gene previously shown to be responsible for the complex red pattern variation in Heliconius, population genetic analyses identify a shared 65 kb region of divergence that includes several sites perfectly associated with phenotype within each species. This region likely contains multiple cis-regulatory elements that control discrete expression domains of optix. The parallel signatures of genetic differentiation in H. erato and H. melpomene support a shared genetic architecture between the two distantly related co-mimics; however, phylogenetic analysis suggests mimetic patterns in each species evolved independently. Using a combination of next-generation sequencing analyses, we have refined our understanding of the genetic architecture of wing pattern variation in Heliconius and gained important insights into the evolution of novel adaptive phenotypes in natural populations.
Neighbor-joining trees of H. erato and H. melpomene combined using aligned fragments from the region of highest association
Trees output from a neighbor-joining distance analysis run in PAUP*4b10 of the aligned fragments in melperato_alignedfrags_D454to742KB_DATA.nex. NJ trees were constructed to test monophyly of fragments across the peak region of association (from 454 -742KB in H. erato) in the red color pattern interval. The format is a nexus tree file format with extra notes from PAUP* output.
melperato_alignedfrags_454to742KB_NJTREES.nex
Aligned sequence fragments of H. erato and H. melpomene combined from the region of highest association
Fragments within the peak of association within the red pattern interval (454KB to 742KB in H. erato) that were aligned across Heliconius erato and Heliconius melpomene individuals. Alignments involved automated alignment followed by manual filtering to remove regions of poor alignment. This file was used for getting NJ trees across each fragment to test monophyly (melperato_alignedfrags_454to742KB_NJTREES.nex).
melperato_alignedfrags_D454to742KB_DATA.nex
Data and resulting phylogenetic tree of H. erato and H. melpomene involving Bayesian analysis of SNPs from the 515 - 580 KB peak of highest association
This datafile involved extracting the most reliable SNPS from the highest peak of association from 515-580KB. Automated alignments of fragments from these region (see melperato_alignedfragsD454to742KB_DATA.nex) were manually adjusted to improve alignments, fragments were concatenated, and SNPS with invariant sites and >20% missing data were removed. This was used for a Bayesian analysis of sequences from the highest peak involving both H. erato and H. melpomene. This nexus formatted file includes Bayes block and the resulting consensus Bayesian tree.
melperato_manalignSNPsD515to580Bayes_DATATREE.nex
Manual alignments of fragments within the 65KB window for H. erato and H. melpomene
This file is provided for visualization of the manual alignments used to extract the SNPs that went into the Bayesian analysis of H. erato and H. melpomene across the 65KB peak of association (515-580KB in H. erato; melperato_manalignSNPsD515to580Bayes_DATATREE.nex). It involves H. melpomene and H. erato aligned fragments from this region (see melperato_alignedfrags_D454to742KB_DATA.nex) that have been concatenated and manually aligned. The file is in nexus format.
melperato65KB_manalignfragsDATA.nex
Sequence alignments from across the red pattern interval of Heliconius erato
This nexus file includes all SNP data for each individual of Heliconius erato sequenced across the four hybrid zones aligned against the full red color pattern interval (D interval). Ns represent missing or excluded data from low quality sequence calls.
wgserato_Dregion_aligned.nex
Aligned sequences of Heliconius erato from across the color pattern unlinked BAC including cinnabar
This nexus file includes all SNP data for each individual of Heliconius erato sequenced across the four hybrid zones aligned against the color pattern-unlinked BAC region that includes the gene cinnabar. Ns represent missing or excluded data from low quality sequence calls.
Unlinked_Cinn_Final_29March2009.nex
Aligned sequences of Heliconius erato across the color pattern-unlinked BAC 46F09
This nexus file includes all SNP data for each individual of Heliconius erato sequenced across the four hybrid zones aligned against the color pattern-unlinked BAC region 46F09. Ns represent missing or excluded data from low quality sequence calls.
Unlinked_Hera_46F09.nex
Aligned sequences of Heliconius erato across the color pattern-unlinked BAC 48A16
This nexus file includes all SNP data for each individual of Heliconius erato sequenced across the four hybrid zones aligned against the color pattern-unlinked BAC 48A16. Ns represent missing or excluded data from low quality sequence calls.
Unlinked_Hera_48A16.nex
Sequence data used for performing 15KB sliding window phylogenetic analysis of the H. erato red pattern interval
Sequence data of Heliconius erato used to perform 15KB sliding window distance-based (neighbor-joining) phylogenetic analysis across the red pattern interval. The file includes the script that generated the trees using PAUP*4b10.
Hera_D_15KBslidingnjtreesDATA.nex
Likelihood scores for geographic and color-based trees for Heliconius erato across the red pattern interval
Sequence data of Heliconius erato across the red color pattern interval and scripts used to obtain scores of the likelihood of two alternative phylogenetic tree topologies (geographic - GEO; color-pattern based - D) explaining the data across 15KB sliding windows. The file includes sequences, trees used to compute likelihood scores, and the script run in PAUP* to calculate the likelihood scores for each tree.
Hera_D_15KBslidingLIKLscoresDATA.nex
Phylogenetic trees from the 15KB sliding window analysis of H. erato along the red pattern interval
Distance based neighbor-joining trees across the sliding window derived from Hera_D_15KBsliding njtreesDATA.nex. Trees are ordered by position along the d-interval, staggered every 5KB. The file is in nexus format.
D15kbsliding_NJTREES.nex
Data and phylogenetic tree for the Bayesian analysis of H. erato in region 1 of the red pattern interval
Includes region 1 of 5 separate partitions of the red pattern interval based on optimal tree partitioning from the MDL method. Includes the SNPs from 1 to 324665, the Bayes block that went into the analysis, and the resulting Bayesian consensus tree.
D1to324K_BayesTREEDATA.nex
Data and phylogenetic tree for the Bayesian analysis of H. erato in region 2 of the red pattern interval
Includes region 2 of 5 separate partitions of the red pattern interval based on optimal tree partitioning from the MDL method. Includes the SNPs from Bases 324,666 to 474,310, the Bayes block that went into the analysis, and the resulting Bayesian consensus tree.
D324to474K_BayesDATATREE.nex
Data and phylogenetic tree for the Bayesian analysis of H. erato in region 3 of the red pattern interval
Includes region 3 of 5 separate partitions of the red pattern interval based on optimal tree partitioning from the MDL method. Includes the SNPs from Bases 474,311 to 773,020, the Bayes block that went into the analysis, and the resulting Bayesian consensus tree.
D474to773K_BayesDATATREE.nex
Data and phylogenetic tree for the Bayesian analysis of H. erato in region 4 of the red pattern interval
Includes region 4 of 5 separate partitions of the red pattern interval based on optimal tree partitioning from the MDL method. Includes the SNPs from Bases 773,021 to 846,050, the Bayes block that went into the analysis, and the resulting Bayesian consensus tree.
D773to846K_DATATREE.nex
Data and phylogenetic tree for the Bayesian analysis of H. erato in region 5 of the red pattern interval
Includes region 5 of 5 separate partitions of the red pattern interval based on optimal tree partitioning from the MDL method. Includes the SNPs from Bases 846,050 to 945,009, the Bayes block that went into the analysis, and the resulting Bayesian consensus tree.
D846K+_BayesDATATREE.nex
Bayesian phylogeny of sequence unlinked to color pattern in H. erato
"Color pattern unlinked" Bayesian phylogenetic tree of H. erato derived from SNPs from the 3 color pattern unlinked regions combined (cinnabar BAC, BAC 46F09, BAC 48A16) and excluding sites with >20% missing data. Also includes Bayes block and the resulting Bayesian consensus tree. The file is in nexus format.
bestunlinked_BayesDATATREE.nex
Sequence data and Bayesian phylogenetic tree of Heliconius erato from the 65 KB peak of association in the red pattern interval
The sequence data and Bayesian consensus tree generated of H. erato from the 65KB region (515 to 580 KB) of highest association in the red pattern interval. The data excludes sites with >20% missing data or that were invariant. This is the same data file as for Splitstree except sites variable only in heterozygotes are removed as these are considered as missing data in the analysis and are effectively constant. The file also includes the Bayes block for running in MrBayes.
bestsnpsD515to580KB_Bayes_DATATREE.nex
Sequence data for Splitstree analysis of Heliconius erato from the 65 KB peak of association in the red pattern interval
The sequence data used for Splitstree analysis of H. erato from the 65KB region (515 to 580 KB) of highest association in the red pattern interval. The data excludes sites with >20% missing data or that were invariant. The file includes sites that had variation only in heterozygotes as Splitstree analysis can take into account heterozygote information.
best_D_515to580KSplitstreeDATA.nex
Genotype calls for Heliconius, 4 erato and 2 melpomene populations (vcf format)
Vcf files were generated by aligning to a reference using BWA. Genotypes were called using GATK. File prefix is _. Files are standard vcf format.
vcfs4dryad.zip