The Heliconius butterflies are a diverse recent radiation comprising multiple levels of divergence with on-going gene flow between species. The recently sequenced genome of Heliconius melpomene allowed us to investigate the genomic evolution of this group using dense RAD marker sequencing. Phylogenetic analysis of 54 individuals robustly supported reciprocal monophyly of H. melpomene and H. cydno and refuted previous phylogenetic hypotheses that H. melpomene may be paraphylectic with respect to H. cydno. H. timareta also formed a monophyletic clade closely related but distinct from H. cydno with H. heurippa falling within this clade. We find evidence for pervasive gene flow between sympatric populations of the sister clades H. melpomene and H. cydno/timareta, particularly between H. cydno and H. melpomene from Central America and between H. timareta and H. melpomene from the eastern slopes of the Andes. Between races, divergence is primarily explained by isolation-by-distance; and there is little, if any, genetic population structure between parapatric races, suggesting that hybrid zones between races are not zones of secondary contact. Our results support previous findings that colour pattern loci are shared between populations and species with similar colour pattern elements. Further this pattern is almost unique to these genomic regions with only a very small number of other loci showing significant similarity between populations and species with similar colour patterns.
RAD_cydmelhec_data.geno
Space delimited tabular file giving the genotype calls used this paper. They were filtered from the .vcf files (generated by GATK unified genotyper) and represent all bases that were called with a genotype quality >=30 in multiple individuals. Individuals are in columns and genome positions in rows. Missing data are given as "N". This file was filtered for the relevant individuals and all sites with missing data removed prior to running analyses with any of the given scripts.
clumpy
Python script used to split the single calls file(s) into individual RAD loci. The output consists of multiple files - one for each locus found - all in the same format as the original calls input file, and a comma separated values file with the start and end positions in the genome and length of each RAD locus.
population_matrix_taxon
A file giving taxonomic groupings for each population for the AMOVA analysis
population_matrix_CP
A file giving colour pattern groupings for each population for the AMOVA analysis
Hmel_chromosomes_dec_2011_OLD_HOX_no_header_filtered
A file relating the scaffolds to chromosomes (inferred from RAD mapping from The Heliconius Genome Consortium 2012). This was used to generate genome-wide plots.
Fst2
R script used to calculate Fst between all pairs of populations. Inputs are as Fst.r.
population_matrix_geo
A file giving geographic groupings for each population for the AMOVA analysis
geno_to_haplo_54
Perl script used to go from genotype calls to haplotype calls (not phased). This was done before running the Fst scripts but not AMOVAs.
Fst
R script used to calculate the Fst values for the major race and species comparisons discussed in the paper. Inputs are the indidividual RAD locus haplotpe files and the summary statistics file generated by clump.py
amova1
R script used to perform AMOVA analyses. Generates a distance matrix based to genotype calls and used the ade4 package to run Amovas. Inputs are the indidividual RAD locus genotype files and the summary statistics file generated by clump.py and character matrix files (see below).
Fst_spcorr
R script used to calculate correlations between Fst values at increasing genomic distances. Input is the output from Fst.r.
genotype_matrix
A starting genotype matrix assuming sepatare genotypes for each individual that is modified by the AMOVA script.