Skip to main content

Recent hybrids recapitulate ancient hybrid outcomes

Cite this dataset

Gompert, Zachariah et al. (2020). Recent hybrids recapitulate ancient hybrid outcomes [Dataset]. Dryad.


Genomic outcomes of hybridization depend on selection and recombination in hybrids. Whether these processes have similar effects on hybrid genome composition in contemporary hybrid zones versus ancient hybrid lineages is unknown. Here we show that patterns of introgression in a contemporary hybrid zone in Lycaeides butterflies predict patterns of ancestry in geographically adjacent, older hybrid populations. We find a particularly striking lack of ancestry from one of the hybridizing taxa, Lycaeides melissa, on the Z chromosome in both the old and contemporary hybrids. The same pattern of reduced L. melissa ancestry on the Z chromosome is seen in two other ancient hybrid lineages. More generally, we find that patterns of ancestry in old or ancient hybrids are remarkably predictable from contemporary hybrids, which suggests selection and recombination affect hybrid genomes in a similar way across disparate time scales and during distinct stages of speciation and species breakdown.



We analyzed partial genome sequences from 835 Lycaeides butterflies from 23 populations in western North America. The sequence data from 643 of these butterflies was previously described in a study of admixture across the Lycaeides species complex (Gompert et al. 2014). Data from 192 of the butterflies were generated for the current study, and this includes many (but not all) of the Dubois individuals. Specific locations (latitude and longitude) and sample sizes for each population are provided in the manuscript. DNA sequence data were generated by  on an Illumina HiSeq 2500 (100 bp, single-end reads) by the Genome Sequencing and Analysis Facility at the University of Texas (Austin, TX).

Usage notes

Compressed, archive (tar.gz) files are included for each step in the analysis. Each file contains the scripts/code used and a readme file. Here we summarize each one:

1. DNA sequence aligment and filtering scripts (alignment.tar.gz) = perl script to filter variants =  bash script to run alignments using bwa mem from script = basg script to run script below on the cluster = wrapper perl script to run alignments using bwa mem = bash script to do variant calling  = bash script to run on the cluster = perl script to filter variants based on allele frequencies = wrapper script to convert sam files to bam format

2. Variant filtering scripts (variantcalling_filtering.tar.gz) = This is a perl script to filter variants = This perl script converts vcf file to genotype likelihood file and filters variants based on depth of coverage. = This perl script converts vcf file to genotype likelihood file and filters variants based on minor allele frequency of SNPs.

3. Genotype inference with entropy (entropy.tar.gz) = perl fork script to run entropy = perl script to convert genotype likelihood estimates to genotype estimates for entopry
initq.R  = R command script to prepare entropy input files = run entropy on the cluster

4. Ancestry frequency inference (popanc.tar.gz) = This is a perl script to run popanc program = This is a bash script to run the perl script on the cluster

5. Genomic clines analysis (bgc.tar.gz)
bgc = bgc program execution file  
estpost_bgc = program execution file to calculate posterior estimates = perl script to run bgc
plot_clines_males.R = R script to plot clines = bash script to run bgc on the cluster = perl script to split populations and create one file per population

6. Whole genome phylogenetic analysis (phylo.tar.gz)

phylo.R = whole genome phylogentic analyses with R = processing script, adds scaffold and position boundaries to 1000 bp windows = processing script, grabs the first and last position and scaffold of the SNPs for each window (works by linkage group) = generates alignments of $win SNPs and generates a tree for each alignment
catWindowTrees1kb = phylogenetic trees for 1 kb windows

7. Genome annotation

This folder contains working pipeline and scripts to do genome annotation. = describes a pipeline to do genome annotation using MAKER. = describes a pipeline to annotation a SNP dataset with information from MAKER genome annotation. = is a python script used in SNP annotation (read = is a python script used in SNP annotation (read

8. Bayesian genomic clines 1.04b (bgcdist-tempversion.tar.gz)

This is a version of bgc that includes a new option to pre-sample hybrid index with alpha and beta set to 0.



National Science Foundation