Skip to main content
Dryad

Recent hybrids recapitulate ancient hybrid outcomes

Cite this dataset

Gompert, Zachariah et al. (2020). Recent hybrids recapitulate ancient hybrid outcomes [Dataset]. Dryad. https://doi.org/10.5061/dryad.76hdr7ssw

Abstract

Genomic outcomes of hybridization depend on selection and recombination in hybrids. Whether these processes have similar effects on hybrid genome composition in contemporary hybrid zones versus ancient hybrid lineages is unknown. Here we show that patterns of introgression in a contemporary hybrid zone in Lycaeides butterflies predict patterns of ancestry in geographically adjacent, older hybrid populations. We find a particularly striking lack of ancestry from one of the hybridizing taxa, Lycaeides melissa, on the Z chromosome in both the old and contemporary hybrids. The same pattern of reduced L. melissa ancestry on the Z chromosome is seen in two other ancient hybrid lineages. More generally, we find that patterns of ancestry in old or ancient hybrids are remarkably predictable from contemporary hybrids, which suggests selection and recombination affect hybrid genomes in a similar way across disparate time scales and during distinct stages of speciation and species breakdown.

 

Methods

We analyzed partial genome sequences from 835 Lycaeides butterflies from 23 populations in western North America. The sequence data from 643 of these butterflies was previously described in a study of admixture across the Lycaeides species complex (Gompert et al. 2014). Data from 192 of the butterflies were generated for the current study, and this includes many (but not all) of the Dubois individuals. Specific locations (latitude and longitude) and sample sizes for each population are provided in the manuscript. DNA sequence data were generated by  on an Illumina HiSeq 2500 (100 bp, single-end reads) by the Genome Sequencing and Analysis Facility at the University of Texas (Austin, TX).

Usage notes

Compressed, archive (tar.gz) files are included for each step in the analysis. Each file contains the scripts/code used and a readme file. Here we summarize each one:

1. DNA sequence aligment and filtering scripts (alignment.tar.gz)

filterSomeMore.pl = perl script to filter variants
runbwamem.sh =  bash script to run alignments using bwa mem from wrap_qsub_slurm_bwa_mem.pl script
runVcfFilter_af.sh = basg script to run vcfFilter_af.pl script below on the cluster
wrap_qsub_slurm_bwa_mem.pl = wrapper perl script to run alignments using bwa mem
lycaeidesVariantcalling.sh = bash script to do variant calling
runFilterSomeMore.sh  = bash script to run filterSomeMore.pl on the cluster
vcfFilter_af.pl = perl script to filter variants based on allele frequencies
wrap_qsub_slurm_sam2bam.pl = wrapper script to convert sam files to bam format

2. Variant filtering scripts (variantcalling_filtering.tar.gz)

filterSomeMore.pl = This is a perl script to filter variants  
vcf2gl_depth.pl = This perl script converts vcf file to genotype likelihood file and filters variants based on depth of coverage.
vcf2gl_maf.pl = This perl script converts vcf file to genotype likelihood file and filters variants based on minor allele frequency of SNPs.

3. Genotype inference with entropy (entropy.tar.gz)

forkRunEntropy.pl = perl fork script to run entropy
gl2genest.pl = perl script to convert genotype likelihood estimates to genotype estimates for entopry
initq.R  = R command script to prepare entropy input files
runentropy.sh = run entropy on the cluster

4. Ancestry frequency inference (popanc.tar.gz)

forkrunpopanc.pl = This is a perl script to run popanc program  
runpopanc_fork.sh = This is a bash script to run the perl script on the cluster

5. Genomic clines analysis (bgc.tar.gz)
 
bgc = bgc program execution file  
estpost_bgc = program execution file to calculate posterior estimates
forkgenomicclines.pl = perl script to run bgc
plot_clines_males.R = R script to plot clines
rungenomicclines_fork.sh = bash script to run bgc on the cluster
splitPops.pl = perl script to split populations and create one file per population

6. Whole genome phylogenetic analysis (phylo.tar.gz)

phylo.R = whole genome phylogentic analyses with R
appendWins.pl = processing script, adds scaffold and position boundaries to 1000 bp windows
getWinScafPos.pl = processing script, grabs the first and last position and scaffold of the SNPs for each window (works by linkage group)
mkRunWindows.pl = generates alignments of $win SNPs and generates a tree for each alignment
catWindowTrees1kb = phylogenetic trees for 1 kb windows

7. Genome annotation

This folder contains working pipeline and scripts to do genome annotation.
maker_genome_annotation.md = describes a pipeline to do genome annotation using MAKER.
SNP_annotation.md = describes a pipeline to annotation a SNP dataset with information from MAKER genome annotation.
create_snp_annotation.py = is a python script used in SNP annotation (read SNP_annotation.md).
analysesSnpAnnot_combine.py = is a python script used in SNP annotation (read SNP_annotation.md).

8. Bayesian genomic clines 1.04b (bgcdist-tempversion.tar.gz)

This is a version of bgc that includes a new option to pre-sample hybrid index with alpha and beta set to 0.

 

Funding

National Science Foundation