The legacy of recurrent introgression during the radiation of hares

Name: The legacy of recurrent introgression during the radiation of hares
Keywords: Ancient introgression

Ferreira, Mafalda S.1 ; Jones, Matthew R.2; Callahan, Colin M.2; Farelo, Liliana1; Tolesa, Zelalem3; Suchentrunk, Franz4; Boursot, Pierre5; Mills, L. Scott2; Alves, Paulo C.1; Good, Jeffrey M.2 ; Melo-Ferreira, José 1

Published Dec 15, 2020 on Dryad. https://doi.org/10.5061/dryad.bzkh18967

Abstract

Hybridization may often be an important source of adaptive variation, but the extent and long-term impacts of introgression have seldom been evaluated in the phylogenetic context of a radiation. Hares (Lepus) represent a widespread mammalian radiation of 32 extant species characterized by striking ecological adaptations and recurrent admixture. To understand the relevance of introgressive hybridization during the diversification of Lepus, we analyzed whole exome sequences (61.7 Mb) from 15 species of hares (1- 4 individuals per species), spanning the global distribution of the genus, and two outgroups. We used a coalescent framework to infer species relationships and divergence times, despite extensive genealogical discordance. We found high levels of allele sharing among species and show that this reflects extensive incomplete lineage sorting and temporally layered hybridization. Our results revealed recurrent introgression at all stages along the Lepus radiation, including recent gene flow between extant species since the last glacial maximum, but also pervasive ancient introgression occurring since near the origin of the hare lineages. We show that ancient hybridization between northern hemisphere species has resulted in shared variation of potential adaptive relevance to highly seasonal environments, including genes involved in circadian rhythm regulation, pigmentation, and thermoregulation. Our results illustrate how the genetic legacy of ancestral hybridization may persist across a radiation, leaving a long-lasting signature of shared genetic variation that may contribute to adaptation within and among species.

The pipeline and custom scripts used to recapitulate the analyses in this work can be found at https://github.com/evochange/hare-phylogenomics.

Supplementary Files:

Supplementary_Materials_and_Methods.pdf

Supplementary Materials and Methods.

Supplementary_Figures.pdf

Supplementary Figures S1 to S13.

Supplementary_Tables.xlsx

Supplementary Table S1 to S15.

Data Files:

Pseudo-references

BID_CG461_pseudoref.gatk.iteration4.consensus.FINAL.fa.gz

LCL_FXG64H.gatk.iteration4.consensus.FINAL.fa.gz

LCP_TAN3023.gatk.iteration4.consensus.FINAL.fa.gz

LCR_ITA1958.gatk.iteration4.consensus.FINAL.fa.gz

LCS_1891.gatk.iteration4.consensus.FINAL.fa.gz

LER_ALM1580.gatk.iteration4.consensus.FINAL.fa.gz

LFG_FAG114_pseudoref.gatk.iteration4.consensus.FINAL.fa.gz

LGR_CRE2553.gatk.iteration4.consensus.FINAL.fa.gz

LHB_HAB68_pseudoref.gatk.iteration4.consensus.FINAL.fa.gz

LMS_PRI2460.gatk.iteration4.consensus.FINAL.fa.gz

LOT_2109.gatk.iteration4.consensus.FINAL.fa.gz

LST_STA65_pseudoref.gatk.iteration4.consensus.FINAL.fa.gz

LTM_FIN2191.gatk.iteration4.consensus.FINAL.fa.gz

LTW_JRRK3.gatk.iteration4.consensus.FINAL.fa.gz

Pseudo-reference genomes for each species generated by iteratively mapping (four iterations and allowing ambiguity codes) cleaned single and paired-end reads from one individual per species (Supplementary Table S1 available on Dryad) to the European rabbit reference genome (OryCun2.0). Names correspond to sample IDS in Supplementary Table S1.

Consensus sequences:

consensus_fasta_sequences.zip

Consensus fasta sequences for hares and pygmy rabbit used as input for all analyses in this study. Inside the zip file there is a file per individual (identified by Sample ID in Supplementary Table S1) containing one fasta sequence per chromosome.

ASTRAL:

astral5.6.3_chrX_TCs5_gene_trees.tre

ASTRAL species tree obtained from 181 chromosome X gene trees, displayed in Supplementary Figure S8.

astral5.6.3_lineage_TCs5_gene_trees.tre

astral5.6.3_species_TCs5_gene_trees.tre

ASTRAL species tree obtained with 8889 gene trees without (“lineage”) or with (“species”) assigning individuals to species. These trees are displayed in Supplementary Figure S2.

chr_X_bestTrees_TCs5.txt.gz

Maximum likelihood gene trees from 181 X-linked regions used as input to generate the ASTRAL species tree astral5.6.3_chrX_TCs5_gene_trees.tre displayed in Supplementary Figure S8.

gene_trees_TCs5.tree.gz

Maximum likelihood gene trees from 8889 regions used as input to generate the ASTRAL species trees astral5.6.3_lineage_TCs5_gene_trees.tre and astral5.6.3_species_TCs5_gene_trees.tre displayed in Supplementary Figure S2.

partitions_fasta_alignments.zip

gene_trees_raxml_output.zip

Fasta alignments generated from partitioning the genome in 50 kilobase (kb) windows with no overlap. Only windows inside target regions in the capture and 200 bp flanking regions were retained. Positions with more than 30% missing data were excluded from the alignments. These fasta alignments were used as input for RAxML to generate maximum likelihood gene trees (gene_trees_raxml_output.zip) that were later used for the multispecies coalescent analysis implemented in ASTRAL

SVDquartets:

snp_alignment_33indvs_chr_X.nex

snp_alignment_33indvs_WithX.nex

SNP alignments in nexus format containing only 1473 X-linked independent SNPs (snp_alignment_33indvs_chr_X.nex) or containing 45,779 exome-wide independent SNPs (snp_alignment_33indvs_WithX.nex) used as input for SVDquartets.

lineage_33indvs_chr_X_qall_b1000

lineage_33indvs_chr_X_qall_b1000.tre

lineage_33indvs_WithX_qall_b1000

lineage_33indvs_WithX_qall_b1000.tre

species_33indvs_chr_X_qall_b1000

species_33indvs_chr_X_qall_b1000.tre

species_33indvs_WithX_qall_b1000

species_33indvs_WithX_qall_b1000.tre

SVDquartets output log files (no extension) and tree files (.tre). The analyses were performed with (“species”) or without (“lineage”) assigning individuals to species, and with X-linked (“chr_X”) or whole-exome (“WithX”) SNPs. Exome-wide trees are displayed in Supplementary Figure S3 and Chromosome X trees are displayed in Supplementary Figure S8.

Concatenated whole-exome maximum likelihood tree:

exome_alignment_33indvs_f100.fas.gz

Fasta alignment containing 11,949,529 exome-wide concatenated positions with no missing data used to generate the Maximum Likelihood tree in Supplementary Figure S1.

RAxML_bipartitions.exome_alignment_33indvs_f100.tree

RAxML_bootstrap.exome_alignment_33indvs_f100.tree

RAxML_info.exome_alignment_33indvs_f100.tree

RAxML output files used to estimate a single bifurcating phylogeny using the fasta alignment exome_alignment_33indvs_f100.fas.gz and using a maximum likelihood (ML) search and rapid bootstrapping run under the GTR+GAMMA model of sequence evolution.

Bayesian estimation of divergence times:

9k_CDS_123_alignments.phy

PHYLIP alignment with data for 15 hare species (1 individual per species) and 1 outgroup species containing 3 partitions corresponding to the 1^st, 2^nd and 3^rd codon positions of 9015 protein coding genes used as input for Bayesian estimation of divergence times of the Lepus species tree with MCMCtree.

FigTree_mcmc_9k_123_CDS_tree3_run1.tre

FigTree_mcmc_9k_123_CDS_tree3_run2.tre

FigTree_mcmc_9k_123_CDS_tree5_run1.tre

FigTree_mcmc_9k_123_CDS_tree5_run2.tre

Output tree files obtained after Bayesian estimation of divergence times with MCMCtree. We run MCMCtree twice (run 1 and run2) per calibration strategy (a) molecular-based dates extrapolated from deep fossil calibrations in the lagomorphs (tree 3) or (b) fossil calibrations for the divergence of the genus Lepus and the Lepus species (tree5). See Materials and Methods for details. Run1 trees are displayed in Supplementary Figure S5.

mcmc_9k_123_CDS_tree3.zip

mcmc_9k_123_CDS_tree5.zip

Folders containing all input files (alignment and species with age priors), control files (mcmc.ctl) and output files (trees and log files for hessian and approximate maximum likelihood estimations) obtained from two MCMCtree runs with different calibration strategies (a) molecular-based dates extrapolated from deep fossil calibrations in the lagomorphs (tree 3) and (b) fossil calibrations for the divergence of the genus Lepus and the Lepus species (tree5).

Majority Rule Consensus Tree

RAxML_MajorityRuleExtendedConsensusTree_IC.gene_trees_TCs5.tree

Majority Rule Consensus Tree obtained with RAxML (-L MRE option) with the set of gene trees (gene_trees_TCs5.tree.gz) used to generate the ASTRAL species tree. This tree is displayed in Supplementary Figure S7.

PhyloNet:

gene_trees_TCs5_m0_plopt.nex.gz

gene_trees_TCs5_m1_plopt.nex.gz

gene_trees_TCs5_m2_plopt.nex.gz

gene_trees_TCs5_m3_plopt.nex.gz

gene_trees_TCs5_m4_plopt.nex.gz

Nexus file containing 8889 gene trees and PhyloNet command used to produce pseudo-maximum likelihood networks with 0 to 4 reticulation events displayed in Supplementary Figure S10.

gene_trees_TCs5_m0_plopt.out.gz

gene_trees_TCs5_m1_plopt.out.gz

gene_trees_TCs5_m2_plopt.out.gz

gene_trees_TCs5_m3_plopt.out.gz

gene_trees_TCs5_m4_plopt.out.gz

PhyloNet raw outputs obtained from pseudo-maximum likelihood network analyses with 0 to 4 reticulation events. Outputs contain the results of 10 interactions and the five best networks. The best networks from each run are displayed in Supplementary Figure S10.

TreeMix:

snp_alignment_LepusOnly_hap_treemix_inputtreemix_file_out.txt.gz

Allele frequencies for 30,709 biallelic SNPs from 15 hare species used as input for ancestral population graph reconstruction with TreeMix.

treemix_outputs_m0-9_reticulations.zip

Output files for 10 TreeMix runs with 0 to 9 reticulations events using allele frequency data of 30,709 biallelic SNPs for 15 hare species snp_alignment_LepusOnly_hap_treemix_inputtreemix_file_out.txt.gz. The networks in these files and corresponding residuals are displayed in Supplementary Figures S11 and S12.

Genetic divergence and nucleotide diversity:

wg_filtered_wHead.geno.gz

Geno file containing the whole-exome alignment used to calculate genetic divergence between and nucleotide diversity and heterozygosity within hare species with Genomic Generals (https://github.com/simonhmartin/genomics_general) scripts. This was also the input file used to calculate f_hom among black-tailed jackrabbits and snowshoe hare populations.

wg_filtered_wHead.pi

wg_filtered_wHead.dxy

Nucleotide diversity (“pi”) and genetic divergence (“d_xy”) in 1 Megabase overlapping windows calculated for 15 hare species with the script popgenWindows.py from Genomics General (https://github.com/simonhmartin/genomics_general). Results average across windows can be found in Supplementary Table S8 and S9.

Minimum D-statistics (D_min):

by_species_wg_filtered_wgHead.freq.tsv.gz

Derived allele frequencies for 15 hare species calculated with freq.py from Genomics General (https://github.com/simonhmartin/genomics_general) and the input file wg_filtered_wHead.geno.gz.

D_stat_by_species_results_FINAL.txt

Dmin_by_species_sig_FINAL.txt

D-statistic values (D_stat_by_species_results_FINAL.txt) for all possible trios of species in the dataset of 15 hare species analyzed in this study and the minimum D-statistic per trio (Dmin_by_species_sig_FINAL.txt). These results are represented in Supplementary Figure S9.

‘f-branch’ statistics (f_b(C)):

all_frequencies_filtered.freq.tsv.zip

fb_C_comparisons_f_G.txt

Derived allele frequencies for several taxa configurations (per species, per individual, per population or per haploid chromosome of the white-sided jackrabbit) used to calculate admixture proportion for all f(A,B,C,O) conformations necessary for the ‘f-branch’ statistic. In the table fb_C_comparisons_f_G.txt we list the ABBA-BABA configuration used to calculate admixture proportion for each branch b and C pair in the species tree (Figure 1b). The order of the columns is 1) branch b; 2) branch a (sister to b); 3) all B descendants of b; 4) all A descendants of a; 5) donor branch C; 6) taxa serving as Ca (or P3a); 7) taxa serving as Cb (or P3b).

fG_results_species.txt

Admixture proportion results for all f(A,B,C,O) configurations in fb_C_comparisons_f_G.txt used as input to calculate ‘f-branch’ statistics (f_b(C)) among all pairs of b and C branches in the species tree (Figure 1b).

Scans of fraction of admixture (f_d):

L_othus_vs_L_americanus_fdscan_50k5k.out

L_othus_vs_L_americanus_fdscan_99.5_sig_annotation.txt

L_townsendii_vs_Lamericanus_fd_scan_99.5_sig_annotation.txt

L_townsendii_vs_Lamericanus_fdscan_50k5k.out

L_timidus_vs_Lamericanus_fdscan_50k5k.out

L_timidus_vs_Lamericanus_fdscan_99.5_sig_annotation.txt

Fraction of admixture (f_d) between snowshoe hares and Alaskan hares, mountain hares or white-tailed jackrabbits for 50 kb overlapping (5 kb) sliding windows (.out files) and European rabbit annotations for the significant outlier windows in each scan (.txt files). Fraction admixture files were obtained with ABBABABAwindows.py from Genomics General (https://github.com/simonhmartin/genomics_general).

collective_sig_fd.txt

European rabbit annotations of the windows that are significant in all three fraction of admixture (f_d) scans performed between snowshoe hares and Alaskan hares, mountain hares or white-tailed jackrabbits. This annotation was used for Gene Ontology enrichment analysis with g:Profiler.

dxy_windows50k_sscc.out

Genetic divergence (d_xy) among snowshoe hares, Alaskan hares, mountain hares and white-tailed jackrabbits in 50 kb overlapping (5 kb) windows obtained with popgenWindows.py from Genomics General (https://github.com/simonhmartin/genomics_general), and used to determine d_xy of the fraction of admixture (f_d) outlier windows. Results of this scan are displayed in Supplementary Figure S13.

Citation

Subject keywords

Funding

Fundação para a Ciência e Tecnologia : Project Grant "CHANGE" - PTDC/BIA-EVF/1624/2014, Portuguese National Funds
National Science Foundation : EPSCoR OIA-1736249 and DEB-1907022 Grant
Luso American Development Foundation : PD/BD/108131/2015 PhD Grant, POPH-QREN from ESB and Portuguese MCTES/FCT
Fundação para a Ciência e Tecnologia : PTDC/BIA-EVF/1624/2014
National Science Foundation : OIA-1736249
National Science Foundation : NSF Graduate Research Fellowship (DGE-1313190)
Luso American Development Foundation : CEEC Contract (CEECIND/00372/2018)
Luso American Development Foundation : CIBIO NEW-GEN Sequencing Platform (no. 286431)
M J Murdock Charitable Trust : University of Montana Genomics Core
Eunice Kennedy Shriver National Institute of Child Health and Human Development : R01HD073439
COMPETE2020, PORTUGAL2020, and ERDF : POCI-01-0145-FEDER-022184
Luso American Development Foundation : Portugal-United States of America Research Networks Program Funds
COMPETE2020, PORTUGAL2020, and ERDF : POCI-01-0145-FEDER-022184

The legacy of recurrent introgression during the radiation of hares

Data files

Abstract

Methods

Usage notes