Full-likelihood genomic analysis clarifies a complex history of species divergence and introgression: the example of the erato-sara group of Heliconius butterflies
Thawornwattana, Yuttapong; Seixas, Fernando; Yang, Ziheng; Mallet, James (2022), Full-likelihood genomic analysis clarifies a complex history of species divergence and introgression: the example of the erato-sara group of Heliconius butterflies, Dryad, Dataset, https://doi.org/10.5061/dryad.dfn2z3526
Introgressive hybridization plays a key role in adaptive evolution and species diversification in many groups of species. However, frequent hybridization and gene flow between species make estimation of the species phylogeny and key population parameters challenging. Here, we show that by accounting for phasing and using full-likelihood methods, introgression histories and population parameters can be estimated reliably from whole-genome sequence data. We employ the multispecies coalescent (MSC) model with and without gene flow to infer the species phylogeny and cross-species introgression events using genomic data from six members of the erato-sara clade of Heliconius butterflies. The methods naturally accommodate random fluctuations in genealogical history across the genome due to deep coalescence. To avoid heterozygote phasing errors in haploid sequences commonly produced by genome assembly methods, we process and compile unphased diploid sequence alignments and use analytical methods to average over uncertainties in heterozygote phase resolution. There is robust evidence for introgression across the genome, both among distantly related species deep in the phylogeny and between sister species in shallow parts of the tree. We obtain chromosome-specific estimates of key population parameters such as introgression directions, times and probabilities, as well as species divergence times and population sizes for modern and ancestral species. We confirm ancestral gene flow between the sara clade and an ancestral population of H. telesiphe, a likely hybrid speciation origin for H. hecalesia, and gene flow between the sister species H. erato and H. himera. Inferred introgression among ancestral species also explains the history of two chromosomal inversions deep in the phylogeny of the group. This study illustrates how a full-likelihood approach based on the multispecies coalescent makes it possible to extract rich historical information of species divergence and gene flow from genomic data.