Data from: Most genomic loci misrepresent the phylogeny of an avian radiation because of ancient gene flow
Data files
Apr 11, 2021 version files 1.01 MB
Jun 17, 2021 version files 312.08 MB
Abstract
Phylogenetic trees based on genome-wide sequence data may not always represent the true evolutionary history for a variety of reasons. One process that can lead to incorrect reconstruction of species phylogenies is gene flow, especially if interspecific gene flow has affected large parts of the genome. We investigated phylogenetic relationships within a clade comprising eight species of passerine birds (Phylloscopidae, Phylloscopus, leaf warblers) using one de novo genome assembly and 78 resequenced genomes. On the basis of hypothesis-exclusion trials based on D-statistics, phylogenetic network analysis, and demographic inference analysis, we identified ancient gene flow affecting large parts of the genome between one species and the ancestral lineage of a sister species pair. This ancient gene flow consistently caused erroneous reconstruction of the phylogeny when using large amounts of genome-wide sequence data. In contrast, the true relationships were captured when smaller parts of the genome were analyzed, showing that the “winner-takes-all democratic majority tree” is not necessarily the true species tree. Under this condition, smaller amounts of data may sometimes avoid the effects of gene flow due to stochastic sampling, as hidden reticulation histories are more likely to emerge from the use of larger data sets, especially whole-genome data sets. In addition, we also found that genomic regions affected by ancient gene flow generally exhibited higher genomic differentiation but a lower recombination rate and nucleotide diversity. Our study highlights the importance of considering reticulation in phylogenetic reconstructions in the genomic era.
Usage notes
Using population genomic data of 8 Phylloscopus (previous Seicercus) birds, we estimated how ancient gene flow would affect the phylogenetic reconstruction. The supplementary figures and tables of the manuscript (https://doi.org/10.1093/sysbio/syab024), includes 3 figures and 5 tables.
Figure S1. DNA sampling localities and the approximate breeding distributions of the eight Phylloscopus species in two panels. |
Page 2 |
Figure S2. 15 tested divergence models in fastsimcoal2. |
Page 3-4 |
Figure S3. Four tested divergence models in ∂a∂i for species pairs. |
Page 5 |
Table S1. Sampling information and sequencing statistics. |
Page 6-9 |
Table S2. The estimated and observed likelihood values and the corresponding AIC for all the 15 different models in fastsimcoal2. |
Page 10 |
Table S3. Estimated parameters in fastsimcoal2 for model j and k. |
Page 11 |
Table S4. Log-likelihood values of the four different models for each species pairs in ∂a∂i. |
Page 12 |
Table S5. Dsuite analyses result. |
Page 13 |
dadi input files in folder "dadi_input_6_species_pairs"
fastsimcoal2 input files in folder "fastsimcoal2_input_SFS_and_15_models"
phylogenetic analysis of figure 5 input files in folder "Phylogenetic_input_of_Figure5_vcfFiles"
phylonet inpufiles in folder "Phylonet_input_4K_trees"
recombination rate estimation file in folder "10_scaffolds_for_recobination_rate_estimation"