Data from: Whole genome shotgun phylogenomics resolves the pattern and timing of swallowtail butterfly evolution
Data files
Apr 29, 2019 version files 504.77 MB
-
Appendix S1 - Pipeline from raw reads to phylogenomics and dating analysis.sh
13.73 KB
-
Appendix S10 - Chronogram files.zip
10.08 KB
-
Appendix S11 - Prior and posterior distributions.pdf
123.35 KB
-
Appendix S12 - Phylogenomic tree with cross-contam.pdf
59.60 KB
-
Appendix S13 - Correlation cross-conta vs. no cross-conta.pdf
70.26 KB
-
Appendix S2 - Dataset 1 with 760 genes (amino acids format).fasta
17.60 MB
-
Appendix S3 - Dataset 2 with 6621 genes (amino acids format).fasta
101.02 MB
-
Appendix S4 - Dataset 3 with 760 genes (nucleotides format with gene partitions).nex
55.27 MB
-
Appendix S5 - Dataset 4 with 6407 genes (nucleotides format with gene partitions).nex
329.96 MB
-
Appendix S6 - Phylogenomic trees.pdf
205.47 KB
-
Appendix S7 - tree files.zip
13.69 KB
-
Appendix S8 - Gene- and site-concordance and discordance factors.pdf
279.13 KB
-
Appendix S9 - Dated trees with GTS.pdf
151.84 KB
Abstract
Evolutionary relationships have remained unresolved in many well-studied groups, even though advances in next-generation sequencing and analysis, using approaches such as transcriptomics, anchored hybrid enrichment, or ultraconserved elements, have brought systematics to the brink of whole genome phylogenomics. Recently, it has become possible to sequence the entire genomes of numerous non-biological models in parallel at reasonable cost, particularly with shotgun sequencing. Here we identify orthologous coding sequences from whole-genome shotgun sequences, which we then use to investigate the relevance and power of phylogenomic relationship inference and time-calibrated tree estimation. We study an iconic group of butterflies - swallowtails of the family Papilionidae - that has remained phylogenetically unresolved, with continued debate about the timing of their diversification. Low-coverage whole genomes were obtained using Illumina shotgun sequencing for all genera. Genome assembly coupled to BLAST-based orthology searches allowed extraction of 6,621 orthologous protein-coding genes for 45 Papilionidae species and 16 outgroup species (with 32% missing data after cleaning phases). Supermatrix phylogenomic analyses were performed with both maximum-likelihood (IQ-TREE) and Bayesian mixture models (PhyloBayes) for amino acid sequences, which produced a fully resolved phylogeny providing new insights into controversial relationships. Species tree reconstruction from gene trees was performed with ASTRAL and SuperTriplets and recovered the same phylogeny. We estimated gene site concordant factors to complement traditional node-support measures, which strengthens the robustness of inferred phylogenies. Bayesian estimates of divergence times based on a reduced dataset (760 orthologs and 12% missing data) indicate a mid-Cretaceous origin of Papilionoidea around 99.2 million years ago (Ma) (95% credibility interval: 68.6-142.7 Ma) and Papilionidae around 71.4 Ma (49.8-103.6 Ma), with subsequent diversification of modern lineages well after the Cretaceous-Paleogene event. These results show that shotgun sequencing of whole genomes, even when highly fragmented, represents a powerful approach to phylogenomics and molecular dating in a group that has previously been refractory to resolution.