Skip to main content
Dryad

Phylogenomics of the Andean tetraploid clade of the American Amaryllidaceae (subfamily Amaryllidoideae): unlocking a polyploid generic radiation abetted by continental geodynamics

Cite this dataset

Meerow, Alan; Gardner, Elliot; Nakamura, Kyoko (2020). Phylogenomics of the Andean tetraploid clade of the American Amaryllidaceae (subfamily Amaryllidoideae): unlocking a polyploid generic radiation abetted by continental geodynamics [Dataset]. Dryad. https://doi.org/10.5061/dryad.573n5tb4j

Abstract

The second large clade of the endemic American Amaryllidaceae subfam. Amaryllidoideae constitutes the tetraploid-derived (n = 23) Andean-centered tribes, most of which have 46 chromosomes. Despite progress in resolving phylogenetic relationships of the group with nrDNA, certain subclades were poorly resolved or weakly supported in those previous studies. Sequence capture using anchored hybrid enrichment was employed across 95 species of the clade along with five outgroups and generated sequences of 524 nuclear genes and a partial plastome. Maximum likelihood phylogenetic analyses were conducted on concatenated supermatrices, and coalescent species tree analyses were run on the gene trees, followed by hybridization network, age diversification and biogeographic analyses. The four tribes Clinantheae, Eucharideae, Eustephieae (the first branch), and Hymenocallideae (sister to Clinanthus) are resolved in all analyses with 100% support. Nuclear gene supermatrix and species tree results were largely in concordance; however cytonuclear discordance was evident. Hybridization network analysis identified significant reticulation in Clinanthus, Hymenocallis, Stenomesson and the subclade of Eucharideae comprising Eucharis, Caliphruria, and Urceolina. Our data support a previous treatment of the latter as a single genus, Urceolina, with the addition of Eucrosia dodsonii. Biogeographic analysis and penalized likelihood age estimation suggests an origin in the central Andean region (north and central Peru) for the complex in the mid-Oligocene, with more dispersals than vicariances in its history, but no extinctions. The Eucharideae experienced a sudden lineage radiation ca. 10 Mya. We tie much of the divergences in the Andean-centered lineages to the rise of the Andes, directly and indirectly, and suggest that the Amotape-Huancabamba Zone functioned as both a corrider (dispersal) and a barrier to migration (vicariance). Several taxonomic changes are made. This is the largest DNA sequence data set to be applied within Amaryllidaceae to date.The second large clade of the endemic American Amaryllidaceae subfam. Amaryllidoideae constitutes the tetraploid-derived (n = 23) Andean-centered tribes, most of which have 46 chromosomes. Despite progress in resolving phylogenetic relationships of the group with nrDNA, certain subclades were poorly resolved or weakly supported in those previous studies. Sequence capture using anchored hybrid enrichment was employed across 95 species of the clade along with five outgroups and generated sequences of 524 nuclear genes and a partial plastome. Maximum likelihood phylogenetic analyses were conducted on concatenated supermatrices, and coalescent species tree analyses were run on the gene trees, followed by hybridization network, age diversification and biogeographic analyses. The four tribes Clinantheae, Eucharideae, Eustephieae (the first branch), and Hymenocallideae (sister to Clinanthus) are resolved in all analyses with 100% support. Nuclear gene supermatrix and species tree results were largely in concordance; however cytonuclear discordance was evident. Hybridization network analysis identified significant reticulation in Clinanthus, Hymenocallis, Stenomesson and the subclade of Eucharideae comprising Eucharis, Caliphruria, and Urceolina. Our data support a previous treatment of the latter as a single genus, Urceolina, with the addition of Eucrosia dodsonii. Biogeographic analysis and penalized likelihood age estimation suggests an origin in the central Andean region (north and central Peru) for the complex in the mid-Oligocene, with more dispersals than vicariances in its history, but no extinctions. The Eucharideae experienced a sudden lineage radiation ca. 10 Mya. We tie much of the divergences in the Andean-centered lineages to the rise of the Andes, directly and indirectly, and suggest that the Amotape-Huancabamba Zone functioned as both a corrider (dispersal) and a barrier to migration (vicariance). Several taxonomic changes are made. This is the largest DNA sequence data set to be applied within Amaryllidaceae to date.

Methods

This dataset represents the results of various phylogenetic analyses of genomic data collected from 100 species of the monocot family Amaryllidaceae (subfamily Amarylidoideae). The data were generated with sequence capture using anchored hybrid enrichment, employed across 95 species of the tetraploid Andean clade along with five outgroups (100 taxa in total). This generated sequences of 524 nuclear genes and a partial plastome. The fastq files from the sequencing are deposited in  Sequence Read Archive (SRA) database of The National Center for Biotechnology Information NCBI) as Bioproject PRJNA635412,SRA Accessions numbers SAMN15066108 – SAMN15066207. These will not be publicly available until our paper is published. Files deposited with DRYAD are as follows:

– Individual gene alignments in fasta format (gnes are named): All_Nuclear_Genes.zip

– Input files and results from RAxML thorough analyses (best tree and bootstrap) of concatenated supermatrices of nuclear loci, coding + flanking regions (AndeanSuperMatrix_RAxML_wholeAll; input: andean_whole_supermatrix_sorted.fasta, andean_whole_supermatrix_partition.txt); coding regions only (Andean RAxML All Coding_only; input: andean_coding_supermatrix_sorted.fasta, andean_coding_supermatrix_partition.txt), loci with 70% taxon coverage (input: supermatrix_over70.fasta, supermatrix_over70.partition.txt); loci with 90% taxon coverage (Andean whole RAxML_90 per cent; input: supermatrix90percent.fasta, supermatrix90percent.part), and partial plastome [Andean_Plastome_RAxML, Large Single Copy (LSC) and Inverted Repeat (IR) regions (input: Andean_LSC&IRtrim.fasta, Andean_LSC&IRtrim.part.txt)]. Archived as RAxML.zip. The two input files are all that is required to run RAxML.

– Gene trees for use in species tree estimation with the program ASTRAL-III (ASTRAL Analyses/gene trees/): andean_coding, andean_whole, loci70percent, loci90percent; bootstrap and Local Posterior Probablility (LPP) tree files from the various ASTRAL-III analyses (ASTRAL Analyses/). Archived as ASTRAL analyses.zip.

– Infile (90pcAndeanMLsupermatrix), R scripts (Andean_chronos_script3_EG.NEW.txt, Andean_lamda cycle_script5.txt), and relaxed and correlated trees from the divergence dating analysis with APE in R v.4.0 (time_calibrated_tree_3_correlated.tre, time_calibrated_tree_3_relaxed.tre). Archived as ape.zip.

– Infiles, model test results, and results of biogeographic analyses (Nuclear supermatrix tree DIVAlike+J/, Plastome super matrix BAYSAREA+J) using BioGeoBears as implemented in RASP v. 4.2. Archived as RASP.zip.

– Infiles (Clinantheae 99pc supermatrix.nex, Eucharideae99pcsupermatrix.nex, Hymenocallis99pcsupermatrix.nex) for hybridization network analyses with Splitstree v.4.15.1. Archived as Splitstree v.4.15.1.

– Infile (time_calibrated_tree_3_relaxed.tre) and results for BAMM (Bayesian Analysis of Macroevolutionary Mixtures) of the tetraplid Andean clade. Archived as BAMM.zip.

 

 

 

Usage notes

In order to use the components of the data set, the user would need to download and learn to use the following programs: RAxML or use it on the CIPRES Portal (http://www.phylo.org/), RASP 4.2, Splitstree 4.15.1, R, APE (install within R), BAMM (install in R), and ASTRAL-III.

All .tre files can be visualized in all programs that can process treefiles. We use FigTree.