An elevational phylogeographic diversity gradient in Neotropical birds is decoupled from speciation rates
Abstract
A key question about macroevolutionary speciation rates is whether they are controlled by microevolutionary processes operating at the population level. For example, does spatial variation in population genetic differentiation underlie geographical gradients in speciation rates? Previous work suggests speciation rates increase with elevation in Neotropical birds, but underlying population-level gradients remain unexplored. Here, we characterize elevational phylogeographic diversity between montane and lowland birds in the megadiverse Andes-Amazonian system and assess its relationship to speciation rates to evaluate the link between population-level differentiation and species-level diversification. We aggregated and georeferenced nearly 7000 mitochondrial DNA sequences across 103 species or species complexes in the Andes and Amazonia and used these sequences to describe phylogeographic differentiation across both regions. Our results show increased levels of both discrete and continuous metrics of population structure in the Andean mountains compared to the Amazonian lowlands. However, higher levels of population differentiation do not predict higher rates of speciation in our dataset. Multiple potential factors may lead to our observed decoupling of initial population divergence and speciation rates, including the ephemerality of incipient species and the multifaceted nature of the speciation process, as well as methodological challenges associated with estimating rates of population differentiation and speciation.
README: An elevational phylogeographic diversity gradient in Neotropical birds is decoupled from speciation rates
https://doi.org/10.5061/dryad.7pvmcvf0j
README for code and data from Wacker and Winger (2023) "An elevational phylogeographic diversity gradient in Neotropical birds is decoupled from speciation rates." The manuscript demonstrates an elevational gradient in phylogeographic diversity of 103 birds across the megadiverse Andes-Amazonian system, but fails to find a statistically-supported relationship between the rates of population differentiation and the rates of speciation. Here, we include the mtDNA sequence data and metadata; the code for assessing population structure, conducting PGLS, and other analyses; and figures and supplementary materials associated with the main manuscript.
Description of the data and file structure
Data.zip: This folder contains sequence data for all species in the project (fasta files, nexus files, and mcc trees); individual-level metadata; map shapefiles for classifying species as in vs out of region; the species-level data used in model analyses; and the MRC tree for running phylogenetic-controlled regressions.
- /maps: This folder contains the .shp files and dependencies for the Andes biogeographic region and the Amazonia biogeographic region, onto which samples were mapped to be classified as in vs out of region and for the calculation of geographic sampling area. Amazonia shapefiles were based on areas of endemism in Da Silva et al. (2005) "The fate of the Amazonian areas of endemism" and Andes shapefiles were based on areas of endemism from Hazzi et al. (2018) "Biogeographic regions and events of isolation and diversification of the endemic biota of the tropical Andes". Also included is a shapefile and dependencies for an outline of the South American continent used for visualization purposes. *Note: .shp files contain the geometric coordinates for drawing the shapefile, and they are the type of file read into the R environment for spatial analyses; all other files with the same name but a different extension (i.e., .cpg, .dbg, .prg, and .shx) are dependencies with factual and attribute data associated with the principal shapefile that must be present in the same directory as the .shp file for the shapefile to be intelligble to the computer. Lastly, the .tif elevation file is used to estimate the elevational mean of samples from the three wide-ranging species with sampling restricted to the Andes or Amazonia, as described in the methods.
- /nexus_files: This folder contains .nexus files of the mtDNA alignment for every species or species complex in the study. The .nexus files were used to run BEAST to generate the mtDNA gene trees. To that end, they include in and out of region samples and outgroup sequences.
- /mcc_trees: This folder contains the Maximum Clade Credibility trees for every species or species complex in the study, generated with common ancestors heights in TreeAnnotator.
- /fasta_files: This folder contains the .fasta files of the mtDNA alignment for every species or species complex in the study. The .fasta alignments were used to calculate isolation by distance (IBD); they include in and out of region samples and outgroup sequences, which were removed from the alignment during analysis in the relevant R script. They match the .nexus files.
- /100_complete_trees: This folder contains 100 trees sampled from the pseudoposterior distribution of the Jetz et al. 2012 bird tree of life, including tips lacking genetic data placed with taxonomic constraints. These are used to calculate tip DR.
- /scratch: This presently-empty folder and all nested empty folders within it are available to hold the output that results from our code files.
- Jetz_50MRC.tre: This is a 50% Majority Rule Consensus tree based on the Jetz et al. 2012 bird tree of life with the Hackett et al. 2008 backbone. It is used to control for phylogenetic signal in the residuals of linear regressions run in the R package phylolm.
- Species_level_data.xlsx: This is the species-level summary of data input and analytical results of structure and speciation rates that was modeled in this study.
- Family classifies the species or species complex to the taxonomic family level.
- Taxon_name is the name of the 103 different species or species complexes included in the study.
- Locus is the mtDNA locus comprising the phylogeographic dataset for each species.
- Seq_Model is the best model of molecular evolution identified by MEGA X and used to generate gene trees in BEAST.
- Region is the discrete (binary/binned) biogeographic region for each taxon.
- Mid_elev is the midpoint of the elevational distribution.
- Area is the in-region geographic sampling area of each taxon, given in kilometers squared.
- Stem_Age is the age of the node in the MCC tree connecting all in-region sequences to its closest relative.
- Crown_age is the age of the node at which all in-region sequences in the MCC tree coalesce.
- N_Total is the total number of sequences of each taxon used to generate the mtDNA gene tree in BEAST.
- N_InRegion is the number of sequences georeferenced to the area of interest used to calculate discrete levels of in-region phylogeographic structure.
- Superspecies is a binary variable (Yes/No) indicating whether the taxonomic unit of study is comprised of multiple named taxa according to the South American Checklist Committee or not.
- HWI is the hand-wing index of each taxon.
- GMYC is the raw number of discrete phylogeographic clusters.
- Diff_Rate is the rate at which discrete clusters form (GMYC/crown age).
- DR is the Tip DR statistic.
- BAMM is the lambda rate of speciation.
- IBD.subclade is the slope of the linear regression of pairwise raw genetic distance versus log geographic distance, for the largest subclade per species.
- log.IBD.subclade is the log of the raw IBD slope.
- N.subclade is the number of sequences in the subclade for which IBD was analysed.
- Area.subclade is the geographic sampling area encompassed by the largest subclade.
- Jetz_genetic is a binary variable indicating whether the species/species complex is represented by genetic data in the global avian phylogeny.
- Jetz_tip_genetic is the name of the tip from which tip DR was assigned for each species in the study -- if the species was genetically placed in the avian phylogeny, this is itself, but if the species lacks genetic data then this is the closest relatives with genetic data in the phylogeny.
- Empty cells are for traits not estimable for a given species (e.g., one for which subclass IBD cannot be estimated because of insufficient sample size), and can be practically considered na.
- Sequence_data_region_assignments.xlsx: This is the sequence-level metadata for all tips used in the study (full, type I dataset used for creating mtDNA gene trees in BEAST, including tips from in and out of region). In the first sheet (Data):
- Reference is the source publication for the sequence.
- Family is the taxonomic family of the taxon.
- Tip_name is the name of the sequence as represented in nexus files, fasta files, and mcc tre files.
- Taxon_name is the name of the species or species complex as used in this study.
- Locus is the mtDNA gene.
- Accession_number is the GenBank accession number for the sequence for all sequences that were published to GenBank.
- Museum is the institution housing the specimen from which the sequence was collected.
- Voucher_number is the specific specimen associated with the sequence.
- Region is the binary classification of the species/species complex as Montane or Lowland.
- Latitude and Longitude are the WGS84 coordinates of the sample.
- Locality is the most specific locality description available for the sequence.
- InRegion is a binary classification of the sequence as in versus out of region (either Andes or Amazonia).
- The second sheet, Data_description, describes the precision of the lat-lon data and what we believe to be limitations for its potential use in future work. Empty cells can be practically considered na.
- Taxon_key_lump.csv: This relates taxon names used in the study to Jetz et al. 2012 tip names. The "lump" key relates a single Jetz et al. tip name to species complexes analyzed as a single unit in this study (n=103).
- Taxon_key_split.csv: This relates taxon names used in the study to Jetz et al. 2012 tip names. The "split" key relates each species complex analyzed in the study to all Jetz et al. tip names (n=152) contained within our superspecies/species complexes.
Code/Software
All code files are hosted on Zenodo and were run in R v4.2.1.