Data from: Corridors and reservoirs: An analysis of inter-Andean historical biogeography
Data files
Feb 20, 2026 version files 556.74 KB
-
README.md
4.37 KB
-
scoring_clean.csv
69.06 KB
-
scraped_papers.csv
293.69 KB
-
scraped_papers.xlsx
189.63 KB
Abstract
The Andes are a global biodiversity hotspot, and the formation of this mountain range has been linked to the rapid diversification of numerous lineages across the Tree of Life. In addition to generating high species richness, the Andes may have functioned as a reservoir from which lineages dispersed across South America. Recent syntheses of Andean geological history, together with the growing number of phylogenetic studies across diverse Andean clades, provide an opportunity to integrate current knowledge of the historical biogeography of the Andean biota. Here, we present a meta-analysis of phylogenetic studies in which we scored disjunction events within and out of the Andes based on published biogeographic reconstructions. Across clades and through time, dispersal events within the Andes were approximately as frequent as those out of the Andes. The highest number of extra-Andean disjunctions originated in the northern Andes, despite this region being the most recently uplifted. Our results also indicate sustained bidirectional dispersal within the Andes over the past 50 million years, with northward dispersals occurring at a slightly but significantly higher rate than southward movements. Furthermore, crown clades of Andean plants tend to be older than those of animals, the latter largely originating within the past 30 million years. Overall, these findings demonstrate that the Andes have functioned both as a reservoir and as a corridor of biodiversity across clades with widely differing dispersal strategies.
Dataset DOI: 10.5061/dryad.3n5tb2rwp
Description of the data and file structure
Using a Python script, we performed three rounds of web scraping of Google Scholar search results for the keyword “Andes.” Each web scraping round targeted publications that cite a specific major historical biogeographic reconstruction methodology. The first round scraped all the publications from 2014 to 2024 that cited the R package BioGeoBEARS (Matzke 2014) and returned 304 results. The second round scraped all publications from 2016 to 2024 that cited RevBayes (Höhna et al. 2016), returning 62 results. The last round scraped all publications from 2013 to 2024 that cited LagRange (Matzke 2013), returning 178 results. Among the scrape results, we excluded publications that did not perform ancestral reconstructions on a time-calibrated phylogeny, lacked at least one dispersal either out of or within the Andes, treated the entire Andes as one region in the phylo-biogeography analysis, or were not published in a peer-reviewed journal (e.g., a thesis). Finally, we excluded duplicates across scrape rounds.
For our scoring, we identified three regions in the Andes: northern, central, and southern. This is consistent with the regions used in the majority of biogeographical studies of the Andes and corresponds to major geologic and climatic barriers across the Andean range (Boschman, 2021; Luebert & Weigend, 2014; Pérez-Escobar et al., 2022). We only scored papers that included at least two of the three regions so that we could infer directionality of movement within the Andes. In total, we retained 52 studies out of 544 identified from our scraping process (see Appendix S1 in Supporting Information).
For each paper, we recorded the major taxonomic group, clade name, areas represented (of northern Andes, central Andes, southern Andes, and outside of the Andes), and the crown age of the most inclusive clade with Andean origin. If multiple Andean clades with non-Andean common ancestry were present, we treated each clade separately. Within the target clade, we recorded each disjunction event separately, including the origin, destination, and age of movement. For reconstructions with weighted probabilities, we required >50% to assign to a region. We biased our estimated ages of disjunction toward minimum ages by assigning the age of disjunction to the crown clade rather than the stem. Shifts occurring between the most recent node and the tip were recorded as “0.00 Ma.” We only scored inferred dispersals out of the Andes and not back in, as these were not relevant to our questions, and did not record range contractions (e.g., an inferred movement from northern Andes + central Andes to only northern Andes).
Files and variables
File: scoring_clean.csv
Description:
Variables
- paper_ID: assigned ID number
- doi: doi of paper scored
- scorer: person who did the scoring
- taxonomy: taxonomic group ofa clade of study
- areas: areas of the Andes defined and included in the historical biogeography analysis
- clade: name assigned to the Andean clade
- crown_age: crown age of the Andean clade
- region_shift: includes origin and destination of movement
- origin: inferred movement origin
- destination: inferred movement destination
- dispersal_age: inferred age of movement
- direction: inferred directionality of movement
- notes: notes
File: scraped_papers.xlsx
Description: This is the document containing sheets with all scraped papers, sorted by citation. It was not used for downstream analyses, but rather to keep track of manual scoring of all papers and elimination of non-relevant papers.
Variables
- Tabs/sheets: citation of scraped papers
- Duplicate: whether or not the paper appears more than once across all sheets
- Title: title of paper
- Authors: authors of the paper
- Date: publication date
- Publisher: publisher
- Person: initials of the person who scored the paper
- Relevant: whether or not the paper was included in our study
- If no, why excluded: reason for excluding
- Done: whether scored or not
File: scraped_papers.csv
Description: the same as above, but in csv format, collapsed into a single sheet without formatting.
