Data from: Spatial phylogenetics of the native Colombian flora
Data files
Dec 05, 2025 version files 623.60 MB
-
Colombia_native_plants_orderedt.tre
430.30 KB
-
Colombia.phy
622.40 MB
-
Loci_per_species_and_Genbank_IDs.xlsx
743.52 KB
-
README.md
5.71 KB
-
S03_R20160415_euphyllophyte.new.ed.210829.tre
17.91 KB
-
Script_download_BIENoccurrence.R
1.41 KB
Abstract
This dataset underpins the first nationwide analysis of spatial phylogenetic diversity and endemism in Colombia’s native vascular flora. Using 278,551 georeferenced records and DNA-based phylogenies comprising 8,337 species, 1,839 genera, and 266 families, we identified 17 centers of phylogenetic diversity. Randomization tests revealed pronounced clustering, with the Andes harboring the highest evolutionary diversity and endemism. Our phylogentic results show that high elevation areas tend to have more short-range restricted lineages or neoendemism centres (cradles); mid-to high elevations areas act as a mix of both biodiversity cradles and museums; while lowlands more often preserve ancient lineages with long branches or paleoendemism centres (museums). Three major phylogenetically distinct biogeographic regions emerged: Andean region with 2 clusters, and lowlands with cluster 1.
File 1: Species Occurrences Native Flora of Colombia
File 1 is not available in Dryad. Instead, it can be accessed on Zenodo at the following DOI: 10.5281/zenodo.17513490.
As part of the spatial data, a species occurrence dataset was downloaded from BIEN
4.2.8 (Maitner et al. 2018; http://bien.nceas.ucsb.edu/bien/; downloaded March 14,
2024) containing 268,159 records corresponding to 8,583 species.
The script and parameters to download the data are provided
Description of the data and file structure.
Species occurrences of Colombia´s native flora.
name: Sp_ocurr_Col_native_Flora.csv
Sp_ocurr_Col_native_Flora.csv: Contains multiple information of all taxa used in the study.
ID: Unique ID going from 1 to 278552 for each species occurrences.
X: Unique datasource number
scrubbed_species_binomial: Scientific name for each taxon used in the study.
Columns with other fields contained in the .csv are:
TAXONOMIC:
verbatim_family
verbatim_scientific_name
family_matched name_matched
name_matched_author
higher_plant_group
scrubbed_taxonomic_status
scrubbed_family
ORIGIN:
scrubbed_author native_status
native_status_reason
native_status_sources
is_introduced
native_status_country
native_status_state_province
native_status_county_parish
GEOGRAPHIC:
country state_province
county
locality
elevation_m
latitude
longitude
is_new_world
CUSTODIANS:
date_collected
datasource
dataset data owner
custodial_institution_codes
collection_code datasource_id
Script_download_BIENoccurrence.R Summary script for downloading BIEN occurrence data for
PARAMETERS:
occ <- BIEN_occurrence_species(species = species_in_phylogeny, new.world = T, all.taxonomy = T, native.status = T, natives.only = F,
political.boundaries = T, only.geovalid = T)
Next, the occurrences were merged to one table, subset with country==”Colombia”
ORCHIDS DATA:
This dataset included information comprising 10,392 records of Colombian orchids, representing 730 species, extracted from GBIF (access to data at: DOI: 10.15468/dl.v2gwxv). These data are freely available as part of the supplementary material of the following paper: Pérez-Escobar, O. A. et al. (2024). The origin and speciation of orchids. New Phytologist, 242, 700–716.
File 2: Species Phylogeny of Native Colombian Flora
Dataset DOI: https://doi.org/10.5061/dryad.fj6q5746j
Description of the data and file structure.
NEXUS file containing 10,853 terminals of native plant species for Colombia.
File name: Colombia_native_plants_orderedt.tre
content genes: eight loci (ITS, rbcL, atpB, matK, matR, trnF, ndhF, and trnK)
Sequences source: GeneBank (Table with all details of the sequences in File 4).
File 3: Family Phylogeny of Native Colombian Flora
Dataset DOI: https://doi.org/10.5061/dryad.fj6q5746j
Description of the data and file structure.
Newick format as a .TRE file containing 266 terminals of native plant families for Colombian flora.
name: S03_R20160415_euphyllophyte.new.ed.210829.tre
repared originally by: Gastauer and Meira (2017)
Source: Angiosperm Phylogeny Group IV classification (APG IV 2016),
Version used here adapted by: Diazgranados (2022) by reconciling the nomenclature with the taxonomic backbone of World Flora Online (WFO 2023) and pruning the tree to include as terminals only the families represented in our dataset (266 families).
File 4: Loci per species and GenBank IDs of Colombian native flora
Dataset DOI: https://doi.org/10.5061/dryad.fj6q5746j
Description of the data and file structure.
name: Loci_per_species_and_Genbank_IDs.xlsx
Loci per species and GenBank IDs.excel: Table that contains names of all taxa used in the study, with Loci type and GenBank IDs.
Taxon: 10,855 rows with species names of Colombian native flora.
No of charsets: number of locis identified in GenBank.
8 columns with loci names and Genbank IDs: ndhf ITS_binomials atpB_binomials matK_binomials matR_binomials rbcL_binomials trnF_binomials trnL_binomials
File 5: Alignment Colombian native Flora
Dataset DOI: https://doi.org/10.5061/dryad.fj6q5746j
Dataset Description: A concatenated alignment in Phylip format containing ndhf, ITS, atpB, matK, matR, rbcL, trnF, and trnL.
This dataset contains the concatenated DNA sequence alignment used for the phylogenetic and spatial analyses of Colombia’s native flora. The dataset was compiled as part of a study assessing patterns of phylogenetic diversity and endemism across Colombian plant lineages.
Data Content
The dataset includes aligned DNA sequences from eight loci commonly used in plant phylogenetics:
ITS (Internal Transcribed Spacer)
rbcL (ribulose-bisphosphate carboxylase large chain)
atpB (ATP synthase beta subunit)
matK (maturase K)
matR (maturase R)
trnF (transfer RNA phenylalanine)
ndhF (NADH dehydrogenase subunit F)
trnK (transfer RNA lysine)
These loci were concatenated into a single alignment using SequenceMatrix version 1.9, resulting in 10,853 terminals (species-level entries) representing the native flora of Colombia.
File Information
File name: Colombia.phy
Format: .phy
Software used: SequenceMatrix v1.9
Number of loci: 8
Intended Use
The data are provided for phylogenetic, macroevolutionary, and biodiversity analyses. Users may employ these sequences to replicate the phylogenetic tree reconstruction or to conduct independent analyses of evolutionary relationships among Colombian plant taxa.
