Skip to main content

Data from: Spatial phylogenetics of the North American flora

Cite this dataset

Mishler, Brent et al. (2020). Data from: Spatial phylogenetics of the North American flora [Dataset]. Dryad.


North America is a large continent with extensive climatic, geological, soil, and biological diversity. That biota is under threat from habitat destruction and climate change, making a quantitative assessment of biodiversity of critical importance. Rapid digitization of plant specimen records and accumulation of DNA sequence data enable a much-needed broad synthesis of species occurrences with phylogenetic data. Here we attempted the first such synthesis of a flora from such a large and diverse part of the world: all seed plants for the North American continent (here defined to include Canada, United States, and Mexico) with a focus on examining phylogenetic diversity and endemism. We collected digitized plant specimen records and chose a coarse grain for analysis, recognizing that this grain is currently necessary for reasonable completeness per sampling unit. We found that raw richness and endemism patterns largely support previous hypotheses of biodiversity hotspots. Application of phylogenetic metrics and a randomization test revealed novel results, including significant phylogenetic clustering across the continent, a striking east-west geographic difference in the distribution of branch lengths, and the discovery of centers of neo- and paleo-endemism in Mexico, the southwestern USA, and the southeastern USA. Finally, our examination of phylogenetic beta-diversity provides a new approach to comparing centers of endemism. We discuss the empirical challenges of working at the continental scale, and the need for more sampling across large parts of the continent, for both DNA data for terminal taxa and spatial data for poorly understood regions, to confirm and extend these results.


There are two datasets archived here:

One is the full spatial dataset presented by Mishler et al. (2020), for seed plants of North America, defined as Canada, the United States, and Mexico (using a biogeographic barrier in Yucatan as the southern boundary of our study area).   We downloaded occurrence data from GBIF and iDigBio and carefully cleaned the data for taxonomic name matching (using the OpenTree Taxonomy and The Plant List) and georeference problems as described in the paper.  Following cleaning steps, the data set contains 11,067,080 records for 44,171 species of seed plants.  Note that only 19,649 of these species were actually included in the phylogenetic analysis presented in the paper, because those were the only ones that were represented in the phyogeny (deposited as a tree file in an accompanying data set).

The other is a NEXUS file containing the phylogenetic tree that was used in a study on spatial phylogenetics of the North American flora (Mishler et al. 2020).  This tree was pruned from a dated phylogeny for seed plants originally consisting of 79,881 species described by Smith & Brown (2018).   The full unpruned tree is available from with alignments linked therein.  This tree was pruned to include only the 19,649 species that had spatial data from North America (the full spatial dataset is deposited in accompanying file).

Literature cited: 

Mishler, B.D., Guralnick, R., Soltis, P.S., Smith, S.A., Soltis, D.E., Barve, N., Allen, J.M. and Laffan, S.W.  2020.  Spatial phylogenetics of the North American flora. J. Syst. Evol.

Smith S.A. and Brown J.W.. 2018. Constructing a broadly inclusive seed plant phylogeny. American Journal of Botany 105: 302–314.