Skip to main content
Dryad

An interpolated biogeographic framework for tropical Africa using plant species distributions and the physical environment

Cite this dataset

Marshall, Cicely; Wieringa, Jan; Hawthorne, William (2021). An interpolated biogeographic framework for tropical Africa using plant species distributions and the physical environment [Dataset]. Dryad. https://doi.org/10.5061/dryad.rfj6q5786

Abstract

Aim: Existing phytogeographic frameworks for tropical Africa lack either spatial completeness, unit definitions smaller than the regional scale, or a quantitative approach. We investigate whether physical environmental variables can be used to interpolate floristically defined vegetation units, presenting an interpolated, hierarchical, quantitative phytogeographic framework for tropical Africa, which is compared to previously defined regions.

Location: Tropical mainland Africa 24°N to 24°S.

Taxon: 31,046 vascular plant species and infraspecific taxa.

Methods: We calculate a betasim dissimilarity matrix from a comprehensive whole-flora database of plant species distributions. We investigate environmental correlates of floristic turnover with local non-metric multidimensional scaling. We derive a hierarchical biogeographic framework by clustering the dissimilarity matrix. The framework is modelled using a classification decision tree method and 12 physical environmental variables to interpolate and downscale the framework across the study region.

Results: Floristic turnover is related strongly to water availability and temperature, with smaller contributions from land cover, topographic ruggedness and lithology. Region can be predicted with 90% accuracy by the model. We define 19 regions and 99 districts. We find a novel arrangement of the arid regions. Regional subdivision within the savanna biome is supported with minor variation to borders. Within the forests of west and central Africa, our whole-flora gridded regionalisation supports the divisions identified by a previous analysis of trees only.

Main conclusions: Physical environmental variables can be used to predict floristically defined vegetation units with very high accuracy, and the approach could be pursued for other inc ompletely sampled taxa and areas outside of tropical Africa. Geographic coherence is higher than in previous quantitative phytoregional definitions. For most tropical African vascular plant species, we provide predictions of which species will occur within each mapped district and region of tropical Africa. The framework should be useful for future studies in ecology, evolution and conservation.

Methods

Plant species records from tropical Africa were summarised uniquely at degree square resolution for tropical Africa to produce 533,383 records of 31,046 tropical African species and infraspecific taxa in 1,197 degree squares of tropical mainland Africa between 24°N and 24°S. Contributing datasets are cited in the dataset ReadMe and Appendix S1 of the manuscript. Larger datasets with DOI links have been included as cited works with the Dryad submission. Data cleaning, georeferencing and synonymy of the compiled data set are described in the dataset ReadMe and Appendix S1 of the associated manuscript.

Environmental data were summarised at one degree square and half degree square. We summarised: Mean altitude from GMTED2010 at 30 arc second resolution (Danielson & Gesch, 2011). Topographic ruggedness from GMTED2010 using the GDAL Terrain Ruggedness Index tool via QGIS. Climatic variables Bio1 to Bio35 at 30-minute resolution for the years 1961-1990 from the CliMond database (Kriticos et al., 2012). Surficial lithology classification of Sayre et al. (2013).  Majority land cover class from GlobCover 2009 (Arino et al., 2012). We estimated completeness of taxon sampling for each degree square by comparing the number of species recorded as present with richness estimates of Barthlott, Mutke, Rafiqpoor, Kier, & Kreft, 2005.

A betasim dissimilarity matrix was created from these summarised data and a local NMDS performed. The same betasim dissimilarity matrix was clustered using Ward’s algorithm. The 19 cluster solution was defined as the regional level, and the 99 cluster solution as the district level. Random Forest classification models were built using the summarised environmental data as predictors, using the R package randomForest (Liaw & Wiener, 2002): we trained one model on the 19 regions to predict the regional framework. We subsequently trained 19 models to predict the distribution of the 99 districts within each of the 19 regions, using the same selection of predictor variables. The interpolated regions, and districts, constitute the biogeographic framework presented here.

The biogeographic framework was characterised by the number of taxa, number of endemic taxa, percent endemism, percent sampling completeness, number of indicator species and number of significant indicator species. Continuous environmental data used in the Random Forest model were summarised by their mean and standard deviation, minimum, median, maximum, interquartile range; lower and upper confidence intervals of the median are calculated using +/-1.58 IQR/sqrt(n). Categorical data were summarised by their majority class.

Usage notes

Appendix S2: Input and output data for each one degree cell, including coordinates, sampling levels, classifications, ordination scores and environmental data (S2_onedegdata.csv).

Appendix S3: Input and output data for each half degree cell, including coordinates, sampling levels, classifications, ordination scores and environmental data (S3_halfdegdata.csv).

Appendix S4: Folder containing shape files for (i) the regional framework, (ii) the district framework (S4 _shapefiles.rar) in latlong and Albers Africa Equal Area projection.

Appendix S5: Region attributes (S5_regionstats.csv) including region name, number of species and endemic species, summary of environmental data.

Appendix S6: District attributes (S6_districtstats.csv) including region & district names, number of species and endemic species, summary of environmental data.

Appendix S7: Species summary table (S7_speciesstats.csv) including species name and occurrence data by region and district, with indicator value for each region and district.

Appendix S9: Folder of R scripts used in the analysis (S9_code.rar).