Code and data from: Global patterns in plant environmental breadths

Barandun, Marco1; Paz, Andrea2; van den Hoogen, Johan3; Pellissier, Loïc3; Crowther, Thomas3; Maynard, Daniel 4

Published Feb 17, 2025 on Dryad. https://doi.org/10.5061/dryad.0vt4b8h8k

Data files

Feb 17, 2025 version files 54.33 MB

dryad_global_envbreadth.zip

54.32 MB
README.md

13.12 KB

Abstract

The latitudinal gradient in plant diversity is one of the most famous patterns in ecology. It is hypothesised that narrow niche breadths and restricted geographic ranges in the tropics allow more species to coexist with minimal overlap, relative to high-latitude regions. Although a wealth of studies have investigated these questions across different regions and taxonomic groups, these have consistently yielded contradictory results, leading to the continued persistence of numerous ecological explanations. Here, using a global occurrence database containing over 100,000 plant species, we provide the first globally standardised investigation into the geographic relationships among latitudinal range, environmental breath, and latitudinal median. We find limited evidence for a global latitudinal gradient in species’ ranges and environmental breadths, with results varying between hemispheres and along latitude within each hemisphere. In agreement with previous observations, we show consistent support for a latitudinal gradient in environmental breadth and latitudinal range, but only for trees in the northern hemisphere and for tropical species. In the southern hemisphere, conversely, these trends are inverted for non-tropical species, with latitudinal range and environmental breadth decreasing with distance from the equator. Moreover, these relationships are even weaker with environmental breadth, even though there is a strong relationship between environmental breadth and latitudinal range. By applying standardised methods at the global scale, these results illustrate that variation in species’ ranges is largely a by-product of biogeographic patterns rather than niche processes. Collectively, this work suggests that existing ecological “rules” linking niche breadth to latitude predominantly reflect regional sampling biases and a historical focus on the northern hemisphere and certain taxonomic groups.

The contents of the zip file are:

1_original_data

The folder containing the raw data

backbone_taxonomy: Includes the GBIF backbone taxonomy with all information, reduced to the Drosera genus (sample)

backbone: Includes taxonomy files

Taxon.csv: Taxonomy of the Drosera genus

— taxonID: unique identifier

— acceptedNameUsageID: unique identifier based on GBIF taxonomy

— phylum/kingdom/family/specificEpithet: the taxonomic designation of that species

— scientificName: the taxonomic name and authority

— taxonomicStatus: whether that species is accepted, or a synonym of another species based on the backbone

eml.xml: Metadata

sample_envs: Includes .tiff files of the environmental variables. As a sample, they are reduced to a resolution of 1 degree (~100 x 100 km)

— CHELSA_adj_1 to CHELSA_adj_19: Climatic variables from the CHELSA dataset, representing bioclimatic factors such as temperature, precipitation, and seasonality, following the CHELSA v2.1 numbering scheme, adjusted by downsampling to 1 degree.

— SG_Absolute_depth_to_bedrock: Depth to bedrock in meters

— SG_CEC_005cm: Cation Exchange Capacity (CEC) at 0-5 cm soil depth

— SG_Coarse_fragments_005cm: Proportion of coarse fragments at 0-5 cm soil depth

— SG_Silt_Content_005cm: Percentage of silt content in soil at 0-5 cm depth

— SG_Soil_pH_H2O_005cm: Soil pH measured in H₂O at 0-5 cm depth

world_shapefile: Includes a shapefile of the world administrative boundaries at the level 0

— world_adm0.dbf: The dBASE table that stores the attribute information of features,

— world_adm0.prj: The file that stores the coordinate system information.

— world_adm0.sbn: The files that store the spatial index of the features.

— world_adm0.sbx: The files that store the spatial index of the features.

— world_adm0.shp: The main file that stores the feature geometry.

— world_adm0.shx: The index file that stores the index of the feature geometry.

sample_occs.csv: Sample occurrences of the Drosera genus

— Species: the species name

— occID: unique occurrence identifier

— latitude/longitude: geographic coordinates

— country: country of o—ccurrence

tree_species.csv: List of tree species (Beech et al. 2017)

— accepted_name_aggr: the accepted name for each tree species

2_scripts:

Folder including the R scripts

1_standardise_clean_thin.R: Script used for the cleaning and standardization of the species occurrences

2_modelling_maxent.R: Script used for generating the Species Distribution Models and calculating env.breadth

3-1_MESS_global.R: Script used for the MESS analysis at a global scale

3-2_MESS_perHemisphere.R: Script used for the MESS analyses per hemisphere

4-1_merging_results.R: Script used for merging the results from the different analyses (env.breadth, MESS) but also the metrics directly computed from the cleaned oCcurrences, such as latitudinal range

4-2_merging_mess_values_perHemisphere.R: Script used for merging the results from the MESS analyses performed on distinct hemispheres

5-1_coefficients_bootstrap.R: Script used for calculating the coefficients (i.e. slope of the relationships)

6 and 7: Scripts used to generate the relative figures in the manuscript

z_my_maxent_funcs.R: Config file of slightly modified functions

3_generated_data:

Folder containing the data generated from the analyses The idea is, that this folder is used as output also when testing the scripts, i.e. sample outputs.

not_sample: Contains the files generated by the author

all_non-tree_medianLat.csv: List of all the non-tree species (including those prior cleaning) and their median latitude

— sp: the species name

— lat_median: median latitude for that species

— growthform: an indicator of tree or non-tree

all_tree_medianLat.csv: List of all the tree species (including those prior cleaning) and their median latitude

— sp: the species name

— lat_median: median latitude for that species

— growthform: indicator of tree or non-tree

area_perspecies.csv: List of all species and their geographic area

— Species: name of species

— mean_area_km2: the mean area in km2

coefficients_eblr.csv: Data frame of coefficients relative to the "Environmental Breadth vs. Latitudinal Range" relationship

— model: modelled relationship, in this case "env.breadth vs lat range"

— type: Type: Four different types of data subsets:

— Global: Includes all data without hemisphere separation

— Separate Hemispheres: Treats northern and southern hemispheres independently

— Hemisphere Endemic: Only includes species endemic to one hemisphere

— Hemisphere Specific MESS: Uses environmental breadth specific to each hemisphere

— zone: Two latitude-based categories:

— tropical: Latitudes close to the equator

— else: Latitudes farther from the equator

— hemisphere: Either the North or South hemisphere

— indicator of tree or non-tree (tree or herb)

— valci: Bootstrap-estimated coefficient of the regression model

— valse: Standard error of the estimated coefficient

— lower_bci: Lower bound of the bootstrap confidence interval

— upper_bci: Upper bound of the bootstrap confidence interval

— b: Regression coefficient, representing the relationship between environmental breadth and latitudinal range

coefficients_lmeb.csv: Data frame of coefficients relative to the "Latitudinal Median vs. Environmental Breadth" relationship

— model: modelled relationship, in this case "lat median vs env.breadth"

— type: Type: Four different types of data subsets:

— Global: Includes all data without hemisphere separation

— Separate Hemispheres: Treats northern and southern hemispheres independently

— Hemisphere Endemic: Only includes species endemic to one hemisphere

— Hemisphere Specific MESS: Uses environmental breadth specific to each hemisphere

— zone: Two latitude-based categories:

— tropical: Latitudes close to the equator

— else: Latitudes farther from the equator

— hemisphere: Either the North or South hemisphere

— indicator of tree or non-tree (tree or herb)

— valci: Bootstrap-estimated coefficient of the regression model

— valse: Standard error of the estimated coefficient

— lower_bci: Lower bound of the bootstrap confidence interval

— upper_bci: Upper bound of the bootstrap confidence interval

— b: Regression coefficient, representing the relationship between latitudinal median and environmental breadth

coefficients_lmlr.csv: Data frame of coefficients relative to the "Latitudinal Median vs. Latitudinal Range" relationship

— model: modelled relationship, in this case "lat median vs lat range"

— type: Type: Four different types of data subsets:

— Global: Includes all data without hemisphere separation

— Separate Hemispheres: Treats northern and southern hemispheres independently

— Hemisphere Endemic: Only includes species endemic to one hemisphere

— Hemisphere Specific MESS: Uses environmental breadth specific to each hemisphere

— zone: Two latitude-based categories:

— tropical: Latitudes close to the equator

— else: Latitudes farther from the equator

— hemisphere: Either the North or South hemisphere

— growthform: indicator of tree or non-tree (tree or herb)

— valci: Bootstrap-estimated coefficient of the regression model

— valse: Standard error of the estimated coefficient

— lower_bci: Lower bound of the bootstrap confidence interval

— upper_bci: Upper bound of the bootstrap confidence interval

— b: Regression coefficient, representing the relationship between latitudinal median and latitudinal range

mess_values_perHemisphere.csv: Data frame of MESS values computed per hemisphere

— species: name of species

— mess_n: the MESS for the northern hemisphere (unitless)

— mess_s: the MESS for the souther hemisphere (unitless)

— growthform: indicator of tree or non-tree

niche_data_summarized.csv: Most important data frame containing the main results.

Data frame of species niche and model performance metrics.

Output from the 4-1_merging_results.R script.

— Species: the species name

— env_breadth: Environmental breadth estimated from the species distribution model

— n_inc_obs: Number of included observations after data filtering

— fc: Feature class used in the species distribution model

— rm: Regularization multiplier used in the model

— tune.args: Parameters used for tuning the model

— auc.train: Area Under the Curve (AUC) of the model during training

— cbi.train: Continuous Boyce Index (CBI) during training

— auc.diff.avg: Average difference between training and validation AUC

— auc.diff.sd: Standard deviation of the AUC difference

— auc.val.avg: Average AUC for validation data

— auc.val.sd: Standard deviation of validation AUC

— cbi.val.avg: Average CBI for validation data

— cbi.val.sd: Standard deviation of validation CBI

— or.10p.avg: Average omission rate at the 10th percentile threshold

— or.10p.sd: Standard deviation of omission rate at the 10th percentile

— or.mtp.avg: Average omission rate at the minimum training presence threshold

— or.mtp.sd: Standard deviation of omission rate at the minimum training presence

— AICc: Corrected Akaike Information Criterion for model selection

— delta.AICc: Difference in AICc values relative to the best model

— w.AIC: Model weight based on AICc

— ncoef: Number of model coefficients

— n_obs_total: Total number of observations available for the species

— growthform: indicator of tree or non-tree (tree or herb)

— mess: Multivariate Environmental Similarity Surface (MESS) value, indicating how well the species’ environment matches the training conditions

— lat_range_sd_n: Standard deviation of latitudinal range in the Northern Hemisphere

— lat_range_mad_n: Median absolute deviation of latitudinal range in the Northern Hemisphere

— lat_median_n: Median latitude of occurrences in the Northern Hemisphere

— lat_range_sd_s: Standard deviation of latitudinal range in the Southern Hemisphere

— lat_range_mad_s: Median absolute deviation of latitudinal range in the Southern Hemisphere

— lat_median_s: Median latitude of occurrences in the Southern Hemisphere

— lat_range_sd_g: Standard deviation of latitudinal range globally

— lat_range_mad_g: Median absolute deviation of latitudinal range globally

— lat_median_g: Median latitude of occurrences globally

raw_niche_data_mess.csv: Intermediate dataframe in the 4-1_merging_results.R script.

For variable description see niche_data_summarized.csv

used_variables_nontree.csv: Dataframe containing the variables used for the modelling of the distribution of each species of non-tree

— species: the species name

Climatic variables (from CHELSA)

— I(CHELSA_adj_X^2): Squared transformations of the respective CHELSA-adjusted variables, used for modeling non-linear relationships.

Soil variables (from SoilGrids)

— SG_Absolute_depth_to_bedrock: Depth to bedrock in meters

— SG_CEC_005cm: Cation Exchange Capacity (CEC) at 0-5 cm soil depth

— SG_Coarse_fragments_005cm: Proportion of coarse fragments at 0-5 cm soil depth

— SG_Silt_Content_005cm: Percentage of silt content in soil at 0-5 cm depth

— SG_Soil_pH_H2O_005cm: Soil pH measured in H₂O at 0-5 cm depth

— I(SG_Absolute_depth_to_bedrock^2): Squared transformation of depth to bedrock

— I(SG_CEC_005cm^2): Squared transformation of CEC

— I(SG_Coarse_fragments_005cm^2): Squared transformation of coarse fragment proportion

— I(SG_Silt_Content_005cm^2): Squared transformation of silt content

— I(SG_Soil_pH_H2O_005cm^2): Squared transformation of soil pH

— growthform: indicator of tree or non-tree (tree or herb), in this case only "herb"

used_variables_ tree.csv: Dataframe containing the variables used for the mmodelingof the distribution of each species of tree

(For variable description see used_variables_ tree.csv)

zones.csv: Data frame relating to the main zone of distribution (tropical vs. else) within hemispheres per species

— Species: the species name

— zone_n: Latitudinal zone classification for the species in the Northern Hemisphere

— zone_s: Latitudinal zone classification for the species in the Southern Hemisphere

— zone_g: Latitudinal zone classification for the species globally, considering occurrences across both hemispheres

The contents of one additional file outside of the folders:

ODMAP_Anonymous_2024-11-14.csv: A file reporting the ODMAP SDM protocol details

— section: the type of information reported

— subsection: the subset of information being reported

— element: the specific attribute being reported

— value: the specific value reported

Code and data from: Global patterns in plant environmental breadths

Data files

Abstract

README: Global patterns in plant environmental breadths