First atlas of soil microbial biodiversity in Mexico: a biogeographical approach
Data files
Sep 17, 2025 version files 118.57 MB
-
Dumping_SRA.sh
1.53 KB
-
estadosmx.shp
13.50 MB
-
Extract_data_from_kms_or_kml_files
1.80 KB
-
faith_pd_vector.qza
1.94 MB
-
Final_seqs.qza
30.82 MB
-
Final_table.qza
21.64 MB
-
Final_taxonomy.qza
20.81 MB
-
insertion-tree.qza
19.39 MB
-
Manifest_creator.sh
988 B
-
Metadata_final_atlas.csv.csv
145.18 KB
-
observed_features_vector.qza
1.94 MB
-
Qiime_commands
15.96 KB
-
README.md
5.59 KB
-
shannon_vector.qza
1.94 MB
-
Soil_atlas_R_code.Rmd
34.59 KB
-
Supplementary_Table_1.csv
184.57 KB
-
weighted_unifrac_distance_matrix.qza
6.19 MB
Abstract
Mexico is recognized for its great biological diversity, supported by a wide variety of ecosystems. Nonetheless, to date, there is no comprehensive study about the magnitude of microbial diversity and its biogeographical patterns across Mexican ecosystems. Here, we present a meta-analysis describing the diversity and biogeographical patterns of soil microbial communities across Mexico. We gathered 16S rRNA sequencing data from > 700 soil samples collected across Mexico. We analyzed whether soil microbial communities differ between ecoregions and vegetation types. In addition, we evaluated the influence of edaphic and climatic factors on the diversity patterns of soil microbial communities. Our results showed that soil bacterial communities across Mexico exhibit biogeographical patterns across ecoregions and vegetation types. Specifically, diversity patterns in arid regions significantly differ from those in temperate and tropical ecoregions. Through redundancy and correlation analysis, we found that pH, carbon and nitrogen levels, temperature, and precipitation are the main drivers of the bacterial diversity patterns across ecoregions and vegetation types. This work contributes to a better understanding of the biogeographical patterns of soil microorganisms across Mexico, highlighting the influence of environmental variation in driving such diversity patterns.
Dataset DOI: [10.5061/dryad.ht76hdrvz]
Description of the data and file structure
These data and scripts were used to retrieve and analyze information from 718 soil samples obtained from peer-reviewed studies and novel sequencing efforts investigating the bacterial diversity of Mexican soils.
Files and variables
File: Metadata_final_atlas.csv.csv
Description:
Variables
- sampleID: Name of the sample
Data from SoilGrids
- bdod = Bulk density of the fine earth fraction (cg/cm³)
- cfvo = Volumetric fraction of coarse fragments (> 2 mm) (cm3/dm3 (vol‰))
- nitrogen = Total nitrogen (N) (cg/kg)
- phh2o = Soil pH (pHx10)
- sand = Proportion of sand particles (> 0.05/0.063 mm) in the fine earth fraction (g/kg)
- silt = Proportion of silt particles (≥ 0.002 mm and ≤ 0.05/0.063 mm) in the fine earth fraction (g/kg)
- soc = Soil organic carbon content in the fine earth fraction (dg/kg)
- ocd = Organic carbon density (hg/m³)
Data from Global-AI_PET
- AI = Aridity Index (mean annual precipitation / mean annual evaporation demand)
Data from WorldClim
- BIO1 = Annual Mean Temperature (°C)
- BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp)) (°C)
- BIO3 = Isothermality (BIO2/BIO7) (×100) (%)
- BIO4 = Temperature Seasonality (standard deviation ×100)
- BIO5 = Max Temperature of Warmest Month (°C)
- BIO6 = Min Temperature of Coldest Month (°C)
- BIO7 = Temperature Annual Range (BIO5-BIO6) (°C)
- BIO8 = Mean Temperature of Wettest Quarter (°C)
- BIO9 = Mean Temperature of Driest Quarter (°C)
- BIO10 = Mean Temperature of Warmest Quarter (°C)
- BIO11 = Mean Temperature of Coldest Quarter (°C)
- BIO12 = Annual Precipitation (mm)
- BIO13 = Precipitation of Wettest Month (mm)
- BIO14 = Precipitation of Driest Month (mm)
- BIO15 = Precipitation Seasonality (Coefficient of Variation) (%)
- BIO16 = Precipitation of Wettest Quarter (mm)
- BIO17 = Precipitation of Driest Quarter (mm)
- BIO18 = Precipitation of Warmest Quarter (mm)
- BIO19 = Precipitation of Coldest Quarter (mm)
- ELEV = Elevation (masl)
File: Soil_atlas_R_code.Rmd
Description: R code use to analyze 16S amplicon data generated in Qiime2
Supplementary_Table_1.csv
Description: Table containing the SRR_ID for each sample used in this study. These IDs are used by Dumping_SRA.sh to download data from the SRA database
faith_pd_vector.qza
Description: QZA generated by Qiime containing Faith's Phylogenetic diversity metrics for each sample at a read depth of 40,000 sequences per sample
observed_features_vector.qza
Description: QZA generated by Qiime containing Observed features (Richness) metrics for each sample at a read depth of 40,000 sequences per sample
shannon_vector.qza
Description: QZA generated by Qiime containing Shannon index metrics for each sample at a read depth of 40,000 sequences per sample
weighted_unifrac_distance_matrix.qza
Description: QZA generated by Qiime containing a weighted unifrac distance matrix for the 718 samples at a read depth of 40,000 sequences per sample
estadosmx.shp
Description: Mexico shape file used as input for the microgeo package in R
insertion-tree.qza
Description: QZA generated by Qiime containing a phylogenetic tree inferred using the fragment-insertion sepp plugin
Final_table.qza
Description: QZA generated by Qiime, containing an ASV table inferred by DADA2
Final_seqs.qza
Description: QZA generated by Qiime, containing a representative sequence table inferred by DADA2
Final_taxonomy.qza
Description: QZA generated by Qiime, containing the taxonomic assignments for each ASV using a naive-bayes classifier trained using the V3-V4 region of the 16S rRNA recovered from Silva DB 138.1
File: Dumping_SRA.sh
Description: Script to download data from the SRA db using the SRR_ID provided for each sample in Supplementary Table 1
File: Qiime_commands
Description: Comands used to procces 16S rRNA data
File: Extract_data_from_kms_or_kml_files
Description: Commands to obtain geographical data from KML files
File: Manifest_creator.sh
Description: Script to create manifest files used to import fastq file into Qiime2
Code/software
To view the files in this repository, you will need access to a terminal, a text editor, or an R console.
Access information
Other publicly accessible locations of the code:
Data was derived from the following sources:
- https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1230145 (data set of 16S rRNA sequenced generated by our group)
- https://www.ncbi.nlm.nih.gov/sra/ (The files from peer-reviewed publications were retrieved from SRA DB; accession numbers are provided in Supplementary Table 1)
Note
The dataset of this study contains both data generated by our research group and publicly available data from the NCBI SRA database. All newly generated data are original contributions and can be released under CC0. The data recovered from the SRA DB are openly accessible and publicly accessible without restriction once released according to NCBI policy, with no restrictions incompatible with CC0.
