Climate change impacts on Xanthium strumarium distribution: Integrating species distribution models with rhizosphere microbiome analysis in China
Data files
Jun 12, 2026 version files 20.21 MB
-
README.md
12.04 KB
-
Sequencing_data.zip
14.88 MB
-
Suitable_area.zip
5.32 MB
Abstract
Global environmental changes increasingly alter species distributions, yet their effects on plants serving both ecological and economic functions remain inadequately explored. We examined Xanthium strumarium, a species with medicinal and invasive properties, throughout China using integrated approaches: species distribution modeling (Biomod2), niche analysis (Ecospat), and rhizosphere microbiome profiling (Tax4Fun). Our findings demonstrate that human footprint index (66.6% variable importance), elevation, and topographic slope primarily determine current distribution patterns. Future climate scenarios predict habitat expansion of 8.9–28.6%, with notable increases in Yunnan, Guangdong, and Inner Mongolia provinces. Although niche overlap analysis indicates high conservatism (Schoener’s D = 0.8986–0.9338), ecological adaptability shows a modest decline under elevated emission scenarios. Rhizosphere bacterial assemblages, characterized by Proteobacteria dominance and nitrogen-cycling taxa enrichment (Nitrospira, Verrucomicrobia), facilitate adaptation through enhanced metabolic pathways and environmental stress responses, promoting establishment in anthropogenically disturbed environments. Our results underscore the interactive effects of climate-mediated range shifts and microbiome-assisted resilience mechanisms underlying X. strumarium’s invasive potential. This research offers essential guidance for managing dual-function species, emphasizing integrated strategies that consider both anthropogenic pressures and microbial associations in conservation planning under accelerating global change.
Dataset DOI: 10.5061/dryad.x0k6djhz9
We have submitted our habitat suitability raster files (Suitable_area.zip) and our comprehensive bioinformatic output files and tabular datasets for the microbiome analysis (Sequencing_data.zip).
Descriptions
Suitable_area
- Current.tif: Predicted potential suitable habitats under contemporary (baseline) climate conditions.
- Era: The target future time period (50 = average for 2041–2060; 70 = average for 2061–2080).
- SSP: Shared Socioeconomic Pathways from CMIP6 indicating greenhouse gas emission trajectories (126 = SSP1-2.6; 245 = SSP2-4.5; 585 = SSP5-8.5).
Sequencing_data
General Abbreviations:
- OTU: Operational Taxonomic Unit (a proxy for microbial "species").
- Sample_ID: Unique alphanumeric identifiers for individual soil samples (e.g., ZD1, ZD2).
- Taxonomic Levels: Represented by single letters in filenames: p (Phylum), c (Class), o (Order), f (Family), g (Genus), s (Species).
Data Processing & Quality Control
These files document the standardized preprocessing pipeline from raw sequencing reads to clean, analyzable data.
- SampleSeq_info.xls: Master metadata table. Contains sample names, group assignments, barcode/primer sequences, and sequencing platform details used to demultiplex and configure the pipeline.
- assemble_stat.xls: Paired-end read merging statistics produced by the FLASH algorithm. Columns:
Sample_name,Total_reads,Combined_reads(successfully merged),Uncombined_reads,Percent_combined(%),Combined_base(bp), and length metrics (Min_len,Max_len,Avg_len). - QCstat.xls: Quality control statistics detailing the filtering process (removing low-quality reads, reads with Ns, and short reads) from raw to clean data. Contains read counts and retention percentages for each filtering step.
- histograms.xls: Sequence length distribution of the clean data. Columns:
Length(base pairs) andCount(number of reads at that length). Used to generate length distribution histograms. - Tags_stat.xls: Summary table of the final valid tags (clean reads) and clustered OTU counts per sample, serving as a baseline for sequencing depth evaluation.
- checkSize.xls: A simple verification file containing the byte size of the delivered data package (e.g.,
1090423081 ./Result-xxx.zip) to ensure the zip file was downloaded completely without corruption.
Sequences and Taxonomy
- OTUs.fasta: Representative nucleotide sequences (A, T, C, G) for all identified OTUs.
- OTUs.tre: Phylogenetic tree file in Newick format, mapping the evolutionary distances between OTUs.
- OTUs.tax_assignments: The taxonomic classification lineage assigned to each specific OTU ID.
- classified_stat.xls: Taxonomic annotation depth statistics. Columns:
Sample_Name, followed by the number of tags successfully annotated at each taxonomic rank:Kingdom,Phylum,Class,Order,Family,Genus,Species. - species_stat.txt: Summary statistics specifically for species-level annotations, detailing detection rates and abundance of taxa resolved to the species level.
Abundance Tables (otu_table.*.xls)
- otu_table.absolute: The raw integer sequence read counts for each OTU across all samples.
- otu_table.relative: The relative abundance (fractional values from 0 to 1) of each OTU across all samples.
- otu_table.[level].absolute/relative: OTU abundances merged at specific taxonomic ranks (e.g., otu_table.g.relative for the genus level).
- otu_table.[level].absolute/relative.tran: Transposed versions of the abundance tables (rows and columns switched), formatted to be compatible with specific statistical and plotting software in R.
Alpha Diversity
Metrics estimating within-sample richness, evenness, and overall diversity.
- alpha_diversity_index.txt / alpha_diversity_index_group.txt: Comprehensive summary tables compiling all calculated alpha diversity indices for individual samples and aggregated sample groups, respectively.
- Observed_species / Chao1 / Shannon / Simpson / ACE / goods_coverage: These groups of files contain continuous numeric indices. ACE estimates total species richness; goods_coverage evaluates sequencing depth sufficiency (values closer to 1 indicate better coverage).
- These files follow a standard naming convention with up to four files each:
\*_Tukey.txt(ANOVA results),\*_Wilcox.txtWilcoxon results),\*SampleID.txt(raw index values per sample for plotting), and\*Description.txt(group metadata for plotting).
- These files follow a standard naming convention with up to four files each:
- plot_observed_species.xls: Rarefaction (saturation) curve data. Columns:
series(sampling depth/number of sequences drawn), followed by columns for eachSample_IDshowing the observed OTU count at that depth. - Alpha_div.pdf: Documentation explaining the statistical testing methods (parametric vs. non-parametric) used for alpha diversity comparisons and how to interpret the resulting text files.
- PD_whole_tree files: Faith's Phylogenetic Diversity incorporates phylogenetic branch lengths. Following the standard naming convention, there are four files:
PD_whole_tree_Tukey.txt(ANOVA results),PD_whole_tree_wilcox.txt(Wilcoxon results),PD_whole_treeSampleID.txt(raw index values per sample for plotting), andPD_whole_treeDescription.txt(group metadata for plotting). - _wilcox.txt / _Tukey.txt (e.g., ACE_Tukey.txt, goods_coverage_wilcox.txt): Files ending in
_Tukeycontain parametric ANOVA results, while_wilcoxcontains non-parametric Wilcoxon rank-sum test results comparing diversity between groups. Columns typically include group comparisons, test statistics, and p-values. - [Metric]Description.txt / [Metric]SampleID.txt: Supplementary metadata and sample alignments used by R scripts to generate boxplots for the respective diversity indices.
Beta Diversity & Clustering
Metrics evaluating differences in microbial community structure between samples.
- "unweighted" vs "weighted" unifrac files: UniFrac distance metrics used across Beta diversity analyses (PCoA, NMDS, UPGMA). Unweighted UniFrac considers only the presence/absence of lineages (sensitive to rare taxa), while Weighted UniFrac incorporates relative abundances (sensitive to dominant taxa).
- _dm.txt: Distance Matrices (e.g., bray_curtis, unifrac). Variables are Sample IDs on both rows and columns; values are continuous distance scores (0 = identical, 1 = completely different).
- _pc.txt / NMDS_scores.txt / PCoA.txt: Coordinate values used for plotting ordinations. Columns in PCoA.txt:
pc(Sample ID),V1(PC1 axis score),V2(PC2 axis score),V3(PC3 axis score). - bray_adonis.txt: PERMANOVA (Adonis) results based on Bray-Curtis distance. Evaluates if overall community structures differ significantly between groups. Columns include degrees of freedom, sum of squares, R² (variance explained), and p-value.
- stat_anosim.txt / stat_mrpp.txt: Statistical results testing if grouped communities are significantly different (variables include R-value and p-value).
- cluster.[level].txt / group.cluster.[level].txt: Hierarchical clustering outputs (e.g., UPGMA algorithms) at various taxonomic levels.
- cluster.[level].diff.txt: Text files detailing the structural differences identified within the clustering trees.
Environmental & Correlation Analysis
Analyses linking microbial community structures and diversity to environmental variables (e.g., Temperature, Depth).
- correlation.xls: Correlation matrix between environmental factors and Alpha diversity indices. Columns: Environmental variables (e.g.,
T,Depth) as rows, and Alpha indices (shannon,chao1, etc.) as columns. Values are correlation coefficients (positive or negative). - mantel_test_table.xls: Mantel test results evaluating the correlation between environmental variable matrices and the microbial community distance matrix. Columns:
Variable(environmental factor or combination),r(Mantel statistic/correlation coefficient),P(significance). - BioENV.txt: Bioenv analysis results identifying the best subset/combinations of environmental variables that maximizes the correlation with the community distance matrix, used prior to CCA/RDA ordination.
- VIF.txt: Variance Inflation Factor results used to detect multicollinearity among environmental variables. Variables with a VIF > 10 are typically excluded from ordination models to prevent distortion.
Network Analysis
Files generated to construct and visualize microbial co-occurrence networks (typically using Cytoscape or Gephi).
- node.xls: Node attribute file. Columns:
name(taxon/OTU name),group(taxonomic assignment, e.g., Phylum),score(topological score, degree, or relative abundance). - edge.xls: Edge (connection) attribute file. Columns:
source(node 1),target(node 2),value(correlation coefficient; positive values indicate co-occurrence, negative values indicate mutual exclusion).
Differential Abundance & Overlap
- pvalue.xls: Consolidated table of raw and adjusted p-values (e.g., FDR/q-value) for all taxa across all group comparisons. Used as the primary filter to identify significant biomarkers.
- [group]_[group].xls (e.g., ZD1_ZD2.xls): Pairwise differential abundance tables between two specific groups. Contains mean abundances for both groups, fold changes, and p-values for each taxon.
- LDA.#.txt: Outputs from Linear Discriminant Analysis Effect Size (LEfSe). Variables include specific taxa names and their LDA Score.
- .vennarea.xls: Tabular data containing the overlapping OTU IDs and the subset count shared between specified sample groups.
- [Group]vs[Group].test.xls: Complete tabular results of comparative statistical tests (e.g., t-test, MetaStat) between specific sample groups prior to strict significance filtering.
- [Group]-vs-[Group].psig / .qsig: Pairwise statistical comparisons containing features filtered for a significant p-value (< 0.05) or False Discovery Rate corrected q-value (< 0.05).
- [Group]VS[Group].map01110.txt / .metabolism.txt: Differential abundance results specifically mapped to KEGG metabolic pathways.
Functional Predictions
- predicted_metagenomes.KEGG_L1 / L2 / L3, KEGG_metagenome_predictions.txt: Files that aggregate functional relative abundances at different hierarchical levels of the KEGG database. Variables include Pathway_Name and corresponding relative abundance fractions per sample.
Key Information Sources
Climate layers, future emission scenarios, and microbiome reference databases were derived from the following sources:
- WorldClim database v2.1
- CMIP6 (Coupled Model Intercomparison Project Phase 6)
- SILVA Database (v138)
- KEGG Database (Kyoto Encyclopedia of Genes and Genomes)
Code/Software
Bioinformatic processing was performed using QIIME2 (version 2022.8) and the DADA2/FLASH plugins for OTU clustering, read merging, and quality filtering. The SILVA database (v138) was utilized for taxonomic assignment. Functional profiling was performed using Tax4Fun (version 1.0) mapped against the KEGG database.
R (version 4.2.3) was used for statistical and ecological modeling. Key packages included:
- biomod2 (version 4.2-3): For ensemble species distribution modeling (SDM).
- ecospat (version 3.2): For analyzing ecological niche dynamics and overlap.
- terra (version 1.7-18): For managing geographical spatial objects and environmental raster correlation.
- vegan (version 2.6-4): For PERMANOVA (Adonis), Mantel tests, Bioenv, VIF, and diversity index calculations.
- igraph / psych: For microbial co-occurrence network construction (node/edge generation).
