Structural and functional differences of gut microbiota in Pomacea canaliculata from different geographical locations and habitats
Data files
Aug 29, 2024 version files 23.45 MB
-
alphadiversity.csv
-
asv_group_g_even.csv
-
asv_group_p_even.csv
-
ASVtable.csv
-
gxlevel12.csv
-
HZ_ditch.csv
-
HZ_paddy.csv
-
HZ_pond.csv
-
KEGG_L1.xlsx
-
KEGG_L2.xlsx
-
ldaresult.csv
-
LZ_ditch.csv
-
LZ_paddy.csv
-
LZ_pond.csv
-
NMDSdata.csv
-
NN_ditch.csv
-
NN_paddy.csv
-
NN_pond.csv
-
README.md
-
WZ_ditch.csv
-
WZ_paddy.csv
-
WZ_pond.csv
-
YL_ditch.csv
-
YL_paddy.csv
-
YL_pond.csv
Abstract
Gut microbiota is related to host fitness, and influenced by geographical locations and habitats. Pomacea canaliculata is a malignant invasive alien snail that threatens agricultural production and ecosystem functions worldwide. Clarifying the general rules of the gut microbial community structure and function of the snails in different geographical locations and habitats is of great significance for understanding their invasion at different spatial scales. This study used high-throughput sequencing technology to compare and analyze the differences in community structure and function of gut microbiota in P. canaliculata from five geographical locations (Liuzhou, Yulin, Nanning, Wuzhou, and Hezhou) and three different habitats (pond, paddy field, and ditch) in Guangxi Province. The results showed that the intestinal microbial alpha diversity of P. canaliculata was higher in Liuzhou, Yulin, lower in Nanning, Wuzhou, and Hezhou, and higher in ponds compared with paddy fields and ditches. The dominant phyla of gut microbiota in snails were Firmicutes, Cyanobacteria, Proteobacteria, Fusobacteriota, and Bacteroidota, and the dominant genus was Lactococcus. The community structure of gut microbiota in snails varied significantly across different geographical locations and habitats, and the phyla Firmicutes and Cyanobacteria had significantly higher relative abundance in snails collected from Nanning and Yulin, respectively. Moreover, the relative abundance of gut functional microbiota associated with human disease in P. canaliculata was significantly affected by geographical locations and habitats, and with the highest abundance in ponds. However, the relative abundance of functional microbiota related to metabolism, genetic information processing, organizational systems, environmental information processing, and cellular processes was only significantly affected by geographical locations. Collectively, geographical locations and habitats had significantly different effects on the community structure and function of gut microbiota in P. canaliculata, and the greater differences were caused by geographical locations rather than by habitats.
README: Structural and functional differences of gut microbiota in Pomacea canaliculata from different geographical locations and habitats
Description of the data and file structure
https://doi.org/10.5061/dryad.4f4qrfjmk
From August to September 2022, we collected Pomacea canaliculata in the field from five geographical locations (Nanning, NN; Liuzhou, LZ; Yulin, YL; Wuzhou, WZ; Hezhou, HZ) in Guangxi Province, China. Three sites containing 3 habitats (pond, paddy field, ditch) simultaneously were randomly selected in each geographical location, and 5 quadrats (1m * 1m) were set in each habitat. One adult snail was taken from each quadrat for intestinal sample collection. A total of 225 snails were collected in this study. The entire intestinal contents of each snail were extracted into a sequence to get the gut microbiota data, which was the dataset of this study.
ASVtable is the Amplicon Sequence Variants (ASVs) of the 225 Pomacea canaliculata samples after sequencing. NN, LZ, YL, WZ, and HZ are short for Nanning, Liuzhou, Yulin, Wuzhou, and Hezhou, respectively, and C, D, and G represent pond, paddy field, and ditch in the column name of the ASVtable. The row names mean the ASV ID, and the number in each cell represents the reads of a certain ASV in an intestinal sample, without units. A total of 15 snail intestinal samples were collected from a single habitat under a single geographical location, for example, the column name LZ.C.1 means the snail intestinal sample was collected from one certain quadrat in a pond from Liuzhou.
Alpha diversity data is for comparing the alpha diversity indices of gut microbiota in Pomacea canaliculata across geographical locations and habitats and visualization for Figure 2. The column names location, habitat, site, Chao1, Shannon, Simpson, weight, height, width, and smwidth mean geographical locations (NN, LZ, YL, WZ, and HZ), habitats (pond, paddy field, and ditch), sites (1, 2, and 3), Chao1 index, Shannon index, Simpson index, snail body mass, snail shell height, snail shell width, and snail shell mouth width, respectively. The unit of weight is gram (g) and the unit of height, width, smwidth is centimeter (cm). The diversity index is generally considered to have no units.
The asv_group_p_even and asv_group_g_even data are used to visualize the relative abundance of gut microbiota in Pomacea canaliculata at the phylum and genus level (Figure 3). The column names location, habitat, type, abundance mean geographical locations (NN, LZ, YL, WZ, HZ ), habitats (pond, paddy field, ditch), the names of phyla or genera, the relative abundance of phyla or genera, respectively. The data in the cell is a proportional number without a unit.
NMDSdata is the results of Non-metric multidimensional scaling (NMDS) analysis based on Bray-Curtis distances of gut microbiota in Pomacea canaliculata from different geographical locations and habitats (visualization for Figure 4). The column names location, habitat mean geographical locations (NN, LZ, YL, WZ, HZ), and habitats (pond, paddy field, ditch), respectively. NMDS1 and NMDS2 are the points of NMDS with no units.
The datasets with prefixes (NN, LZ, YL, WZ, HZ) and suffixes (pond, paddy, ditch) are for the Venn diagram to show the numbers of common and unique ASVs among the pond, paddy, and ditch under different geographical locations (Figure 5). For example, in the HZ_ditch.csv file, the column name "OUT" means the types and ID of ASVs in the gut of P. canaliculata from the ditch under Hezhou. The column name "phylo" means the phylogenetic relationship of the ASV. The prefixes k, p, c, o, f, g, and s in the cell mean the kingdom, phylum, class, order, family, genus, and species, respectively.
Ldaresult data is the result of linear discriminant analysis effect size (LEfSe) analysis of gut microbiota in P. canaliculata from different geographical locations and habitats (|LDA| > 4, P < 0.05) using the LEfSe software (Version 1.0), and visualizing for Figure 6. The column names location, bioname, habitat mean geographical locations (NN, LZ, YL, WZ, HZ ), biomarkers in different habitats and geographical locations, habitats (pond, paddy field, ditch), respectively. The column names score and p mean LDA score and P value; |LDA|score > 4, P < 0.05 means significant biomarker with no units. The prefixes p, c, o, f, g, and s in the cell (column name bioname) mean the phylum, class, order, family, genus, and species, respectively.
The gxlevel12, KEGG_L1, and KEGG_L2 data are the functions and the relative abundance of functional pathways of gut microbiota in P. canaliculata from different geographical locations and habitats at KEGG level 1 and KEGG level 2, and analyzed and visualized for Figure 7 and Figure 8. In the gxlevel12.csv file, the column names L1 and L2 mean the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways at level 1 and level 2, other column names are the same as the file of ASVtable. The data in the cell mean the relative abundance of the pathways with no units. KEGG_L1 and KEGG_L2 together constitute dataset gxlevel12.
Sharing/Access information
https://doi.org/10.5061/dryad.4f4qrfjmk
Code/Software
The analyses were performed using R software (version 4.3.1).
Methods
Sample collection
In August to September 2022, we collected P. canaliculata in the field from 5 geographical locations (Nanning, NN (107°77′E, 23°09′N); Liuzhou, LZ (109°31′E, 24°37′N); Yulin, YL (109°99′E, 22°32′N); Wuzhou, WZ (110°30′E, 23°54′N); Hezhou, HZ (111°67′E, 24°35′N)) in Guangxi Province, China (Figure 1A). Three sites containing 3 habitats (pond, paddy field, ditch, Figure 1D, E, F) simultaneously were randomly selected in each geographical location, and 5 quadrats (1m2) were set in each habitat (Figure S3). One adult snail was taken from each quadrat for intestinal sample collection, and the distance between the sample quadrat was about 10 meters. We distinguished between male and female when sampling the snails and 23 female snails and 22 male snails were collected from each geographical location. A total of 225 P. canaliculata were collected in this study (5 geographical locations Í 3 sites Í 3 habitats Í 5 replicates). All the testing snails collected from five geographical locations were preliminarily discerned by shell morphological analysis (Hayes et al., 2012) and using primers LCO1490 or HCO2198 to amplify cytochrome C oxidase subunit I (COI) gene to identify P. canaliculata (Yang et al., 2019) which could be used to sequence for gut microbiota. The body weight, shell height (Table S2), shell width, and shell mouth width of each P. canaliculata were also measured. All sampling individuals were wiped with 75% ethanol three times and followed by rinsing twice in distilled water to sanitize the surface prior to dissection. The entire intestinal contents were extracted carefully to avoid rupturing the gut wall. Each sample was stored in a sterile tube using liquid nitrogen and later stored in a freezer of -80°C.
DNA extraction and sequencing
The total genomic DNA (gDNA) of each sample was extracted using the cetyltrimethylammonium bromide (CTAB) method (Allen et al., 2006). The V3-V4 hypervariable region of the 16S rDNA genes was amplified using specific bacterial primers 341F (CCTAYGGGRBGCASCAG) and 806R (GGACTACNNGGGTATCTAAT) by polymerase chain reactions (PCRs). All PCR mixtures contained 15 µL of Phusion® High-Fidelity PCR Master Mix (New England Biolabs), 0.2 µM of each primer and 10ng target DNA, and cycling conditions consisted of a first denaturation step at 98°C for 1 min, followed by 30 cycles of denaturation at 98°C for 10 s, primer annealing at 50°C for 30 s and extension at 72°C for 30 s, with a final extension step carried out at 72°C for 5 min to ensure complete amplification. The PCR products were purified with a Qiagen Gel Extraction Kit (Qiagen, Germany). Sequencing libraries were generated with NEBNext® Ultra™ IIDNA Library Prep Kit (Cat No. E7645) following the manufacturer’s recommendations, and the library quality was evaluated on the Qubit@ 2.0 Fluorometer (Thermo Scientific) and Agilent Bioanalyzer 2100 system. The library was sequenced on an Illumina NovaSeq platform.
Statistical and bioinformatics analyses
Firstly, paired-end reads were assigned to samples based on their unique barcodes and were truncated by cutting off the barcodes and primer, and merged using FLASH (Version 1.2.11). Quality filtering on the raw tags was performed using the fastp (Version 0.20.0) software to obtain high-quality clean tags which were compared with the SILVA 123 database using Vsearch (Version 2.15.0) to detect the chimera sequences, and the chimera sequences were removed to obtain the effective tag. Denoise of the effective tags was performed with DADA2 to obtain initial Amplicon Sequence Variants (ASVs), and then ASVs with an abundance of less than 5 were filtered out. Secondly, species annotation was performed using QIIME2 software (Version QIIME2-202006) based on the SILVA 123 database, and multiple sequence alignment was performed to study the phylogenetic relationship of each ASV and the differences of the dominant species among different samples. Finally, all samples were rarefied to the sequencing depth of the lowest sample (26970 clean reads).