Data from: Molecular characterisation of faecal bacterial assemblages among four species of syntopic odonates
Data files
Sep 27, 2024 version files 4.49 GB
-
ODO_Bac_17Feb2018.tgz
4.48 GB
-
rdp_16s_v18_plus_ZymoMock_out.txt
503.84 KB
-
README.md
3.92 KB
-
zotus.fa
1.02 MB
-
zotutab_global.txt
2.94 MB
Abstract
Factors such as host species, phylogeny, diet, timing, and location of sampling are thought to influence the composition of gut-associated bacteria in insects. In this study, we compared the faecal-associated bacterial taxa for three Coenagrion and one Enallagma damselfly species. We expected high overlap in representation of bacterial taxa due to the shared ecology and diet of these species. Using metabarcoding based on the 16S rRNA gene, we identified 1513 sequence variants, representing distinct bacterial ‘taxa’. Intriguingly, the damselfly species showed somewhat different magnitudes of richness of ZOTUs, ranging from 480 to 914 ZOTUs. In total, 921 (or 60.8% of the 1513) distinct ZOTUs were non-shared, each found only in one species, and then most often in only a single individual. There was a surfeit of these non-shared incidental ZOTUs in the Enallagma species accounting for it showing the highest bacterial richness and accounting for a sample-wide pattern of more single-species ZOTUs than expected, based on comparisons to the null model. Future studies should address the extent to which faecal bacteria represent non-incidental gut bacteria and whether abundant and shared taxa are true gut symbionts.
README: Data from: Molecular characterisation of faecal bacterial assemblages among four species of syntopic odonates
https://doi.org/10.5061/dryad.08kprr58q
Molecular data from faecal PCR-based 16S sRNA DNA analysis from four damselfly species.
Description of the data and file structure
DNA extraction using Macherey-Nagel NucleoSpin Tissue XS Kit (product nr 740901, Macherey-Nagel, Düren, Germany), followed by 16S gene PCR using primers 515F-Parada 5′-GTG YCA GCM GCC GCG GTA A-3′ (Parada et al. 2016) and 806R-Apprill 5′-GGA CTA CNV GGG TWT CTA AT-3′ (Apprill et al. 2015), and library preparation. Sequencing was carried on Illumina MiSeq v3 2x300bp run.
Data consists of raw FASTQ files with two technical replicates per sample. Retrieved sequence variants (ZOTUs), ZOTU tables, and ZOTU assignations are also included. See the original article for more detailed information.
File contents and Column names
File = rdp_16s_v18_plus_ZymoMock_out.txt, tab-separated, no header-line
Column 1 = ZOTU ID. This is the sequence variant ID based on the bioinformatics carried out up to this step.
Column 2 = Taxonomic identification of each ZOTU. This column contains the whole taxonomic path from Domain to the genus level based on probabilistic assignation of the ZOTUs. The file contents are a standard output from USEARCH SINTAX algorithm. Each taxonomic level is separated with a comma and taxa abbreaviation. The probability of each ZOTU for each taxonomic level is given in parentheses. All taxonomic levels are abbreviated as follows:
d = Domain
p = Phylum
c = Class
o = Order
f = Family
g = Genus
File = zotutab_global.txt, space-separated, includes a header-line
First column = ZOTU ID. This is the sequence variant ID based on the bioinformatics carried out up to this step.
Columns 2-379 = Each column contains a sample, and each samples is represented as two technical PCR replicates. The naming of the samples follows this logic:
File = zotus.fa, fasta-formatted file
Each header line begins with character '>', followed by ZOTU ID.
The ZOTU nucleotide sequence begins always after the header line. The sequence is wrapped to 80 character rows.
File = ODO_Bac_17Feb2018.tgz, tar-ball file containing compressed (guzip) FASTQ-formatted raw sequence files
Each file is named as follows, separated by dash '-'
project-tag = Always Bac for this dataset.
replicate number 1 or 2 = Technical PCR replicate 1 or 2.
Odonate species = Odonate species information (COH = Coenagrion hastulatum; COL = Coenagrion lunulatum, COP = Coenagrion pulchellum, and ENC = Enallagma cyathigerum)
Sex code and individual number = sex (F = female; M = male) and a running number for each individual in the group.
ILLUMINA TECHNICAL CODE = everything after the underscore is technical Illumina sample code and relevant to the study or data.
File name extensions = The extension FASTQ refers to FASTQ format sequence file and extension GZ refers to gzip-compressed file.
For example: Bac-1-ENC-M4_S25_L001_R1001.fastq.gz is the first PCR replicate of the Enallagma cyathigerum male number 4 sample. Everything after the underscore is technical Illumina sample code and relevant to the study or data. The extension FASTQ refers to FASTQ format sequence file, and extension GZ refers to gzip-compressed file.
Methods
To assess the faecal bacterial assemblages of damselflies, we targeted four predatory odonate species at a freshwater pond of approximately 600 m × 200 m (12 ha), located in Southern Finland (ETRS-TM35FIN N: 67118; E: 2460). On 1–2 June 2016, we collected 185 individuals (20–26 males and females from each species) for faecal DNA analysis. All our focal damselfly species belong to the family Coenagrionidae: Coenagrion lunulatum (Charpentier, 1840), Coenagrion hastulatum (Charpentier, 1825), Coenagrion pulchellum (Vander Linden, 1825), and Enallagma cyathigerum (Charpentier, 1840). Species identification of damselflies was based on current literature, e.g. [27]. These four target species were selected as they were the most common predatory species at the study site, based on pilot surveys (K. Kaunisto, pers. obs.). Only sexually mature individuals with adult colours and hardened wings were included in the study. According to a previous study [28], all four focal species feed mainly on dipteran prey by open foraging flights and by gleaning insects from vegetation.
Each damselfly was placed into a sterile 10-ml collection tube housing a piece of dampened paper towel to reduce desiccation risk. To allow for defecation, damselflies were kept in the tubes for the next 24 h (sufficient time for defecation to occur, according to [18]). After the live individuals had defecated into the tube, we froze the entire sample without removing the faeces or the damselfly. All faecal material was collected from the tubes with sterile forceps, after which the faeces were frozen in 15-ml Falcon tubes at −64 °C until further processing and analysis.
Sample Processing and Molecular Analysis
Total DNA was extracted as described in a previous study using NucleoSpin Tissue XS Kit (product nr 740901, Macherey-Nagel, Düren, Germany) [28]. To characterize the bacterial assemblages of the focal species, we used established metabarcoding protocols for dragonflies building on earlier optimization [1828]. To amplify bacterial 16S rRNA gene (hypervariable region v4), we used primers 515F-Parada (also known as 515FB: 5′-GTG YCA GCM GCC GCG GTA A-3′; Parada et al. 2016) and 806R-Apprill (also known as 806RB: 5′-GGA CTA CNV GGG TWT CTA AT-3′; [29]). Each DNA sample was amplified in two separate reactions that were individually tagged and sequenced. The locus-specific PCR setup followed Kankaanpaa et al. [30] and included 5 μl of 2× MyTaq HS Red Mix (Bioline, UK), 2.4 μl of H2O, 150 nM of each primer (two forward and two reverse primer versions; total primer mix concentration 600 nM), and 2 μl of DNA extract per each sample in 10 μl volume. CycAQ6ling conditions were 3 min at 95 °C, then 35 cycles of 45 s at 95 °C, 1 min at 50 °C, and 1 min 30 s at 72 °C, ending with 10 min at 72 °C. In the second PCR stage, the first PCR products were modified by attaching Illumina-specific adapters and sample-specific indices. For a reaction volume of 10 μl in the indexing PCR, we mixed 5 μl of MyTaq HS RedMix, 500 nM of each tagged and indexed primer (i7 and i5), and 3 μl of locus-specific PCR product from the first PCR phase. For this second PCR, we used the following protocol: initial denaturation for 3 min at 98 °C, then 15 cycles of 20 s at 95 °C, 15 s at 60 °C, and 30 s at 72 °C, followed by 3 min at 72 °C. All the indexed reactions were then pooled and purified using magnetic beads [3132].
Sequencing was done on an Illumina MiSeq v3 PE 2×300 (Illumina Inc., San Diego, CA, USA) run, including the PhiX control library by the Turku Centre for Biotechnology, Turku, Finland. After sequencing, the reads were demultiplexed into each original sample and uploaded onto CSC servers (IT Center for Science, https://www.csc.fi/ ) for bioinformatic analysis. Paired-end reads (13,027,754) were merged and trimmed for quality using 64-bit vsearch version 2.14.2 [33] command ‘fastq_mergepairs’ with the default options and ‘fastq_allowmergestagger’. Primers were removed from the merged reads (11,179,018) using software cutadapt version 1.14 (Martin 2011) with 20% mismatch rate, minimum length of 240 bp and truncate length of 270 bp (the excess nucleotides were trimmed from 3′ end). Trimmed reads (11,050,385) reads were then collapsed into unique sequences (singletons removed) with command ‘fastx_uniques’ and option ‘minuniquesize’ set to 10 (49,832 uniques retrieved). Finally, reads were corrected for point errors to obtain an accurate set of amplicon sequences (=denoised) and filtered of chimeric amplicons (=chimeras were removed) resulting in 3803 ZOTUs (‘ZOTU’, ‘zero-radius OTU’) through command ‘unoise3’ using USEARCH version 11.0.667 with settings minsize = 8 and unoise_alpha = 2. The median and mean length of ZOTUs was 253 bp (SD ± 2.50 bp) Then ZOTUs were mapped back to the original trimmed reads with command ‘usearch_global’ to establish the total number of reads in each sample using vsearch. We were able to map 10,627,197 of 11,050,385 (96.17%) to our original samples. The ZOTUs (sequence variants) were assigned to taxa using 16 RDP database with SINTAX (Edgar, 2010) probabilistic algorithm implemented in vsearch. The database ‘16S RDP training set v18’ (21k seqs) was downloaded from the usearch website (https://drive5.com/usearch/manual/sintax_downloads.html; accessed 19th April 2023). For the chosen database, the genus level is the lowest taxonomic level. For any taxonomic level, we only accepted assignations with 100% probability. The data was further filtered to remove artefacts, spurious reads, and non-targets based on information on the numerous control samples, technical replicates, and taxonomy. First, we removed those ZOTUs from any sample that had fewer reads than extraction or PCR controls (9,833,618 reads retained). Then, we collapsed reads based on the taxonomy per each sample, that is, all the reads that were assigned to the same taxa per sample were summarized. Out of the 3803 ZOTUs, we identified 983 to genus, 1570 to family, 2002 to order, 3063 to class, 3319 to phylum, and 3482 to domain level. From the total ~10M reads, we identified 4.0M to genus, 4.4M to family, 8.5M to order, and 9.5M to the higher levels. Then, we removed taxa that were present in a sample by only one of the two replicates and finally summed the reads in both replicates (9,678,663 reads left). Then, to remove potentially leaked ‘tag-jumped’ reads from the data, we removed all taxa from the samples with less than 0.05% proportion of the total reads in one sample (9,636,233 reads saved). We removed all the taxa outside domains Bacteria or Archaea, as well as Class Chloroplast (9,006,117 reads passed the filtering). The non-targets included mainly plants (~6200 reads) and Fungi (~250 reads). Altogether 284,351 reads could not be assigned with the strict 100% probability threshold. Finally, very rare occurrences (sequence count < 20) were removed (9,004,996 final reads).