Raw reads and metadata for 16S and 12S sequencing for microbiome and dietary analysis of Tasmanian devils

Hogg, Carolyn 1 ; Molloy, Meadhbh1; McLennan, Elspeth1; Fox, Samantha2; Belov, Katherine1

Published Dec 01, 2025 on Dryad. https://doi.org/10.5061/dryad.qz612jmrj

Data files

Dec 01, 2025 version files 10.85 GB

DevilGM.R

149.29 KB
Diet_(12S)_fastq.zip

1.10 GB
Microbiome_(16S)_fastq.zip

9.75 GB
README.md

4.12 KB
sample_metadata_anonymousIDs.csv

14.95 KB

Abstract

The gut microbiome is an important component of host health and function and is influenced by internal and external factors such as host phylogeny, age, diet, and environment. Monitoring the gut microbiome has become an increasingly important management tool for wild populations of threatened species. The Tasmanian devil (Sarcophilus harrisii) is the largest extant carnivorous marsupial from the island state of Tasmania, Australia. Devils are currently endangered due to devil facial tumour disease. Previous assessments have shown differences between captive and wild devil gut microbiomes and changes during translocations. However, wild gut microbiome variability across Tasmania and the drivers of these differences are not well understood. We conducted a range-wide assessment of gut microbiomes at ten locations across Tasmania, via 16S rRNA sequencing, and tested the influence of diet (12S sequencing), location, sex, and cohort. We show that the five most abundant phylum and genera were consistent across all ten locations. Location, cohort, and sex impacted bacterial richness, but location did not impact diversity. While there were differences in diet across the state, there was no strong evidence of differences between juveniles and adults, nor between males and females. Contrary to our hypothesis, diet only explained a small amount of variation seen in microbial communities. We suspect that other variables, such as environmental factors and immune system development may have a stronger influence on gut microbiome variability. Adjustments to dietary supplementation is not necessary when preparing devils for translocation to different sites. Future research should prioritize collecting environmental samples for microbial analysis and integrating metabolomics to elucidate functional differences associated with Tasmanian devil gut microbiome variability.

This dataset was generated to assess the gut microbiome and diet of Tasmanian devils (Sarcophilus harrisii) across 10 locations in Tasmania, Australia. We used high-throughput sequencing to target two amplicon regions:

The bacterial 16S rRNA gene (V3–V4 region) to characterize gut microbiota.
The vertebrate mitochondrial 12S rRNA gene to detect vertebrate dietary DNA.

The dataset includes raw sequencing reads (FASTQ files) and associated sample metadata.

Descriptions

The dataset includes paired-end FASTQ files from amplicon sequencing of Tasmanian devil fecal samples, generated to characterize both gut microbiota (16S rRNA gene) and dietary composition (12S mitochondrial gene). Files are demultiplexed and anonymised due to threatened species status. There are a total of 856 FASTQ files:

428 files for 16S sequencing (214 samples × forward and reverse reads; folder Microbiome_(16S)_fastq.zip)
428 files for 12S sequencing (214 samples × forward and reverse reads; folder Diet_(12S)_fastq.zip)

Each sample has two associated files: one forward read (R1) and one reverse read (R2). File names follow the convention:

[Gene]_[LocationCode][SampleCode]_S[RunNumber]_L001_R[ReadDirection]_001.fastq.gz

Where:

[Gene] is either 16S (microbiome) or 12S (diet)
[LocationCode] is a 2–3 letter abbreviation for the sampling location:
- BRO = Bronte
- FEN = Fentonbury
- BUC = Buckland
- GRA = Granville
- KEM = Kempton
- MI = Maria Island
- NNP = Narawntapu
- SH = Stony Head
- WOO = Woolnorth
- WU = wukalina
[SampleCode] is a numeric identifier for the sample at that location
S[RunNumber] is a sequencing run/sample ID
L001 is the Illumina lane number
R1/R2 indicate forward/reverse reads

Metadata File Description

The dataset includes a metadata file (sample_metadata_anonymousIDs.csv) that provides contextual information for each fecal sample included in the sequencing dataset. Each row corresponds to a single sample, and columns contain the following variables:

SampleID – Unique identifier for each sample, which corresponds to the FASTQ filenames.
Sex – Biological sex of the individual devil.
Age – Age of the individual at the time of sampling.
DFT_Score – Scoring system indicating Devil Facial Tumor (DFT) status.
DFT – Indicates presence or absence of Devil Facial Tumor.
Location – Full name of the sampling location (e.g., Woolnorth, Fentonbury, etc.), matching the location codes in FASTQ filenames.
Sample_Date – Date the fecal sample was collected, in MM_DD_YYYY format.
Extraction_Date – Date the DNA was extracted from the sample, in MM_DD_YYYY format.
ng_uL – DNA concentration (nanograms per microliter) after extraction.
Negative_Control – Indicates which negative control was processed alongside the sample.
Cohort – Age cohort classification (Juvenile or Adult), used for broad biological grouping.

Sharing/Access information

All data are provided directly through this Dryad submission.

Code/Software

R (version 4.2.2) was used for all primary analyses. The workflow is documented in the script DevilGM.R, provided with this dataset. Some preprocessing and phylogenetic tree generation steps were completed using QIIME2 and shell commands.

The script includes:

Amplicon sequence processing and quality filtering using dada2
Taxonomic assignment with the SILVA reference database (16S) and BLASTN against NCBI mitochondrial genomes (12S)
Contaminant filtering using negative controls with decontam
Integration of QIIME2-generated phylogenetic trees into R using phyloseq
Diversity analyses (alpha and beta), ordination, and differential abundance testing using DESeq2
Metadata integration and taxonomic table formatting for downstream analysis

Annotations are provided throughout the script for reproducibility and clarity.

DNA from faecal material was extracted in a dedicated clean laboratory within a biosafety cabinet following the QIAmp PowerFecal Pro Kit (Qiagen) using 250 mg of fecal material from the center of each scat. A core subsample was taken from each fecal sample to ensure an even view of the bacterial community. Contamination was reduced by decontaminating the workspace and tools using 70% ethanol and bleach between each sample. Negative controls were used during each extraction batch to detect contaminants. DNA quantity and quality were checked using a Nanodrop (Thermo Fisher Scientific). The DNA extractions from the faecal samples had a 260/280nm ratio of approximately ~1.8, indicating “pure DNA”. To confirm the successful isolation of bacterial DNA, PCR amplification to target the V3-V4 region using 341F and 806R primers was performed on a random subset of samples from each batch, with banding confirmed through gel electrophoresis following Chong et al. 2019. PCR steps included (1) initial denaturation at 95 °C for 1 minute, (2) 35 cycles of denaturation at 95 °C for 15 s, (3) annealing at 55 °C for 15 s, (4) extension at 72 °C for 15 s, and (5) final extension at 72 °C for 1 minute. 20 uL of each of the resulting amplicon products were loaded into 96-well fully skirted PCR plates with a final concentration of 5-10 ng/uL and sent for 16S V3-V4 amplicon library preparation and Illumina MiSeq v3 2 × 300 base-pair sequencing.

In addition, amplicon products of the 12S V5 region targeting vertebrate species for all samples were produced to capture a broad range of prey species, including a blocking oligonucleotide primer (“12Sv5DevilB”) to reduce Tasmanian devil host DNA amplification. Successful DNA amplification was confirmed for all samples using gel electrophoresis (1% agarose at 90 voltage for 35 minutes). Amplicon products were sent with Illumina overhangs for Indexing PCR and Library preparation using Illumina MiSeq v2 sequencing platform.

This dataset contains the raw reads for both 16S and 12S sequencing and the corresponding metadata. Samples have been anonymised due to being a threatened species.