Metagenomes and metagenome-assembled genomes from Onthophagus taurus
Data files
May 05, 2025 version files 96.91 MB
-
Alphaproteobacteria.gff
987.15 KB
-
Azobacteroidaceae.gff
2.07 MB
-
Caryophanon.fa
1.25 MB
-
Caryophanon.gff
1.03 MB
-
Christensenellales_A.gff
2.13 MB
-
Christensenellales_B.fa
2.24 MB
-
Christensenellales_B.gff
1.87 MB
-
Croceibacterium.fa
1.65 MB
-
Croceibacterium.gff
1.41 MB
-
Crocinitomicaceae.gff
1.88 MB
-
Desulfovibrionaceae.gff
1.76 MB
-
Devosiaceae.fa
2.49 MB
-
Devosiaceae.gff
2.12 MB
-
Dysgonomonas.gff
2.05 MB
-
Fimivivens.fa
2.93 MB
-
Fimivivens.gff
2.51 MB
-
Fluviicola.fa
2.26 MB
-
Fluviicola.gff
1.67 MB
-
Frigididesulfovibrio.gff
1.99 MB
-
Gemmatimonadaceae.gff
2.04 MB
-
Lachnospiraceae.fa
1.97 MB
-
Lachnospiraceae.gff
1.56 MB
-
Massilibacteroides.gff
2.77 MB
-
Moraxellaceae.gff
1.68 MB
-
Negativicutes.gff
1.47 MB
-
Paludibacteraceae_A.fa
2.45 MB
-
Paludibacteraceae_A.gff
1.68 MB
-
Paludibacteraceae_B.gff
2.51 MB
-
Parachlamydiaceae.fa
1.54 MB
-
Parachlamydiaceae.gff
1.10 MB
-
Polyangiales.fa
3.73 MB
-
Polyangiales.gff
3.02 MB
-
Proteiniphilum.gff
1.49 MB
-
Pseudomaricurvus.gff
2.15 MB
-
Pseudomonas.fa
3.59 MB
-
Pseudomonas.gff
2.77 MB
-
README.md
4.19 KB
-
Ruminococcaceae_JAAYCI01.fa
1.76 MB
-
Ruminococcaceae_JAAYCI01.gff
1.57 MB
-
Ruminococcaceae_WRAV01.fa
2.99 MB
-
Ruminococcaceae_WRAV01.gff
2.24 MB
-
Ruminococcaceae.fa
1.71 MB
-
Ruminococcaceae.gff
1.51 MB
-
Saccharimonadales.fa
865.82 KB
-
Saccharimonadales.gff
638.96 KB
-
Saccharospirillaceae.gff
1.80 MB
-
Saezia.gff
1.91 MB
-
Stenotrophomonas.fa
3.34 MB
-
Stenotrophomonas.gff
2.75 MB
-
summary_statistics.csv
7.45 KB
Aug 11, 2025 version files 97.04 MB
-
Alphaproteobacteria.gff
987.15 KB
-
Azobacteroidaceae.gff
2.07 MB
-
Bacterial_Taxa.zip
113.81 KB
-
Caryophanon.fa
1.25 MB
-
Caryophanon.gff
1.03 MB
-
Christensenellales_A.gff
2.13 MB
-
Christensenellales_B.fa
2.24 MB
-
Christensenellales_B.gff
1.87 MB
-
Croceibacterium.fa
1.65 MB
-
Croceibacterium.gff
1.41 MB
-
Crocinitomicaceae.gff
1.88 MB
-
Desulfovibrionaceae.gff
1.76 MB
-
Devosiaceae.fa
2.49 MB
-
Devosiaceae.gff
2.12 MB
-
Dysgonomonas.gff
2.05 MB
-
Fimivivens.fa
2.93 MB
-
Fimivivens.gff
2.51 MB
-
Fluviicola.fa
2.26 MB
-
Fluviicola.gff
1.67 MB
-
Frigididesulfovibrio.gff
1.99 MB
-
Fungal_Taxa.zip
15.07 KB
-
Gemmatimonadaceae.gff
2.04 MB
-
Lachnospiraceae.fa
1.97 MB
-
Lachnospiraceae.gff
1.56 MB
-
Massilibacteroides.gff
2.77 MB
-
Moraxellaceae.gff
1.68 MB
-
Negativicutes.gff
1.47 MB
-
Paludibacteraceae_A.fa
2.45 MB
-
Paludibacteraceae_A.gff
1.68 MB
-
Paludibacteraceae_B.gff
2.51 MB
-
Parachlamydiaceae.fa
1.54 MB
-
Parachlamydiaceae.gff
1.10 MB
-
Polyangiales.fa
3.73 MB
-
Polyangiales.gff
3.02 MB
-
Proteiniphilum.gff
1.49 MB
-
Pseudomaricurvus.gff
2.15 MB
-
Pseudomonas.fa
3.59 MB
-
Pseudomonas.gff
2.77 MB
-
README.md
4.75 KB
-
Ruminococcaceae_JAAYCI01.fa
1.76 MB
-
Ruminococcaceae_JAAYCI01.gff
1.57 MB
-
Ruminococcaceae_WRAV01.fa
2.99 MB
-
Ruminococcaceae_WRAV01.gff
2.24 MB
-
Ruminococcaceae.fa
1.71 MB
-
Ruminococcaceae.gff
1.51 MB
-
Saccharimonadales.fa
865.82 KB
-
Saccharimonadales.gff
638.96 KB
-
Saccharospirillaceae.gff
1.80 MB
-
Saezia.gff
1.91 MB
-
Stenotrophomonas.fa
3.34 MB
-
Stenotrophomonas.gff
2.75 MB
-
summary_statistics.csv
7.26 KB
Abstract
Shotgun metagenomic sequencing was carried out on Onthophagus taurus larval gut sections, female adult midgut, and pedestal (a maternally-provisioned fecal pellet provided to offspring). Here we present the fasta sequences of 16 metagenomically assembled genomes, which fell below the NCBI completion threshold, as well as the 32 GFF protein predictions and annotations, resulting from RASTtk analysis of all 32 genomes assembled in the study, and .csv files containing the abundance of bacteria and fungal taxa across samples, from Kaiju.
https://doi.org/10.5061/dryad.4qrfj6qn8
Description of the data and file structure
"summary_statistics.csv" contains statistics and characterizations related to the metagenomically assembled genomes from this project.
Rows 1-4 are identifiers used to differentiate sample quality, NCBI sample information, and by what name they're referred to in manuscript text and figures.
Rows 5-11 contain summary statistics (base-pairs, contig number, N50, L50, G-C content, Completeness, and Contamination) from Quast v4.4 and CheckM v1.0.18
Row 12 quantifies predicted protein-coding sequences from RASTtk v1.073
Rows 13-19 shows taxonomic classification across each level (Domain to Species) from GRDB-Tk v1.7.0
Rows 20-25 shows the percentage of sample sequencing reads that align to that metagenomically assembled genome using Bowtie2 v2.3.2
Rows 26-33, column 1 gives definitions of terms and programs used to gain statistics.
Bacterial_Taxa.zip and Fungal_Taxa.zip contain .csv files with Kaiju outputs. Both folder has 6 .csv files with the total abundances of phyla, class, order, family, genus, and species across the 5 samples.
Table: Data included in repository. Assembly files in parentheses represent samples with >90% completion, located on NCBI
| Identifier | Assembly File (.fa) | RASTtk Output File (.GFF) |
|---|---|---|
| Pseudomaricurvus | (SAMN41896156) | Pseudomaricurvus.gff |
| Frigididesulfovibrio | (SAMN41896157) | Frigididesulfovibrio.gff |
| Proteiniphilum | (SAMN41896158) | Proteiniphilum.gff |
| Moraxellaceae | (SAMN41896159) | Moraxellaceae.gff |
| Paludibacteraceae_B | (SAMN41896160) | Paludibacteraceae_B.gff |
| Alphaproteobacteria | (SAMN41896161) | Alphaproteobacteria.gff |
| Christensenellales_A | (SAMN41896162) | Christensenellales_A.gff |
| Gemmatimonadaceae | (SAMN41896163) | Gemmatimonadaceae.gff |
| Dysgonomonas | (SAMN41896164) | Dysgonomonas.gff |
| Desulfovibrionaceae | (SAMN41896165) | Desulfovibrionaceae.gff |
| Massilibacteroides | (SAMN41896166) | Massilibacteroides.gff |
| Negativicutes | (SAMN41896167) | Negativicutes.gff |
| Crocinitomicaceae | (SAMN41896168) | Crocinitomicaceae.gff |
| Saccharospirillaceae | (SAMN41896169) | Saccharospirillaceae.gff |
| Azobacteroidaceae | (SAMN41896170) | Azobacteroidaceae.gff |
| Saezia | (SAMN41896171) | Saezia.gff |
| Parachlamydiaceae | Parachlamydiaceae.fa | Parachlamydiaceae.gff |
| Fluviicola | Fluviicola.fa | Fluviicola.gff |
| Pseudomonas | Pseudomonas.fa | Pseudomonas.gff |
| Ruminococcaceae WRAV01 | Ruminococcaceae_WRAV01.fa | Ruminococcaceae_WRAV01.gff |
| Lachnospiraceae | Lachnospiraceae.fa | Lachnospiraceae.gff |
| Paludibacteraceae_A | Paludibacteraceae_A.fa | Paludibacteraceae_A.gff |
| Ruminococcaceae | Ruminococcaceae.fa | Ruminococcaceae.gff |
| Devosiaceae | Devosiaceae.fa | Devosiaceae.gff |
| Caryophanon | Caryophanon.fa | Caryophanon.gff |
| Ruminococcaceae JAAYCI01 | Ruminococcaceae_JAAYCI01.fa | Ruminococcaceae_JAAYCI01.gff |
| Croceibacterium | Croceibacterium.fa | Croceibacterium.gff |
| Polyangiales | Polyangiales.fa | Polyangiales.gff |
| Christensenellales_B | Christensenellales_B.fa | Christensenellales_B.gff |
| Fimivivens | Fimivivens.fa | Fimivivens.gff |
| Stenotrophomonas | Stenotrophomonas.fa | Stenotrophomonas.gff |
| Saccharimonadales | Saccharimonadales.fa | Saccharimonadales.gff |
August-6th-2025: Added Bacterial_Taxa.zip and Fungal_Taxa.zip. Which contains .csv files for access to the Kaiju output. Both folders have 6 .csv files with the total abundances of phyla, class, order, family, genus, and species across the 5 samples.
Wild-caught Onthophagus taurus were collected and maintained in the lab. Pedestals (n=17), from the brood balls these individuals produced, were collected, pooled, and DNA was extracted using a Qiagen DNeasy PowerSoil Pro Kit. Larval gut sections (fore-, mid-, and hindgut; n=5), and adult female midguts (n=4), dissected from F1 offspring, were processed to enrich for bacteria before a modified phenol-chloroform extracted was used to extract DNA. The NEBNext Ultra II Kit protocol was used to size select FNA fragments, using Serapure beads, and to generate libraries. DNA was sequenced on the NextSeq500 sequencer using 300 PE cycles.
Sequences were quality controlled with Trimomatic v0.36 to remove adapter sequences and low quality reads. Assembly was carried out on KBase, where sequences were assembled with metaSPAdes v1.3.4 and binned into metagenomically assembled genomes using CONCOCT v1.3.4, MaxBins2 v1.1.1, and MetaBAT2 v2.3.0 with bin optimization with DAS Tool v1.1.2. Finally, RASTtk v1.073 was used for protein prediction and annotation.
Assembles with below 90% completion, and all RASTtk output files are stored here.
All MAG assemblies with completion above 90% (n=16) are stored on NCBI in BioProject PRJNA1117517 as BioSamples SAMN41896156-SAMN41896171.
