Data from: Shotgun microbiome analysis of two Schizaphis graminum biotypes with time with and without carried cereal yellow dwarf virus
Data files
Jul 07, 2025 version files 8.56 MB
-
gbmbiomeItodatadryad.tar.gz
8.56 MB
-
README.md
2.49 KB
Abstract
Reads from a prior RNAseq study of gene expression in greenbug aphid (Schizaphis graminum (Rondani)) were used for shotgun metagenomic analysis in relation to two aphid biotypes, presence or absence of carried cereal yellow dwarf virus (CYDV), and five timepoints from zero to 20 days post-infestation. There were three biological replicates per condition. Reads were aligned with bwa mem to reference or representative genomes of 47000 bacterial, 1218 archaeal, 14165 viral, 571 fungal, and 94 protozoan taxa, plus greenbug and wheat, and each read was credited to the highest-scoring taxon. Read counts by taxon were imported into QIIME2 for statistical analysis and display. There were 105635 to 5938545 microbial reads per sample. The reads matched 3348 genera. The ratio of total microbial counts to total greenbug counts peaked at day 5 and declined 50% by day 20. Barplotted relative frequencies indicated two major communities, one enriched in Shigella and Escherichia, the other depleted in Shigella and Escherichia but enriched in Acinetobacter and Gilbertella. With one exception, the depleted community was restricted to days 15 and 20 and existed in both biotypes with and without carried CYDV. CYDV was not detected in any sample, but an aphid pathogen, Rhopalosiphum padi virus, was present in 20 samples and exceeded 4000 counts in four samples, of which two were enriched in Shigella and Escherichia. Two unrelated photosynthetic genera, Microcystis and Lamprocystis, were relatively abundant; the latter was positively correlated with Shigella. Oddly, Letharia (as an ascomycete) was the fifteenth most abundant hit overall, despite that Letharia itself is a lichen. Shigella-depleted communities in 24 samples were significantly more alpha-diverse than communities in the remaining samples (Shannon entropy 5.06 in depleted versus 3.84 otherwise, t = -8.46, p = 1.8e-09). Principal-components projection of Bray-Curtis dissimilarity showed two distinct clusters whose membership conformed to the two groups distinguishable in the barplotted relative frequencies. Sample B02 lay between the clusters. The first three principal components accounted for 60.54% of the variation. Permanova of Bray-Curtis distances with Adonis confirmed that only time (p = 0.001) and the interaction of time with biotype (p = 0.035) significantly affected beta diversity. In conclusion, two distinct microbiome communities existed in the aphids, where the Shigella-depleted community accompanied yellowing and death of the wheat host as it succumbed to aphid feeding and yellow dwarf disease.
https://doi.org/10.5061/dryad.z08kprrp1
Description of the data and file structure
To see the contents of this tar.gz file, type 'gunzip gbmbiomeItodatadryad.tar.gz; tar -xvf gbmbiomeItodatadryad.tar' at the Unix command line. This tar file contains two kinds of data files: all*counttable0502.txt and *usedsamcountsbytaxon*.txt. The former were aggregated from the latter. The former has the taxon name in the first column and counts for each combination of biotype, viral carrier status, and timepoint in subsequent columns. The header row gives the codes for these combinations: biotype (B or H), viral carrier status (nothing or RPV), timepoint (integers from 0 through 7), and replicate (1, 2, or 3). The latter kind has two parts: an initial list of GenBank accessions grouped by NCBI taxonomic identities (taxid), followed by counts by taxon upon grouping GenBank accessions. There is one file of the second kind for each combination of biotype, viral carrier status, timepoint, and replicate.
Sharing/Access information
Reference and representative genomes were downloaded from subdirectories archaea, bacteria, viral, protozoa, and fungi in https://ftp.ncbi.nlm.nih.gov/genomes/refseq/. Taxonomic files nucl_gb.accession2taxid, nucl_wgs.accession2taxid, nucl_wgs.accession2taxid.EXTRA, and rankedlineage.dmp were downloaded from https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/ and https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/.
Code/Software
Perl and bash scripts were written to download and manipulate files. Two scraping scripts, drillingwget.pl and drillingwget2.pl, were written to download files from NCBI; the second script was needed for the somewhat different naming convention for viruses. Bash scripts ran bwa index and bwa mem. Perl script maxscore1015.pl collated alignments for the same read among databases and found the highest-scoring hit for each read. Script grouptaxa1211.pl grouped counts by taxa at taxonomic levels from genus to phylum. Script buildtoptable.pl isolated the top-ranked taxa by overall count for importation into QIIME2 for community analysis and graphical presentation.
An RNAseq study was conducted using whole Schizaphis graminum aphids without surface sterilization. Reads were mapped to reference genomes of Schizaphis graminum, its host Triticum aestivum (wheat), and all available reference or representative genomes of bacterial, archaeal, viral, fungal, and protozoan taxa available in NCBI as of 31 October 2023. It was necessary to divide these genomes among 61 separate databases and then merge the results of bwa mem runs because of the slowness of bwa indexing. A perl script, grouptaxa1211.pl, was written to collate closest hits in the resulting sam files. Other scripts were written to aggregate counts by taxon at different levels from species to phylum using files from NCBI taxonomy. The raw counts were subjected to DESeq2 to identify significantly responding taxa, since the data format was identical to the sister RNAseq data analyzed for aphid gene expression. The counts by taxon were also imported into QIIME2 for graphical presentation and analysis of community diversity. Of important note, the library preparation apparently selected RNA with a polyA tail. CYDV lacks a polyA tail and was not detected. RhPV has a polyA tail and was detected.