Data from: Dietary overlap of sympatric polyphagous alpine grasshoppers includes invasive plant species
Data files
May 15, 2026 version files 1.02 GB
-
ASV_data.xlsx
76.90 KB
-
Raw_sequence_file.7z
1.02 GB
-
README.md
3.64 KB
Abstract
We performed gut content analysis of three alpine grasshopper species—Sigaus nitidus, Sigaus nivalis, and Sigaus australis—collected from three alpine regions in New Zealand, using chloroplast DNA markers. First, we assessed the taxonomic resolution of the trnL and rbcL markers using 12 gut samples from Mount Hutt. We then analysed diet composition across species, sexes, and locations using 28 samples from three sites (Mount Hutt, Craigieburn Range, and Foggy Peak) with the trnL marker. The dataset includes raw sequence files (FastQ) and amplicon sequence variants (ASVs) classified as plant taxa (Viridiplantae: Streptophyta), comprising 112 rbcL and 328 trnL ASVs. ASVs were generated using the DADA2 pipeline in QIIME2 and identified via a custom reference database using the QIIME2 BLAST+ algorithm. The trnL marker revealed higher taxonomic diversity (23 genera, 2 species), compared to 17 genera and 2 species identified using rbcL. To further refine species-level identifications, trnL sequences with 100% pairwise identity were additionally compared against the NCBI nucleotide database, with matches retained only when the species are known to occur in New Zealand.
Author Information
Mari Nakano (1)(0000-0002-6809-0001), Steven A. Trewick (1) (0000-0002-4680-8457), Richard N. Watson (2,3), Mary Morgan-Richards (1)* (0000-0002-3913-9814)
(1) Wildlife & Ecology, SFTNS, Massey University, Private Bag 11-222, Palmerston North 4410, New Zealand m.nakano@massey.ac.nz; s.trewick@massey.ac.nz; m.morgan-richards@massey.ac.nz.
(2) Entomology Department, Lincoln College, University of Canterbury, Christchurch, New Zealand.
(3) 23 East Street, Claudelands, Hamilton, New Zealand
*Correspondence
Data information and file inventory.
File: Raw_sequence_file.7z
Description: Raw sequence files (FASTQ format) for the rbcL and trnL markers were generated from gut contents of 28 samples collected across three mountain sites in New Zealand: Mt Hutt, Foggy Peak, and the Craigieburn Range.
File names are formatted as "sample_name"_1 or "sample_name"_2, where _1 denotes the forward read and _2 denotes the reverse read, with the .fq file extension. Files are organised into folders named “rbcL” or “trnL”, each containing subdirectories corresponding to the sampling sites (“Mt Hutt”, “Foggy Peak”, and “Craigieburn Range”). Within each site-specific directory, files are further separated into “Forward” and “Reverse” folders.
File: ASV_data.xlsx
Description: Sheet trnL - Processed trnL
Amplicon Sequence Variant (ASV) data. A total of 328 unique ASVs were generated from the trnL region (column A)amplified from 28 gut samples collected across three mountain sites: Mt Hutt (columns L1–W1), Craigieburn Range (columns X1–Z1), and Foggy Peak (columns AA1–AM1). ASVs were identified using a custom reference database via the QIIME2 classify-consensus-blast algorithm (columns C–I; and its pairwise identity column B) and NCBI nucleotide BLAST (columns J and K).
Taxonomic identification using NCBI nucleotide BLAST (columns J and K) was assigned only when a 100% pairwise sequence identity was found for genus/species present in New Zealand. Values for each sample (columns L–AM) indicate the number of sequences assigned to each ASV. ASV sequences not assigned to a specific taxon by either method (n/a) are likely due to gaps or limitations in the available reference databases.
Abbreviations: nv = Sigaus nivalis; nt = Sigaus nitidus; au = Sigaus australis; F = female; M = male; n/a = sequence identity not available.
Description: Sheet rbcL - Processed rbcL
Amplicon Sequence Variant (ASV) data are provided in an Excel file (“ASV data”). A total of 112 unique plant ASVs were generated from the rbcL region (column A) amplified from 12 gut samples collected at Mount Hutt (columns J1–U1).
Taxonomic identification was performed using a custom reference database via the QIIME2 classify-consensus-blast algorithm (columns B–I). Values for each sample (columns J–U) indicate the number of sequences assigned to each ASV. ASV sequences not assigned to a specific taxon by a custom refence database (n/a) are likely due to gaps or limitations in the available reference databases.
Abbreviations: nv = Sigaus nivalis; nt = Sigaus nitidus; au = Sigaus australis; F = female; M = male; n/a = sequence identity not available.
Funding:
The work was supported by funding from the Miss E. L. Hellaby Indigenous Grassland Research Trust, The Orthopterists’ Society and Entomological Society of New Zealand.
Methods
Sampling
Adult grasshoppers of Sigaus nitidus, Sigaus nivalis, and Sigaus australis used for gut content analysis were collected from Mount Hutt (−43.5118, 171.5492; authorization number: 49878-RES), Foggy Peak (−43.294107, 171.744770), and Craigieburn Range (−43.125750, 171.686239; authorization number: 97397-FLO) in New Zealand during the summer months (February–March) between 2020 and 2023. Collections were carried out with permission from the New Zealand Department of Conservation, the New Zealand Forest Service, and local ski area operators. Grasshoppers were sampled from alpine habitats at elevations between 1340 m and 1700 m above sea level, within areas of >100 m², to ensure access to the same range of food plants. Specimens were preserved in 99% ethanol, with an incision made in the abdomen to maximize DNA preservation in the crop and gut.
DNA extraction and amplification
DNA was extracted from crop contents of 28 adult grasshoppers using the GeneJET Plant DNA extraction Kit (Thermo Fisher Scientific) according to the manufacturer’s instructions. Polymerase chain reaction amplified two segments of the chloroplast genome, trnL intron and rbcL (Erickson et al. 2017; Kress and Erickson 2007). DNA sequencing with customised primers and sample barcodes were sequenced on a NovaSeq platform providing 250bp paired end (PE) reads for > 50,000 tags per sample.
Bioinformatic analysis
Resulting 250 PE sequences were filtered and denoised using DADA2 in the QIIME2 (version 2024.5) platform using the following command:
1.Converting data
#fastq -> qza format (QIIME2 format)
Ref: How to import data for use with QIIME 2 - Microbiome marker gene analysis with QIIME 2
qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path manifest.txt \
--output-path single-end-demux.qza \ *paired-end-demux.qza if paired-end sequences
--input-format SingleEndFastqManifestPhred33V2
2.DADA2 denoising
#Denoising
Ref: dada2 - Microbiome marker gene analysis with QIIME 2
qiime dada2 denoise-single \ *qiime dada2 denoise-paired if paired-end sequences
--i-demultiplexed-seqs single-end-demux.qza \
--p-trunc-len-f 0 \
--p-trunc-len-r 0 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza
3.Data visualization (qzv files can be visualized in https://view.qiime2.org/)
#Denoising summary: percentage and number of sequences after filtered, denoised, chimera removal (and merged if paired-end)
qiime metadata tabulate \
--m-input-file denoising-stats.qza \
--o-visualization denoising_stats
#ASV summary table: list of consensus sequences and sequence length statistics
qiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs
#ASV table: visualize the number of sequence per sample per ASV
qiime tools export
--input-path table.qza
--output-path table
4.Blasting ASVs against NCBI custom reference database
qiime feature-classifier classify-consensus-blast \
--i-query rep-seqs.qza \
--i-reference-reads NCBI_fasta_file.qza \ *created using Dubois et al. 2022
--i-reference-taxonomy NCBI_taxonomic_lineages.qza \ *created using Dubois et al. 2022
--o-classification classified_sequence.qza \
--o-search-results searchresults.qza
BLAST and custom reference database
Identification of plant taxa was performed by comparing ASVs against a global custom QIIME2 trnL reference database, created using the DB4Q2 workflow (Dubois et al. 2022). This reference was built from FASTA-format nucleotide data retrieved from the NCBI database (https://www.ncbi.nlm.nih.gov/) in September 2024, using the following Entrez text queries:
trnL:
(viridiplantae[Organism] AND (trnL[Gene Name] OR tRNA-Leu[Title] OR (trnL-trnF[Title] AND intergenic spacer[Title]) OR trnL[Title] AND 100:500000[Sequence Length]) AND (chloroplast[Gene Name]) OR chloroplast[Title])
rbcL:
(viridiplantae[Organism] AND (rbcL[Gene Name] OR ribulose-1,5-bisphosphate carboxylase/oxygenase[Title] OR ribulose-1,5-bisphosphate carboxylase oxygenase[Title] OR rbcL[Title]) AND 100:500000[Sequence Length]) AND (chloroplast[Gene Name]) OR chloroplast[Title])
trnL ASV sequences that were not identified through the QIIME2 BLAST+ algorithm were further examined using the megablast function on the NCBI website to find the closest matches. Species-level identification was assigned only when a 100% pairwise identity match was found with plant species known to occur in New Zealand. The presence or absence of identified plant taxa in the Southern Alps was verified using the New Zealand Plant Conservation Network (https://www.nzpcn.org.nz/) and iNaturalist (https://www.inaturalist.org/).
References
Dubois B, Debode F, Hautier L, et al (2022) A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data. BMC Genomic Data 23:1–14. https://doi.org/10.1186/s12863-022-01067-5
Erickson DL, Reed E, Ramachandran P, et al (2017) Reconstructing a herbivore’s diet using a novel rbcL DNA mini-barcode for plants. AoB Plants 9:. https://doi.org/10.1093/aobpla/plx015
Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One 2:. https://doi.org/10.1371/journal.pone.0000508
