Sequencing data for seabird eDNA in long-nosed fur seal diets from southeastern Australia
Data files
May 17, 2024 version files 41.45 MB
Abstract
Wildlife conflicts require robust quantitative data on incidence and impacts, particularly among species of conservation and cultural concern. We apply a multi-assay framework to quantify predation in a southeastern Australian scenario where complex management implications and calls for predator culling have grown despite a paucity of data on seabird predation by recovering populations of long-nosed fur seals (Arctocephalus forsteri). We apply two ecological surveillance techniques to analyse this predator’s diet – traditional morphometric (prey hard-part) and environmental DNA metabarcoding (genetic) analyses using an avian specific primer for the 12S ribosomal RNA (rRNA) gene – to provide managers with estimated predation incidence, number of seabird species impacted and inter-prey species relative importance to the predator. DNA metabarcoding identified additional seabird taxa and provided relative quantitative information where multiple prey species occur within a sample; while parallel use of both genetic and hard-part analyses revealed a greater diversity of taxa than either method alone. Using data from both assays, the estimated frequency of occurrence of predation on seabirds by long-nosed fur seals ranged from 9.1–29.3% of samples and included up to 6 detected prey species. The most common seabird prey was the culturally valued little penguin (Eudyptula minor) that occurred in 6.1–25.3% of samples, higher than previously reported from traditional morphological assays alone. We then explored DNA haplotype diversity for little penguin genetic data, as a species of conservation concern, to provide a preliminary estimate of the number of individuals consumed. Polymorphism analysis of consumed little penguin DNA identified five distinct mitochondrial haplotypes – representing a minimum of 16 individual penguins consumed across 10 fur seal scat samples from 99 sampled across southeastern Australia. We recommend rapid uptake and development of cost-effective genetic techniques and broader spatiotemporal sampling of fur seal diets to further quantify predation and hotspots of concern for wildlife conflict management.
README: Sequence data for seabird eDNA in long-nosed fur seal diets from southeastern Australia
https://doi.org/10.5061/dryad.stqjq2cb3
The dataset contained herein represents seabird DNA from the 12s ribosomal RNA gene, extracted and collected from 99 long-nosed fur seal faecal (scat) samples from multiple time points and locations across southern New South Wales and Victoria described below, and in greater detail in the associated publication. Here we provide sequenced, demultiplexed and paired-end data in .fastq files, that have been minimally filtered and we highlight the steps taken with these shared data below, and provide further information on how the authors chose to further filter sequences. More processed versions of these data are available upon request.
Data contained in .fastq files includes:
- All seabird sequences found within submitted samples sequenced using the Bird12sa/h primer (Cooper, 1994). Methods for sample screen, diagnostic PCR and sequencing are described by Hardy et al. (2024). However, we provide a summary of key steps herein.
- Using the avian specific Bird12sa/h assay, a total of 32 samples (of 99) showed target amplicons in both or a single duplicate at neat DNA concentration, all extraction and PCR controls were negative. DNA extracts of the 32 samples that tested positive for birds, and two extraction blanks and one positive control (n = 35 samples for sequencing) were therefore sent for quantitative PCR (qPCR), cleanup, sample-based rarefaction and extrapolation sampling curves, appropriate sequencing depth (< 10,000 reads per sample) and next generation sequencing performed on Illumina Miseq by Ramaciotti Centre for Genomics (RCG), University of New South Wales.
- There, a single-step fusion tagging PCR procedure was used to attach and assign unique MID (Multiplex IDentifier) tag combinations, next generation sequencing (NGS) adaptors and the Bird12sa/h assay.
- Amplicons were purified and blended at RCG in equimolar concentrations to form a library, which was sequenced with a 150 bp paired-end sequencing kit (Illumina Miseq v2 Nano 150 bp).
- After sequencing, samples were ‘demultiplexed’ and assigned to the correct original sample by their individual MID tags by RCG.
- For each sample submitted for genetic sequencing on Illumina Miseq platform, RCG generated two paired-end forward and reverse sequences that were merged based on 70 bp overlap, and then trimmed of their forward and reverse primers, sample indexes (MID tags) and adapters off the ends of each sequence. The trimmed sequences are described in the metadata spreadsheet and below in description of the data and file structure.
- Finally, additional sequences were discarded if they did not contain exact matches to both the forward and reverse PCR primers, tags, and adaptor sequences, failed to pair, or were > 10% shorter than the primer product length (expected 220 bp, discarded below 200 bp). Thus, the data contained in shared .fastq files made available here were minimally filtered, but are not fully filtered to remove sequences that could have resulted from sequencing error or that occur below various thresholds of sequence quality.
- We share this partially processed and minimally filtered dataset, because users differ greatly in the level of conservatism they wish to apply to sequencing data and we thus allow users to reprocess these data using their preferred thresholds for sequence quality filtering and in a format compatible with known pipelines for eDNA data processing.
Description of the data and file structure
Detailed description and metadata for file naming convention, location and dates of samples, as well as additional suppporting information regarding all samples that were screened both for hard-parts and for genetic remains within samples are contained in the metadata file titled "PINP_SEABIRD_META_AND_DATA.xlsx" uploaded alongside .fastq files. In this file, the following tabs describe and include the following information, and clarify how data were processed to arrive at the genetic data made available in the .fastq files share here:
- README: This is a brief directory of the tabs included in this spreadsheet, this includes a brief summary of colours used when describing samples in the BIRD_DATA and BIRD_OTUs tabs.
- FILES_NAMES: Describes the file naming structure of the .fastq files
- Sequencing_samples_submission: contains metadata that informs how the data were processed prior to inclusion in this data publication and includes a detailed list of samples that tested positive in diagnostic and that were submitted for sequencing by Illumnia Miseq. This sheet includes information on the adapters, sample indexes (MID) tags and primers used, and that we also trimmed. Note that the sequencing platform originally generated two sets of .fastq files for each sample that included paired-end forward and reverse sequences (2x ~150 bp fragments, with overlap of 70 bp). To produce a single .fastq file for each sample made available here, the authors merged paired-end forward and reverse sequences based on 70 bp overlap, and then trimmed forward and reverse primers, sampled indexes (MID tags) and adapters off the ends of each sequence.
- BIRD_DATA: includes fully processed data presence / absence data of hard-parts and genetic remains identified in each sample, this is an additional supporting information to accompany these data.
- BIRD_OTUs: includes the outputs of our custom genetic data quality filtering steps to accompany the data files shared herein.
- ESM_Tables_Figs: includes additional supporting metainformation that are published online with the manscript, but made available here alongside the data for ease of access of information.
Sharing/Access information
These data are released under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in linked Frontiers in Marine Science journal (below) is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Code/Software
Processing of these data are described in the Methods section adapted from the manuscript linked below, code and processed data are also available in the public repository for project analyses on Github: https://github.com/NatashaAHardy/pinp_stats/tree/main.
We chose to provide demultiplexed and minimally filtered data, as described in Methods, in order to allow future users to filter and process our sequencing data with the level of scrutiny and stringency they prefer. In the manuscript cited below, Hardy et al. (2024), we suggest software and provide links to code used to process the data from .fastq files through to analyses described in the manuscript.
Associated Primary Research Article
Hardy, N. A., Berry, T. E., Bunce, M., Bott, N., Figueira, W. F., & McIntosh, R. R. (2024) Quantifying wildlife conflicts with metabarcoding and traditional dietary analyses: applied to seabird predation by long-nosed fur seals. Frontiers in Marine Science, 11, 1288769: 10.3389/fmars.2024.1288769
Additional Literature Cited
Cooper A. (1994). “DNA from Museum Specimens,” in Ancient DNA: Recovery and Analysis of Genetic Material from Paleontological, Archaeological, Museum, Medical, and Forensic Specimens. Eds. Herrmann B., Hummel S. (New York, NY, USA: Springer), 149–165. doi: 10.1007/978-1-4612-4318-2_10