Skip to main content

Data from: DNA metabarcoding for biodiversity monitoring in a national park: screening for invasive and pest species

Cite this dataset

Hardulak, Laura et al. (2020). Data from: DNA metabarcoding for biodiversity monitoring in a national park: screening for invasive and pest species [Dataset]. Dryad.


  1. DNA metabarcoding was utilized for a large-scale, multi-year assessment of biodiversity in Malaise trap collections from the Bavarian Forest National Park (Germany, Bavaria). 
  2. Principal Component Analysis of read count-based biodiversities revealed clustering in concordance with whether collection sites were located inside or outside of the National Park.
  3. Jaccard distance matrices of the presences of BINs at collection sites in the two survey years (2016 and 2018) were significantly correlated.
  4. Overall similar patterns in the presence of total arthropod BINs, as well as BINs belonging to four major arthropod orders across the study area, were observed in both survey years, and are also comparable with results of a previous study based on DNA barcoding of Sanger-sequenced specimens.
  5. A custom reference sequence library was assembled from publicly available data to screen for pest or invasive arthropods among the specimens or from the preservative ethanol.
  6. A single 98.6% match to the invasive bark beetle Ips duplicatus was detected in an ethanol sample. This species has not previously been detected in the National Park.


Pre-processing and clustering of sequence data

All FASTQ files generated were combined although they were sequenced on separate runs throughout the study period. Sequence processing was performed with the VSEARCH v2.4.3 suite (Rognes et al., 2016) and cutadapt v1.14 (Martin, 2011). Because some runs did not yield reverse reads of sufficiently high quality to enable paired-end merging, only forward reads were utilized. Forward primers were removed with cutadapt. Quality filtering was with the fastq_filter program of VSEARCH, fastq_maxee 2, minimum length of 100 bp. Sequences were dereplicated with derep_fulllength, first at the sample level, and then concatenated into one fasta file, which was then dereplicated. Chimeric sequences were removed from the fasta file using uchime_denovo. Remaining sequences were clustered into OTUs at 97% identity with cluster_size, and an OTU table was created with usearch_global. To reduce likely false positives, a cleaning step was employed which excluded read counts in the OTU table that represented less than 0.01% of the total read count for their respective sample (see Elbrecht and Steinke, 2019).


Construction of reference databases and sequence identification

BIN-based reference library

All arthropod sequences on BOLD were downloaded (FASTA format, including private and public data) to create a general reference database containing hierarchical taxonomic information and BINs. To create this database, downloaded fasta files were concatenated and imported into Geneious (v 10 Biomatters, Auckland - New Zealand) (Kearse et al., 2012). To aid the monitoring of species of interest, a broad list of potentially relevant arthropod species was compiled from the following literature sources: Index of Economically Important Lepidoptera (Zhang, 1994), and Die Forstschädlinge Europas (“The Forest Pests of Europe”) (Pschorn-Walcher and Schwenke, 1982). Of the Index of Economically Important Lepidoptera, 2,684 species names were found on BOLD. Of the Forest Pests of Europe, 294 species names were found on BOLD. About two thirds of these species (1,962/2978) of these species had BINs. OTUs were BLASTed (Megablast, default parameters) against the downloaded database. The result was joined to the OTU table in LibreOffice, where the spreadsheet of pest names and BINs was used to cross-check with the BLAST results. All of these BINs and species names available on BOLD were added to a publicly available dataset named “Dataset - DS-BWPST Database of Pest Species of Insects in Germany” (Dataset DOI -

Pest and invasive species custom reference libraries

Reference sequences for species from the following sources were compiled into a list of 1,017 names: Nature protection warning list of the German Federal Office for Nature Conservation in Bonn (“Erstellung einer Warnliste in Deutschland noch nicht vorkommender invasiver Tiere und Pflanzen”) (Rabitsch et al., 2013), terrestrial arthropods only; "Die invasiven gebietsfremden Arten der Unionsliste der Verordnung (EU) Nr.1143/2014 -Erste Fortschreibung 2017" (Nehring and Skowronek); The International Union for Conservation of Nature’s Red List of Threatened Species (IUCN, 2019), accessed online,, filter criteria of phylum = Arthropoda, land regions = Europe, Geographical scale = global, Red List Category = Critically Endangered, Endangered, Extinct in the wild, Lower risk/Conservation dependent, near threatened, or vulnerable; the European Plant Protection Global Database (, filter criteria of "Germany"; as well as the following 28 widely known invasive species (with one synonym), if not already listed: Periplaneta americana (Linnaeus, 1758), Harmonia axyridis (Pallas, 1773), Stictocephala bisonia (Kopp and Yonke, 1977), Anoplophora chinensis (Forster, 1771), Corythucha ciliata (Say, 1832), Rhagoletis completa (Cresson, 1929), Sceliphron curvatum (Smith, 1870), Leptinotarsa decemlineata (Say 1824), Reticulitermes flavipes (Kollar, 1837), Anoplophora glabripennis (Motschulsky, 1853), Hulecoeteomyia japonica (Theobald, 1901), Aedes japonicus (Theobald, 1901), Aedes koreicus (Edwards, 1917), Dryocosmus kuriphilus (Yasumatsu, 1951), Aproceros leucopoda (Takeuchi, 1939), Cacyreus marshalli (Butler, 1898), Dreyfusia nordmannianae (Eckstein, 1890), Frankliniella occidentalis (Pergande, 1895), Leptoglossus occidentalis (Heidemann, 1910), Cameraria ohridella (Deschka and Dimic, 1986), Cydalima perspectalis (Walker, 1859), Monomorium pharaonis (Linnaeus, 1758), Hypoponera punctatissima (Roger, 1859), Phyllonorycter robiniella (Clemens, 1859), Drosophila suzukii (Matsumura, 1931), Trialeurodes vaporariorum (Westwood, 1856), Diabrotica virgifera (J.L. LeConte, 1868), Viteus vitifoliae (Fitch, 1855), Ectobius vittiventris (Costa, 1847).

Sequences were downloaded using the R (R Core Team, 2019) package BOLD (Chamberlain, 2018). Of the 1,004 total species names, 361 were found in BOLD. These were exported as a tab-separated file and processed into fasta format with Linux command lines. The remaining species were searched for on NCBI GenBank (advanced search, criteria including (“COI” OR “CO1” OR “COXI” OR “COX1”)). 41 of the species names were found and downloaded as fasta files. To combine the sequences from both sources into a single database and BLAST, we used BOLD_NCBI_Merger (Macher et al., 2017). The highest scoring pair of the top hit (NCBI BLAST+, outfmt 6) for each OTU was imported into LibreOffice, joined with the OTU table, and filtered. A taxonomic neighbor-joining tree was constructed using the BOLD website. All arthropod species and corresponding BINs on the list that were available on BOLD were added to a publicly available data set named “Dataset - DS-BFNWARN Bundesamt für Naturschutz Warnliste, Arthropoden” (Dataset DOI -

Biodiversity analysis

As DNA metabarcoding is not quantitative (Piñol at al., 2019; Krehenwinkel et al., 2017) we utilize presence-absence data of BINs recovered at >= 97% identity over geographical areas represented by malaise trap locations to calculate many of the biodiversity metrics. The OTU table indicates which BINs (or higher corresponding taxa) were detected in each collection event. To calculate detection frequencies, all counts in the table greater than zero were set to one. In this way, row sums across the table indicate the number of samples from which a particular taxon was recovered, while column sums indicate the total numbers of taxa recovered from a sample. Presence-absence data for the homogenized samples for all traps from 2016 and 2018 were also analyzed together with a dataset from the Global Malaise Trap Program (GMTP) downloaded from BOLD, project "GMTPE Germany Malaise 2012" (see Geiger et al., 2016a). Frequencies of BIN detection throughout the growing seasons could then be compared for each of the three years. Bar, line, and area charts were created with ggplot2 (Wickham, 2016) or base R.

The presence of BINs in the 2016 and 2018 samples was used to calculate Jaccard distances and dissimilarity matrices for traps inside and outside the National Park, with R packages vegan (Dixon, 2003) and betapart (Baselga and Orme, 2012). A Mantel test was performed to compare the study years in terms of their dissimilarities among trap sites, utilizing R packages geosphere (Hijmans et al., 2017) and ade4 (Dray and Dufour, 2007). ANOSIM tests to compare BIN compositions of trap sites inside and outside of the park were performed with the anosim function of vegan: Community Ecology Package (Oksanen et al., 2010). Additionally, Principal Component Analyses for the 2016 and 2018 taxonomic composition data for each trap site were performed based on 7-level taxonomic identifications of OTUs and their read counts, with the R package ampvis2 (Andersen et al., 2018), amp_ordinate function, Hellinger transform.


Bavarian State Ministry for Science and Art, Award: Barcoding Fauna Bavarica, BFB

Federal Ministry of Education and Research, Award: German Barcode of Life: BMBF FKZ 01LI1101 and 01LI1501