BAGS: an automated barcode, audit & grade system for DNA barcode reference libraries
Data files
Sep 01, 2020 version files 42.08 MB
Abstract
Biodiversity studies greatly benefit from molecular tools, such as DNA metabarcoding, which provides an effective identification tool in biomonitoring and conservation programmes. The accuracy of species-level assignment, and consequent taxonomic coverage, relies on comprehensive DNA barcode reference libraries. The role of these libraries is to support species identification, but accidental errors in the generation of the barcodes may compromise their accuracy. Here we present an R-based application, BAGS (Barcode, Audit & Grade System; https://github.com/tadeu95/BAGS), that performs automated auditing and annotation of cytochrome c oxidase subunit I (COI) sequences libraries, for a given taxonomic group of animals, available in the Barcode of Life Data System (BOLD). This is followed by implementing a qualitative ranking system that assigns one of five grades (A to E) to each species in the reference library, according to the attributes of the data and congruency of species names with sequences clustered in Barcode Index Numbers (BINs). Our goal is to allow researchers to obtain the most useful and reliable data, highlighting and segregating records according to their congruency. Different tests were performed to perceive its usefulness and limitations. BAGS fulfils a significant gap in the current landscape of DNA barcoding research tools by quickly screening reference libraries to gauge the congruence status of data and facilitate the triage of ambiguous data for posterior review. Thereby, BAGS has the potential to become a valuable addition in forthcoming DNA metabarcoding studies, in the long term contributing to globally improve the quality and reliability of the public reference libraries.
Methods
BAGS was totally developed in the R language. Shinnyapps was used to deploy the application in order to be run remotely.
The R script here presented contains the BAGS version used during the performance assessments. Any development of BAGS and further information can be found at:
https://github.com/tadeu95/BAGS
To use BAGS remotely, please use:
Usage notes
File list (Files within Appendixes_Fontes_etal_2020.rar)
fontes_et_al_BAGS.R
fontes_et_al_ambiguous_expressions_appendix1.txt
fontes_et_al_reference_libraries_palaemonidae_appendix2.xlsx
fontes_et_al_reference_libraries_gradeassessment_appendix3.xlsx
fontes_et_al_manual_curation_appendix4.xlsx
fontes_et_al_grade_c_nj_trees_appendix5.rar
File descriptions
fontes_et_al_BAGS.R – R script of the BAGS application which can be used to launch BAGS locally within your R environment (e.g. using the R gui or R studio).
fontes_et_al_ambiguous_expressions_appendix1.txt – List of ambiguous expressions removed from the species name in the BAGS pipeline. The records with species names containing these expressions are not filtered out, although the expression is removed from the name. IUPAC nucleotides used to count the number of bases in each sequence are also shown.
fontes_et_al_reference_libraries_palaemonidae_appendix2.xlsx – Reference libraries (all taxa, marine and non-marine) generated with BAGS for the Palaemonidae family in order to test the marine / non-marine filter.
fontes_et_al_reference_libraries_gradeassessment_appendix3.xlsx – Reference libraries generated with BAGS used for the grade assignment performance test.
fontes_et_al_manual_curation_appendix4.xlsx – List of species manually curated for the grading assignment.
fontes_et_al_grade_c_nj_trees_appendix5.rar – Neighbour-joining trees created within the BOLD platform which were used in order to categorize the grade C species into monophyletic and non-monophyletic.
Notes
BAGS manuscript:
João T Fontes, Pedro E Vieira, Torbjørn Ekrem, Pedro Soares, Filipe O Costa. BAGS: an automated Barcode, Audit & Grade System for DNA barcode reference libraries. Molecular Ecology Resources. 2020.
A Pre-print version can be found in Authorea:
https://www.authorea.com/users/317355/articles/457103-bags-an-automated-barcode-audit-grade-system-for-dna-barcode-reference-libraries
Comments and requests should be addressed to Filipe Costa (fcosta@bio.uminho.pt). For specific questions regarding the R script, please contact João Fontes (jtadeusfontes@gmail.com) or Pedro Vieira (pedroefrvieira@gmail.com).