Skip to main content
Dryad

Evaluating the genetic variation of the COI gene of Insecta: Implications for DNA barcoding, metabarcoding and species delimitation studies

Data files

Nov 05, 2020 version files 42.14 MB
Aug 31, 2021 version files 49.87 MB
Sep 14, 2022 version files 49.87 MB
Dec 16, 2022 version files 75.55 MB

Abstract

The genetic variation of the COI gene has a great effect on the final results of the species delimitation studies. However, little research has comprehensively investigated the genetic divergence in COI among Insecta. The fast-growing COI data in BOLD provide an opportunity for comprehensively appraising the genetic variation in COI among Insecta. We calculated the K2P distance of 64,414 insect species downloaded from BOLD. The match ratios of the clustering analysis based on different thresholds were compared among 4,288 genera (35,068 species). Besides, we also compared the match ratios obtained from two species delimitation methods: the clustering analysis (distance-based method) and the bPTP analysis (tree-based method). Furthermore, the effectiveness of two different results of the bPTP analysis: bPTP_h and bPTP_ml was also tested. Approximately one-quarter of the species of Insecta showed high intraspecific genetic variation (> 3%), and a conservative estimate of this value is 12.05-22.58%. The application of empirical thresholds (e.g., 2% and 3%) in the clustering analysis may result in the overestimation of species diversity. In metabarcoding studies, a threshold of 3% can only be used to estimate the insect diversity roughly. As for the clustering analysis, the "threshOpt" or "localMinima" algorithms can provide a priori value for the researcher. Nevertheless, if the minimum interspecific genetic distance of congeneric species was greater than or equal to 2%, it is possible to avoid overestimating the species diversity based on the empirical thresholds. Besides, the match ratios of the bPTP_ml results were higher than those of the bPTP_h results. As for the bPTP analysis, the bPTP_ml results were recommended. If a proper threshold was selected, the clustering analysis may outperform the bPTP analysis.