Skip to main content

A specimen-level dataset of functional traits and DNA barcodes for Chinese bees

Cite this dataset

Xie, Tingting et al. (2023). A specimen-level dataset of functional traits and DNA barcodes for Chinese bees [Dataset]. Dryad.


The full potential for using DNA barcodes for profiling functional trait diversity has yet to be determined in plants and animals, thus we outline a general framework for quantifying functional trait diversity of insect community DNA and propose and assess the accuracy of three methods for achieving this. 

We built a novel dataset of traits and DNA barcodes for wild bees in China. An informatics framework was developed for phylogeny-based integration of these data and prediction of traits for any subject barcodes, and compared to two Blast-based methods of trait assignment. 

Under the specimen-level dataset, rate and accuracy of trait assigned dropped with an increase in distance between the query and its nearest reference member, though phylogenetic assignment was found to perform best under several criteria. For a wider range of compiled traits, conservative life-history traits showing the highest rates of assignment, for example sociality was predicted with confidence at 53%, parasitism at 44%, and nest location at 33%. 

Bayesian phylogenetic assignment of traits showed particular advantage over distance-based methods in the much lower rate that it would return incorrect state predictions when the query lacked relatives in the reference data. Automated, taxon-independent trait assignment might be applied at scale to either barcodes or metabarcodes in any invertebrates. With the further generation of integrated barcode and trait data, the assignment accuracy of the method is likely to increase. 


For the newly created specimen-level database, morphological traits were measured using either a Motic SMZ-161-BLED stereomicroscope with an eyepiece ruler or a Zeiss Discovery V20 stereomicroscope. If the length exceeded microscope range, we used a digital caliper to perform measurements. Body length was measured as the distance between the antennal socket and metasomal apex in lateral view; head width was taken as the widest point between the outer margins of compound eyes in frontal view; the tongue length was the distance between the prementum base and glossa tip; hind leg length was measured as the sum of the lengths of coxa, trochanter, femur, tibia and tarsus (including distitarsus); ITD was measured as the distance between the nearest inner edges of the tegulae in dorsal view; the hairs on the mesonotum were used as a proxy for hair length; forewing length was recoded as the distance between wing base and anal angle.

In addition to the specimen-level dataset, we collated data for a set of functional life history traits.

Usage notes



National Natural Science Foundation of China, Award: 31772495

CAS President’s International Fellowship Initiative, Award: 2020FSB0001

Strategic Priority Research Program of the Chinese Academy of Science, Award: XDB310304

National Natural Science Foundation of China, Award: 31625024

Program of Ministry of Science and Technology of China, Award: 2018FY100405