Skip to main content

Estimation of the relative abundance of species in artificial mixtures of insects using low-coverage shotgun metagenomics

Cite this dataset

Garrido-Sanz, Lidia; Senar, Miquel Àngel; Piñol, Josep (2021). Estimation of the relative abundance of species in artificial mixtures of insects using low-coverage shotgun metagenomics [Dataset]. Dryad.


Amplicon metabarcoding is an established technique to analyse the taxonomic composition of communities of organisms using high-throughput DNA sequencing, but there are doubts about its ability to quantify the relative proportions of the species, as opposed to the species list. Here, we bypass the enrichment step and avoid the PCR-bias, by directly sequencing the extracted DNA using shotgun metagenomics. This approach is common practice in prokaryotes, but not in eukaryotes, because of the low number of sequenced genomes of eukaryotic species. We tested the metagenomics approach using insect species whose genome is already sequenced and assembled to an advanced degree. We shotgun-sequenced, at low-coverage DNA, 18 species of insects in 22 single-species and 6 mixed-species libraries and mapped the reads against 110 reference genomes of insects. We used the single-species libraries to calibrate the process of assignation of reads to species and the libraries created from species mixtures to evaluate the ability of the method to quantify the relative species abundance. Our results showed that the shotgun metagenomic method is easily able to set apart closely-related insect species, like four species of Drosophila included in the artificial libraries. However, to avoid the counting of rare misclassified reads in samples, it was necessary to use a rather stringent detection limit of 0.001, so species with a lower relative abundance are ignored. We also identified that approximately half the raw reads were informative for taxonomic purposes. Finally, using the mixed-species libraries, we showed that it was feasible to quantify with confidence the relative abundance of individual species in the mixtures.


Speciments of 18 different species were captured and stored in 70 % ethanol at 4ºC. We used the DNeasy Blood & Tissue Kit (Qiagen) to extract the DNA from ca. 20 mg of fresh material of each species.

We prepared two kinds of libraries: 22 libraries with DNA of a single-species and 6 libraries with a mixture of DNA of several species at known relative concentrations. All libraries were prepared using the TruSeq DNA PCR-Free LT Kit of Illumina following the manufacturer’s instructions (Ref. 15037063) and sequenced using an Illumina MiSeq with the 2x150 chemistry in three different runs.


Spanish Government, Award: TIN2017‐84553‐C2‐1‐R

Government of Catalonia, Award: AGAUR 2017 SGR 1001