Skip to main content
Dryad logo

Sequencing data and taxonomic assignments from: Biodiversity and vector-borne diseases: host dilution and vector amplification occur simultaneously for Amazonian leishmaniases


Kocher, Arthur (2021), Sequencing data and taxonomic assignments from: Biodiversity and vector-borne diseases: host dilution and vector amplification occur simultaneously for Amazonian leishmaniases, Dryad, Dataset,


This is the sequencing data used in the paper: "Biodiversity and vector-borne diseases: host dilution and vector amplification occur simultaneously for Amazonian leishmaniases" by Kocher et al. The study aims at assessing the effects of biodiversity changes on Leishmania transmission using molecular analyses of sand fly pools and blood-fed dipterans. The data is split in three files corresponding to the PCR amplicons used in the study (Ins16S for insect identifications, 12SV5 for vertebrate identifications and leishmini for Leishmania identifications). Each file combines output from different Illumina Miseq and Hiseq runs, after read demultiplexing and adapter trimming, dereplication and removal of reads present in less than 10 copies (but before further read filtering). The data is presented in tabular format (similar to the output of the obitab command from the obitools package), together with information on the corresponging sample, sequencing run, and taxonomic assignments (performed with ecotag from the obitools).


(See the paper for more details)

Insects were sampled between 2015 and 2017 in French Guiana using Centre for Disease Control light traps. Each morning, the content of each trap was collected. Sand fly females were sorted using a stereo microscope and kept in pools (corresponding to each trap-night) in microcentrifuge tubes with 95% ethanol for later molecular analyses. A maximum of 50 individuals was included in a pool, and several pools were made when more than 50 specimens were caught in a given trap (with a maximum of four pools per trap, i.e. 200 individuals). Visibly blood-fed dipterans, including sand flies, mosquitoes (Culicidae), and biting midges (Ceratopogonidae), were kept individually in microcentrifuge tubes with 95% ethanol for molecular analyses. Additional blood-fed dipterans resting during the day along tree trunks were collected using a Prokopack aspirator (John W. Hock co., Gainesville, FL, USA) and conserved in the same way.

We analyzed sand fly pools to identify their species composition using DNA metabarcoding. Leishmania DNA detection and identification was performed on the same pools using high-throughput sequencing of kDNA minicircle amplicons. For each blood-fed specimen, the dipteran species and blood meal source were also identified molecularly. Sand fly pools were homogenized using a Qiagen TissueLyser 2 (Qiagen, Valencia, CA, USA), and DNA was extracted with the Qiagen DNeasy Blood and Tissue kit. For individual blood-fed specimens, a modified Chelex (Bio-Rad, Hercules, CA, USA) protocol was used for DNA extraction. The Ins16S_1 [F: TRRGACGAGAAGACCCTATA; R: TCTTAATCCAACATCGAGGTC; (Clarke, Soubrier, Weyrich, & Cooper, 2014)], 12S-V5 [F: TAGAACAGGCTCCTCTAG; R: TTAGATACCCCACTATGC; (Riaz et al., 2011)] and leishmini [F: 5′-GGKAGGGGCGTTCTGC-3′; R: 5′-STATWTTACACCAACCCC-3′; Kocher, Valière, Bañuls, & Murienne, 2018] PCR primers were used to amplify short fragments of dipteran, vertebrate and Leishmania DNA respectively. Tags of eight base pairs with at least five differences between them were added at the 5’ end of each primer to enable multiplexing of PCR products for subsequent sequencing. A Latin square design was used for PCR multiplexing to allow for the detection and filtering of mistagged sequencing read. For sand fly metabarcoding, two PCR replicates were performed. PCR products were pooled according to the multiplexing design and used for sequencing library preparation and high-throughput sequencing on Illumina Hiseq or Miseq platforms at the GeT-PlaGe core facilities of Genotoul (Toulouse, France).

Bioinformatic analyses were performed using the OBITools 1.2.9 package and R 4.0.3. Paired-end reads were merged with illuminapairedend and demultiplexed based on PCR primer tags using ngsfilter. Reads were dereplicated using obiuniq and sequences supported by less than ten reads in a given sample were discarded using obigrep. Taxonomic assignments were performed using ecotag, with reference DNA sequence datasets constituted for each studied taxonomic group. For dipteran identifications, we used previously published reference datasets for neotropical sand flies and mosquitoes, to which we added mosquito reference sequences corresponding to the targeted 16S region, which we extracted from NCBI GenBank using ecoPCR. For vertebrate identifications, we used a previously published reference dataset for Amazonian mammals, completed with vertebrate reference sequences corresponding to the targeted 12S region extracted from GenBank. For Leishmania identifications, we use a previously published dataset of kDNA minicircle reference sequences. We considered taxonomic assignments at the genus level at best if the percentage of identity with the closest match was lower than 97%, in order to avoid biases due to reference dataset incompleteness (i.e. artifactual species-level identifications in cases where only one species was represented in the dataset for a given genus). We then performed de novo sequence clustering using sumaclust 1.0.31, with a 97% threshold. We defined molecular taxonomic units (MOTU) based on ecotag results in case of species-level identifications, and based on de novo clustering otherwise, in order to identify putative species within upper-level taxa. Vertebrate identifications were adjusted when only a subset of the matched species was known to be present in French Guiana. Because no reference sequence was available for local biting midge species, we defined MOTUs within the Ceratopogonidae family based on de novo clustering only.