The human impact on natural habitats is increasing the complexity of human-wildlife interactions and leading to the emergence of infectious diseases worldwide. Highly successful synanthropic wildlife species, such as rodents, will undoubtedly play an increasingly important role in transmitting zoonotic diseases. We investigated the potential for recent developments in 16S rRNA amplicon sequencing to facilitate the multiplexing of the large numbers of samples needed to improve our understanding of the risk of zoonotic disease transmission posed by urban rodents in West Africa. In addition to listing pathogenic bacteria in wild populations, as in other high-throughput sequencing (HTS) studies, our approach can estimate essential parameters for studies of zoonotic risk, such as prevalence and patterns of coinfection within individual hosts. However, the estimation of these parameters requires cleaning of the raw data to mitigate the biases generated by HTS methods. We present here an extensive review of these biases and of their consequences, and we propose a comprehensive trimming strategy for managing these biases. We demonstrated the application of this strategy using 711 commensal rodents, including 208 Mus musculus domesticus, 189 Rattus rattus, 93 Mastomys natalensis, and 221 Mastomys erythroleucus, collected from 24 villages in Senegal. Seven major genera of pathogenic bacteria were detected in their spleens: Borrelia, Bartonella, Mycoplasma, Ehrlichia, Rickettsia, Streptobacillus, and Orientia. Mycoplasma, Ehrlichia, Rickettsia, Streptobacillus, and Orientia have never before been detected in West African rodents. Bacterial prevalence ranged from 0% to 90% of individuals per site, depending on the bacterial taxon, rodent species, and site considered, and 26% of rodents displayed coinfection. The 16S rRNA amplicon sequencing strategy presented here has the advantage over other molecular surveillance tools of dealing with a large spectrum of bacterial pathogens without requiring assumptions about their presence in the samples. This approach is therefore particularly suitable to continuous pathogen surveillance in the context of disease-monitoring programs.
Informations concerning the samples and positive and negative controls multiplexed in the two MiSeq runs
This XLSX file contains the run names, the sample IDs, the sample types, the PCR IDs, the PCR replicate numbers, the localities, the rodent species and the fastq file names for each of the 1560 PCR products multiplexed in the two Illumina MiSeq runs.
Table_of_the_samples_and_fastq_informaions.xlsx
MiSeq raw sequences (Run 1) of the V4 region 16S rRNA gene from the spleens of 355 murinae from Senegal and 21 positive and negative controls (part 1 to 4)
This ZIP file contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each individual in duplicate or triplicate using the MiSeq platform (Run 1). The 823 multiplexed PCR products were indexed using both forward and reverse indices. The list of the 355 multiplexed samples and the 21 positive and negative controls are provided in the following XLSX file titled: Information concerning the samples multiplexed in the MiSeq Run 1 & 2.
MiSeq_Reads_16Sv4_Murinae_Run1_part1.zip
MiSeq raw sequences (Run 1) of the V4 region 16S rRNA gene from the spleens of 355 murinae from Senegal and 21 positive and negative controls (part 2 to 4)
This ZIP file contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each individual in duplicate or triplicate using the MiSeq platform (Run 1). The 823 multiplexed PCR products were indexed using both forward and reverse indices. The list of the 355 multiplexed samples and the 21 positive and negative controls are provided in the following XLSX file titled: Information concerning the samples multiplexed in the MiSeq Run 1 & 2.
MiSeq_Reads_16Sv4_Murinae_Run1_part2.zip
MiSeq raw sequences (Run 1) of the V4 region 16S rRNA gene from the spleens of 355 murinae from Senegal and 21 positive and negative controls (part 3 to 4)
This ZIP file contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each individual in duplicate or triplicate using the MiSeq platform (Run 1). The 823 multiplexed PCR products were indexed using both forward and reverse indices. The list of the 355 multiplexed samples and the 21 positive and negative controls are provided in the following XLSX file titled: Information concerning the samples multiplexed in the MiSeq Run 1 & 2.
MiSeq_Reads_16Sv4_Murinae_Run1_part3.zip
MiSeq raw sequences (Run 1) of the V4 region 16S rRNA gene from the spleens of 355 murinae from Senegal and 21 positive and negative controls (part 4 to 4)
This ZIP file contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each individual in duplicate or triplicate using the MiSeq platform (Run 1). The 823 multiplexed PCR products were indexed using both forward and reverse indices. The list of the 355 multiplexed samples and the 21 positive and negative controls are provided in the following XLSX file titled: Information concerning the samples multiplexed in the MiSeq Run 1 & 2.
MiSeq_Reads_16Sv4_Murinae_Run1_part4.zip
MiSeq raw sequences (Run 2) of the V4 region 16S rRNA gene from the spleens of 348 murinae from Senegal and 26 positive and negative controls (part 1 to 3)
This ZIP file contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each individual in duplicate using the MiSeq platform (Run 2). The 746 multiplexed PCR products were indexed using both forward and reverse indices. The list of the 355 multiplexed samples and the 26 positive and negative controls are provided in the following XLSX file titled: Information concerning the samples multiplexed in the MiSeq Run 1 & 2.
MiSeq_Reads_16Sv4_Murinae_Run2_part1.zip
MiSeq raw sequences (Run 2) of the V4 region 16S rRNA gene from the spleens of 348 murinae from Senegal and 26 positive and negative controls (part 2 to 3)
This ZIP file contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each individual in duplicate using the MiSeq platform (Run 2). The 746 multiplexed PCR products were indexed using both forward and reverse indices. The list of the 355 multiplexed samples and the 26 positive and negative controls are provided in the following XLSX file titled: Information concerning the samples multiplexed in the MiSeq Run 1 & 2.
MiSeq_Reads_16Sv4_Murinae_Run2_part2.zip
MiSeq raw sequences (Run 2) of the V4 region 16S rRNA gene from the spleens of 348 murinae from Senegal and 26 positive and negative controls (part 3 to 3)
This ZIP file contains the FASTQ files of the paired-end reads (R1: reads 1; R2: reads 2) produced for each individual in duplicate using the MiSeq platform (Run 2). The 746 multiplexed PCR products were indexed using both forward and reverse indices. The list of the 355 multiplexed samples and the 26 positive and negative controls are provided in the following XLSX file titled: Information concerning the samples multiplexed in the MiSeq Run 1 & 2
MiSeq_Reads_16Sv4_Murinae_Run2_part3.zip
Raw output files generated by the mothur program
This ZIP file contains seven output files (.groups, .names, .table, .taxonomy, .list, .fasta and .log) generated by the mothur program for the whole 16S rRNA dataset. The .log file contains all the mothur command lines used for the processing of the raw sequences.
mothur_raw_output_files.zip
OTU table of raw abundance
This XLSX file contains the number of reads obtained for the Operational Taxonomic Units (OTU ≥ 100 reads due to file size limits for Excel) of the two MiSeq runs and each PCR product before the data filtering.
Table_of_abundance.xlsx
Table of presence and absence for the twelve pathogenic OTUs in the 704 analysed rodents
This XLSX file shows the presence (1) or absence (0) for the twelve pathogenic OTUs in the spleen of the 704 analysed rodents after the data filtering.
Table_of_presence_absence.xlsx
Sequence alignment file of the 16S rRNA PCR primers and 41,113 sequences from 79 bacterial zoonotic genera
This FASTA file contains 41,113 sequences of the V4 hypervariable region from 79 bacterial zoonotic genera extracted from the Silva SSU database v119, and the forward and reverse primers used in our 16S rRNA amplicon sequencing experiment.
16Sv4_primers_and_zoonotic_genera.fasta
Perl script for the extraction of the most abundant sequences for each OTU and each sample
This ZIP file contains a script called GOMS: Get OTU Main Sequence. GOMS is a Perl program based on mothur files. It thus requiers several files from mothur (http://www.mothur.org/wiki/454_SOP or http://www.mothur.org/wiki/MiSeq_SOP): .list file, .names file, .groups file and .fasta file. The user must also provide the number of the OTU and the treshold used to form the OTUs. GOMS generates a fasta file with, for each sample, the highly representative sequence of a given OTU, meaning the unique sequence which has the most important number of copies. Some information are also provided within the fasta file like the number of sequences and unique sequences assigned to the OTU and the number of copies of the main unique sequence.
GOMS.zip
Sequence alignment files of the V4 hypervariable region of the 16S rRNA used for the phylogenetic analysis
This ZIP file contains the FASTA files of the most abundant sequences from the 12 pathogenic OTUs (Bartonella, Borrelia, Ehrlichia, Orientia, Rickettsia, Streptobacillus and Mycoplasma_OTU_1 to Mycoplasma_OTU_6) and reference sequences from Silva ribosomal database v119 for the same genera.
Alignment_files_for_phylogenetic_analysis.zip