Impact of smoking cessation, coffee and bread consumption on the intestinal microbial composition among Saudis: A cross-sectional study
Harakeh, Steve et al. (2020), Impact of smoking cessation, coffee and bread consumption on the intestinal microbial composition among Saudis: A cross-sectional study, Dryad, Dataset, https://doi.org/10.5061/dryad.qv9s4mwbc
All participants were asked to sign a written informed consent after being informed about the purpose of the study and ensured about confidentiality of the data. They were then requested to fill out a questionnaire covering their socio-demographic information, medical history and lifestyle practices. In addition, a structured food frequency questionnaire (FFQ) was administered to evaluate their dietary practices, and weight and height measurements were taken using standardized techniques. The used questionnaire was previously described partially or fully and used in other manuscripts [12-15]. Weight and height were used to calculate body mass index (BMI = kg m-2) and the WHO criteria  were used to classify participants as underweight, normal, overweight and obese. Weight categories were defined according to BMI as follows: normal 20-25 kg m−2, underweight 18-20 kg m−2, overweight 25-30 kg m−2, and obese >30 kg m−2. Stool samples were collected in aseptic conditions with clean, dry screw-top containers and immediately stored at -20 °C.
Extraction of DNA from stool samples and 16S rRNA sequencing using MiSeq technology
All participants' stool samples were extracted using a deglycosylation protocol as follows: 250 µL of each sample was placed in a 2 mL tube containing a mixture of acid-washed glass beads (Sigma, Aldrich) and with two or three 0.5 mm glass beads. Mechanical lysis was performed by bead-beating the mixture using a Fast Prep BIO 101 apparatus (Qbiogene, Strasbourg, France) at maximum speed (6.5) for 3×30 seconds. The supernatant was centrifuged at 12,000 rpm for 10 min and the pellet retained. A mixture containing 2 µL of 10×glycoprotein denaturing buffer EndoHf (New England Biolabs) and 17 µL of H2O was added and heated at 100 °C for 10 minutes. Deglycosylation was performed adding a mixture of 2 µL of 10×G5 reaction buffer (ref B1702 New England Biolabs), 2 µL of EndoHf (New England Biolabs), 2 µL of cellulase (Sigma) and 16 µL of H2O. The preparation was then incubated overnight at 37 °C. Finally, DNA was extracted using the NucleoSpin® Tissue Mini Kit (Macherey Nagel, Hoerdt, France) according to a previously described protocol . The quantity, purity, integrity and size of DNA and its amenability to PCR amplification were assessed. The concentration of each DNA extraction was measured by a Qubit assay with the high sensitivity kit (Life technologies, Carlsbad, CA, USA) according to the Nextera XT DNA sample prep kit (Illumina) and diluted to 1 ng aliquots of each metagenome for paired end sequencing analysis. DNA extracts were dispensed into 10- to 20-μL single-use aliquots and frozen at -20 °C to avoid repeat freeze-thaw cycles prior to downstream analyses. Samples were then sequenced targeting the V3–V4 regions of the 16S rRNA gene using MiSeq technology as previously described [18, 19].
Data processing: Filtering the reads, dereplication and clustering
Paired end fastq files were assembled using FLASH . A total of 7518258 joined reads were filtered and then analyzed in QIIME by choosing chimera slayer for removing chimera and Uclust [16, 20] for Operational Taxonomic Units (OTU) extraction as described previously [18, 19]. All reads were clustered with a threshold of 97% identity to obtain OTU. Extracted OTUs were blasted against SILVA123 SSU database  of release and taxonomy were assigned to a species if they matched one with at least 97% identity, as previously described [22, 23]. Briefly, for each OTU, representative sequences were extracted and were searched against the reference database. For each unique representative sequence, we extracted the best matches from the reference database and sorted them by decreasing percentage of similarity rounded to the nearest integer. We used the reference sequences with >97% similarity (or the highest available) for taxonomic assignments into species. When multiple matches with the same percentage of similarity were present, the taxonomy of each rank was obtained by consensus [16, 24]. OTU not assigned to any species were considered "unidentified". As several OTUs matched identical species, the total number of identified species and the number of unidentified OTU was expected to be smaller than the total number of OTUs.
King Abdulaziz City for Science and Technology, Award: AR-34-191