This README_file.txt file was generated on 2022-04-06 by Md. Mizanur Rahman GENERAL INFORMATION 1. Title of Dataset: Data from: DNA-based assessment of environmental degradation in an unknown fauna: the freshwater macroinvertebrates of the Indo-Burmese hotspot. 2. Author Information Corresponding Investigator Name: Md. Mizanur Rahman Institution:Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, SL5 7PY, UK. Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK. Department of Zoology, University of Dhaka, Dhaka-1000, Bangladesh e-mail: m.rahman15@imperial.ac.uk; mizan.rahmanzool@du.ac.bd Co-investigator 1 Name: Dr Alfred Burian Institution: Marine Ecology Department, Lurio University, Nampula, Mozambique. Department of Computational Landscape Ecology, UFZ–Helmholtz Centre for Environmental Research, Leipzig, Germany. Co-investigator 2 Name: Dr Thomas J. Creedy Institution: Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK Co-investigator 3 Name: Dr Alfried P. Vogler Institution: Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, SL5 7PY, UK. Department of Life Sciences, Natural History Museum, London, SW7 5BD, UK. 3. Date of data collection: 2017-2019 4. Geographic location of data collection: Bandarban, Bangladesh 5. Funding sources that supported the collection of the data: Commonwealth Scholarship Commission 6. Recommended citation for this dataset: Rahman et al. (2022), Data from: DNA-based assessment of environmental degradation in an unknown fauna: the freshwater macroinvertebrates of the Indo-Burmese hotspot, Dryad, Dataset DATA & FILE OVERVIEW 1. Description of dataset These data were generated by DNA based methods to investigate the diversity of macroinvertebartes and their responses to diffrent man-made stressors in 16 streams of the Indo-Burmese hotspot. Macroinvertebrate samples (n=80) were collected from relatively pristine upland streams located in Bandarban, south-eastern Bangladesh. Anthropogenic impacts at each site were quantified based on the assessment of environmental components derived from 14 binomial variables. Diversity of macroinvertebrats, responses of diffrent macroinvertebrate groups to anthropogenic impacts and potential indicator species were assesssed using DNA metabrcoding technique. DNA was extracted from each bulk macroinvertebrate sample and then PCR amplification was done targeting a 418 bp region of the Cytochrome Oxidase subunit I (COI) gene with invertebrate-specific primer. Amplicons were sequenced on an HTS platform (Illumina MiSeq) and subsequently bioinformatic processing generated species-level clusters (Operational Taxonomic Units, OTUs) across sites of different impacts. 2. File List: File 1 Name: Rahman_2022_a_All_OTUs_sequences.fasta File 1 Description: All OTUs including ID numbers and their respective fasta sequences File 2 Name: Rahman_2022_b_OTUs_sequences_target_taxa.fasta File 2 Description: OTUs of targeted taxa including ID numbers and their respective fasta sequences File 3 Name: Rahman_2022_c_OTU_Reads_Table.csv File 3 Description: Read abundance of all OTUs of small individual (bulk) and larger individual (tissue) samples across different sites of the investigated streams. File 4 Name: Rahman_2022_d_Taxonomic_assignment_Final_OTUs.csv File 4 Description: Taxonomic assignment of finally retained OTUs File 5 Name: Rahman_2022_e_Indicator_OTUs_sequences.fasta File 5 Description: Fasta sequences of potential indicator OTUs File 6 Name: Rahman_2022_f_metadata.csv File 6: Description: Data of 14 binomial environmental variables across the sampling sites of all investigated streams METHODOLOGICAL INFORMATION Each of the 80 samples was split into three technical sub-samples (two sub-samples for small and one sub-sample for large individuals) for DNA extraction, PCR amplification and sequencing. DNA was extracted from small and large individuals using the DNeasy Power Soil Kit and the DNeasy 96 Blood and Tissue Kit, respectively. Metabarcoding of samples followed a standard protocol, targeting a 418 bp region of the Cytochrome Oxidase subunit I (COI). PCR on each extraction was done in triplicate and pooled, prior to clean-up with AMPure XP paramagnetic beads and library construction. Samples were indexed with Nextera XT tags during a secondary PCR for library preparation, and amplicons were sequenced on an Illumina MiSeq (2x300 bp paired-end) aiming for 65000 and 30000 reads per sample of small and large individuals, respectively. Bioinformatic processing followed an established pipeline for quality filtering, merging and clustering of sequence reads. OTU clustering was performed using USEARCH v11.0 and the most representative sequences were identified against the NCBI nr database for taxonomic assignment with the lowest common ancestor (LCA) method in MEGAN. OTUs only assigned to Insecta (Coleoptera, Diptera, Ephemeroptera, Hemiptera, Odonata, Plecoptera, and Trichoptera), Decapoda and Mollusca were retained for final analysis. Phylogenetic trees required for calculating phylogenetic diversity were constructed under maximum likelihood (ML) with RAxML. A joint rarefaction-extrapolation approach was implemented the iNEXT package in R to calculate richness for each of the samples. Biodiversity indices were calculated for each local site and correlated with environmental intactness measures. The impact of overall environmental intactness on biodiversity measures of OTU richness, evenness and PD was assessed in regression analyses using a full model building approach. We also screened our community data for OTUs sensitive to anthropogenic influence using an indicator species analysis. DATA-SPECIFIC INFORMATION FOR: Rahman_2022_a_All_OTUs_sequences.fasta 1. Number of OTU sequences: 3439 2. Number of OTU IDs: 3439 DATA-SPECIFIC INFORMATION FOR: Rahman_2022_b_OTUs_sequences_target_taxa.fasta 1. Number of OTU sequences: 936 2. Number of OTU IDs: 936 DATA-SPECIFIC INFORMATION FOR: Rahman_2022_c_OTU_Reads_Table.csv 1. Number of variables: 3443 2. Number of cases/rows: 225 3. Variable List: Sample_ID: Name of each samples across sampling sites Sample: Sample types; bulk (smaller individual samples)/tissue (larger individual samples) River: Name of 16 investigated streams Site: IDs of 5 sites (1, 2, 3, 4, 5) in each stream otu1-otu3439: OTUs IDs with their respective read abundace for both sample types across the sampling sites of all streams 4. Missing data codes: None 5. Abbreviations used: otu; operational taxonomic unit 6. Other relevant information: For some streams (Betchhora, Sangukhiang Chhora and Cheihkhiang Chhora), data for two technical replicates (of bulk sample) are combinedly mentioned in the table. DATA-SPECIFIC INFORMATION FOR: Rahman_2022_d_Taxonomic_assignment_Final_OTUs.csv 1. Number of variables: 02 2. Number of cases/rows: 936 3. Variable List: Megan_otu: IDs of OTUs finally retained under targeted taxa Taxa: Name of taxa under which OTUs were assigned DATA-SPECIFIC INFORMATION FOR: Rahman_2022_e_Indicator_OTUs_sequences.fasta 1. Number of OTU sequences: 26 2. Number of OTU IDs: 26 DATA-SPECIFIC INFORMATION FOR: Rahman_2022_f_metadata.csv 1. Number of variables: 18 2. Number of cases/rows: 80 3. Variable List: Sample_IDs: Name of each samples across sampling sites River: Name of 16 investigated streams Replicate: These were basically the the IDs of 5 sites named as Rep1, Rep2, Rep3, Rep4 and Rep5 Rivercode: These abbreviated letter codes were used for sixteen streams Substrate composition appropriate: 1 for 'yes' and 0 for 'no' No sand , gravel or stone excavation: 1 for 'yes' and 0 for 'no' Natural channel structure:1 for 'yes' and 0 for 'no' No damming or diversion: 1 for 'yes' and 0 for 'no' No significant water extraction:1 for 'yes' and 0 for 'no' No dumping of house hold wastes:1 for 'yes' and 0 for 'no' Minimal washing andbathing:1 for 'yes' and 0 for 'no' Minimum Run-off from cropland: 1 for 'yes' and 0 for 'no' Natural color and odor : 1 for 'yes' and 0 for 'no' Minimal Tourist pressure: 1 for 'yes' and 0 for 'no' Minimal Fishing Pressure: 1 for 'yes' and 0 for 'no' Presence of adjacent natural vegetation: 1 for 'yes' and 0 for 'no' Representative diversity of wild animals: 1 for 'yes' and 0 for 'no' No significant intervention by exotic plant or animal: 1 for 'yes' and 0 for 'no' 4. Missing data codes: None