Dataset for distinct microbiomes of the scleractinian coral Favia fragum in mangrove and adjacent reef habitats in the Panamanian Caribbean
Data files
Jan 26, 2026 version files 473.61 KB
-
Favia_Demultiplexing_and_trimming.txt
3.73 KB
-
favia_metadata_2.csv
2.68 KB
-
Favia_workflow_dada2_to_phyloseq.R
5.27 KB
-
Favia_workflow_phyloseq_analysis_visualization.R
5.64 KB
-
Favia.ps.RDS
449.62 KB
-
README.md
6.67 KB
Abstract
This dataset supports a manuscript examining microbial communities associated with the scleractinian coral Favia fragum across mangrove and adjacent reef habitats in the Panamanian Caribbean. The dataset includes the complete analytical workflow and scripts used to process, analyze, and visualize 16S rDNA amplicon sequencing data from coral-associated microbiomes. Demultiplexed amplicon libraries were analyzed in RStudio using the DADA2 pipeline, followed by community-level statistical analyses and visualizations. A finalized phyloseq object is provided, containing all sequences, taxonomy, and metadata necessary to reproduce statistical tests and graphical outputs. All scripts required to replicate the analyses are included, along with PDF versions of the R code. Trimmed and demultiplexed sequencing reads are publicly available through the NCBI Sequence Read Archive under BioProject PRJNA1023296.
This material is based upon work supported by the National Science Foundation under Grant Nos. EF-2025121, EF-2025067, EF-2025009, and EF-2150107. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Description of the data and file structure
- Favia_Demultiplexing_and_trimming.txt: A generic and customizable workflow (text script) for demultiplexing and trimming raw Illumina MiSeq sequencing files. This script is provided for users to apply to their own raw sequencing data and to understand the initial processing steps that lead to demultiplexed reads.
- Favia_workflow_dada2_to_phyloseq.R: An R script to implement the DADA2 pipeline for amplicon analysis, generating a phyloseq object from demultiplexed reads.
- Favia.ps.RDS: A pre-computed phyloseq object in R's RDS format, representing the final processed amplicon dataset. The R Package 'phyloseq' will be required to download and utilize this object.
- Favia_workflow_phyloseq_analysis_visualization.R: An R script for analyzing community data, performing statistical tests, and generating visualizations from a phyloseq object.
- Favia.ps.RDS: precomputed phyloseq object from the outputs of DADA2 and the metadata file "Favia_metadata_2.csv."
- favia_metadata_2.csv: metadata file to incorporate into phyloseq object.
- sample
Unique identifier for each measured sample or site (e.g., CC01, CK03). - site
General location where the sample was collected (e.g., Coral Key, STRI). - habitat
Type of environment where the sample occurs, such as a reef or mangrove. - length
Maximum length of the sampled feature or structure (typically in centimeters). - width
Maximum width of the sampled feature (typically in centimeters). - height
Vertical dimension or thickness of the sampled feature (typically in centimeters). - surf.area (surface area)
Total surface area of the sampled feature, usually calculated from length, width, and height (typically in square centimeters). - lat (latitude)
Geographic latitude of the sampling location, expressed in decimal degrees. - long (longitude)
Geographic longitude of the sampling location, expressed in decimal degrees.
Getting Started
Demultiplexing and Trimming Your Own Raw Data (Informational Only for this Dataset)
While the raw sequencing files for this specific study are not included here, we provide the Favia_demultiplex_trim_workflow.txt script as a generic guide for demultiplexing and trimming Illumina MiSeq raw data. This workflow outlines the steps that produce demultiplexed Read 1 and Read 2 files per sample, consistent with the format of the files available on SRA for this study. You can use this script to process your own raw sequencing data or to understand the initial steps taken in similar amplicon sequencing pipelines.
- Download Raw Data: Obtain your raw Illumina MiSeq sequencing files (e.g., FASTQ files).
- Demultiplexing and Trimming Workflow:
- Open and customize the Favia_demultiplex_trim_workflow.txt script.
- This workflow requires the Perl script "MergeMeCheck3.pl". This script is available on our GitHub website: https://marineinvert.github.io/microbiome/resources.html. Please download MergeMeCheck3.pl and place it in an accessible directory.
- Execute the steps outlined in Favia_demultiplex_trim_workflow.txt using your preferred command-line environment.
- Outputs: The output of this workflow will be demultiplexed Read 1 and Read 2 FASTQ files for each sample, along with statistics from the demultiplexing and trimming steps.
Users have two primary options for working with this dataset:
- Start from demultiplexed files (downloaded from SRA): Use the demultiplexed Read 1 and Read 2 files from SRA as the input for the DADA2 pipeline.
- Start from the supplied phyloseq object: Directly use the pre-computed Favia.ps.RDS phyloseq object for downstream community analyses.
Amplicon Analysis with DADA2 (from Demultiplexed Files)
The demultiplexed Read 1 and Read 2 files for this study are available in the Sequence Read Archive (SRA) under Submission #SUB13878240 for BioProject #PRJNA1023296. If you have downloaded these files, you can proceed with ASV inference using DADA2:
- Download Demultiplexed Files: Obtain the demultiplexed Read 1 and Read 2 FASTQ files for each sample from SRA (BioProject #PRJNA1023296, Submission #SUB13878240).
- Download and install R and RStudio: Ensure you have a recent version of R and RStudio installed.
- Install Required R Packages: Open RStudio and install the necessary packages, particularly dada2 and phyloseq.
- Run Favia_workflow_dada2_to_phyloseq.R:
- Open the Favia_workflow_dada2_to_phyloseq.R script in RStudio.
- Download the Favia_metadata_2.csv table containing the metadata for each sample and save it into an accessible directory.
- Modify the file paths within the script to point to your downloaded demultiplexed Read 1 and Read 2 FASTQ files.
- Run the script. This script implements the DADA2 pipeline, including quality filtering, dereplication, ASV inference, chimera removal, and taxonomic assignment.
- Outputs of DADA2: The script will generate a phyloseq object. This object encapsulates:
- ASV Table: A matrix of amplicon sequence variants (ASVs) by samples, containing the abundance of each ASV in each sample.
- Taxonomy Table: A table assigning taxonomic classifications (e.g., Kingdom, Phylum, Class, Order, Family, Genus, Species) to each ASV.
- Sample Data: A data frame containing metadata for each sample (e.g., experimental conditions, sampling location, etc.).
- Phylogenetic Tree (Optional): Depending on the specific DADA2 pipeline implementation, a phylogenetic tree relating the ASVs may also be included.
Community Analysis and Visualization (from Phyloseq Object)
You can either use the phyloseq object generated from the previous step or directly use the supplied Favia.ps.RDS file to perform downstream analyses:
- Load Phyloseq Object:
- If you generated your own phyloseq object, it will be in your R environment.
- If using the supplied object, load it into R using readRDS("Favia.ps.RDS").
- Run Favia_workflow_phyloseq_analysis_visualization.R:
- Open the Favia_workflow_phyloseq_analysis_visualization.R script in RStudio.
- Run the script to reproduce the analyses and visualizations.
