Regional transcriptomic divergence reveals thermal adaptation mechanisms in the giant kelp Macrocystis pyrifera
Data files
May 28, 2026 version files 35.78 MB
-
matriz_Mp.txt
3.86 MB
-
README.md
9.82 KB
-
transcriptoma_Mp.fa
31.91 MB
Abstract
This dataset contains a filtered set of Macrocystis pyrifera transcript sequences and raw expression counts derived from RNA-seq analysis of juvenile sporophytes exposed to contrasting thermal conditions. The data support expression-based analyses of thermal adaptation signatures across three geographically distinct Chilean populations (Atacama, Southern Chile, and Magallanes) spanning a temperature gradient from 16 °C to 8 °C. The transcriptome comprises 39,928 transcript sequences that passed quality thresholds after mapping to a reference assembly of 63,219 transcripts from M. pyrifera gametophyte and sporophyte life stages. These represents 67.8 % of transcripts with functional annotation (27,094 transcripts) and 38.9 % with Gene Ontology (GO) terms associated with biological processes. Raw expression count matrices are provided for the 32 sequenced samples, organized by region of origin and temperature treatment (8 °C, 12 °C, or 16 °C). Importantly, this dataset does not represent a canonical reference genome nor the complete raw assembly; instead, it includes the subset of 39,928 transcripts that passed strict quality control, contaminant filtering, and expression thresholds across all experimental treatments. The transcriptomic data derive from juvenile sporophytes generated through the common garden experiment described in Solas et al. (2024), which established population-level fitness differences consistent with local thermal adaptation. This supplementary dataset extends those phenotypic findings by characterizing the underlying gene expression signatures. No ethical considerations apply to this repository. The data are freely available for research use with appropriate attribution to this dataset and associated publication.
Dataset DOI: 10.5061/dryad.hqbzkh1xh
Description of the data and file structure
Regional transcriptomic divergence reveals thermal adaptation mechanisms in the giant kelp Macrocystis pyrifera
Description of the data and file structure
Dataset overview
This dataset contains a filtered set of Macrocystis pyrifera transcript sequences, a raw expression count matrix, and associated sample information used for RNA-seq–based analyses of thermal responses in juvenile sporophytes. The dataset was generated to support expression-based analyses derived from the experimental work of Solas et al. (2023), who obtained the organisms and conducted the common garden experiment across contrasting temperature regimes and regional origins in Chile.
The transcript sequences deposited here correspond to an experimentally validated subset of transcripts retained after mapping RNA-seq reads to a previously published de novo transcriptome assembly and filtering for expression and quality thresholds.
Experimental design summary
Juvenile sporophytes were produced through a reciprocal common garden experiment using parental kelps collected from three climatically distinct coastal regions of Chile: Atacama, Southern Chile, and Magallanes. Gametophytes were isolated from multiple parental sporophytes per region, pooled within regions, and crossed to generate region-specific offspring while minimizing individual parental effects.
Juvenile sporophytes were initially reared under standardized laboratory conditions and subsequently exposed for one month to three experimental temperature treatments (8 °C, 12 °C, and 16 °C). These treatments represent both native (local) and non-native (foreign) thermal environments for each regional origin. RNA was extracted after exposure and used for RNA sequencing.
Files and variables
Data files
1. transcriptoma_Mp.fa
-
File type: FASTA
-
Description:
This file contains 39,928 transcript sequences of Macrocystis pyrifera representing the experimentally validated, expressed subset of a larger de novo transcriptome assembly. Transcripts were retained after mapping RNA-seq reads to an initial raw assembly (63,219 contigs) and filtering based on expression and quality thresholds.
- Content:
Each FASTA entry corresponds to a single transcript sequence. Sequence headers contain unique transcript identifiers.
- Units:
Nucleotide sequences (A, T, G, C, and N).
- Notes:
Sequences include both coding and non-coding regions. No repeat masking or coding sequence prediction was applied. This file does not represent a complete reference genome.
2. matriz_Mp.txt
-
File type: Text file (Matrix format)
-
Description:
Raw read count matrix generated after mapping RNA-seq reads to the reference transcriptome. This file is essential for reproducing differential expression analyses.
- Structure:
- First Row (Header): Contains the Sample IDs (e.g., MP_01, MP_02...). These match the IDs listed in the "Sample Information" table below.
- First Column: Contains the Transcript identifiers (matching the headers in `transcriptome_Mp.fa`).
- Cell Values: Raw integer counts representing the number of reads mapped to each transcript per sample.
- Notes:
Data are raw counts and have not been normalized (e.g., TMM, RPKM, or TPM normalization has not been applied).
Sample Information (Metadata)
Since no separate metadata file is provided, the experimental conditions for each sample ID used in matriz_Mp.txt are detailed below.
Variables:
-
Sample ID: Unique identifier assigned to each juvenile sporophyte sample.
-
Temperature treatment: Experimental seawater temperature exposure (8, 12, or 16 °C).
-
Region: Climatic region of origin of the parental sporophytes (Atacama, Southern Chile, Magallanes).
Sample Metadata Table:
| Sample ID | Temperature treatment | Region |
|---|---|---|
| MP_01 | 16 °C | Atacama |
| MP_02 | 16 °C | Atacama |
| MP_03 | 12 °C | Atacama |
| MP_04 | 8 °C | Atacama |
| MP_05 | 16 °C | Atacama |
| MP_06 | 8 °C | Atacama |
| MP_26 | 12 °C | Atacama |
| MP_31 | 8 °C | Atacama |
| MP_38 | 12 °C | Atacama |
| MP_27 | 12 °C | Magallanes |
| MP_29 | 16 °C | Magallanes |
| MP_30 | 12 °C | Magallanes |
| MP_32 | 8 °C | Magallanes |
| MP_33 | 16 °C | Magallanes |
| MP_34 | 12 °C | Magallanes |
| MP_35 | 16 °C | Magallanes |
| MP_36 | 16 °C | Magallanes |
| MP_37 | 12 °C | Magallanes |
| MP_39 | 8 °C | Magallanes |
| MP_41 | 8 °C | Magallanes |
| MP_43 | 8 °C | Magallanes |
| MP_44 | 8 °C | Southern Chile |
| MP_46 | 12 °C | Southern Chile |
| MP_50 | 12 °C | Southern Chile |
| MP_51 | 16 °C | Southern Chile |
| MP_56 | 8 °C | Southern Chile |
| MP_57 | 8 °C | Southern Chile |
| MP_58 | 12 °C | Southern Chile |
| MP_63 | 16 °C | Southern Chile |
| MP_64 | 12 °C | Southern Chile |
| MP_70 | 16 °C | Southern Chile |
| MP_74 | 8 °C | Southern Chile |
Missing data
-
No missing values are present in the deposited FASTA file or count matrix.
-
Sample metadata variables are complete for all samples listed above.
Compressed files
-
This dataset does not contain compressed archives (.zip or .gz).
-
All files are provided in uncompressed, plain-text formats to facilitate reuse.
Code/software
Software and workflow
Software required to view the data
The deposited data files are provided in standard formats (FASTA, Text) and can be opened using any standard text editor, spreadsheet software (e.g., Excel, LibreOffice), or common open-source bioinformatics tools (R, Python). No proprietary software is required to view or reuse the data.
Software used for data processing
RNA-seq data processing up to the mapping step was performed using standard open-source bioinformatics software. The following tools and versions were used:
-
FastQC v0.12.0 – quality assessment of raw sequencing reads.
-
Trimmomatic v0.39 – removal of adapter sequences and trimming of low-quality bases.
-
PRINSEQ v0.20.x – additional quality filtering.
-
Kraken2 v2.1.2 – detection and removal of potential contaminant reads using a Prebuilt Refseq index: Standard-FULL database.
-
STAR v2.1.3 – mapping of RNA-seq reads to the reference transcriptome.
All software listed above are freely available and widely used in RNA-seq workflows. Analyses were conducted using command-line environments and the Galaxy platform.
Workflow overview
-
Raw RNA-seq reads were assessed for quality using FastQC.
-
Adapter sequences and low-quality bases were removed using Trimmomatic.
-
Reads were further filtered using PRINSEQ to remove low-quality sequences.
-
Potential contaminant reads were identified and removed using Kraken2.
-
Filtered reads were mapped against a reference Macrocystis pyrifera transcriptome using STAR.
-
Mapped reads were used to generate the raw count matrix (matriz_Mp.txt) and identify transcripts showing detectable expression.
Code and scripts
No custom scripts or code files are included with this dataset. All processing steps were performed using standard software implementations and default or commonly used parameters. Detailed command-line settings and downstream statistical analyses are described in the associated study documentation.
Access information
Raw RNA-seq data
Raw RNA-seq reads corresponding to the juvenile sporophyte samples described in this dataset are publicly available through the NCBI Sequence Read Archive (SRA):
-
NCBI BioProject accession: PRJNA1055545
-
Repository: NCBI Sequence Read Archive (SRA)
-
Access URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1055545
These raw sequencing reads were generated for the present study and were used to identify and validate expressed transcripts retained in the filtered transcriptome deposited here.
Source transcriptome assembly
The filtered transcript set deposited in this dataset was derived from a previously published de novo transcriptome assembly of Macrocystis pyrifera generated by Molano et al. (2022). The original assembly comprised 63,219 transcript contigs assembled from both gametophyte and sporophyte life stages and was used as the initial reference framework for read mapping and expression-based filtering.
-
Original data source: Molano et al. (2022)
-
Data repository: NCBI
-
Associated BioProject: PRJNA661280
-
Access URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA661280
The present dataset represents a derived, expression-supported subset of this original assembly, generated by mapping newly produced RNA-seq reads and retaining only transcripts showing detectable expression and passing quality control thresholds.
License and reuse conditions
Raw RNA-seq data deposited in the NCBI SRA and the original transcriptome assembly by Molano et al. (2022) are publicly available for research use under the access and reuse conditions specified by their respective repositories. The filtered transcriptome dataset deposited in DRYAD is released under an open data license to facilitate reuse, reproducibility, and comparative analyses.
References
-
Molano, G. et al. Selection intensity on sporophyte vs gametophyte genes in giant kelp. Front. Mar. Sci. 8, 774076 (2022).
-
Solas, M. et al. Assessment of local adaptation and outbreeding risks in contrasting thermal environments of the giant kelp, Macrocystis pyrifera. J. Appl. Phycol. 36, 471–483 (2024).
