Regional transcriptomic divergence reveals thermal adaptation mechanisms in the giant kelp Macrocystis pyrifera

Guillemin, Marie-Laure 1

Research facility: Pontificia Universidad Católica de Chile

Published May 28, 2026 on Dryad. https://doi.org/10.5061/dryad.hqbzkh1xh

Data files

May 28, 2026 version files 35.78 MB

matriz_Mp.txt

3.86 MB
README.md

9.82 KB
transcriptoma_Mp.fa

31.91 MB

Abstract

This dataset contains a filtered set of Macrocystis pyrifera transcript sequences and raw expression counts derived from RNA-seq analysis of juvenile sporophytes exposed to contrasting thermal conditions. The data support expression-based analyses of thermal adaptation signatures across three geographically distinct Chilean populations (Atacama, Southern Chile, and Magallanes) spanning a temperature gradient from 16 °C to 8 °C. The transcriptome comprises 39,928 transcript sequences that passed quality thresholds after mapping to a reference assembly of 63,219 transcripts from M. pyrifera gametophyte and sporophyte life stages. These represents 67.8 % of transcripts with functional annotation (27,094 transcripts) and 38.9 % with Gene Ontology (GO) terms associated with biological processes. Raw expression count matrices are provided for the 32 sequenced samples, organized by region of origin and temperature treatment (8 °C, 12 °C, or 16 °C). Importantly, this dataset does not represent a canonical reference genome nor the complete raw assembly; instead, it includes the subset of 39,928 transcripts that passed strict quality control, contaminant filtering, and expression thresholds across all experimental treatments. The transcriptomic data derive from juvenile sporophytes generated through the common garden experiment described in Solas et al. (2024), which established population-level fitness differences consistent with local thermal adaptation. This supplementary dataset extends those phenotypic findings by characterizing the underlying gene expression signatures. No ethical considerations apply to this repository. The data are freely available for research use with appropriate attribution to this dataset and associated publication.

Dataset DOI: 10.5061/dryad.hqbzkh1xh

Description of the data and file structure

Regional transcriptomic divergence reveals thermal adaptation mechanisms in the giant kelp Macrocystis pyrifera

Description of the data and file structure

Dataset overview

This dataset contains a filtered set of Macrocystis pyrifera transcript sequences, a raw expression count matrix, and associated sample information used for RNA-seq–based analyses of thermal responses in juvenile sporophytes. The dataset was generated to support expression-based analyses derived from the experimental work of Solas et al. (2023), who obtained the organisms and conducted the common garden experiment across contrasting temperature regimes and regional origins in Chile.

The transcript sequences deposited here correspond to an experimentally validated subset of transcripts retained after mapping RNA-seq reads to a previously published de novo transcriptome assembly and filtering for expression and quality thresholds.

Experimental design summary

Juvenile sporophytes were produced through a reciprocal common garden experiment using parental kelps collected from three climatically distinct coastal regions of Chile: Atacama, Southern Chile, and Magallanes. Gametophytes were isolated from multiple parental sporophytes per region, pooled within regions, and crossed to generate region-specific offspring while minimizing individual parental effects.

Juvenile sporophytes were initially reared under standardized laboratory conditions and subsequently exposed for one month to three experimental temperature treatments (8 °C, 12 °C, and 16 °C). These treatments represent both native (local) and non-native (foreign) thermal environments for each regional origin. RNA was extracted after exposure and used for RNA sequencing.

Files and variables

Data files

1. transcriptoma_Mp.fa

File type: FASTA
Description:

This file contains 39,928 transcript sequences of Macrocystis pyrifera representing the experimentally validated, expressed subset of a larger de novo transcriptome assembly. Transcripts were retained after mapping RNA-seq reads to an initial raw assembly (63,219 contigs) and filtering based on expression and quality thresholds.

Content:

Each FASTA entry corresponds to a single transcript sequence. Sequence headers contain unique transcript identifiers.

Units:

Nucleotide sequences (A, T, G, C, and N).

Notes:

Sequences include both coding and non-coding regions. No repeat masking or coding sequence prediction was applied. This file does not represent a complete reference genome.

2. matriz_Mp.txt

File type: Text file (Matrix format)
Description:

Raw read count matrix generated after mapping RNA-seq reads to the reference transcriptome. This file is essential for reproducing differential expression analyses.

Structure:

- First Row (Header): Contains the Sample IDs (e.g., MP_01, MP_02...). These match the IDs listed in the "Sample Information" table below.

- First Column: Contains the Transcript identifiers (matching the headers in `transcriptome_Mp.fa`).

- Cell Values: Raw integer counts representing the number of reads mapped to each transcript per sample.

Notes:

Data are raw counts and have not been normalized (e.g., TMM, RPKM, or TPM normalization has not been applied).

Sample Information (Metadata)

Since no separate metadata file is provided, the experimental conditions for each sample ID used in matriz_Mp.txt are detailed below.

Variables:

Sample ID: Unique identifier assigned to each juvenile sporophyte sample.
Temperature treatment: Experimental seawater temperature exposure (8, 12, or 16 °C).
Region: Climatic region of origin of the parental sporophytes (Atacama, Southern Chile, Magallanes).

Sample Metadata Table:

Sample ID	Temperature treatment	Region
MP_01	16 °C	Atacama
MP_02	16 °C	Atacama
MP_03	12 °C	Atacama
MP_04	8 °C	Atacama
MP_05	16 °C	Atacama
MP_06	8 °C	Atacama
MP_26	12 °C	Atacama
MP_31	8 °C	Atacama
MP_38	12 °C	Atacama
MP_27	12 °C	Magallanes
MP_29	16 °C	Magallanes
MP_30	12 °C	Magallanes
MP_32	8 °C	Magallanes
MP_33	16 °C	Magallanes
MP_34	12 °C	Magallanes
MP_35	16 °C	Magallanes
MP_36	16 °C	Magallanes
MP_37	12 °C	Magallanes
MP_39	8 °C	Magallanes
MP_41	8 °C	Magallanes
MP_43	8 °C	Magallanes
MP_44	8 °C	Southern Chile
MP_46	12 °C	Southern Chile
MP_50	12 °C	Southern Chile
MP_51	16 °C	Southern Chile
MP_56	8 °C	Southern Chile
MP_57	8 °C	Southern Chile
MP_58	12 °C	Southern Chile
MP_63	16 °C	Southern Chile
MP_64	12 °C	Southern Chile
MP_70	16 °C	Southern Chile
MP_74	8 °C	Southern Chile

Missing data

No missing values are present in the deposited FASTA file or count matrix.
Sample metadata variables are complete for all samples listed above.

Compressed files

This dataset does not contain compressed archives (.zip or .gz).
All files are provided in uncompressed, plain-text formats to facilitate reuse.

Code/software

Software and workflow

Software required to view the data

The deposited data files are provided in standard formats (FASTA, Text) and can be opened using any standard text editor, spreadsheet software (e.g., Excel, LibreOffice), or common open-source bioinformatics tools (R, Python). No proprietary software is required to view or reuse the data.

Software used for data processing

RNA-seq data processing up to the mapping step was performed using standard open-source bioinformatics software. The following tools and versions were used:

FastQC v0.12.0 – quality assessment of raw sequencing reads.
Trimmomatic v0.39 – removal of adapter sequences and trimming of low-quality bases.
PRINSEQ v0.20.x – additional quality filtering.
Kraken2 v2.1.2 – detection and removal of potential contaminant reads using a Prebuilt Refseq index: Standard-FULL database.
STAR v2.1.3 – mapping of RNA-seq reads to the reference transcriptome.

All software listed above are freely available and widely used in RNA-seq workflows. Analyses were conducted using command-line environments and the Galaxy platform.

Workflow overview

Raw RNA-seq reads were assessed for quality using FastQC.
Adapter sequences and low-quality bases were removed using Trimmomatic.
Reads were further filtered using PRINSEQ to remove low-quality sequences.
Potential contaminant reads were identified and removed using Kraken2.
Filtered reads were mapped against a reference Macrocystis pyrifera transcriptome using STAR.
Mapped reads were used to generate the raw count matrix (matriz_Mp.txt) and identify transcripts showing detectable expression.

Code and scripts

No custom scripts or code files are included with this dataset. All processing steps were performed using standard software implementations and default or commonly used parameters. Detailed command-line settings and downstream statistical analyses are described in the associated study documentation.

Access information

Raw RNA-seq data

Raw RNA-seq reads corresponding to the juvenile sporophyte samples described in this dataset are publicly available through the NCBI Sequence Read Archive (SRA):

NCBI BioProject accession: PRJNA1055545
Repository: NCBI Sequence Read Archive (SRA)
Access URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1055545

These raw sequencing reads were generated for the present study and were used to identify and validate expressed transcripts retained in the filtered transcriptome deposited here.

Source transcriptome assembly

The filtered transcript set deposited in this dataset was derived from a previously published de novo transcriptome assembly of Macrocystis pyrifera generated by Molano et al. (2022). The original assembly comprised 63,219 transcript contigs assembled from both gametophyte and sporophyte life stages and was used as the initial reference framework for read mapping and expression-based filtering.

Original data source: Molano et al. (2022)
Data repository: NCBI
Associated BioProject: PRJNA661280
Access URL: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA661280

The present dataset represents a derived, expression-supported subset of this original assembly, generated by mapping newly produced RNA-seq reads and retaining only transcripts showing detectable expression and passing quality control thresholds.

License and reuse conditions

Raw RNA-seq data deposited in the NCBI SRA and the original transcriptome assembly by Molano et al. (2022) are publicly available for research use under the access and reuse conditions specified by their respective repositories. The filtered transcriptome dataset deposited in DRYAD is released under an open data license to facilitate reuse, reproducibility, and comparative analyses.

References

Molano, G. et al. Selection intensity on sporophyte vs gametophyte genes in giant kelp. Front. Mar. Sci. 8, 774076 (2022).
Solas, M. et al. Assessment of local adaptation and outbreeding risks in contrasting thermal environments of the giant kelp, Macrocystis pyrifera. J. Appl. Phycol. 36, 471–483 (2024).

Regional transcriptomic divergence reveals thermal adaptation mechanisms in the giant kelp Macrocystis pyrifera

Data files

Abstract

README: Regional transcriptomic divergence reveals thermal adaptation mechanisms in the giant kelp Macrocystis pyrifera

Description of the data and file structure

Dataset overview

Experimental design summary

Files and variables

Data files

1. transcriptoma_Mp.fa

2. matriz_Mp.txt

Sample Information (Metadata)

Missing data

Compressed files

Code/software

Software and workflow

Software required to view the data

Software used for data processing

Workflow overview

Code and scripts

Access information

Raw RNA-seq data

Source transcriptome assembly

License and reuse conditions

References