Cryptic CAM photosynthesis in Joshua tree (Yucca brevifolia, Y. jaegeriana)
Data files
Aug 15, 2025 version files 256.64 MB
-
2024-04-17_counts_norm.txt
60.39 MB
-
2024-04-17_counts_raw.txt
30.14 MB
-
2024-04-17_tpm_norm.txt
60.93 MB
-
2024-04-17_tpm_raw.txt
60.93 MB
-
full_RNA_metadata_dryad.csv
6.71 KB
-
primaryYucca_jaegeriana_transcripts.fa
44.24 MB
-
README.md
5.29 KB
Abstract
Joshua trees are an iconic species of the Mojave Desert, but face threats from changes to the climate and land use. Here we uncover cryptic Crassulacean acid metabolism (CAM photosynthesis) in Joshua trees via a common garden, and use genomic data to understand other metabolic differences between populations and the two species of Joshua tree. Our Dryad data package includes RNA sequencing data from a common garden experiment on Joshua tree species (Yucca brevifolia and Yucca jaegeriana). Samples were taken in 2022 from just a single garden and were combined with ecophysiological measurements and metabolomics. Our results indicate low level CAM activity in all populations of Joshua tree, as well as strong differentiation in aspects of carbon metabolism between the two species.
Dataset DOI: 10.5061/dryad.7pvmcvf3d
Description of the data and file structure
To understand the presence, variability, and strength of Crassulacean acid metabolism (CAM) photosynthesis, we collected RNA-sequencing data in 2022 from both Joshua tree species (Yucca brevifolia, Yucca jaegeriana) from plants in a common garden. Data package includes only computed read count and transcript per million (TPM) values and the reference used for read mapping of samples collected in 2022. The 2021 data is available as raw reads on SRA. For raw reads of both 2021 and 2022 collections, please see SRA BioProject PRJNA1132710.
Files and variables
File: 2024-04-17_counts_raw.txt
Description: Raw count data as computed by sleuth from 2022 samples mapped to the primary transcript reference file.
Variables
- Columns: Library ID. See "full_RNA_metadata_dryad.csv" for information.
- Rows: primary transcript ID
File: 2024-04-17_tpm_raw.txt
Description: Raw TPM data as computed by sleuth from 2022 samples mapped to the primary transcript reference file.
Variables
- Columns: Library ID. See "full_RNA_metadata_dryad.csv" for information.
- Rows: primary transcript ID
File: 2024-04-17_counts_norm.txt
Description: Normalized count data as computed by sleuth from 2022 samples mapped to the primary transcript reference file.
Variables
- Columns: Library ID. See "full_RNA_metadata_dryad.csv" for information.
- Rows: primary transcript ID
File: 2024-04-17_tpm_norm.txt
Description: Normalized TPM data as computed by sleuth from 2022 samples mapped to the primary transcript reference file.
Variables
- Columns: Library ID. See "full_RNA_metadata_dryad.csv" for information.
- Rows: primary transcript ID
File: full_RNA_metadata_dryad.csv
Description: Metadata for each library sequences (headers in the TPM and count files).
Variables
- sample: library ID (internal number)
- sourcepop: full name of source population
- populationcode: Abbreviation for the source population from which the individual came.
- time: Time of day of sample, AM (morning) or PM (night)
- species: Which of the two species the sample came from (Eastern = Y. jaegeriana, Western = Y. brevifolia, hybrid = hybrid of the two.)
- tag: the unique tag identifier for the plant
- matriline: matriline information for the plant
File: primaryYucca_jaegeriana_transcripts.fa
Description: Transcript file derived from a preliminary annotation of the Y. jaegeriana genome. Transcripts are only the primary isoforms annotated per locus.
Genome assembly:
A total of 136.8 Gb of PacBio CCS reads were generated using a PacBio Sequel 2 at the HudsonAlpha Institute of Biotechnology. PacBio HiFi libraries had insert sizes ranging from 17-23.6 Kb. An estimated 44.1X coverage of the genome was generated. Initial assembly was completed using hifiasm 0.19.8 with default parameters.
Genome annotation:
The assembly was annotated using BRAKER v.3.0.4 using gene models from *Yucca aloifolia *(Ya24Inoko_839 v.2.1) and *Yucca filamentosa *(YfilamentosaC3pri_837_v.2.1) and evidence from transcriptomes generated in this manuscript. RepeatModeler v.2.0.3 (Flynn et al., 2020) and RepeatMasker v.4.1.2-p1 were run to identify the consensus repeat families and softmasking the repeat regions in the genome with default parameters, respectively. TSEBRA v.1.1.2.1(Gabriel et al., 2021) was used to retrieve BRAKER's filtered single exon genes. We ran InterProScan v.5.19-58.0 (Jones et al., 2014) to obtain protein evidence of the genes provided by TSEBRA output. All the single- exon genes lacking any protein evidence were filtered out from the gff3 file using AGAT v.0.7.0 (Dainat, 2021). The output gff3 file from AGAT was used as the final annotation and for downstream analysis.
Dainat J. 2021. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format (v0. 8.0). Zenodo.
Flynn JM, Hubley R, Goubert C, Rosen J, Clark AG, Feschotte C, Smit AF. 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America 117: 9451–9457.
Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. 2021. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics 22: 566.
Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics 30: 1236–1240.
Code/software
All data can be viewed in standard text editors.
Access information
Other publicly accessible locations of the data:
- NCBI SRA BioProject PRJNA1132710
