Full-length transcriptomes of 25 grassland plant species
Data files
Apr 17, 2025 version files 1.92 GB
-
BMK230426-BJ202-01P0001.flnc.clustered.isoforms.fa
81.55 MB
-
BMK230426-BJ202-01P0002.flnc.clustered.isoforms.fa
97.97 MB
-
BMK230426-BJ202-01P0003.flnc.clustered.isoforms.fa
33.58 MB
-
BMK230426-BJ202-01P0004.flnc.clustered.isoforms.fa
71.25 MB
-
BMK230426-BJ202-01P0005.flnc.clustered.isoforms.fa
22.54 MB
-
BMK230426-BJ202-01P0006.flnc.clustered.isoforms.fa
62.58 MB
-
BMK230426-BJ202-01P0007.flnc.clustered.isoforms.fa
62 MB
-
BMK230426-BJ202-01P0008.flnc.clustered.isoforms.fa
37.53 MB
-
BMK230426-BJ202-01P0009.flnc.clustered.isoforms.fa
80.56 MB
-
BMK230426-BJ202-01P0010.flnc.clustered.isoforms.fa
69.16 MB
-
BMK230426-BJ202-01P0011.flnc.clustered.isoforms.fa
83.04 MB
-
BMK230426-BJ202-01P0012.flnc.clustered.isoforms.fa
119.28 MB
-
BMK230426-BJ202-01P0013.flnc.clustered.isoforms.fa
77.85 MB
-
BMK230426-BJ202-01P0014.flnc.clustered.isoforms.fa
78.11 MB
-
BMK230426-BJ202-01P0015.flnc.clustered.isoforms.fa
97.39 MB
-
BMK230426-BJ202-01P0016.flnc.clustered.isoforms.fa
77.75 MB
-
BMK230426-BJ202-01P0017.flnc.clustered.isoforms.fa
90.45 MB
-
BMK230426-BJ202-01P0018.flnc.clustered.isoforms.fa
101.49 MB
-
BMK230426-BJ202-01P0019.flnc.clustered.isoforms.fa
72.86 MB
-
BMK230426-BJ202-01P0020.flnc.clustered.isoforms.fa
86.90 MB
-
BMK230426-BJ202-01P0021.flnc.clustered.isoforms.fa
49.06 MB
-
BMK230426-BJ202-01P0022.flnc.clustered.isoforms.fa
115.51 MB
-
BMK230426-BJ202-01P0023.flnc.clustered.isoforms.fa
107.95 MB
-
BMK230426-BJ202-01P0024.flnc.clustered.isoforms.fa
40.13 MB
-
BMK230426-BJ202-01P0025.flnc.clustered.isoforms.fa
104.29 MB
-
README.md
7.38 KB
Abstract
Grasslands are essential, biodiverse ecosystems of economic importance that play a critical role for carbon storage and soil health. Despite their ecological and economic importance, transcriptomic resources for wild grassland species facilitating eco-evolutionary and functional genomic studies remain limited. In this study, we present full-length transcriptomes for shoot tissue of natural accessions of 25 wild grassland plant species collected from the field site of a long-term grassland biodiversity experiment (Jena Experiment). Using PacBio Iso-Seq technology, we generated a total of 522.45 million subreads which were assembled into isoforms for each species separately. This resulted in an average of 49,180 isoforms per species of which 68.6% were successfully annotated against the Swiss-Prot database. Fifty-six percent of the transcripts had complete open reading frames (ORFs), and 29.6% of the transcripts have been identified as non-coding RNAs (ncRNAs) by two prediction tools. This dataset provides a valuable full-length transcriptomic resource for exploring gene expression, alternative splicing, and evolutionary patterns in wild grassland plant species, paving the way for future functional genomics and conservation studies.
https://doi.org/10.5061/dryad.z08kprrpv
Description of the data and file structure
This dataset contains 25 FASTA files, each representing the final assembled IsoSeq data for a grassland plant species.
The respective file names listed below:
| Species | Assembly |
|---|---|
| Lotus corniculatus | BMK230426-BJ202-01P0001.flnc.clustered.isoforms.fa |
| Medicago variegata | BMK230426-BJ202-01P0002.flnc.clustered.isoforms.fa |
| Trisetum flavescens | BMK230426-BJ202-01P0003.flnc.clustered.isoforms.fa |
| Crepis biennis | BMK230426-BJ202-01P0004.flnc.clustered.isoforms.fa |
| Geranium pratense | BMK230426-BJ202-01P0005.flnc.clustered.isoforms.fa |
| Alopecurus pratensis | BMK230426-BJ202-01P0006.flnc.clustered.isoforms.fa |
| Daucus carota | BMK230426-BJ202-01P0007.flnc.clustered.isoforms.fa |
| Trifolium pratense | BMK230426-BJ202-01P0008.flnc.clustered.isoforms.fa |
| Primula veris | BMK230426-BJ202-01P0009.flnc.clustered.isoforms.fa |
| Knautia arvensis | BMK230426-BJ202-01P0010.flnc.clustered.isoforms.fa |
| Ranunculus acris | BMK230426-BJ202-01P0011.flnc.clustered.isoforms.fa |
| Plantago media | BMK230426-BJ202-01P0012.flnc.clustered.isoforms.fa |
| Prunella vulgaris | BMK230426-BJ202-01P0013.flnc.clustered.isoforms.fa |
| Plantago lanceolata | BMK230426-BJ202-01P0014.flnc.clustered.isoforms.fa |
| Bellis perennis | BMK230426-BJ202-01P0015.flnc.clustered.isoforms.fa |
| Veronica chamaedrys | BMK230426-BJ202-01P0016.flnc.clustered.isoforms.fa |
| Holcus lanatus | BMK230426-BJ202-01P0017.flnc.clustered.isoforms.fa |
| Lathyrus pratensis | BMK230426-BJ202-01P0018.flnc.clustered.isoforms.fa |
| Luzula campestris | BMK230426-BJ202-01P0019.flnc.clustered.isoforms.fa |
| Galium mollugo agg. | BMK230426-BJ202-01P0020.flnc.clustered.isoforms.fa |
| Leucanthemum vulgare agg. | BMK230426-BJ202-01P0021.flnc.clustered.isoforms.fa |
| Ajuga reptans | BMK230426-BJ202-01P0022.flnc.clustered.isoforms.fa |
| Trifolium dubium | BMK230426-BJ202-01P0023.flnc.clustered.isoforms.fa |
| Vicia cracca | BMK230426-BJ202-01P0024.flnc.clustered.isoforms.fa |
| Arrhenatherum elatius | BMK230426-BJ202-01P0025.flnc.clustered.isoforms.fa |
The plant samples were collected from the Jena Experiment, a long-term biodiversity experiment in a grassland ecosystem. Only shoot tissues were harvested for RNA extraction and sequencing. The sequencing was performed using the PacBio Sequel II platform.
Circular Consensus Sequences (CCS) were generated from subreads using the ccs tool (version 6.4.0) with default parameters, which identifies high-fidelity full-length reads by multiple passes of each molecule. On average, 191,105 CCS reads were obtained per species (Table 1). These CCS reads were further processed with lima (version 2.9.0, using the --isoseq option) to remove sequencing adapters and barcodes. Poly(A) tails and artificial concatemers were removed using the isoseq tool (version 4.0.0), yielding Full-Length Non-Chimeric (FLNC) reads. The FLNC reads were then clustered using isoseq cluster to generate polished isoforms, with the --singletons option enabled to retain singleton transcripts.
This dataset provides a valuable full-length transcriptomic resource for exploring gene expression, alternative splicing, and evolutionary patterns in wild grassland plant species, paving the way for future functional genomics and conservation studies.
Files and variables
File: BMK230426-BJ202-01P0024.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Vicia cracca
File: BMK230426-BJ202-01P0025.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Arrhenatherum elatius
File: BMK230426-BJ202-01P0023.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Trifolium dubium
File: BMK230426-BJ202-01P0022.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Ajuga reptans
File: BMK230426-BJ202-01P0021.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Leucanthemum vulgare agg.
File: BMK230426-BJ202-01P0020.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Galium mollugo agg.
File: BMK230426-BJ202-01P0019.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Luzula campestris
File: BMK230426-BJ202-01P0018.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Lathyrus pratensis
File: BMK230426-BJ202-01P0017.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Holcus lanatus
File: BMK230426-BJ202-01P0016.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Veronica chamaedrys
File: BMK230426-BJ202-01P0015.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Bellis perennis
File: BMK230426-BJ202-01P0013.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Prunella vulgaris
File: BMK230426-BJ202-01P0014.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Plantago lanceolata
File: BMK230426-BJ202-01P0012.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Plantago media
File: BMK230426-BJ202-01P0011.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Ranunculus acris
File: BMK230426-BJ202-01P0008.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Trifolium pratense
File: BMK230426-BJ202-01P0010.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Knautia arvensis
File: BMK230426-BJ202-01P0009.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Primula veris
File: BMK230426-BJ202-01P0007.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Daucus carota
File: BMK230426-BJ202-01P0005.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Geranium pratense
File: BMK230426-BJ202-01P0004.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Crepis biennis
File: BMK230426-BJ202-01P0006.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Alopecurus pratensis
File: BMK230426-BJ202-01P0002.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Medicago variegata
File: BMK230426-BJ202-01P0003.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Trisetum flavescens
File: BMK230426-BJ202-01P0001.flnc.clustered.isoforms.fa
Description: Final assembled IsoSeq data for Lotus corniculatus
Access information
Data was derived from the following sources:
- Bioproject PRJNA1171640 on NCBI. (It specifies the raw data, and it will be released to public as soon as the manuscript is published).
Substreads from PacBio Sequel II platform were processed using the PacBio Iso-Seq pipeline. Circular Consensus Sequences (CCS) were generated from subreads using the ccs tool (version 6.4.0) with default parameters, which identifies high-fidelity full-length reads by multiple passes of each molecule. These CCS reads were further processed with lima (version 2.9.0, using the --isoseq option) to remove sequencing adapters and barcodes. Poly(A) tails and artificial concatemers were removed using the isoseq tool (version 4.0.0), yielding Full-Length Non-Chimeric (FLNC) reads. The FLNC reads were then clustered using isoseq cluster to generate polished isoforms, with the --singletons option enabled to retain singleton transcripts.
