Background: Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Results: Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis. Conclusions: Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.
Agave deserti transcriptome profiling (RPKM values)
A table of the normalized RPKM values for Agave deserti transcripts in various tissues.
a_deserti_normalized_rpkm.txt.gz
Sequences of Agave deserti commensal organisms and contaminants
FASTA file of assembled contigs that DO NOT appear to be Agave deserti sequences. Possible sources are commensal microbes, lab contaminants, and mis-assemblies.
agave_deserti_contaminants.fa.gz
Pfam and InterPro annotations of Agave deserti proteins
Pfam and InterPro annotations of Agave deserti proteins.
agave_deserti_pfam_interpro_annotations.txt.gz
Sequences of Agave deserti proteins
FASTA file of Agave deserti proteins
agave_deserti_proteins.fa.gz
Sequences of Agave deserti transcript contigs
FASTA of Agave deserti transcript contigs.
agave_deserti.fa.gz
Sequences of Agave tequilana commensal organisms and contaminants
Assembled contigs that DO NOT display evidence of Agave tequilana origin. Likely sources include commensal microbes, lab contaminants, and mis-assemblies.
agave_tequilana_contaminants.fa.gz
Agave tequilana transcriptome profiling (RPKM values)
Table of normalized RPKM values for Agave tequilana transcripts across various tissues.
agave_tequilana_normalized_rpkm.txt.gz
Pfam and InterPro annotations of Agave tequilana proteins
Table of Pfam and InterPro annotations for Agave tequilana proteins.
agave_tequilana_pfam_interpro_annotations.txt.gz
Sequences of Agave tequilana proteins
FASTA file of Agave tequilana proteins.
agave_tequilana_proteins.fa.gz
Sequences of Agave tequilana transcript contigs
FASTA file of Agave tequilana assembled transcript contigs.
agave_tequilana.fa.gz
Pacific Biosciences long-read sequences, unaltered
Pacific Biosciences sequencing of Agave tequilana transcriptome. These are unassembled subreads directly output from a Pacific Biosciences RS.
17618_filtered_subreads.fasta.gz
Illumina-corrected Pacific Biosciences long-read data
FASTA of Pacific Biosciences subreads of file 17618_filtered_subreads.fasta.gz after correction by Agave tequilana Illumina data. Only subreads passing filtering metrics are reported.
corrected_pacbio.fasta.gz
OrthoMCL clustering
OrthoMCL grouping of proteins within the Phytozome Tester Set. #OrthoMCL clusters with Agave high confidence proteins and the Phytozome Tester Set. Abbreviations are as follows:
ateq - A. tequilana, ades - A. deserti, atha - A. thaliana, bdis - B. distachyon, crei - C. reinhardtii, gmax - G. max, mtru - M. truncatula, osat - O. sativa, ptri - P. trichocarpa, rcom - R. communis, sbic - S. bicolor, sita - S. italica, zmay - Z. mays
orthomcl_clusters.txt.gz
Comparison of Agave and Phytozome Tester Set protein lengths
File agave_and_PTS_protein_lengths.txt contains information about the median protein lengths of Agaves and other species
in the Phytozome tester set, grouped by OrthoMCL orthologous protein group (aka, PlantOG). ades = Agave deserti, ateq = A. tequilana, atha = A. thaliana, bdis = B. distachyon, crei = C. reinhardtii, gmax = G. max, mtru = M. truncatula, osat = O. sativa, ptri = P. trichocarpa, rcom = R. communis, sbic = S. bicolor, sita = S. italica, zmay = Z. mays
agave_and_PTS_protein_lengths.txt.gz
Agave deserti Variant Call Format (VCF)
VCF file of SNPs and indels of the Agave deserti transcriptome, using the most abundant transcript isoform (v1) contigs as a reference sequence.
agave_deserti.vcf.gz
Agave tequilana Variant Call Format (VCF) file
VCF format file of SNPs and indels in Agave tequilana, using the most abundant (v1) transcript isoforms as a reference sequence.
agave_tequilana.vcf.gz