Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues representing the root, stem, leaf, flower buds, column, lip, petal, sepal and three developmental stages of seeds. Our aims were to contribute to a better understanding of the molecular mechanisms driving the analysed tissue characteristics and to enrich the available data for P. equestris. Here, we present three databases. The first dataset is the RNA-Seq raw reads, which can be used to execute new experiments with different analysis approaches. The other two datasets allow different types of searches for candidate homologues. The second dataset includes the sets of assembled unigenes and predicted coding sequences and proteins, enabling a sequence-based search. The third dataset consists of the annotation results of the aligned unigenes versus the Nonredundant (Nr) protein database, Kyoto Encyclopaedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) databases with low e-values, enabling a name-based search.
P. equestris genome assembly
The P. equestris genome scaffolds and the file containing the locational relationship between the superscaffold and scaffolds or contigs
Pha_1213.scafSeq.FG2_superscaffold.tar.gz
P. equestris genome repeat annotation
The P. equestris genome repeat annotation,which containing repeat annotation file by proteinmasker, repeatmasker and TRF, the gff format file of repeat annotation by proteinmasker, repeatmasker and TRF, the gff format file of de novo repeat annotation and the xlsx format file of the statistics of repeat annotation.
pequ_repeat_dataset1.tar.gz
P. equestris genome gene models
The P. equestris genome gene models contain predicted coding sequence, proteins and gff format file
pequ_gene_models_dataset1.tar
P. equestris genome functional annotation
The P. equestris genome function annotation dataset contains the blast results from KEGG, InterPro, Swissprot, TrEMBL database
pequ_functional_annotation_dataset1.tar
The transcriptome assembly
The dataset contains the unigenes from the longest contigs per transcripts generated by Trinity. The fb.flower bud.Unigene.fa file contains unigenes from flower of P. equestris, the L5.root.Unigene.fa file are unigenes from root of P. equestris, the L6.stem.Unigene.fa file contains unigenes from stem of P. equestris, the PHA.leaf. Unigene.fa file contains unigenes from leaf of P. equestris. 12_day.unigene.fasta, 7_day.unigene.fasta and 4_day.unigene.fasta files are unigenes from seeds respectively taken from sowing on 1/2 MS medium for 12 days, 7 days and 4 days. sepal.unigene.fasta, petal.unigene.fasta, lip.unigene.fasta and column.unigene.fasta files are unigenes from sepal, petal, lip and column.
unigene_dataset3.tar
The transcriptome functional annotation
The dataset contains functional annotation and gene coding sequence annotation for 11tissues. There are five annotation files per tissues, which are three functional annotation files and two structural annotation files, respectively. They are the KEGG, COG and Nr database annotation files. The cds and pep files are fasta format, the title in the files contains unigene name predicted coding sequence, the locus and the coding direction
annotation_dataset4.tar.gz
HSP gene family in the eleven transcriptome
We tested full-length transcripts against the HSP90 and HSP70 gene family in order to examine the completeness of the data by comparing 11 tissues transcriptomes with P. equestris genome. PEQU means P. equestri; flower bud, root, stem and leaf are labeled by fb, L5, L6 and PHA, respectively. 4_day_seed, 7_day_seed and 12_day_seed are seeds respectively taken from sowing on 1/2 MS medium for 4 days, 7 days and 12 days.
HSP_dataset5.tar
100 CEGs for checking transcript assembly completeness
The alignment results from100 randomly selected conserved core eukaryotic genes (CEGs) among Arabidopsis thaliana, P. equestris and eleven transcriptomes for examining the transcript assemblies completeness. 82CEGs sequences (82%) were perfectly reconstructed, showing high consistency, although there were some sequences suggesting that partial sequencing missed in PEQU genome, such as sequences from At2g36880.1 and At1g12840.1 homologous genes, and some sequences in transcriptomes should be merged, such as sequences from At4g39280.1 homologous genes.
CEGs_dataset6.tar