Data from: First insights into the transcriptome and development of new genomic tools of a widespread circum-Mediterranean tree species, Pinus halepensis Mill.
Pinosio, Sara et al. (2014), Data from: First insights into the transcriptome and development of new genomic tools of a widespread circum-Mediterranean tree species, Pinus halepensis Mill., Dryad, Dataset, https://doi.org/10.5061/dryad.vb131
Pinus halepensis is a relevant conifer species for studying adaptive responses to drought and fire regimes in the Mediterranean region. Deciphering the molecular basis of Aleppo pine to the Mediterranean environment is therefore needed. In this study we performed Illumina next-generation sequencing of two phenotypically divergent Pinus halepensis accessions with the aims of i) characterizing the transcriptome through Illumina RNA-Seq of two accessions, phenotypically divergent for adaptive traits link to fire adaptation and drought, ii) performing a functional annotation of the assembled transcriptome, iii) identifying genes with accelerated evolutionary rates, iv) studying the expression levels of the annotated genes, and v) developing gene-based markers for population genomic and association genetic studies. The assembled transcriptome consisted in 48,629 contigs and covered about 54.6 Mbp. The comparison of P. halepensis transcripts to Picea sitchensis protein-coding sequences resulted in the detection of 34,014 SNPs across species, with a Ka/Ks average value of 0.216, suggesting that the majority of the assembled genes are under negative selection. Assembled genes showed an over-representation in expression of genes involved in protein synthesis. Several genes were differentially expressed across the two pine accessions with contrasted phenotypes, including glutathione s-transferase, the cellulose synthase and the cobra-like protein . A large number of new markers (8,248 SSRs and 28,236 SNPs) has been identified which should facilitate future population genomics and association genetics in this species. Our results showed that Illumina next-generation sequencing is a valuable technology to obtain an extensive overview on whole transcriptomes of non-model species with large genomes.