Skip to main content

Data from: RNA-Seq reveals adaptive genetic potential of the rare Torrey pine (Pinus torreyana) in the face of Ips bark beetle outbreaks

Cite this dataset

Steele, Stephanie; Ryder, Oliver; Maschinski, Joyce (2022). Data from: RNA-Seq reveals adaptive genetic potential of the rare Torrey pine (Pinus torreyana) in the face of Ips bark beetle outbreaks [Dataset]. Dryad.


The ability of tree species to adapt to water stress and increased frequency of bark beetle outbreaks with climate change may increase with population size and standing genetic variation, calling into question the resilience of small, rare plant populations.  The Torrey pine (Pinus torreyana) is a rare, genetically depauperate conifer that occurs naturally in a mainland and island population in southern California.  Due to recent declines in the mainland population coinciding with drought and Ips paraconfusus bark beetle outbreaks, the species would benefit from an assessment of adaptive genetic diversity.  Here, we use RNA-Seq to survey gene-coding diversity across 40 individuals to 1) characterize patterns of genetic diversity in the species and 2) test for genetic differentiation between trees that succumbed to beetle attack or survived following an outbreak.  Consistent with previous studies, we found few genetic variants, with most SNPs occurring as fixed differences between populations.  However, we found structure within the mainland and polymorphisms segregating in both populations.  Interestingly, we found differentiation in genotypes between attacked and surviving trees and 11 SNPs associated with survival status, several of which had defense-related functions.  While low diversity suggests limited adaptive capacity, genetic associations with survival in functionally relevant genes suggest adaptive potential for bark beetle defense.  This initial study prompts future research to explore the genetic basis of putative resistance and suggests conservation efforts should protect surviving genotypes and the full spectrum of genetic diversity across populations to preserve the evolutionary potential of the species.


Our dataset consisted of 40 Torrey pine trees, including 32 cases and controls that succumbed to attack or survived following a bark beetle outbreak, respectively, and eight additional trees for diversity estimates.  We filtered RNA-seq fastq files using the HTStream pre-processing pipeline (UCDavis) and aligned reads to the Pinus taeda genome v2.01 downloaded from the TreeGenes database.  Due to the large and fragmented nature of the genome assembly, we sorted contigs by size and concatenated them into 222 ~100 Mb ‘pseudocontigs’ with contigs separated by 10 N’s.  We then aligned processed Torrey pine reads to the P. taeda pseudocontigs using two-pass mode with the splice aware aligner STAR v2.6.0.  We used Samtools v1.9 to retain only uniquely mapped reads and to remove PCR/optical duplicates, secondary/supplementary alignments, reads failing QC, and unpaired reads.  Refer to the manuscript for further data processing details.

Usage notes


SNP vcf file of 4,750,889 raw SNPs called with GATKv4.1.4.1 HaplotypeCaller and genotypeGVCFs for 40 Torrey pine samples. CHROM and POS are in pseudocontig coordinates.


SNP vcf file with 9599 SNPs across 40 Torrey pine samples using pseudocontig coordinates.  Raw SNPs were hard-filtered in GATK (see manuscript methods) and further filtered in VCFtools as follows: genotypes with depth (DP) < 5 and quality (GQ) < 20 were set to missing, and sites with mean DP < 15, minor allele frequency < 0.05, > 2 alleles, mean GQ < 44, and call rate < 95% were removed.  Following a modified dDocent pipeline, sites with high read depth that may contain potential paralogs were removed (loci with DP > 2 SD from the mean; loci with DP > 1 SD from the mean if QUAL < 2×DP).  This file was used in DAPC and GEMMA, and was subset by population as needed.


LD-pruned SNP vcf file of 4192 SNPs across 40 Torrey pine samples.  The vcf file was processed as above with an additional step to prune SNPs in linkage disequilibrium (r2 > 0.5 in a window of 5 SNPs) with BCFtools v1.9 using the original genome coordinates.  This file was used for diversity estimates, PC-AiR, DAPC, and GEMMA, and was subset by population as needed.


Bed file used to convert between original and pseudocoordinates. Four columns include Torrey pine pseudocontig (1 – 222), Pita.2_01 contig start and stop (within pseudocontig), and Pita.2_01 contig name, respectively.


Data for each Torrey pine tree used in analyses with corresponding readme file.