Skip to main content

Protea repens whole transcriptome count data for control and drought treatment for 8 populations, climatic data for the 8 populations and phenotypic data collected, and data used for linear mixed models for climate gene expression/trait correlation testing

Cite this dataset

Akman, Melis (2020). Protea repens whole transcriptome count data for control and drought treatment for 8 populations, climatic data for the 8 populations and phenotypic data collected, and data used for linear mixed models for climate gene expression/trait correlation testing [Dataset]. Dryad.


Long term environmental variation often drives local adaptation and leads to trait differentiation across populations. Additionally, when traits change in an environment-dependent way through phenotypic plasticity, the genetic variation underlying plasticity will also be under selection. These processes could create a landscape of differentiation across populations in traits and their plasticity. Here, we performed a dry-down experiment under controlled conditions to measure responses in seedlings of a shrub species from the Cape Floristic Region, the common sugarbush (Protea repens). We measured morphological and physiological traits, and sequenced whole transcriptomes of leaf tissues from 8 populations that represent both the climatic and the geographic distribution of this species. We found that there is substantial variation in how populations respond to drought, but we also observed common patterns such as reduced leaf size and leaf thickness, and upregulation of stress- and down-regulation of growth-related gene groups. Both high environmental heterogeneity and milder source site climates were associated with higher plasticity in various traits and co-expression gene networks. Associations between traits, trait plasticity, gene networks and the source site climate suggests that temperature may play a bigger role in shaping these patterns when compared to precipitation, in line with recent changes in the region due to climate change. We also found that traits respond to climatic variation in an environment dependent manner: some associations between traits and climate were apparent only under certain growing conditions. Together, our results uncover common responses of P. repens populations to drought, and climatic drivers of population differentiation in functional traits, gene expression and their plasticity.


Field sampling and climate data

In 2011, we collected seeds from 8 wild populations of P. repens spanning its geographic distribution in the CFR. From 40-50 plants per population (mean n= 45 plants), we collected 1-2 mature seed heads from the previous year’s growth. Plants were sampled along transects through the population centre, with at least 1 m between each sampled plant for small populations (<100 plants), and 5 m for larger populations. Seed heads were allowed to air-dry until achenes were released.

GPS coordinates of each site near the population center were used to extract elevation and six climate metrics for rainfall and temperature from GIS layers. The climate layers used were (1) mean annual rainfall, (2) average daily maximum temperature for summer (January), (3) average daily minimum temperature for winter (July), (4) precipitation concentration (a seasonality indicator we refer to as “rainfall seasonality”), measured as Markham’s concentration index (Markham 1970), which ranges from 0 to 100, with a value of 0 indicating perfectly even precipitation across all months, and 100 indicating that all precipitation falls in a single month (5) inter-annual rainfall variability, measured as the coefficient of variation in mean annual rainfall across years, and (6) extreme temperature range within a year, measured as the difference between summer maximum and winter minimum temperatures. Our seventh variable was elevation, which was extracted separately from ASTER GDEM (Daac 2011). All climate data layers were from the South African Atlas of Agrohydrology and Climatology (Schulze 1997) with higher resolution for 4 and 5 below, and (Schulze 2007) for the rest, and were based on 50 year (collected between 1950-1999 when data was available from field stations) climate averages, extremes, or variances.

Experimental setup and dry-down treatments

In July 2014, we sowed seeds from 8 Protea repens populations on low nutrient mix (50% bark, 50% sand) in trays of 100 individual plugs 2x2 cm in size with a complete random design. Trays were placed in a long-day growth chamber (8 hours in dark at 8℃, 16 hours in light at 20℃). The trays were randomly moved across the growth chambers every 3 days with each watering. After 1.5 months, we transplanted germinated seedlings into small, deep pots (5 cm diameter x 17 cm deep) in trays of 50 each, in the same low nutrient mix. After transplanting, seedlings were rested for two weeks indoors under growth lights before being transferred to a single greenhouse bench at Nicholls State University Farm, Thibodaux, LA.

Two weeks post-greenhouse transfer and two weeks prior to the start of the dry-down experiment, we selected a total of 316 out of 419 seedlings (19-20 seedlings/population x 8 populations x 2 treatments) for uniform size for the experiment. We then reorganized the 2.5 month-old seedlings into trays containing 20-25 seedlings each, in a complete random design. In the experiment, seedlings were assigned to one of two treatments: control, in which we applied regular watering (~every 3 days, equal amounts of water was given to each plant) and drought, in which we stopped watering entirely for 12 days. Because we had a variable number of seedlings per maternal plant, we could not perfectly balance maternal lines across populations and treatments. To the extent possible, however, we included at least one seedling per maternal family per treatment; for 82% of the 83 maternal families, we were able to include at least one seedling per maternal family in each treatment, and for 51% of families, we included multiple seedlings per treatment. The drought experiment began in late October 2014, when greenhouse conditions were warm but relatively dry, and thus most similar to summer conditions in the native range.

Trait measurements and gene expression sampling

At 6 days and 12 days into the drought experiment, plants were measured for functional traits and leaves were sampled for transcriptomics. Functional traits that were measured at both timepoints included plant height (cm), stomatal conductance, leaf area, leaf length:width ratio (LWR), specific leaf area (fresh leaf area/dry leaf mass; SLA), stomatal density (number per mm2), stomatal pore length (mm) and stem pigmentation. Stem pigmentation, measured as stem hue (H4), was derived using Endler’s segment classification (Maia et al. 2013), based on the average of two reflectance spectra readings (400-700 nm) taken at mid-stem (JAZ, Ocean Optics). On the segment classification scale for hue, higher values represent redder stems that contain higher levels of anthocyanin. Stomatal conductance was measured using a steady-state leaf porometer (SC-1, Decagon) on one leaf per plant, with an additional adjacent leaf measured simultaneously if the chamber was not filled by a single leaf due to small leaf size.

All other leaf measurements were taken from the 2nd or 3rd youngest expanding leaf from the stem apex. Immediately following conductance measurement, the leaf was removed to measure leaf area, width, length and stomatal traits. Fresh leaves were digitally scanned shortly after harvest at 400 dpi on a flatbed scanner. After scanning, acrylic peels were taken from the adaxial leaf surface and placed on cellophane, which were later viewed under a light microscope at 40x for stomatal measurements (see Carlson et al. 2016). Although height and pigmentation were measured on all plants on both timepoints (mean sample size per population=39), all other traits were measured on only a subset of plants on each of these timepoints, typically in association with the transcriptomic data (10-20 individuals per population for each timepoint). Two additional variables were calculated by subtracting day 6 from day 12 values on the same plant: pigment accumulation (day 12 hue - day 6 hue) and growth (day 12 height - day 6 height).

Transcriptome sampling 

A smaller subset of plants compared to functional trait measurements had one leaf per plant harvested for whole transcriptome sequencing and gene expression analysis (n=120 plants total, or 4 per population per treatment per day, except no ALC on day 12 due to insufficient number of individuals because of lower germination in this population). Plants with leaves removed on day 6 were not sampled for transcriptomics on day 12, to avoid possible shifts in gene expression caused by leaf removal. As a consequence, gene expression and most traits were measured in one set of plants on day 6 and in a separate set on day 12. Transcriptome sampling took place prior to any other measurements on a given day and consisted of collecting one leaf for RNA extraction per plant (the 2nd or 3rd youngest leaf from the tip) into a 1 mL tube and snap-freezing immediately in liquid nitrogen. These samples were stored at -80°C until library preparation. All sampling for transcriptome sequencing was done between 10 AM and 11 AM, and functional trait measurements were done between 10AM and 2PM each day. Both transcriptomics and other trait measurements were done in a random order. On day 12, we harvested a subset of plants for root length and carbohydrate storage measurements (n=9 per population including the 4 transcriptome plants and 5 additional plants). After completing all trait and transcriptome sampling, we cut plants at the root- shoot junction and placed aboveground tissues in a 10 mL tube and snap-froze the samples in liquid nitrogen for future carbohydrate analyses. Belowground tissues were gently removed from the soil, rinsed clean and stored in a separate tube and also snap-frozen. For these 9 plants per population, we also measured the length of the longest root. Tubes containing plant tissues were stored in a -80°C freezer for ~1 month, then freeze-dried for biomass and carbohydrate analyses. Dried tissues for above- and belowground material was measured and ground. Ten to hundred and fifty mg of ground material was used for carbohydrate analyses as described in (Akman et al. 2012). Briefly, carbohydrates and starch were extracted with 70% methanol solution. The extract was used for soluble carbohydrate content measurements and the pellet including starch was treated enzymatically with α-amylase and amyloglucosidase resulting in hexose sugars. 

Soluble carbohydrates and hexose sugar concentration was measured with a modified version of anthrone method using fixed glucose standards.

Survival measurements 

After all trait and gene expression data had been collected, the remaining plants in the drought treatment (those that had not been harvested at the end of the experiment) were maintained in the greenhouse under drought stress with regular watering. Plants were checked weekly to score survival and days until mortality was recorded. 

Transcriptome library construction and sequencing 

Transcriptome sequencing samples were homogenized in a Beadbeater-96 (Biospec Inc, Bartlesville, OK, USA) with 2.3 mm Chrome-Steel Beads (Biospec Inc, Bartlesville, OK, USA). Lysis/binding buffer (Life Technologies, Grand Island, NY, USA, with addition of %3 PVP-40) was added and the solution was homogenized again for 2 min. Tubes were transferred to a 60°C water bath for 30 min with occasional shaking every 10 min. The solution was centrifuged for 3 min at 12 000 x g and the supernatant was used for mRNA isolation and sequencing library preparation by the procedures explained in Kumar et al. (2012). Sequencing libraries were pooled and quality control of the libraries were done with Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA, USA). Before pooling, concentration of each library was measured in a plate reader and appropriate amounts were pooled in order to balance reads obtained from each library from the pooled samples. A total of 120 samples were sequenced; 60 in each of 2 Illumina Hiseq 2500 lanes, yielding 756 million single-end reads (100 bp long). A randomized block design was used for sequencing: two samples per population per treatment per time-point were included in each lane. 


Gene expression analyses and bioinformatics

Raw reads were trimmed and demultiplexed with trimmomatic (Bolger et al. 2014) and fastx toolkit (Hannon). By using bowtie (Langmead & Salzberg 2012), trimmed reads were mapped to a previously published de-novo transcriptome of P. repens (Akman et al. 2016) assembled by using sequences of individuals that also include all of our focal populations. Before reads were mapped to the transcriptome, a further clustering of the contigs were done using cd-hit (Fu et al. 2012), which reduced number of contigs from 120406 to 106403 (settings -c 0.95 -n 8 -r 1; N50 slightly increased from 1130 to 1132). Read counts, extracted using eXpress (Pachter 2011), averaged 6.02 million reads per sample (ranged between 2.46-21.80 million). Mapping rate was 92.4% on average (ranged between 89.5%-94.5%). Contigs with <1 counts per million (cpm) for at least 24 samples were excluded from the analyses (reducing the library size range to 2.35-20.46, average = 5.73 million reads), which reduced the number of contigs used to 45019 (29507 of which had hits to Arabidopsis genes). Differential gene expression analyses were performed using edgeR. For each time point, differentially expressed genes per population were revealed using pairwise comparisons using negative binomial models. The common set of differentially expressed genes was obtained by extracting the overlap of differentially expressed genes for each population at day 12. We then used linear models in edgeR, including sequencing lane as a fixed factor, to detect contigs that showed variation between treatments and among populations (population x treatment).

Co-expression gene networks were computed using WGCNA (Langfelder & Horvath 2008) with consensus clustering option for the 2 timepoints together. Normalized and cpm-filtered read counts were transformed with “voom” in limma package. A soft-threshold of 6 was used for clustering and 14 co-expressed gene networks were constructed. The number of genes within each gene network varied between 49 (GN13) to 7682 (GN1). The gene network “grey” (called GN0 in this study) as constructed by WGCNA for genes that do not fit any gene network has 3124 genes. A representative “eigen gene” value, which is the first principal component axis for all genes within the network, was used for further analyses of climate and trait associations.


Markham CG (1970) Seasonality of precipitation in the United States. Annals of the Association of American Geographers. Association of American Geographers, 60, 593–597.

Daac LP (2011) ASTER L1B. Sioux Falls, South Dakota: USGS/Earth Resources Observation and Science (EROS) Center.

Schulze RE (1997) South African atlas of agrohydrology and-climatology. Water Research Commission, Pretoria, South Africa.

Schulze RE (2007) South African atlas of climatology and agrohydrology: WRC Report 1489/1/06. Water Resource Commission.

Maia R, Eliason CM, Bitton P, Doucet SM, Shawkey MD. 2013. Pavo: An R package for the analysis, visualization and organization of spectral data. Methods in Ecology and Evolution, 10: 1097-1107.

Kumar R, Ichihashi Y, Kimura S et al. (2012) A high-throughput method for Illumina RNA-Seq library preparation. Frontiers in plant science, 3, 202.

Akman M, Bhikharie AV, McLean EH et al. (2012) Wait or escape? Contrasting submergence tolerance strategies of Rorippa amphibia, Rorippa sylvestris and their hybrid. Annals of botany, 109, 1263–1276.

Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, 30, 2114-2120.

Hannon GJ Fastx-toolkit. 2010. GNU Affero General Public License, 706.

Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nature methods, 9, 357–359.

Akman M, Carlson JE, Holsinger KE, Latimer AM (2016) Transcriptome sequencing reveals population differentiation in gene expression linked to functional traits and environmental gradients in the South African shrub Protea repens. The New phytologist, 210, 295–309.

Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics , 28, 3150–3152.

Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC bioinformatics, 9, 559.

Usage notes

Seeds of 8 Protea repens were collected in the wild. Population location coordinates were used to extract climatic data from Schultze 1997 and 2007. Seeds were sown on soil and grown in climate rooms and a green house. After plants were 2.5 months old, we started our experiment in a green house. Half the plants were used in a dry-down treatment while the other half were watered regularly. Leaf tissues were collected and phenotypic measurements were done 6 and 12 days in to the treatment. Leaves were used in transcriptome library preparation. The libraries were then sequenced on Illumina Hiseq. After quality assessment, reads were mapped using bowtie and read counts were extracted using eXpress. After voom transformation, WGCNA was used for consensus clustering for the two timepoints. Here we include, raw count data (file 1) together with WGCNA eigen gene values, phenotypes and climate data for each population split into two timepoints (file2 and file3).


National Science Foundation, Award: 1046328

National Science Foundation, Award: 1045985