Skip to main content

Data for: Induction of C4 genes during de-etiolation of Gynandropsis gynandra evolved through changes in cis allowing integration into ancestral C3 gene regulatory networks

Cite this dataset

Singh, Pallavi; Stevenson, Sean; Hibberd, Julian (2023). Data for: Induction of C4 genes during de-etiolation of Gynandropsis gynandra evolved through changes in cis allowing integration into ancestral C3 gene regulatory networks [Dataset]. Dryad.


C4 photosynthesis has evolved repeatedly and in doing so repurposed existing enzymes to drive a carbon pump that limits the oxygenation reaction of RuBisCO. C4 proteins accumulate to levels matching those of the photosynthetic apparatus, and to allow this gene expression must be modified over evolutionary time. To better understand this rewiring of gene expression we undertook RNA-SEQ and DNaseI-SEQ on de-etiolating seedlings of C4 Gynandropsis gynandra which is evolutionarily proximate to C3 A. thaliana. Changes in chloroplast ultrastructure and C4 gene expression in G. gynandra were coordinated and rapid. C3 and C4 photosynthesis genes showed similar induction patterns, but C4 genes from G. gynandra were more strongly induced than orthologs from A. thaliana. The cistrome of G. gynandra was enriched in TGA, TCP and homeodomain binding sites. Furthermore, in vivo binding data in G. gynandra highlighted TGA and homeodomain as well as light responsive elements such as G- and I-box motifs as being associated with the rapid increase in transcripts derived from C4 genes. Although promoters of PPDK and ASP1 from G. gynandra contained distinct light responsive elements, promoters from both A. thaliana and G. gynandra allowed high expression. Deletion analysis of the Ppa6 gene from G. gynandra showed that regions containing G- and I-boxes were necessary for high expression. The data support a model in which accumulation of transcripts derived from C4 genes in leaves of G. gynandra is enhanced compared with homologs in A. thaliana because a variety of modifications in cis allowed integration into ancestral transcriptional networks.


Gynandropsis gynandra seeds were sown directly from intact pods and germinated on moist filter papers in the dark at 32°C for 24 hours. Germinated seeds were then transferred to half strength Murashige and Skoog (MS) medium with 0.8% (w/v) agar (pH 5.8) and grown for three days in a growth chamber at 26°C. De-etiolation was induced by exposure to white light with a photon flux density (PFD) of 350 μmol m-2 s-1 and photoperiod of 16 hours. Whole seedlings were harvested at 0.5, 2, 4 and 24 hours after illumination (starting at 8:00 with light cycle 6:00 to 22:00). Tissue was flash frozen in liquid nitrogen and stored at -80°C prior to processing.

RNA and DNaseI sequencing

Before processing, frozen samples were divided into two, the first being used for RNA-SEQ analysis and the second for DNaseI-SEQ. Samples were ground in a mortar and pestle and RNA extraction carried out with the RNeasy Plant Mini Kit (74904; QIAGEN) according to the manufacturer’s instructions. RNA quality and integrity were assessed on a Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies). Library preparation was performed with 500 ng of high integrity total RNA (RNA integrity number > 8) using the QuantSeq 3’ mRNA-SEQ Library Preparation Kit FWD for Illumina (Lexogen) following the manufacturer’s instructions. Library quantity and quality were checked using Qubit (Life Technologies) and a Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies). Libraries were sequenced on NextSeq 500 (Illumina, Chesterford, UK) using single-end sequencing and a Mid Output 150 cycle run.

To extract nuclei, tissue was ground in liquid nitrogen and incubated for five minutes in 15mM PIPES pH 6.5, 0.3 M sucrose, 1% (v/v) Triton X-100, 20 mM NaCl, 80 mM KCl, 0.1 mM EDTA, 0.25 mM spermidine, 0.25 g Polyvinylpyrrolidone (SIGMA), EDTA-free proteinase inhibitors (ROCHE), filtered through two layers of Miracloth (Millipore) and pelleted by centrifugation at 4°C for 15 min at 3600 g. To isolate deproteinated DNA, 100 mg of tissue from seedlings exposed to 24 hours light were harvested two hours into the light cycle, four days after germination. DNA was extracted using a QIAGEN DNeasy Plant Mini Kit (QIAGEN, UK) according to the manufacturer’s instructions. 2x108 nuclei were re-suspended at 4°C in digestion buffer (15 mM Tris-HCl, 90 mM NaCl, 60 mM KCl, 6 mM CaCl2, 0.5 mM spermidine, 1 mM EDTA and 0.5 mM EGTA, pH 8.0). DNase-I (Fermentas) at 2.5 U was added to each tube and incubated at 37 °C for three minutes. Digestion was arrested by adding a 1:1 volume of stop buffer (50 mM Tris-HCl, 100 mM NaCl, 0.1% (w/v) SDS, 100 mM EDTA, pH 8.0, 1 mM Spermidine, 0.3 mM Spermine, RNaseA40 µg/ml) and incubated at 55°C for 15 minutes. 50 U of Proteinase K were then added and samples incubated at 55°C for 1 h. DNA was isolated by mixing with 1 ml 25:24:1 Phenol:Chloroform:Isoamyl Alcohol (Ambion) and spun for 5 minutes at 15,700 g followed by ethanol precipitation of the aqueous phase. Samples were size-selected (50-400 bp) using agarose gel electrophoresis and quantified fluorometrically using a Qubit 3.0 Fluorometer (Life technologies), and a total of 10 ng of digested DNA (200 pg l-1) used for library construction. Sequencing ready libraries were prepared using a TruSeq Nano DNA library kit according to the manufacturer’s instructions. Quality of libraries was determined using a Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies) and quantified by Qubit (Life Technologies) and qPCR using an NGS Library Quantification Kit (KAPA Biosystems) prior to normalisation, and then pooled, diluted and denatured for paired-end sequencing using High Output 150 cycle run (2x75 bp reads). Sequencing was performed using NextSeq 500 (Illumina, Chesterford UK) with 2x75 cycles of sequencing.

RNA-SEQ data processing and quantification 

Commands used are available on GitHub (“command_line_steps”) but an outline of steps was as follows. Raw single ended reads were trimmed using trimmomatic (version 0.36). Trimmed reads were then quantified using salmon (version 0.4.234) after building an index file for a modified G. gynandra transcriptome. The transcriptome was modified to create a pseudo-3’ UTR sequence of 339 bp (the mean length of identified 3’UTRs) for G. gynandra gene models that lacked a 3’ UTR sequence which was essentially an extension beyond the stop codon of the open reading frame. Inclusion of this psuedo 3’ UTR improved mapping rates. Each sample was then quantified using the salmon “quant” tool. All *.sf files had the “NumReads” columns merged into a single file (All_read_counts.txt) to allow analysis with both DEseq2 and edgeR. The edgeR pipeline was run as the edgeR.R R script (here and on GitHub) on the All_read_counts.txt file to identify the significantly differentially expressed genes by comparing each time-point to the previous one. A low expression filter step was also used. We then similarly analysed the data with the DEseq2 package using the DEseq2.R R script (on GitHub) on the same All_read_counts.txt file. This also included the Principal Component Analysis shown in Fig. 2A. The intersection from both methods was used to identify a robust set of differentially regulated genes. For most further analysis of the RNA-SEQ data, mean TPM values for each time-point (from three biological replicates) was first quantile normalised and then each value divided by the sample mean such that a value was of 1 represented the average for that sample. This processing facilitated comparisons between experiments across species in identifying changes to transcript abundance between orthologs.


European Research Council, Award: 694733 R

Biotechnology and Biological Sciences Research Council, Award: BBP0031171