Skip to main content

The GENOMES UNCOUPLED1 protein has an ancient, highly conserved role but not in retrograde signalling

Cite this dataset

Small, Ian; Honkanen, Suvi (2022). The GENOMES UNCOUPLED1 protein has an ancient, highly conserved role but not in retrograde signalling [Dataset]. Dryad.


The pentatricopeptide repeat protein GENOMES UNCOUPLED1 (GUN1) is required for chloroplast-to-nucleus signalling in response to plastid stress during chloroplast development in Arabidopsis thaliana but its exact molecular function remains unknown. Current data on GUN1 function is limited to Arabidopsis, so we set out to investigate the origin and evolution of the land plant GUN1 proteins. We retrieved GUN1 sequences from 76 phylogenetically diverse land plants and developed a GUN1 sequence profile using hmmbuild ( We then used this profile to systematically analyse the presence/absence of GUN1 sequences in transcriptomes from land plants and streptophyte algae. This dataset includes the GUN1 profile we developed, the code we used to analyse the results of screening over 500,000 PPR protein sequences with the profile, and an alignment of the 893 GUN1 sequences that we obtained.

We used this data to show that GUN1 is an ancient protein that is highly conserved across land plants but missing from the Rafflesiaceae that lack chloroplast genomes. Our findings suggest that GUN1 is an ancient protein that evolved within the streptophyte algal ancestors of land plants before the first plants colonised land more than 470 million years ago. 

This dataset also includes transcript count data from an RNA-seq experiment looking at gene expression in liverwort Marchantia polymorpha wild type and Mpgun1 mutant spore samples grown in the presence or absence of spectinomycin. We used this data to show that GUN1 does not act significantly in chloroplast retrograde signalling in the liverwort M. polymorpha. Its primary role is likely to be in chloroplast gene expression and its role in chloroplast retrograde signalling probably evolved more recently.


Dataset 1

Arabidopsis and Marchantia GUN1 sequences were retrieved from TAIR ( and MarpoIBase (, respectively. Full-length GUN1 sequences were obtained from a representative set of land plants by protein BLAST searches ( using the Arabidopsis sequence to search GenBank. A set of 76 phylogenetically diverse GUN1 sequences (including representatives from algae, bryophytes, lycophytes, ferns, gymnosperms, and angiosperms) were aligned using the G-INS-i algorithm in MAFFT v7 (Katoh & Standley, 2013). The most highly conserved region of this alignment (876 positions) was used to generate a GUN1 sequence profile with hmmbuild from the HMMER package (v3.3.1) (http://hmmer.orgEddy, 2011), which in turn was used to search for GUN1 sequences (using hmmsearch with default parameters) in translations of various transcriptome datasets, most notably putative PPR protein sequences compiled by (Gutmann et al., 2020) from the 1KP data set (Carpenter et al., 2019) The 1KP transcriptomes were filtered to remove those encoding fewer than 10000 distinct proteins to avoid trivial false negatives due to low coverage and those from organisms other than green algae and land plants. This resulted in 1128 analysable samples from 894 plant species. Specific searches were also made in data sets of particular interest (whole genome shotgun or transcriptome shotgun assemblies selected via the NCBI Sequence Set Browser ( These additional data sets included genomes or transcriptomes where GUN1 could not be found in the corresponding 1KP samples and also whole genome shotgun data from Sapria himalayana (Cai et al., 2021) and whole transcriptome data from Rafflesia cantleyi (Lee et al., 2016), both holo-parasites from the Rafflesiaceae.


Katoh K, Standley DM. 2013. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular biology and evolution 30: 772–780.

Eddy SR. 2011. Accelerated Profile HMM Searches. PLoS computational biology 7: e1002195.

Gutmann B, Royan S, Schallenberg-Rüdinger M, Lenz H, Castleden IR, McDowell R, Vacher MA, Tonti-Filippini J, Bond CS, Knoop V, et al. 2020. The Expansion and Diversification of Pentatricopeptide Repeat RNA-Editing Factors in Plants. Molecular plant 13: 215–230.

Carpenter EJ, Matasci N, Ayyampalayam S, Wu S, Sun J, Yu J, Jimenez Vieira FR, Bowler C, Dorrell RG, Gitzendanner MA, et al. 2019. Access to RNA-sequencing data from 1,173 plant species: The 1000 Plant transcriptomes initiative (1KP). GigaScience 8.

Cai L, Arnold BJ, Xi Z, Khost DE, Patel N, Hartmann CB, Manickam S, Sasirat S, Nikolov LA, Mathews S, et al. 2021. Deeply Altered Genome Architecture in the Endoparasitic Flowering Plant Sapria himalayana Griff. (Rafflesiaceae). Current biology: CB 31: 1002-1011.e9.

Lee X-W, Mat-Isa M-N, Mohd-Elias N-A, Aizat-Juhari MA, Goh H-H, Dear PH, Chow K-S, Haji Adam J, Mohamed R, Firdaus-Raih M, et al. 2016. Perigone Lobe Transcriptome Analysis Provides Insights into Rafflesia cantleyi Flower Development. PloS one 11: e0167958.


Dataset 2

Dataset 2 is derived from the NCBI SRA BioProject PRJNA800059 which contains paired-end random-primed, rRNA-depleted, strand-specific RNA-seq reads from 12 liverwort Marchantia polymorpha wild type (accession Takaragaike) or Mpgun1 mutant spore samples grown in the presence or absence of spectinomycin. The raw read data can be obtained from NCBI SRA. 


M. polymorpha spores were sterilised and plated on ½ Gamborg’s medium (Duchefa Biochemie) supplemented with 1.2 % agar and 500 μgml-1 spectinomycin (an inhibitor of plastid translation). The spores were germinated under long day conditions for 48 hours, after which they were resuspended in 1 ml of sterile water, transferred into a microcentrifuge tube, and spun down at 6,000 rpm for 1 minute. Water was removed, and the spore pellet flash-frozen in liquid nitrogen. RNA was extracted from spores using the Direct-Zol RNA MINIprep kit (Zymo Research) and its quality was estimated on an Agilent 4200 tape station (Agilent). Three independent biological replicates were extracted for each genotype/condition. RNA was quantified using a NanoDrop spectrophotometer (Thermo Fisher) and DNase treated using Turbo DNase (Ambion). Transcriptome libraries were prepared using the TruSeq Stranded Total RNA kit with Ribo-Zero Plant (Illumina). The libraries were sequenced on an Illumina HiSeq 4000 platform (150 nt paired-end reads) at Novogene, Hong Kong. Optical duplicate reads were first removed with clumpify (parameters: dedupe optical dist = 40) from the bbmap package ( and adapters were trimmed with bbduk (parameters: ktrim=r k=23 mink=11 hdist=1 tpe tbo ftm=5). The reads were then assigned to transcripts using Salmon v1.3.0 (Patro et al., 2017) (parameters: -l A --validateMappings) against an index prepared with the M. polymorpha MpTak_v6.1 reference genome and cDNA assemblies ( Differential expression analyses were carried out using DESeq2 (Love et al., 2014). Functional annotations for MpTak_v5.1 genome release were used to identify M. polymorphaphotosynthesis-associated nuclear genes.


Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nature methods 14: 417–419.

Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology 15: 550.

Usage notes

For dataset 1, the files included here are:

  • 76_GUN1.alignment.fasta — FASTA format file containing aligned GUN1 sequences from 76 phylogenetically diverse land plants
  • 76_GUN1. conserved_region.alignment.fasta — FASTA format file containing aligned GUN1 sequences (central conserved region only) from 76 phylogenetically diverse land plants; corresponds to the alignment shown in Figure 1.
  • GUN1.hmm — Hidden Markov model profile generated by hmmbuild using the 76 GUN1 sequences in 76_GUN1. conserved_region.alignment.fasta
  • GUN1s.domt — ‘domain table’ output from hmmsearch listing the GUN1 profile matches found in the 76 GUN1 sequences
  • taxonomy.csv — CSV file with taxonomical metadata on 1KP samples needed to run the analyses in GUN1 classification.ipynb
  • GUN1 classification.ipynb — Jupyter notebook (Julia code) used to identify GUN1 sequences from the 1KP dataset and to generate Fig. S2 and Table S2 
  • 893_GUN1.alignment.fasta — FASTA format file containing 893 aligned GUN1 sequences (includes both those found in the 1KP dataset, which are often partial sequences, and the 76 full-length sequences from 76_GUN1.alignment.fasta)

For dataset 2, the files included here are:

  • Mpdata_spec.csv — table of read counts extracted from Salmon output 
  • mp_sample_table — text file containing experimental design used for identifying differentially expressed genes (supplemental table S3 from the paper)

·       mp_sample_table2 — text file containing experimental design used for making Figure 7a

  • RNAseq_WT_gun1_spores_spec.ipynb — Jupyter notebook (Python code) to reproduce supplemental table S3 from the paper — DEseq2 analysis identifying differentially expressed genes between all genotype and treatment combinations using the salmon quants (Mpdata_spec.csv). Requires Python packages pandas, numpy, matplotlib, seaborn and diffexpr (
  • Figure_7a.ipynb — Jupyter notebook (Python code) to reproduce Figure 7a from the paper using the salmon quants (Mpdata_spec.csv). Requires Python packages pandas, numpy, matplotlib, seaborn and diffexpr (


Australian Research Council, Award: FL140100179

Australian Research Council, Award: CE140100008

Commonwealth Scientific and Industrial Research Organisation, Award: CSIRO Synthetic Biology Fellowship