Data for: A unique C-terminal domain contributes to the molecular function of restorer-of-fertility proteins in plant mitochondria
Data files
Jul 24, 2023 version files 177.48 MB
Abstract
Restorer-of-fertility (Rf) genes have practical applications in hybrid seed production as a means to control self-pollination. They encode pentatricopeptide repeat (PPR) proteins that are targeted to mitochondria where they specifically bind to transcripts that induce cytoplasmic male sterility and repress their expression.
We have identified a unique domain, RfCTD (Restorer-of-fertility C-terminal domain), which discriminates Restorer-of-fertility-like (RFL) proteins from hundreds of PPR proteins encoded in plant genomes. Using the sequence of this domain from hundreds of plant species, we have constructed a sequence profile that can quickly and accurately identify RfCTD sequences in plant genomes or transcriptomes.
This data set contains PPR genes identified in 213 plant genomes (as summarised in accompanying table).
Methods
The PPR genes were identified in the genome sequences using the PPRfinder approach (Cheng et al., 2016, Plant J. 85:532-47. doi: 10.1111/tpj.13121). Briefly, the genomic sequences were screened for open reading frames (ORFs) in six-frame translations with the getorf program of the EMBOSS 6.6.0 package (Rice et al., 2000, Trends Genet. 16:276-7. doi: 10.1016/s0168-9525(00)02024-2). Predicted ORFs were screened for the presence of P- and PLS-class PPR motifs using hmmsearch from the HMMER 3.2.1 package (Eddy 2011, PLoS Comput Biol. 7:e1002195. doi: 10.1371/journal.pcbi.1002195) (http://hmmer.org) and hidden Markov models defined by hmmbuild (Cheng et al., 2016, Plant J. 85:532-47. doi: 10.1111/tpj.13121).
Hmmbuild was used to create the RfCTD profile from an alignment of 1,486 non-redundant sequences. The RfCTD profile was incorporated into PPRfinder code (Cheng et al., 2016, Plant J. 85:532-47. doi: 10.1111/tpj.13121) and the screen of the genome sequences was repeated. Final filtering parameters of hmmsearch score >= 50, no overlapping with a PPR motif with a higher score and the RfCTD being the last motif in the protein were used to qualify a protein as containing RfCTD.
Usage notes
The data files are simple fasta files or text files that can be opened in any text editor.