Plastid-nuclear ERCnet analysis results
Data files
Nov 20, 2025 version files 1.94 GB
-
OUT_ERC_Final.zip
1.94 GB
-
README.md
7.12 KB
Abstract
Plant cells rely on an interconnected network of proteins interacting at many levels (e.g. physical enzyme complexes, gene regulatory modules, and biosynthetic pathways). Pairs of proteins that interact at any of these levels have been shown to exhibit phylogenetic signatures of evolutionary rate covariation (ERC), providing a basis for detecting functional interactions among proteins. Here, we apply ERCnet, a bioinformatic tool for performing genome-scale ERC analyses, to predict a plant protein-protein interactome network. We find a clustered set of proteins that exhibit strong signatures of ERC with the plastid caseinolytic protease (Clp) and other plastid proteostasis components, thus forming a functional module within the network. In addition to including proteins with known or predicted functions in protein import, transcription, translation, and degradation in plastids, the module also includes proteins with previously unknown molecular function, thus providing evidence that these proteins may contribute to plastid proteostasis in novel ways. Furthermore, perhaps the most surprising members of this module are a set of proteins that are not thought to localize to the plastid at all. These proteins include a mitochondrial-localized pentatricopeptide repeat (PPR) protein with prior genetic evidence of interaction with the mitochondrial Clp system and two nuclear-localized actin-related proteins involved in chromatin remodeling and epigenetic regulation of nuclear genes. We speculate that these non-plastid-localized proteins act as mediators of organellar crosstalk and retrograde signaling of cellular proteostasis status in plants. In summary, our results highlight the connected nature of plant proteostasis systems and point to a promising set of novel proteostasis protein candidates.
https://doi.org/10.5061/dryad.hx3ffbgp7
Description of the data and file structure
The data deposited here are a zipped folder containing several subfolders. These folders contain the output files and metadata from the ERCnet program.
Files and variables
File: OUT_ERC_Final.zip
Description:
From ERCnet GitHub README (release v1.1.0):
Below is a brief description of each of the files and subdirectories that are output during a run of ERCnet. Many of the files created are intermediate files, which you likely will not need to inspect. Directories are shown in the order in which they're created during the ERCnet workflow.
ERCnet-1.1.0.zip: contains a clone of the ERCnet code used for this analysis.Species_mapping.csv: shows how ERCnet recognizes species IDsSeq_counts_per_species.csv: shows how many homologs from each species were found in each gene familySpeciesTree_rooted_node_labels.txt: a copy of the species tree from OrthofinderFiltered_genefam_dataset.csv: each row represents a gene family (hierarchical orthogroup [HOG] from orthofinder) and the seq IDs of all the sequences in the gene familybenchmark/: log files that track the runtime of each step of ERCnet. If steps are performed multiple times, files will be appended with numbers in ascending order.HOG_seqs/: contains a seperate file for each gene family (hierarchical orthogroup [HOG] from orthofinder).Alns/: contains a seperate file with a multiple sequences alignment for each HOGTAPER_Alns/: an (optional) version of the alignments that have been cleaned using TAPERGb_alns: a version of the alignments that have been cleaned using GBLOCKS. These are the final alignments used to infer trees with optimized branch lengths. Files ending in "reduced" were generated by RAxML when identical sequences are found within an alignment, but these reduced versions are not used in downstream analyses.Aln_pruning/: stores information about sequences that were pruned from alignments because GBLOCKS yielded only gapsHOG_subtrees: orthfinder produces gene trees for each orthogroup (OG); however, ERCnet works with HOGs (which are subtrees of the larger OG tree). This folder contains the subtrees that are extracted from the larger OG tree.Non-binary_subtrees.txt: documents any gene families that were dropped from analysis because gene trees (from orthofinder) were non-bifurcating, which causes errors in R.BS_reps/: subtrees (see HOG_subtrees) are used as a constraint tree and raxml bootstrapping is performed to get confidence values for each branch on the constraint tree. This directory stores the replicate files.BS_trees/: subtrees with BS confidence scores (from BS_reps)SpeciesTree_mapped_names.txt: Version of the species tree with the mapped species IDs from Species_mapping.csv.Rearranged_trees/: the treerecs prpgram is used to rearrange any poorly supported branches (<80% bootstrap support) so that the branches best match the species tree.BL_trees/: branch length optimization is performed on the rearranged version of the tree. This folder contains three types of files output by the raxml program. Only files starting in RAxML_result* are used in downstream analyses. These trees are the final gene trees used to perform ERC.DLC_par/: The last step in Phylogenomics.py creates the inputs (*NODES_BL.txt files), needed to run DLCpar, which helps map gene trees to the species tree. GTST_reconciliation.py writes additional files to this directory, resulting in five file types generated by the DLCpar program. Only files ending in *dlcpar.locus.recon are used in downstream steps of ERCnet.BL_results/: contains the branch lengths that were measured from the BL_trees/ trees. ERCnet measures branches by both branch-by-branch (BXB) and root-to-tip (R2T) methods. This directory also contains the normalized branch lengths, in which each branch length is normalized by the genome-wide average branch legth for that particular branch. Normalized branch lengths are used for downstream ERC correlation analyses.ERC_results/: This directory contains the results of the all-by-all ERC analysis. This folder will contain seperate ERC results for BXB vs R2T (depending on user selection). See below for the description of the column headers in the ERC_results tsv file. This directory will also contain a subdirectory,Filtered_results/, with a tsv file of 'ERC hits' (generated during Network_analyses.py).- Column headers in the ERC_results tsv file
- GeneA_HOG: HOG id for 'gene A' (note gene A vs B are abirary terms to denote the two genes being compared in the pairwise ERC comparison)
- GeneA_ID: the seqID for gene A from the 'focal species' (defined be the user)
- GeneB_HOG: HOG id for 'gene B'
- GeneB_ID: the seqID for gene B from the 'focal species' (defined be the user)
- Overlapping_branches: number of branches shared between gene tree A and B. This is the number of points of the linear correlation plot for a give ERC comparison
- Slope: Slope of the best fit line
- P_R2: Pearson correlation R-squared
- P_Pval: Pearson correlation P-value
- S_R2: Spearman correlation R-squared
- S_Pval: Spearman correlation P-value
- P_FDR_Corrected_Pval: FDR corrected version of the Pearson p-value
- S_FDR_Corrected_Pval: FDR corrected version of the Spearman p-value
- Column headers in the ERC_results tsv file
Network_analyses/: This directory contains networks displaying the 'ERC hits'. ERC hits are defined by the user according the p-value and r-squared cutoffs during Network_analyses.py. If the user tries several different filtering cutoffs, seperate versions of the network files will be stored here. The file names indicate the cutoffs chosen (e.g. "0.0001_0.5" indicates filtering thresholds of p<0.0001 and r2>0.5). To begin inspecting these results, we recommend first looking at the ERC_network*.pdf file. This will give a quick (and sometimes ugly) view of the network. For more detailed inspection, we recommend using the cytoscape GUI program and importing the Cytoscape_network*.graphml file. This creates a much more human-readable and interactive version or the network.BINGO_analysis/contains the outputs of GO enrichment analyses performed using the Cytoscape plugin BINGO. Network_assortativity_Filtered_ERC_results_R2T_3_0.0001_0.5_fg_trimcutoff_0.pdf contains the output of the functional clustering assortativity analysis. Plastnuc_fig_prep.cys and Peptidase_ERC_network.cys contain the cytoscape sessions used to generate network figures displaying subcellular localization and proteolysis attributes, respectively. All other files are supplementary files not directly used in the study.
Access information
Other publicly accessible locations of the data:
- NA
Data was derived from the following sources:
The data included here are the output of the ERCnet program. See the ERCnet GItHub page (https://github.com/EvanForsythe/ERCnet) for deteailed descriptions of outfile files.
