Dual-Cys bacteriophytochromes: intermediates in cyanobacterial phytochrome evolution?
Data files
Jan 06, 2025 version files 720.44 KB
-
Dual-Cys_BphP_Data.tar.gz
702.54 KB
-
README.md
17.89 KB
Abstract
This work (Yang et al., "Dual-Cys bacteriophytochromes: intermediates in cyanobacterial phytochrome evolution?" in FEBS. J.) involves the characterization of new cyanobacterial phytochrome photoreceptor proteins and engineering chromophore preferences via Cysteine mutagenesis both in those proteins and in known reference proteins. Proteins were characterized using absorption and mass spectrometry, phylogenetic analysis, and X-ray crystallography. The crystal structure is available through the RCSB PDB (PDB ID: 9JRY). All other data are in this deposit.
Description of the Data and file structure
Data are deposited as a gzipped tarball. After extraction, deposited data are in three folders:
I. Phylogeny
II. Mass_Spec
III. Absorption_Spectra
I. Phylogeny:
Data are all within a single folder, without subdirectories. A single phylogeny was inferred for this study, and the deposited files are all from this analysis.
Three files are deposited: the original multiple sequence alignment in CLUSTAL format, the input file for PHYML in PHYLIP format after removal of gap-enriched columns, and the final tree with transfer bootstrap expectation values in Newick format. The files are:
DCB_tree.aln (CLUSTAL format)
This is the original alignment, generated in MAFFT using the E-INS-i algorithm (v7.450). After a header indicating the MAFFT version, aligned sequences are presented as interleaved blocks. Each block has a segment of each sequence after the name of the sequence, with ‘-‘ serving as the gap character. As an example, here are three arbitrary protein sequences in CLUSTAL format:
protein MACDEFG–HIKLM
prot_2 MGSNEYPGGHLRVW
another MSSMEFG–HVQIS
protein NPQRSTVWY
prot_2 PG-KTTLFY
another G–MASIYF
DCB_tree.phy (Input PHYLIP)
For phylogenetic inference in PHYML, the CLUSTAL-format alignment was converted into the PHYLIP format with the removal of gap-enriched positions. The resulting file is now sequential (with one protein sequence followed by the next). The header line is also different, indicating the number of sequences (3, in the above example) and positions or characters (19, in the above case). For the above example, the converted case looks like this:
3 19
protein MACDEFGHIKLMNRSTVWY
prot_2 MGSNEYPHLRVWPKTTLFY
another MSSMEFGHVQISGMASIYF
(note however that this example is for illustrative purposes only and is NOT the actual file deposited).
DCB_tree.txt (Newick tree)
The resulting phylogeny is presented in the Newick format, suitable for use with a broad range of tree viewers but not well suited for analysis in a text editor or word processor.
II. Mass_spec
Data are all within a single folder, without subdirectories.
Four samples were analyzed using sequential reaction monitoring mass spectrometry (SRM-MS). In this technique, the initial ionization of the sample is used to select a fraction containing an ion of interest (the Q1 m/z ratio). This fraction then undergoes a fragmentation step (q2) to generate fragment ions (Q3 m/z) for analysis. For this study, the ions were peptide fragments generated using proteolytic digestion and then separated on a C18 column prior to SRM-MS. Selected peaks on the column were analyzed using initial electrospray ionization (Q1 step) followed by fragmentation and detection of fragments (Q3).
Data from each sample is presented in a tab-delimited text file named for that sample:
FiDCB_BV.txt Wild-type Dual-Cys bacteriophytochrome (DCB) from Fischerella sp. (FiDCB), incorporating biliverdin IX-alpha (BV) as a chromophore.
FiDCB_C17A_BV.txt The C17A variant of FiDCB, in which Cys17 was mutated to Ala, was also examined with a BV chromophore.
FiDCB_C259I_BV.txt The C259I variant of FiDCB, in which Cys259 was mutated to Ile, was also examined with BV.
FiDCB_WO_Bilin.txt The wild-type protein was examined without a bilin chromophore (i.e., apoprotein).
In each of these files, the data are presented as tab-delimited text with a series of values:
-
Compound name: The assigned initial fragment from the tandem MS analysis. DCB proteins can attach to BV through two Cys residues at the same time, so the fragments can include discontinuous peptide sequences along with BV.
-
Precursor ion charge state: This is the charge state of the initially selected ion in the tandem SRM-MS.
-
Product ion charge state: This is the charge state of the second, detected ion in the tandem SRM-MS.
-
Fragment Ion: Each assigned secondary fragment is indicated. Only discontinuous fragments are explicitly named; continuous fragments were instead assigned alphanumeric codes based on standard fragmentation schemes.
-
Light Precursor ion Q1 (m/z): Selected m/z value from the initial ionization step (Q1).
-
Light Product ion Q3 (m/z): Observed m/z value after collision (Q3).
-
Collision Energy: The collision energy for that fragment at the fragmentation step (q2).
-
Retention time: Retention time for the initial chromatography step.
-
Area: Integrated peak area from the initial chromatography step.
III. Absorption_Spectra
Most work was performed with an E. coli expression system, but a subset of proteins was also examined using expression in the model cyanobacterium Synechocystis sp. PCC 6803. Data from the spectroscopic characterization of these proteins is organized in two folders by expression system. All data files are supplied as tab-delimited text with UNIX newlines.
Spectra are therefore in 2 folders:
Synechocystis_expression
E_coli_expression
Synechocystis_expression
Within this folder, one file is provided for each protein that was studied in this expression system. Each file contains the absorption spectra for that protein. The individual files are as follows:
DrBphP.txt Deinococcus radiodurans bacteriophytochrome
FiDCB.txt Fischerella sp. dual-Cys bacteriophytochrome
FiTCCP.txt Fischerella sp. tandem-Cys cyanobacterial phytochrome
Le3CP.txt Leptolyngbya sp. candidate 3-Cys phytochrome
NcDCB.txt Nostoc carneum dual-Cys bacteriophytochrome
ToBphP2.txt Tolypothrix sp. bacteriophytochrome 2
ToCph1.txt Tolypothrix sp. cyanobacterial phytochrome 1
ToDCB.txt Tolypothrix sp. dual-Cys bacteriophytochrome
ToTCCP.txt Tolypthrix sp. tandem-Cys cyanobacterial phytochrome
The same data are presented for each protein, so the organization of each file is identical. Each protein expressed in Synechocystis was analyzed under native and denaturing conditions. Denaturation used acidic guanidinium chloride. For native conditions, spectra are presented for the dark-adapted (15Z) state and for the photoproduct (15E). For denaturing conditions, the difference spectrum between the two photostates is presented. This difference spectrum was calculated as (15Z - 15E).
Therefore, each file contains the following columns:
Wavelength (nm) wavelengths for native spectra, in nanometers
15Z Absorbance values for the 15Z photostate
15E Absorbance values for the 15E photostate
d wavelength (nm) wavelengths for denatured difference spectra
Denatured difference spectra delta-absorbance values (15Z - 15E)
E_coli_expression
Most of the work was done with protein produced in E. coli, which allowed production at higher levels with higher purity. Data are organized into a series of eight folders, each of which has data for one set of experiments.
The folders are:
Native_wildtype_new_proteins
DCB_single_Cys_removal
DCB_second_Cys_addition
TCCP_Cys_variants
BphP_Cph1_Cys_swaps
BphP_Cph1_2Cys_variants
FiDCB_BV_color_response
Conversion_timecourses
The Conversion_timecourses data are kinetic traces, in which data take the form of a series of time/Absorbance pairs, with Absorbance reported at a chosen wavelength of interest. All other subdirectories contain absorption spectra or difference spectra, in which the data take the form of a series of wavelength/Absorbance pairs (as in the data from expression in Synechocystis).
A description of each subdirectory follows:
(i) Native_wildtype_new_proteins
This folder contains absorption spectra for previously uncharacterized proteins under native conditions. One file is provided for each protein. Each protein is designated by a 2-letter code for the host organism (e.g., “Fi” = Fischerella sp.) followed by the type of protein in all caps:
DCB = Dual-Cys bacteriophytochrome
TCCP = Tandem-Cys cyanobacterial phytochrome
3CP = candidate 3-Cys cyanobacterial phytochrome
Therefore, the files are:
FiDCB.txt Data for FiDCB (dual-Cys bacteriophytochrome from Fischerella sp.)
FiTCCP.txt Data for TCCP from Fischerella sp.
Le3CP.txt Data for possible 3CP from Leptolyngbya sp.
NcDCB.txt Data for DCB from Nostoc carneum
ToDCB.txt Data for DCB from Tolypothrix sp.
ToTCCP.txt Data for TCCP from Tolypothrix sp.
Each of these files is TAB-delimited and has the following columns:
Wavelength (nm) wavelength for wild-type protein with PCB chromophore
WT (PCB) 15Z Absorbance for that wavelength in the 15Z dark-adapted state
WT (PCB) 15E Absorbance for the wavelength in the 15E photoproduct state
Wavelength (nm) wavelength for wild-type protein with BV chromophore
WT (BV) 15Z Absorbance for that wavelength in the 15Z dark-adapted state
WT (BV) 15E Absorbance for the wavelength in the 15E photoproduct state
Wavelength (nm) wavelength for the dark-minus-light PCB difference spectrum
15Z-15E PCB delta-Absorbance for PCB, as (dark-light) or (15Z-15E)
Wavelength (nm) wavelength for the dark-minus-light BV difference spectrum
15Z-15E BV delta-Absorbance for PCB, as (dark-light) or (15Z-15E)
(ii) DCB_single_Cys_removal
This folder contains spectra for variants in DCB proteins in which a single Cys residue has been replaced by site-directed substitution mutagenesis. Each file contains data for one such variant and is named by the parent protein and the substitution (e.g., “FiDCB C17A.txt” contains data for the C17A variant of FiDCB, in which Cys17 was replaced with Ala).
The individual files are:
FiDCB C17A.txt
FiDCB C259I.txt
FiDCB C259L.txt
FiDCB C259M.txt
NcDCB C257I.txt
NcDCB C257L.txt
NcDCB C257M.txt
NcDCB C8A.txt
ToDCB C289I.txt
ToDCB C289L.txt
ToDCB C289M.txt
ToDCB C40A.txt
Each of these files contains the absorption spectra for the given protein with BV and PCB chromophores in each photostate, except for the C40A variant of ToDCB (ToDCB C40A.txt). This variant did not undergo photoconversion with BV, so only a 15Z spectrum is provided for this chromophore.
Each file is thus of the following format:
Wavelength (nm) wavelength for the spectra, in nanometers
BV 15Z Absorbance with BV in the 15Z state
BV 15E Absorbance with BV in the 15E state (absent in ToCDB C40A)
PCB 15Z Absorbance with PCB in the 15Z state
PCB 15E Absorbance with PCB in the 15E state
(iii) DCB_second_Cys_addition
This folder contains spectra for similar variants in which the “second Cys” of TCCP proteins was introduced via substitution into different DCB proteins in different contexts. The naming convention is the same as in (ii) above; e.g., “FiDCB C17A H260C.txt” has data for the C17A H260C variant of FiDCB.
The individual files are:
FiDCB C17A H260C.txt
FiDCB H260C.txt
NcDCB C8A H258C.txt
NcDCB H258C.txt
ToDCB C40A H290C.txt
ToDCB H290C.txt
Each file has the same format as in (ii), as follows:
Wavelength (nm) wavelength for the spectra, in nanometers
BV 15Z Absorbance with BV in the 15Z state
BV 15E Absorbance with BV in the 15E state
PCB 15Z Absorbance with PCB in the 15Z state
PCB 15E Absorbance with PCB in the 15E state
(iv) TCCP_Cys_variants: This folder contains spectra for variant proteins in which similar Cys substitutions were made in TCCP/3CP proteins. This work demonstrated that the candidate “3CP” protein is in fact a standard TCCP protein, so the folder name reflects that conclusion. The file organization matches that used in (ii) and (iii).
The individual files are:
FiTCCP C19A C257H.txt
FiTCCP C19A.txt
FiTCCP C257H.txt
Le3CP C19A C260H.txt
Le3CP C19A.txt
Le3CP C259M C260H.txt
Le3CP C260H.txt
ToTCCP C258H.txt
ToTCCP H17C C257M C258H.txt
ToTCCP H17C C258H.txt
ToTCCP H17C.txt
Each file is in the following format:
Wavelength (nm) wavelength for the spectra, in nanometers
BV 15Z Absorbance with BV in the 15Z state
BV 15E Absorbance with BV in the 15E state (absent in some)
PCB 15Z Absorbance with PCB in the 15Z state
PCB 15E Absorbance with PCB in the 15E state (absent in some)
For the C259M C260H variant of Le3CP (Le3CP C259M C260H.txt), no photoproduct was formed with PCB and therefore there is no column for “15E PCB” included.
For the ToTCCP C258H variant, no photoproduct was formed with BV and thus there is no column for “15E BV” included.
For the ToTCCP H17C C257M C258H variant, no photoproduct was observed with either chromophore, so only 15Z columns are included.
(v) BphP_Cph1_Cys_swaps
As a comparison to the work performed on DCB proteins, similar Cys substitutions were made in “reference” proteins belonging to previously recognized types of bacterial phytochromes: Cph1 and BphP proteins.
Both previously published reference proteins (SyCph1 and DrBphP) and newly characterized ones (ToBphP1, ToBphP2, and ToCph1) were used for these studies. Each file contains the absorption spectra for one protein; wild-type proteins are designated by “WT” in the filename, and variants are designated by the substitution(s) present in each case along with the name of the parent protein (matching above).
DrBphP C24A M259C.txt
DrBphP M259C.txt
DrBphP WT.txt
SyCph1 L15C E16D C259I.txt
SyCph1 L15C E16D.txt
SyCph1 WT.txt
ToBphP1 C19A I259C.txt
ToBphP1 I259C.txt
ToBphP1 WT.txt
ToBphP2 C24A L271C.txt
ToBphP2 L271C.txt
ToBphP2 WT.txt
ToCph1 L15C K16D.txt
ToCph1 L15C K16D C259L.txt
ToCph1 WT.txt
Each file is in the following format:
Wavelength (nm) wavelength for the spectra, in nanometers
BV 15Z Absorbance with BV in the 15Z state
BV 15E Absorbance with BV in the 15E state
PCB 15Z Absorbance with PCB in the 15Z state
PCB 15E Absorbance with PCB in the 15E state (absent in one)
ToCph1 L15C K16D C259L.txt does not contain a column for 15E PCB, because this variant did not exhibit formation of the relevant photoproduct.
(vi) BphP_Cph1_2Cys_variants: As a comparison for the introduction of the TCCP second Cys into DCB proteins, similar experiments were again performed with the same panel of BphP and Cph1 reference proteins. One file is provided for each of the resulting variants, using the same convention in the filenames.
Individual files are as follows:
DrBphP C24A M259C H260C.txt
DrBphP M259C H260C.txt
SyCph1 H260C.txt
SyCph1 L15C E16D H260C.txt
ToBphP1 C19A I259C H260C.txt
ToBphP1 I259C H260C.txt
ToBphP2 C24A L271C H272C.txt
ToBphP2 L271C H272C.txt
ToCph1 H260C.txt
ToCph1 L15C K16D H260C.txt
Each file is in the following format:
Wavelength (nm) wavelength for the spectra, in nanometers
BV 15Z Absorbance with BV in the 15Z state
BV 15E Absorbance with BV in the 15E state (absent in two)
PCB 15Z Absorbance with PCB in the 15Z state
PCB 15E Absorbance with PCB in the 15E state (absent in two)
Two variants exhibited no photoconversion with either chromophore, so the 15E columns are not present in these files (DrBphP C24A M259C H260C.txt and ToBphP2 C24A L271C H272C.txt).
(vii) FiDCB_BV_color_response
The color response was tested for one DCB protein, FiDCB from Fischerella sp. PCC 9605, with biliverdin IX-alpha (BV) as a chromophore and with a series of different illumination wavelengths. The resulting absorption spectra are presented as a series of files, with each file named by the wavelength of light used for illumination.
Individual files are:
380 nm.txt
405 nm.txt
460 nm.txt
525 nm.txt
590 nm.txt
620 nm.txt
660 nm.txt
740 nm.txt
Each of these files has the same columns: wavelength in nm, then 15Z and 15E absorption spectra named by the illumination wavelength and the configuration. For example, the titles and first data point under 740 nm illumination look like this:
Wavelength (nm) 740nm 15Z 740nm 15E
250.00 0.51200 0.52100
(note that tab-delimited text does not line up in the README file).
(viii) Conversion_timecourses
The kinetics of forward photoconversion and dark reversion were examined for selected proteins (both wild-type and Cys variants) and are deposited as single-wavelength Time/Absorbance traces. In this case, each file contains several such traces for different proteins to permit easy comparison without having to reassemble traces from different files. The filenames thus provide approximate “themes” for comparison, along with the direction of conversion. In each case, data are presented at both 16 deg. C and 26 deg. C.
Within each file, the columns are designated as time (in seconds) and then by protein and temperature. For example, the first few headers from BphP_BV_forward look like this:
Time (sec.) FiDCB_PCB_16 FiDCB_PCB_26 FiDCB_C17A_PCB_16 …
Time (sec.)
FiDCB_PCB_16 is the forward conversion of FiDCB with PCB at 16 deg. C
FiDCB_PCB_26 is the same preparation but at 26 deg. C
FiDCB_C17A_PCB_16 is the C17A variant of FiDCB with PCB at 16 deg. C
(and so on)
The individual files are as follows:
BphP_BV_forward.txt
This file includes forward photoconversion traces for wild-type FiDCB, for the FdiDCB C259I variant mimicking BphP proteins, and for three reference BphP proteins.
BphP_BV_reverse.txt
This file includes thermal relaxation (dark reversion) data for the same proteins. This is formally a reverse isomerization reaction relative to the forward photoconversion, so the filename reflects that fact.
Cph1_PCB_forward.txt
This file includes forward photoconversion traces for wild-type FiDCB, for the FdiDCB C17A variant mimicking Cph1 proteins, and for the reference protein SyCph1. Here, the chromophore is phycocyanobilin (PCB).
Cph1_PCB_reverse.txt
This file contains dark reversion data for the same proteins.
FiDCB_various_forward.txt
This file combines the FiDCB data from the above “forward” files for easy comparison to each other. No reference proteins are included.
FiDCB_various_reverse.txt
This file is an equivalent combination but with dark reversion data from the “reverse” files.
Sharing/Access Information
NA
Three types of data are deposited: absorption spectroscopy, mass spectrometry, and phylogeny.
Absorption data were collected at 16°C or 26°C using a Shimadzu UV-1900 spectrophotometer equipped with a TCC-100 temperature controller. Data were exported as text files.
Selected reaction monitoring mass spectrometry (SRM-MS) analysis was conducted using a 1290 Infinity Ⅱ Bio 2D-LC System (Agilent Technologies, USA) connected with a 6495D Triple Quadrupole mass spectrometer (Agilent Technologies, USA) with a Jet-stream electrospray source. The separation was performed on a ZORBAX Eclipse Plus C18 column (ID: 959759-902, length: 150 mm, pore size: 95 Å, particle size: 1.8 μm, column temperature 40℃) (Agilent Technologies, USA) using Mobile Phase A [0.1% (vol/vol) formic acid in water] and Mobile Phase B [0.1% (vol/vol) formic acid in acetonitrile]. The peptides derived from protein digestion were separated using a 30-minute gradient. The gradient started at 3% Mobile Phase B for 4 min, then increased to 25% for 22 min, 40% for 24 min, and 60% for 25 min. Finally, the system was equilibrated to 3% for 28 min. The separation was performed at a flow rate of 0.4 mL/min. The samples, with a concentration of 0.48 mg/mL, were introduced into the column by injection with 5 μL. SRM-MS Data were analyzed using Skyline.
For phylogenetic analysis, a multiple sequence alignment of 175 sequences and 995 characters was constructed in MAFFT v7.450 (command-line settings --genafpair --maxiterate 16 --clustalout –reorder). After gap removal (5% cutoff), 477 characters remained. The resulting file was used to infer a maximum likelihood phylogeny using PhyML-3.1 (command-line settings -m WAG -d aa -s SPR -a e -c 4 -v e -o tlr). Support was evaluated using the Shimodaira–Hasegawa approximate likelihood test as implemented in PhyML (SH-aLRT).