Skip to main content

Elucidating the origins of phycocyanobilin biosynthesis and phycobiliproteins

Cite this dataset

Rockwell, Nathan C.; Martin, Shelley S.; Lagarias, J. Clark (2023). Elucidating the origins of phycocyanobilin biosynthesis and phycobiliproteins [Dataset]. Dryad.


Terrestrial ecosystems and human societies depend on oxygenic photosynthesis, which began to reshape our atmosphere approximately 2.5 billion years ago. The earliest known organisms carrying out oxygenic photosynthesis are the cyanobacteria, which use large complexes of phycobiliproteins as light-harvesting antennae. Phycobiliproteins rely on phycocyanobilin (PCB), a linear tetrapyrrole (bilin) chromophore, as the light-harvesting pigment that transfers absorbed light energy from phycobilisomes to the chlorophyll-based photosynthetic apparatus. Cyanobacteria synthesize PCB from heme in two steps: A heme oxygenase converts heme into biliverdin IXα (BV), and the ferredoxin-dependent bilin reductase (FDBR) PcyA then converts BV into PCB. In the current work, we examine the origins of this pathway. We demonstrate that PcyA evolved from pre-PcyA proteins found in nonphotosynthetic bacteria and that pre-PcyA enzymes are active FDBRs that do not yield PCB. Pre-PcyA genes are associated with two gene clusters. Both clusters encode bilin-binding globin proteins, phycobiliprotein paralogs that we designate as BBAGs (bilin biosynthesis-associated globins). Some cyanobacteria also contain one such gene cluster, including a BBAG, two V4R proteins, and an iron–sulfur protein. Phylogenetic analysis shows that this cluster is descended from those associated with pre-PcyA proteins and that light-harvesting phycobiliproteins are also descended from BBAGs found in other bacteria. We propose that PcyA and phycobiliproteins originated in heterotrophic, nonphotosynthetic bacteria and were subsequently acquired by cyanobacteria.


Absorption spectra were collected on Cary 50 or Cary 60 spectrophotometers and were exported from the Cary software in .csv format. They were then processed using an in-house script to remove metadata and were plotted, analyzed, and prepared for presentation in Kaleidagraph. Presented spectra were exported from Kaleidagraph as tab-delimited text files with unix newlines and are deposited in that format. Some spectra were subtracted from each other to generate difference spectra and/or normalized to facilitate comparison; such spectra are presented separately in the same format. Some samples were characterized using a standard acid denaturation assay, and peak wavelengths from these spectra were plotted for comparison to references with known bilin composition. The reference and unknown values are deposited together as tab-delimited text files for short and long wavelengths.

Circular dichroism spectra were collected on an Applied Photophysics Chirascan and were converted to tab-delimited text using the Applied Photophysics software. They were then plotted, analyzed, and prepared for presentation in Kaleidagraph. Presented spectra were exported from Kaleidagraph as tab-delimited text files with unix newlines and are deposited in that format.

Fluorescence spectra were collected on a QM-6/2005SE fluorimeter equipped with red-enhanced photomultiplier tubes (Photon Technology International 814 Series). Files were displayed in tabular format and copy/pasted for text export. They were then plotted, analyzed, and prepared for presentation in Kaleidagraph. Emission spectra were numerically integrated for estimation of fluorescence quantum yield, and the data used for quantum yield estimation are included as tab-delimited text along with files used for figure presentation (also as tab-delimited text).

Deposited phylogenetic data include the three files used in the workflow for the published phylogenies. For each analysis, a multiple sequence alignment was constructed using MAFFT v7.450 (command-line settings --genafpair --maxiterate 16 --clustalout –reorder), deposited here in CLUSTAL format (.aln extension). The file was then converted to PHYLIP format with removal of gap-enriched columns (≥5%) using an in-house script. The PHYLIP format file is deposited here (.phy extension) and was used to infer a maximum-likelihood phylogeny in PhyML-3.1 with 100 bootstraps (command-line settings -m WAG -d aa -s SPR -a e -c 4 -v e -o tlr -b 100). Statistical robustness was assessed using the transfer bootstrap expectation (TBE) as calculated in booster (software available at and file deposited in Newick format with .TBE.txt extension). For figure preparation, the Newick file output by Booster was processed in FigTree prior to annotation, scaling, and coloring in Adobe Illustrator. For each analysis, three files are deposited, with file extensions listed above.

Homology models are presented for 3 proteins (POZ53557, CAP_1520, and MBL9008304) characterized in the course of this work. Models were generated starting from the same sequence alignment used for phylogenetic analysis of FDBRs, prior to removal of gap-enriched columns. Target and template sequences were filtered from this alignment using the alnfilter utility (available as part of the homolmapper distribution) and were converted to PIR format using the -convert option in CLUSTALW2. The .pir file was then manually edited to generate an input for MODELLER v9.22, and single homology models were generated for each target. These three models are deposited in PDB format.

Synthetic gene sequences and the complete sequence for plasmid Spam-545/alt were generated using ApE and are deposited in flat text.

Usage notes

All files are reported as flat text files with unix newlines. A README.txt is included with this submission.


United States Department of Energy, Award: DE-SC0002395