Data pertaining to structural and functional analyses of NF-kB in diverse holozoan taxa
Data files
Mar 07, 2026 version files 57.97 KB
-
NFkB_review_code.Rmd
9.36 KB
-
NFkB_sequences.fa
14.98 KB
-
PBM_correlation_data.csv
455 B
-
README.md
3.34 KB
-
RHD_MSA.aln-clustal_num
13.05 KB
-
RHD_PCA_data.csv
923 B
-
RHD_percent_identity_data.csv
4.25 KB
-
RHD_sequences.fa
7.41 KB
-
RMSD_data.csv
4.20 KB
Abstract
This repository contains data pertaining to sequence and structural analyses of the transcription factor NF-kB across a diverse set of holozoan species. Dataset includes data relevant to comparative sequence analyses of the NF-kB Rel homology domain (RHD), as well as comparative analyses of three-dimensional structures of this domain predicted using AlphaFold3. Further, data are included for an analysis correlating NF-kB RHD sequence and structural similarity between species with binding preference as previously quantified using protein binding microarrays (PBMs). All data are available for open use.
Dataset DOI: 10.5061/dryad.t1g1jwtfx
Data and file structure
File 1: NFkB_sequences.fa
This file contains amino acid sequences in FASTA format for the NF-kB proteins analyzed.
File 2: PBM_correlation_data.csv
This file is for data pertaining to NF-kB binding specificity (as determined via protein binding microarrays [PBM]) and structural features (amino acid percent identity or root mean square deviation (RMSD)).
This file has headers, which are:
-Species_pair = the pair of species for which NF-kB proteins were compared; abbreviations are 4 letters with the first letter indicating genus and the next 3 letters indicating the first 3 letters of the species name (e.g., Mvir = Morbakka virulenta)
-PBM_z_score = correlation of z-scores from protein binding microarray (PBM)
-Structure_metric = structural metric analyzed (either amino acid percent identity or RMSD)
-Structure_value = value for metric identified in "Structure_metric"
File 3: RHD_PCA_data.csv
This file is for data pertaining to a principle components analysis (PCA) performed on substitution matrix scores from a multiple sequence alignment of Rel homology domain (RHD) sequences from NF-kB proteins.
This file has headers, which are:
-Sequence = species of origin for RHD sequence; abbreviations are 4 letters with the first letter indicating genus and the next 3 letters indicating the first 3 letters of the species name (e.g., Mvir = Morbakka virulenta)
-PC1–3 = first through third PC values
File 4: RHD_percent_identity_data.csv
This file is for data pertaining to the percent amino acid identities from pairwise sequence alignments for NF-kBs from pairs of species
This file has headers, which are:
-Species_1–2 = species being compared; abbreviations are 4 letters with the first letter indicating genus and the next 3 letters indicating the first 3 letters of the species name (e.g., Mvir = Morbakka virulenta)
-Percent_identity = percentage of amino acid identities
File 5: RHD_sequences.fa
This file contains amino acid sequences in FASTA format for the Rel homology domains of the NF-kB proteins analyzed.
File 6: RMSD_data.csv
This file is for data pertaining to root mean square deviations between predicted NF-kB structures generated via AlphaFold3
This file has headers, which are:
-Species_1–2 = species being compared; abbreviations are 4 letters with the first letter indicating genus and the next 3 letters indicating the first 3 letters of the species name (e.g., Mvir = Morbakka virulenta)
-RMSD = root mean square deviation between sequences calculated using PyMOL
File 7: RHD_MSA.aln-clustal_num
This file contains the multiple sequence alignment (MSA) of the Rel homology domain (RHD) sequences in in CLUSTAL format with residue numbering
Sharing/Access information
Links to other publicly accessible locations of the data:
- NA
Data was derived from the following sources:
- PBM data originally from Mansfield et al. (2017) Scientific Reports
Code/Software
File 1: NFkB_review_code.Rmd
This file contains code in R markdown format for the reproduction of all analyses and figure generation. Code can be run using the CSV files in this repository.
