RefAHL: A curated quorum sensing reference linking diverse LuxI-type signal synthases with their acyl-homoserine lactone products
Data files
Apr 22, 2025 version files 84.08 KB
-
README.md
5.38 KB
-
RefAHL_complete_rev20250421.xlsx
47.14 KB
-
RefAHL_rev20250421.fasta
31.56 KB
Abstract
Some bacteria use acyl-homoserine lactone (AHL) signals in quorum sensing, a type of cell-cell communication. Here we present “RefAHL”, a curated collection of LuxI-type AHL synthases with their AHL structures and associated metadata. RefAHL is publicly available as a community resource to help catalog LuxI-type diversity encoded in (meta)genomic data.
Dataset DOI: 10.5061/dryad.866t1g21s
Description of the data and file structure
Here we curate a list of previously published LuxI homologs, which have been experimentally demonstrated to synthesize an acyl homoserine lactone (AHL) signal of well-supported structure. We refer to this collection as “RefAHL” and will update the collection when new LuxI-AHL signals are defined.
Contact Amy Schaefer (amyschae@uw.edu) or Aaron Puri (a.puri@utah.edu) with any questions or to submit newly defined LuxI-AHL signal pairs.
There are two associated files (note that date suffixes in YYYYMMDD format will change in future versions).
Files and variables
File: RefAHL_complete_revYYYYMMDD.xlsx
Description: The RefAHL_complete_revYYYYMMDD.xlsx file has Tab 1 ("RefAHL LuxI-AHL"), Tab 2 (“AHL compounds”), and Tab 3 ("RefAHL confidence ranking"), which contain the following variables (columns) and their descriptions:
Variables
-
(1) RefAHL identifier: Unique RefAHL name, comprised from the beginning of the genera/clone name (letters) and a number; if a LuxI homolog has >65% amino acid identity and synthesizes the same AHL product with an existing RefAHL LuxI, they are given the same RefAHL ID with a subcategory letter (e.g. Mes_001a, Mes_001b)
(1) Taxonomy or clone: Indicates either bacterial Class taxonomy or clone if derived from metagenomic sequence
(1) Published LuxI homolog name: Most, but not all, LuxI homologs have an associated gene/protein name; for those homologs in the same RefAHL identifier grouping the shared percent amino acid identity is indicated in parentheses
(1) Major AHL (defined in Tab 2): The abbreviation of the major AHL signal synthesized by the LuxI homolog; in cases where cognate LuxR activity data is not available, the major AHL is defined as the most abundant AHL produced; see Tab 2 for full compound names and additional chemical information; in cases where the major AHL depends on the growth condition or strain background, this is listed in parentheses [e.g. in minimal medium (M9) vs. rich medium (K9) for Burk_007]
(1) AHL confidence category (defined in Tab 3): Confidence category ranking for the AHL product synthesized by the LuxI homolog depending upon its associated data; see Tab 3 for specific criteria
(1) Strain or metagenome: Name of the bacterial strain or metagenomic library clone encoding the LuxI homolog gene
(1) Reference(s) (PMID): Reference(s) for the data summarized in RefAHL; in most cases this is the PubMed identifier (PMID) number for published manuscript(s); one entry references unpublished 14C-methionine feeding data, which supports the published relaxed-specificity bioassay data
(1) LuxI-homolog protein sequence: Amino acid sequence of the LuxI homolog
(1) IMG gene identifier: Unique object identifier of the LuxI homolog gene in the JGI-IMG database; ‘none’ indicates the genome sequence is not hosted by IMG
(1) GenBank identifier: Unique object identifier of the LuxI homolog gene in the GenBank database; only used when there was no IMG identifier available
-
(2) AHL abbreviation: Abbreviation of the acyl homoserine lactone (AHL) structure used in Tab 1
(2) AHL common name: Common chemical name of the AHL structure
(2) PubChem compound: Unique object identifier of the AHL structure in the NIH PubChem database; ‘not available’ indicates the AHL has an undefined double bond so a PubChem number cannot be assigned; 'none' indicates the compound has not been deposited in PubChem
(2) AHL isomeric SMILES: The isomeric Simplified Molecular Input Line Entry System sstructure line notation, which describes the AHL structures; we assume a stereochemistry of L for the homoserine lactone and R for any 3-hydroxy acyl groups; ‘not available’ indicates the AHL has an undefined double bond so an isomeric SMILES cannot be assigned
(2) AHL formula: Chemical formula for the AHL structure
(2) AHL monoisotopic mass: Monoisotopic mass for the AHL structure
-
(3) Category: Confidence ranking of data used to assign the AHL structure; lower values denote better confidence
(3) Criteria: Criteria used to assign the category confidence rank; the rationale for omitting data (superscript note a) that utilize only relaxed-specificity bioassays is discussed in the manuscript text
File: RefAHL_rev20250421.fasta
Description: The RefAHL_revYYYYMMDD.fasta file contains the following:
Header: Contains the RefAHL identifier and Major AHL separated by an underscore for each LuxI homolog entry listed in .xls spreadsheet in fasta format; the text ‘RefAHL’ is included for each header for easy filtering
Sequence: One letter coded amino acid sequence for the LuxI homolog
Code/software
.xlsx files can be opened with any version of Microsoft Excel, as well as Google Sheets, WPS Office or OpenOffice Calc
.fasta files can be opened with any text editor
Access information
Other publicly accessible locations of the data:
- none
Data was derived from the following sources:
- data was assembled from the original references as cited in RefAHL.
The RefAHL dataset was collected, assembled and curated by humans from the public scientific literature.
