Skip to main content

Data for: Identification of integrons and gene cassette-associated recombination sites in bacteriophage genomes

Cite this dataset

Qi, Qin et al. (2022). Data for: Identification of integrons and gene cassette-associated recombination sites in bacteriophage genomes [Dataset]. Dryad.


Bacteriophages are versatile mobile genetic elements that play key roles in driving the evolution of their bacterial hosts through horizontal gene transfer. Phages co-evolve with their bacterial hosts and have plastic genomes with extensive mosaicism. In this study, we present bioinformatic and experimental evidence that temperate and virulent (lytic) phages carry integrons, including integron-integrase genes, attC/attI recombination sites and gene cassettes. Integrons are normally found in Bacteria, where they capture, express and re-arrange mobile gene cassettes via integron-integrase activity. We demonstrate experimentally that a panel of attC sites carried in virulent phage can be recognized by the bacterial class 1 integron-integrase (IntI1) and then integrated into the paradigmatic attI1 recombination site using an attC x attI recombination assay. With an increasing number of phage genomes projected to become available, more phage-associated integrons and their components will likely be identified in the future. The discovery of integron components in bacteriophages establishes a new route for lateral transfer of these elements and their cargo genes between bacterial host cells.


All bacteriophage genomes with the descriptions “complete genomes” or “genomic sequences” were downloaded from the NCBI Genome database (last accessed on 1 May 2022). “Unverified”, “partial” or “incomplete” phage genomes were excluded. IntegronFinder 2.0 was used to detect complete integrons, CALINs (clusters of attCs lacking an associated integron-integrase) and In0 elements (integron-integrase that does not carry any gene cassettes) in the 10,705 downloaded phage genomes, using default parameters.  In addition, every phage genome was screened for integron attC sites using a previously described HattCI + INFERNAL pipeline. For the HattCI + INFERNAL analysis, the minimum bit-score threshold was set to 20, and a minimum of two predicted attC sites were required in each candidate CALIN according to previously published methods. All predicted attC sites that were located within 3 kB of another attC site were used in subsequent analysis. Annotations of attC sites were based on the results from both sets of bioinformatic predictions (i.e. IntegronFinder and HattCI + INFERNAL). ORF annotations were based on IntegronFinder results and the original GenBank annotations for the respective phages from NCBI. An in-house script ( was used to classify the phage-associated attC sites according to sequence and structural homologies of chromosomal attC sequences in eleven bacterial taxa.

Usage notes

Supplementary File 1 can be opened using Geneious or SnapGene.


Australian Research Council, Award: DP200101874