Evolution under vancomycin selection drives divergent collateral sensitivity patterns in Staphylococcus aureus
Data files
Sep 05, 2025 version files 9.81 GB
-
Card_2025_breseq_data.zip
9.81 GB
-
README.md
5.11 KB
Abstract
Staphylococcus aureus bacteremia is typically treated empirically with vancomycin, with therapy later tailored based on susceptibility results. However, these tests occur before vancomycin exposure and do not account for adaptation during empiric treatment that can alter S. aureus’ susceptibility to first-line drugs. To investigate these collateral drug responses, we experimentally evolved 18 methicillin-susceptible S. aureus (MSSA) populations under increasing vancomycin concentrations until they achieved intermediate resistance. Genomic sequencing revealed two distinct adaptive pathways characterized by mutations in the WalKR regulon, affecting cell wall metabolism, or rpsU, impacting translational stress responses. These pathways correlated with divergent collateral sensitivity profiles to first-line antibiotics. By developing a Collateral Response Score (CRS), we quantified the probability and magnitude of these responses, demonstrating that evolutionary dynamics critically influence resistance outcomes. Our findings suggest a probabilistic approach to antimicrobial therapy, advocating for rapid genomic diagnostics alongside susceptibility testing to better anticipate and respond to evolutionary changes.
Author: Kyle J. Card, et al. | Dataset DOI: 10.5061/dryad.qnk98sfw2
Description
This repository contains the raw genomic data and bioinformatic outputs required to replicate the analyses presented in Card et al. (2025), "Evolution under vancomycin selection drives divergent collateral sensitivity patterns in Staphylococcus aureus." This study reveals that evolution toward vancomycin-intermediate resistance in the pathogen Staphylococcus aureus proceeds through at least two distinct evolutionary pathways: one characterized by alterations in cell wall metabolism and another by changes in global stress response.
The raw sequencing reads for each population are available from the NCBI Sequence Read Archive (SRA) under accession number PRJNA1075422.
Methodological Overview
To generate the data in this repository, we first filtered the raw sequencing reads using Trimmomatic v0.39. We then used the breseq v0.39.0 bioinformatic pipeline to identify mutations relative to the S. aureus ATCC 29213 reference genome. This procedure was performed in two steps: first, to identify pre-existing mutations in our lab's ancestral strain, and second, to call new mutations in the evolved populations against this corrected ancestral reference.
The contents of this repository are the complete outputs from these breseq analyses.
File and Folder Structure
The data are contained within a single compressed folder:
Card_2025_breseq_data.zip: This zip file contains allbreseqoutput directories for the ancestral strain, control populations, and vancomycin-selected experimental populations.
Upon unzipping, you will find four sub-folders:
ancestor: Contains the initialbreseqoutput for the ancestral strain mapped to the original ATCC 29213 reference genome.updated_ancestor: Contains the FASTA file for the reference genome that has been updated with the background mutations found in theancestoranalysis. All evolved populations were mapped to this reference.control_lines: Contains individual output directories for each of the 87 control populations that evolved without vancomycin.experimental_lines: Contains individual output directories for each of the 18 experimental populations evolved in vancomycin.
Within the control_lines and experimental_lines folders, each sub-directory corresponds to a single evolved population and is named using the following convention: [CONDITION]_[POPULATION]
[CONDITION]:CTLfor control lines orVANfor vancomycin-selected lines.[POPULATION]: The specific population number.
Description of Key File Types
Each population directory contains a full breseq output. For a comprehensive explanation of every file and data column, please consult the official breseq documentation authored by Dr. Jeffrey Barrick.
Below is a summary of the most important files for interpreting the results.
Human-Readable Summary (HTML)
This is the best place to start for exploring the results for a single population.
output/index.html: Open this file in any web browser to see a user-friendly summary of the mutations predicted bybreseq. It includes tables describing the mutation, its position, the affected gene(s), and links to the underlying evidence.
Variant and Mutation Data (Computer-Readable)
These files are best for computational analysis across multiple samples.
data/output.vcf(Variant Call Format): A standardized text file containing information about the locations of genetic variants. This file is useful for downstream analysis with tools like BCFtools, GATK, or vcftools.- How to open: Plain text editor or specialized genomics software (e.g., IGV).
- Key columns:
#CHROM: Reference sequence identifier.POS: Position of the mutation.REF: The reference base(s).ALT: The alternate (mutant) base(s).INFO: Additional information, such as allele frequency (AF).
data/output.gd(GenomeDiff): Abreseq-specific format that details all predicted mutations and evidence. It is more detailed than the VCF file.- How to open: Plain text editor or using
gdtools.
- How to open: Plain text editor or using
Alignment Files
These files show how the sequencing reads were mapped to the reference genome. They are essential for manually verifying mutations.
data/reference.bam(Binary Alignment Map): The compressed binary file containing all read alignments.- How to open: Use a genome viewer like the Integrative Genomics Viewer (IGV) or command-line tools like SAMtools. To view in IGV, you will also need the reference FASTA file (
data/reference.fasta) and its index (data/reference.fasta.fai).
- How to open: Use a genome viewer like the Integrative Genomics Viewer (IGV) or command-line tools like SAMtools. To view in IGV, you will also need the reference FASTA file (
data/<read_file>.unmatched.fastq: Contains reads from the original sequencing file that did not map to the reference genome.
The data were collected using antimicrobial broth microdilution assays and whole genome sequencing as described in the associated publication. All of the statistical analyses of experimental data were performed using the R software environment (version 4.5.0). The details of our statistical analyses are provided in the associated R Notebook in our GitHub repository.
