Data from: Attenuation of virulence in Yersinia pestis across three plague pandemics
Data files
Mar 31, 2025 version files 2.20 MB
-
raw_chromosome.snps.filter5.aln
2.15 MB
-
README.md
43.01 KB
Abstract
Yersinia pestis has spilled over from wild rodent reservoirs to commensal rodents and humans causing three historically recorded pandemics. Depletion in the copy number of the plasmid-encoded virulence gene pla occurred in later-dated strains of the first and second pandemics, yet the biological relevance of the pla deletion has been difficult to test. We identified modern Y. pestis strains that independently acquired the same pla depletion as ancient strains, and herein show that excision of pla from the multi-copy pPCP1 plasmid is accompanied by the integration of a separate full pPCP1 harboring pla into the single-copy pCD1 plasmid, reducing pla dosage. Moreover, we demonstrate that this depletion decreases mortality of mice in models of bubonic plague, but not in the pneumonic and septicemic forms of the disease. We hypothesize that pla depletion may have been selectively advantageous in bubonic plague due to rodent fragmentation following pandemic-induced mortality.
https://doi.org/10.5061/dryad.xksn02vs1
Description of the data and file structure
Supplementary figures have been uploaded as individual .PDF files and their figure captions are listed below. Data files have been uploaded within a single excel file, descriptions/titles for each table are listed below. The filtered multiple sequence alignment of 317 Y. pestis samples used to create Figure 1 and Fig. S5 and its description are also listed below. For detailed methodology please refer to the materials and methods section of the manuscript.
Figures were made using R-studio, Geneious, and BioRender, and Mummerplot. Edits were made in Adobe Illustrator.
Files and variables
File: raw_chromosome.snps.filter5.aln
Description: Filtered multiple sequence alignment of 317 Y. pestis samples. Alignment file for a maximum likelihood phylogenetic analysis of Yersinia pestis. Ancient samples were only included in the analysis if they had a minimum of 70% of the chromosome covered at a minimum depth of 3X (n=71), 235 modern samples were included in the analysis, representing broadest Y. pestis diversity. Snippy v4.6.0 was used to perform a pairwise alignment of each sample to the Y. pestis CO92. Input for ancient samples was sorted and deduplicated, mapped bam, filtered for a min35MQ30. For modern samples, assembly files were used as input. Minimum mapping quality to accept a variant call was 30, minimum number of reads covering a site was 3, minimum fraction of reads differing from the reference for a variant to be called was 0.9, and minimum quality for a nucleotide to be used in variant calling was set to 20. Pairwise alignment, this multiple alignment was created using snippy-core v4.6.0 . SNP-sites v2.5.1 was used to count the number of constant sites present in the alignment file, and to extract all SNP sites present in the chromosome. SNP sites in the resulting chromosomal alignment were filtered to ensure that a minimum of 95% of all samples containing data for any given site, which did not reach this threshold were removed.
File: FigureS1.pdf
Description: Fig. S1. Distinguishing between wildtype versus pla-reduced samples. A)Sequencing reads from a wildtype Y. pestis (pla-WT) sample mapped to a reference pPCP1 will present with even coverage across the plasmid, including the pla and toxin-antitoxin containing region. B) Sequencing reads from a pla-reduced Y. pestis sample mapped to a reference pPCP1 will present with decreased coverage throughout the pla and toxin-antitoxin containing region. C) Sequencing reads from a pla-reduced Y. pestis sample mapped to pPCP1∆pla, where the pla containing region is deleted, will contain gap bridging reads which span across the point of deletion. This figure was made in BioRender and edited in Adobe Illustrator.
File: FigureS2.pdf
Description: Fig. S2. Bioinformatic screening for gap bridging reads and pla depletion. Sequencing data from pPCP1 enriched Medieval Danish samples mapped to the Y. pestis whole genome, including pPCP1∆pla. Samples which we considered positive for pla depletion required the presence of reads mapping as GBRs. The results of this screen are shown in Table S1. Once mapped, the average coverage of 3 regions was calculated using samtools. A) Average coverage was taken across the pst gene (4615-5888 bp). B) Average coverage was taken across the pla gene (6665-7603 bp). When comparing coverage of this region to coverage around pst, wildtype samples will appear similar, however pla-reduced samples will experience a drop in average coverage. C) Average coverage of the gap bridging region on pPCP1∆pla (6420-6440 bp). Wildtype samples experience a drop in coverage in this region when compared to their coverage of pst and pla. Samples with pla depletion generally have higher coverage in this region than across the pla gene. These images were exported from Geneious and edited in Adobe Illustrator.
File: FigureS3.pdf
Description: Fig. S3. Example of percent mismatches in reads mapping to the gap bridging region (6420-6440 bp) from pla-reduced vs. wildtype Y. pestis samples. The pla depletion screen revealed reads mapping within the gap bridging region, in low amounts, in Y. pestis wildtype samples. These reads were explored further by considering their percent mismatch rate within the region. In this example, wildtype Y. pestis sample G25Bx98 is shown in green and pla-reduced sample G861x1035 is shown in blue. There is a low and consistent percent mismatch rate found in sequencing reads from pla-reduced samples mapping along the gap bridging region. In contrast, there is a high percent mismatch rate (up to 97%) in sequencing reads from wildtype samples mapping along the gap bridging region. Reads mapping to the gap bridging region in wildtype samples correspond to the sequences surrounding the point of deletion, but their high mismatch rate shows that they do not consist of reads which truly span across the region. Reads with low mismatch rates, which truly span the across the gap bridging region, are only found in pla-reduced strains of Y. pestis (i.e., G861x1035).
File: FigureS4.pdf
Description: Fig. S4. Low coverage of pPCP1 from ancient samples prevents accurate identification of pla depletion. Sequencing reads from G861x1035 mapped to pPCP1 (NC_003132.1), following Y. pestis whole genome enrichment. IS100 elements (1-1888 bp) have been removed from the figure, as mapping quality filters applied to the alignment resulted in a coverage depth of 0. Although the decreased presence of reads within the depleted region is a key characteristic of pla-reduced strains, we use gap-bridging reads as evidence of pPCP1∆pla within a sample. Lower coverage samples may not recover gap-bridging reads.
File: FigureS5.pdf
Description: Fig. S5. Complete maximum likelihood (ML) phylogenetic tree (n = 317). Tip colours of pla-reduced strains are light blue, pla-intermediate are dark blue, pla-WT strains are green, strains with less than 10X coverage of pPCP1 are grey, and strains not included in our screen are black (Data file S1). In the ML trees, nodes with over 95% bootstrap support are indicated with asterisks. Presence and absence of gap bridging reads (GBRs), in addition to the pla/pst gene depth ratio and % depletion, was considered before colouring of tips on the phylogenetic tree (Data file S1). Samples without GBRs were identified as pla-WT. Only ancient samples with >70% coverage at read depth of 3X are present in the phylogenetic analyses (Data file S4). Sweeps of pla-depletion in pandemic clades are highlighted in blue. Third pandemic samples, corresponding to 1.ORI are highlighted in pink (Data file S5). Red branch lengths represent ancient strains which are ancestral to the third pandemic. Yellow stars have been placed beside G861x1035 and G25Bx98, which are the samples used in pPCP1∆pla and pPCP1WT de novo assembly. Human and rat figures have been placed next to modern Vietnamese strains used in in vivo and in vitro analyses, to represent their isolation source. Strains have been labelled according to 12 subpopulations, population labels and corresponding colours are within the legend (Data file S1).
File: FigureS6.pdf
Description: Fig. S6. Reads mapping to the gap bridging region (pPCP1∆pla) in ‘back-mutated’ samples, LAR8, LAR11, and Rostov2033, do not resemble true GBRs. A) The sequence of pPCP1WT surrounding 6430 bp, the 3’ end before the deletion . B) The sequence of pPCP1WT surrounding 8530 bp, the 5’ end following the deletion. C) The sequence of pPCP1∆pla from 6420 to 6440 bp, where we screen for gap bridging reads. D) The red box surrounding the single read mapping from LAR8 contains 2 mismatches, which correspond to the pPCP1WT sequence from 6431-6433 bp. E) The red box surrounding the single read mapping from LAR11 contains 1 mismatch, which correspond to the pPCP1WT sequence at 6432 bp. F) The red box surrounding the single read mapping from Rostov2033 contains 2 mismatches, which correspond to the pPCP1WT sequence at 6431-6433 bp. The green box contains 4 mismatches, all corresponding to the sequence of pPCP1WT from 8520-8530 bp.
File: FigureS7.pdf
Description: Fig. S7. Presence of protein-coding genes across the Y. pestis CO92 chromosome (NC_003143.1). Regions were extracted from the ref-seq feature table and filtered to only contain protein-coding genes. Coverage was calculated by the tool bedtools genomecov which records the coverage at each position within a provided genomic region. Therefore, the heat map represents the percent coverage of each gene region, and we considered a gene to be present or absent based on several criteria. For example, if a sample presents with below 70% coverage across the replicon genes, it was not considered absent, rather a sample with low coverage. On the contrary, if gene coverage was low in a sample which had consistent and high coverage throughout the replicon genes, it was considered absent. Samples names are along the bottom of the heatmap, names in green represent pla wt strains, blue represent pla-reduced strains, dark blue represent pla-intermediate strains, and grey represents samples which could not be classified by our pla depletion screen. First, second, and third pandemic labels are along the bottom of the figure. Names of gene regions are along the right-hand side. All 114 ancient samples were included in this screen, along with 5 modern Vietnamese isolates.
File: FigureS8.pdf
Description: Fig. S8. Presence of protein-coding genes across the Y. pestis CO92 pCD1 plasmid (NC_003131.1). Regions were extracted from the ref-seq feature table and filtered to only contain protein-coding genes. Coverage was calculated by bedtools genomecov which records the coverage at each position within a provided genomic region. Therefore, the heat map represents the percent coverage of each gene region, and we considered a gene to be present or absent based on several criteria. For example, if a sample presents with below 70% coverage across the replicon genes were not considered absent, rather samples were low coverage. On the contrary, if gene coverage was low in a sample which had consistent and high coverage throughout, the replicon genes were considered absent. Samples names are along the bottom of the heatmap, names in green represent pla wt *strains, blue represent *pla-reduced strains, dark blue represent pla-intermediate strains, and grey represents samples which could not be classified by our pla depletion screen. First, second, and third pandemic labels are along the bottom of the figure. Names of gene regions are along the right-hand side. All 114 ancient samples were included in this screen, along with 5 modern Vietnamese isolates.
File: FigureS9.pdf
Description: Fig. S9. Presence of protein-coding genes across the Y. pestis CO92 pMT1 plasmid (NC_003134.1). Regions were extracted from the ref-seq feature table and filtered to only contain protein-coding genes. Coverage was calculated by bedtools genomecov which records the coverage at each position within a provided genomic region. Therefore, the heat map represents the percent coverage of each gene region, and we considered a gene to be present or absent based on several criteria. For example, if a sample presents with below 70% coverage across the replicon, genes were not considered absent, rather samples were low coverage. On the contrary, if gene coverage was low in a sample which had consistent and high coverage throughout the replicon, genes were considered absent. Samples names are along the bottom of the heatmap, names in green represent pla wt strains, blue represent pla-reduced strains, dark blue represent pla-intermediate strains, and grey represents samples which could not be classified by our pla depletion screen. First, second, and third pandemic labels are along the bottom of the figure. Names of gene regions are along the right-hand side. All 114 ancient samples were included in this screen, along with 5 modern Vietnamese isolates.
File: FigureS10.pdf
Description: Fig. S10. Presence of protein-coding genes across the Y. pestis CO92 pPCP1 plasmid (NC_003131.1). Regions were extracted from the ref-seq feature table and filtered to only contain protein-coding genes. Coverage was calculated by bedtools genomecov which records the coverage at each position within a provided genomic region. Therefore, the heat map represents the percent coverage of each gene region, and we considered a gene to be present or absent based on several criteria. For example, if a sample presents with below 70% coverage across the replicon, genes were not considered absent, rather samples were low coverage. On the contrary, if gene coverage was low in a sample which had consistent and high coverage throughout the replicon, genes were considered absent. Samples names are along the bottom of the heatmap, names in green represent pla wt strains, blue represent pla-reduced strains, dark blue represent pla-intermediate strains, and grey represents samples which could not be classified by our pla depletion screen. First, second, and third pandemic labels are along the bottom of the figure. Names of gene regions are along the right side. Notably, pla still has coverage because pla-reduced samples contain the gene at low levels, the colouring of sample names is required to identify reduced strains. All 114 ancient samples were included in this screen, along with 5 modern Vietnamese isolates.
File: FigureS11.pdf
Description: Fig. S11. Identifying gap bridging reads in previously enriched and sequenced Y. pestis samples from Medieval Denmark. Reads were mapped to pPCP1 (NC_003132.1) from the Y. pestis CO92 reference genome, with modifications to remove the ‘pla region’ (6429-8531bp), as indicated by Susat et al., 2020 (13). *Following mapping bam files were filtered for a minimum length of 35 bp and mapping quality of 30 and then visualized in Geneious. Gap bridging reads were detectable in three samples: A) A146x3001, B) G371 C) A1155 x1155. These reads were also extracted following mapping to determine if they originated from organisms other than *Y. pestis (Data files S11-13).
File: FigureS12.pdf
Description: Fig. S12. Damage plots and fragment length distributions of enriched reads from Danish medieval sample G25Bx98 mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS13.pdf
Description: Fig. S13. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample GR ID 319 mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS14.pdf
Description: Fig. S14. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample A19 x21 mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS15.pdf
Description: Fig. S15. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample Gr GCx15 mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS16.pdf
Description: Fig. S16. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample G.25A mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS17.pdf
Description: Fig. S17. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample G.16 mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS18.pdf
Description: Fig. S18. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample G207, mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS19.pdf
Description: Fig. S19. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample G861x1035 mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS20.pdf
Description: Fig. S20. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample A146x3011 mapping to Y. pestis C092 (GCA_00009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS21.pdf
Description: Fig. S21. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample x1265 mapping to Y. pestis C092 (GCA_000009065.1). A)Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS22.pdf
Description: Fig. S22. Damage plots and fragment length distributions of enriched reads from Danish Medieval sample x1155 A1155 mapping to Y. pestis C092 (GCA_000009065.1). A) Fragment length distribution of reads mapping to Y. pestis which have been filtered for a minimum length of 35 and a map quality of 30. B) Ancient DNA misincorporation plots for mapped and filtered reads.
File: FigureS23.pdf
Description: Fig. S23. Node 1 assembled by SPAdes from enriched pPCP1 data from G861x1035 is missing the 2100 bp pla and toxin-antitoxin containing region. A). Node 1 was split into 1000 bp pieces and mapped back to NC_003132.1 (pPCP1) to identify large insertions and deletions, rearrangements, and/or SNPs. B) Mummer plot of Node 1 and pPCP1, indicates 100% sequence similarity between the assembled node and the reference sequence. The assembled node contains a deletion between 6430 and 8529 bp in pPCP1. C) Mummer plot after Node 1 was manually rearranged in Geneious, making the starting point of Node 1 and the reference the same, simplifying downstream comparisons between Node 1 and pPCP1. D) Enriched pPCP1 reads from G861x1035 mapped back to the pPCP1∆pla contig (Node 1). The area with increased coverage represents Insertion sequence elements: istA and istB. The mapped reads were filtered for a minimum length of 65 and map quality of 30, as fastq files were trimmed by 10bp on each end and filtered for a minimum length of 45bp before reconstruction by SPAdes. SNPs between the mapped reads and the assembled contig were called using Geneious (Data file S16), changes to the contig were manually applied, with a threshold of >95%.
File: FigureS24.pdf
Description: Fig. S24. Node 20 assembled by SPAdes from enriched pPCP1 data from G861x1035 corresponds to the missing 2100 bp pla and toxin-antitoxin containing region. A) Node 20 taken as 1000 bp stretches and mapped back to pPCP1WT *(NC_003132.1) from *Y. pestis CO92 (GCA_000009065.1) using bwa mem. The previously assembled node was split into 1000 bp contigs and mapped back to NC_003132.1 to determine if there were any changes in the sequence (large insertions and deletions, rearrangements, and/or SNPs). B) Mummer plot of an alignment between the entire Node 1 and pPCP1 (NC_003132.1), indicates 100% sequence similarity between the assembled node and the pPCP1 reference sequence. The assembled node consists of the region between 6420 and 8530 bp in NC_003132.1. This node was not incorporated into Node 1 despite attempts at providing the node as a trusted contig to SPAdes. C) **Enriched pPCP1 reads from G861x1035 mapped back to Node 20. **The mapped reads were filtered for minimum length 65 and mapping quality of 30, as the fastq files were trimmed by 10 bp on each end and then filtered to > 45 bp prior to assembly with SPAdes.
File: FigureS25.pdf
Description: Fig. S25. Node 2 (G25Bx98) assembled by SPAdes from enriched pPCP1 data, represents pPCP1WT. Node 2 (G25Bx98) confirms that pPCP1 could be assembled in its entirety without any breaks in the contig. A) Node 2 was split into 1000 bp pieces and mapped back to NC_003132.1 (pPCP1) to identify large insertions and deletions, rearrangements, and/or SNPs. B) Mummer plot of Node 2 and pPCP1, indicates 100% sequence similarity between the assembled node and the reference sequence, and no deletion present between 6430 and 8529 bp. C) Mummer plot after Node 2 was manually rearranged in Geneious, making the starting point of Node 2 and the reference the same, simplifying downstream comparisons between the Node 2 and pPCP1. D) Enriched pPCP1 reads from G25Bx98 mapped back to the pPCP1WT contig (Node 2). The area with increased coverage represents Insertion sequence elements: istA and istB. The mapped reads were filtered for a minimum length of 65 and a mapping quality of 30, as the fastq files were trimmed by 10 bp on each side and then filtered for the minmum length of 45 bp before assembly by SPAdes.
File: FigureS26.pdf
Description: Fig. S26. Mummer plot of IP1642H pCD1 and pPCP1 assemblies against Y. pestis CO92. The pink dotted line represents the start of the pPCP1 reference sequence. The yellow star is placed to emphasize that the pCD1 assembly from IP1642H contains the entire pPCP1WT. The blue box shows that the IP1642H assembly for pPCP1 is missing the 2100 bp deleted region. The legend for percent similarity represents the between the CO92 reference plasmids and the IP1642H pPCP1 and pCD1 plasmids.
File: FigureS27.pdf
Description: Fig. S27. Mummer plot of IP1638H pCD1 and pPCP1 assemblies against Y. pestis CO92. The pink dotted line represents the start of the pPCP1 reference sequence. The yellow star is placed to emphasize that the pCD1 assembly from IP1638H contains the entire pPCP1WT. The blue box shows that the IP1642H assembly for pPCP1 does not have a deletion. The legend for percent similarity represents the between the CO92 reference plasmids and the IP1638H pPCP1 and pCD1 plasmids.
File: FigureS28.pdf
Description: Fig. S28. Nodes assembled by SPAdes from short-read sequencing data of IP1642H, a pla-reduced modern Vietnamese isolate, aligned against pPCP1 (NC_003132.1). All three pla-reduced Vietnamese isolate contained 3 assemblies corresponding to pPCP1, notably the IS element was not assembled in these samples. A) Node 38 (4592 bp) aligned against pPCP1 (NC_003132.1), presenting 100% sequence similarity to 1839-6430 bp on the Y. pestis CO92 reference genome. B) Node 45 (3433 bp) aligned against pPCP1 (NC_003132.1), presenting 100% sequence similarity to 6304-9612 bp and 1 to 126 bp. C) Node 129 is 252 bp in length, presenting with 100% sequence similarity to the gap bridging and surrounding region, spanning between: 6304-6430 bp and 8531-8657 bp on pPCP1.
File: FigureS29.pdf
Description: Fig. S29. Single insertion/deletion difference present within the gap bridging region of ancient and modern pla-reduced samples. Mapping of 252bp contigs from IP1640H, IP1642H, IP1643H, and gap bridging reads from G861x1035 to pPCP1∆pla revealed an indel difference between ancient and modern Y. pestis strains with depletion.
File: FigureS30.pdf
Description: Fig. S30. Long-read alignment on pPCP1WT-pCD1 chimera for the pla-WT IP1638H isolate. Three reads spanning the pPCP1WT plasmid as well as mapping on pCD1 are highlighted in red, yellow and blue. Reference chimera sequence comes from IP1642H assembly. Sequencing depth is shown in logarithmic scale. Increased depth of coverage is seen along the integrated pPCP1 region because mapping reads also correspond to the pPCP1WT which are not incorporated into the pCD1-pPCP1 chimera and are present at a higher copy-number.
File: FigureS31.pdf
Description: Fig. S31. Long-read alignment on pPCP1WT-pCD1 chimera for the pla-reduced IP1640H isolate. Three reads spanning the pPCP1WT plasmid as well as mapping on pCD1 are highlighted in blue, red, and yellow. Reference chimera sequence comes from IP1642H assembly. Alignment gaps are represented in dark grey on the reads, highlighting the 2.1 kbp pla region excision and a potential pCD1 plasmid without pPCP1 insertion. Sequencing depth is shown in logarithmic scale. Increased depth of coverage is seen along the integrated pPCP1 region because mapping reads also correspond to the pPCP1∆pla which is not incorporated into the pPCP1WT-pCD1 chimera and is present at a higher copy-number. This is also reflected by the dip in coverage specifically around pla and the TA system (annotated above).
File: FigureS32.pdf
Description: *Fig. S32. Long-read alignment on pPCP1WT-pCD1 chimera for the *pla-reduced IP1642H isolate. *Three reads spanning the pPCP1WT plasmid as well as mapping on pCD1 are highlighted in red, yellow and blue. The yellow read covers the whole chimera. Reference chimera sequence comes from IP1642H assembly. Alignment gaps are represented in dark grey on the reads, highlighting the 2.1 kbp *pla region excision. Sequencing depth is shown in logarithmic scale. Increased depth of coverage is seen along the integrated pPCP1 region because mapping reads also correspond to the pPCP1∆pla which is not incorporated into the pPCP1WT-pCD1 chimera and is present at a higher copy-number. This is also reflected by the dip in coverage specifically around pla and the TA system (annotated above).
File: FigureS33.pdf
Description: Fig. S33. Long-read alignment on pPCP1WT-pCD1 chimera for 4 IP1643H clones. (A,B) Clones showing a pla-reduced genotype and (C,D) clones showing a pla-WT genotype. One read spanning the pPCP1WT plasmid as well as mapping on pCD1 is highlighted in blue for each clone. Reference chimera sequence comes from IP1642H assembly. Alignment gaps are represented in dark grey on the reads, highlighting the 2.1 kbp pla region excision. Sequencing depth is shown in logarithmic scale.
File: FigureS34.pdf
Description: Fig. S34. IS100 elements interfere with de novo reconstruction of pPCP1WT integration in samples with short read lengths. Dark blue, green, and grey are used to represent IS100 elements, all other genes on pPCP1, and pCD1, respectively. Light pink arrows represent short-reads corresponding to pPCP1 and light blue arrows correspond to short-reads mapping to pCD1. A) **Clinker diagram showing the structure of the known pPCP1-pCD1 integration from Vietnamese isolate IP1642H. This was used as a structural guide for identifying pPCP1 integration in our ancient samples. **B) **Mummerplots were made to visualize the integration of pPCP1 into other replicons in the Y. pestis genome. The green box shows our expectation for a successfully assembled integration, here we used the IP1642H sequence assembled from long read nanopore data and aligned it against the CO92 references for pPCP1 and pCD1. The orange box shows our attempt at simulating the IP1642H, pPCP1-pCD1 integration, using gargammel. SPAdes was used to assemble the plasmid and the resulting nodes are visualized here, the blue box highlights that the IS100 element was not assembled, and the integration could not be successfully assembled as a contiguous sequence. The remaining mummerplots were made using nodes assembled from Gdansk8, despite assembling over 94% of the Y. pestis genome from this high coverage ancient genome, the assemblies do not show integration of pPCP1 into any of the replicons. **C) The assembled contigs form Gdansk8 were mapped to the Y. pestis chromosome using bwa-mem. The number of IS100-like elements annotated in the Y. pestis genome were counted, their co-ordinates were extracted, and bedtools was used to calculate per base coverage. The table shows that none of the assembled contigs covered these elements. D) Schematic diagrams of pPCP1, pCD1, and chimeric pPCP1-pCD1 plasmids, highlighting that no unique short reads would map to the integrated regions of pPCP1, into other replicons due to the presence of IS100 elements on either side.
File: FigureS35.pdf
Description: Fig. S35. Average sequencing depth of the Y. pestis chromosome, pPCP1, pCD1, and pMT1 across 2339 modern Y. pestis samples. **Sequencing data from modern isolates of Y. pestis was downloaded and mapped to the Y. pestis CO92 reference genome. To determine the expected depth of coverage per replicon of Y. pestis the average depth across each sample was calculated and visualized in a violin plot. **A) Plasmid depth of coverage, not normalized against chromosomal depth of coverage. B) Plasmid depths normalized against the chromosome.
File: FigureS36.pdf
Description: Fig. S36. Tukey honest significant difference (HSD) test comparing mean coverage depth across pla compared to genes on pPCP1, pMT1, pCD1, and the chromosome. G861x1035 was shotgun sequenced to a depth of 348M clusters to determine the ratio of coverage across Y. pestis within the sample. Following an ANOVA analysis, a Tukey HSD test was performed. This test confirms that there are differences in the mean coverage between pst and the genes on all other replicons. This is expected as pst is present on a higher copy number genomic element. Genes present on all other replicons have a smaller difference in mean replicon depth. Notably, pla and YPO3619 (chromosome), have no difference in mean coverage. Data used for visualization can be found in Table S7.
File: FigureS37.pdf
Description: Fig. S37. Analyzing mean coverage depth in ancient sample G701 from Medieval Latvia. A) Average depth of coverage across each position in the following genes: pla, pst, YPMT1.67c, yopD, and YPO3619, is presented in violin plots. B) Tukey HSD test was performed, following an ANOVA analysis of the mean depths. The coverage of G701 is lower than other ancient shotgun sequenced Y. pestis samples (G488, X52, G861x1035). The test confirms that there are differences in the mean coverage between pst and the genes on all other replicons. Genes present on all other replicons have a smaller difference in mean replicon depth. Similar depths were found between in pMT1, the chromosome, and pPCP1, therefore the exact position of pla *cannot be confidently resolved. From this comparison it appears that *pla could be integrated into the chromosome or pMT1. Data used for visualization can be found in Table S8.
File: FigureS38.pdf
Description: Fig. S38. Analyzing mean coverage depth in ancient sample G488 from Medieval Latvia. A) Average depth of coverage across each position in the following genes: pla, pst, YPMT1.67c, yopD, and YPO3619, is presented in violin plots. B) Tukey HSD test was performed, following an ANOVA analysis of the mean depths. The test confirms that there are differences in the mean coverage between pst and the genes on all other replicons. Genes present on all other replicons have a smaller difference in mean replicon depth. From this comparison it appears that pla could be integrated into the chromosome. Data used for visualization can be found in Table S9.
File: FigureS39.pdf
Description: Fig. S39. Analyzing mean coverage depth in ancient sample X52 from Medieval Denmark. A) Average depth of coverage across each position in the following genes: pla, pst, YPMT1.67c, yopD, and YPO3619, is presented in violin plots. B) Tukey HSD test was performed, following an ANOVA analysis of the mean depths. The test confirms that there are differences in the mean coverage between pst and the genes on all other replicons. Genes present on all other replicons have a smaller difference in mean replicon depth. From this comparison it appears that pla could be integrated into the chromosome. Data used for visualization can be found in Table S10.
File: FigureS40.pdf
Description: Fig. S40. Linear model representing the relationship between pla gene depth and Pla activity in modern isolates. Absorbance data was collected from each time point between 80-120 minutes during the plasminogen activation assay, from each of the modern isolates, CO92 and the Pla-D206A control. This data was used to create the model shown above which has an adjusted R2= 0.8035, and p-value < 2e-16. Data is present in Data file S23 and detailed results can be found in Table S12.
File: FigureS41.pdf
Description: Fig. S41. Absorbance at 405 nm resulting from the enzymatic cleavage of plasminogen into chromogenic plasmin by Pla across 10 clones sampled from each modern Vietnamese Y. pestis isolate. A) All 10 clones of IP1638H show wildtype enzymatic activity; B) All 10 clones from IP1640H show a decreased enzymatic activity which we associate with pla-reduced* strains; C) All 10 clones from IP1642H show decreased enzymatic activity which we associate with *pla-reduced* *strains; D) Clones from IP1643H show mixed enzymatic activities: 4 colonies show wildtype Pla activity and 6 isolates display decreased Pla activity (1 biological replicate with 3 technical replicates in each panel).
File: FigureS42.pdf
Description: Fig. S42. Impact of pla depletion on bubonic plague after intradermal injection. **Mice were infected intradermally (5 mice / group, 1 biological replicate for each dose) with around **A) 136 CFUs, B) 271 CFUs, C) 542 CFUs and D) 1,084 CFUs of Vietnamese Y. pestis pla-WT IP1638H (green) isolate and the pla-reduced IP1642H (blue) isolate. Mouse survival was recorded daily for 21 days. While no significative difference was observed with log-rank test, the LD50 calculated by logistic regression was 197 CFUs (standard error: 65) for IP1642H, while the LD50 was below the lower dose tested (135 CFUs) for IP1638H.
File: FigureS43.pdf
Description: Fig. S43. Impact of pla depletion on septicaemic plague. **Mice were infected intravenously with around **A) 13-21 CFUs and B) 26-43 CFUs, C) 44-53 CFUs and D) 90-111 CFUs of Y. pestis pla-WT IP1638H (green) isolate, the pla-reduced IP1642H (blue) isolate and the Pla-D206A proteolytically inactive mutant (grey) (for A and B) (1 biological replicate for each dose, 7 mice / group in A and B, 5 mice / group in C and D). Mouse survival was recorded daily for 21 days. No significant difference was observed with log-rank test comparing death kinetics using IP1638H strain as reference. The LD50 calculated by logistic regression was 26 (standard error: 9) for IP1638H and 29 (standard error: 5) for IP1642H.
File: FigureS44.pdf
Description: Fig. S44. Schematic presentation of Y. pestis strains studied in the present project. We highlighted pPCP1 plasmids, their pla *copies and the genetic rearrangements that have occurred where known, resulting in a reduction but not loss of the pla gene and its corresponding Pla surface enzyme. Strains have been classified vertically according to the decreasing Pla activity and pla gene copy number (in green). Next to strain names are symbols to signify isolation source (rat or human). On the left of the modern samples, we show schematics of deeply shotgun sequenced medieval samples, for which coverage was deep enough to estimate the putative location of pla (pPCP1) insertion. One 14th C sample showing pla-WT and three pla* depleted samples with putative insertion of the pPCP1 into the chromosome.
File: FigureS45.pdf
Description: Fig. S45. Workflow diagram summarizing genomic, in vitro, and in vivo analyses performed. Preliminary screening was performed on all ancient and modern samples. Following preliminary screening, samples meeting minimum coverage cut-offs were carried forward in downstream analyses. Samples from our in-house collections were selected for enrichment and shotgun sequencing (ancient samples) and in vitro and in vivo analyses (modern samples) as outlined.
File: adt3880_Suppl. Excel_seq1_v5_2025-0223.xlsx
Description of Tabs:
Data file S1. All publicly available Ancient and modern Y. pestis samples screened for *pla *depletion in this study.
Data file S2. Fastq files from enriched G861x1035 were down sampled and mapped to the Y. pestis CO92 reference genome with the inclusion of pPCP1 (pla -). Gap bridging reads could not be detected when coverage of the pPCP1 dropped below 10X.
Data file S3. Sequencing statistics for samples analyzed from Medieval and early modern Denmark in this study.
Data file S4. Percent of Y. pestis chromosome covered (3X) among first and second pandemic samples included in this study (n =114).
Data file S5. Ancient and Modern Y. pestis samples included in Maximum Likelihood (ML) phylogeny produced in iqtree.
Data file S6. All ancient samples (n = 114) and the modern Vietnamese isolates (n = 5) were screened for the presence of the following protein coding genes on the *Y. pestis *chromosome and pMT1, pPCP1, and pCD1. Hypothetical proteins and IS elements were removed, 89 chromosomal genes, 6 pPCP1 genes, 56 pCD1 genes, and 49 pMT1 genes were used in this analysis. Annotation information was extracted from the RefSeq annotation for CO92.
Data file S7. Coverage of chromosomal genes used in Fig. S7 (n = 114 ancient genomes, n = 5 modern genomes)
Data file S8. Coverage of pCD1 genes used in Fig. S8 (n = 114 ancient genomes, n = 5 modern genomes)
Data file 9. Coverage of pMT1 gene used in Fig. S9 (n = 114 ancient genomes, n = 5 modern genomes)
Data file S10. Coverage of pCD1 genes used in Fig. S10 (n = 114 ancient genomes, n = 5 modern genomes)
Data file S11. Megablast results from gap bridging reads retrieved from previously enriched sample A1155 x1155
Data file S12. Megablast results from gap bridging reads retrieved from previously enriched sample G371.
Data file S13. Megablast results from gap bridging reads retrieved from previously enriched sample A146x3011.
Data file S14. Quast summary reports produced following assembly with SPAdes.
Data file S15. Megablast results of Node 1 and Node 20 produced from SPAdes analysis of G861x1035.
Data file S16. Assessment of SNPs present between pPCP1 (pla -) and input reads from G861x1035, staring from position 1 on CO92. Where SNP frequency of >95% was present in the sequencing reads the position on the assembled contig was changed.
Data file S17. Megablast results of Node 2 generate from SPAdes analysis of sample G25Bx98.
Data file S18. Megablast results of Node 38, 45, and 129 generated from SPAdes analysis of IP1642H.
Data file S19. Toxin-antitoxin module annotation on Yersinia pestis pPCP1 and other species. Of note, plasmid dimers and chimers can be due to assembly artefacts mediated by IS100 insertion sequences.
Data file S20. Structural variation between Vietnamese isolates IP1638H and CO92, and between IP1638H, IP1640H, IP1642H and IP1643H (Clones 1, 2, 6, and 9).
Data file S21. Average coverage of* Y. pestis *chromosome in modern sequenced isolates, compared to the ratio of pMT1, pPCP1, and pCD1 (n = 2339).
Data file S22. Coverage of genes chosen for the depth of coverage analyses for G861x1035 and Vietnamese isolates (calculated using Illumina data).
Data file S23. Plasminogen activation assay performed using Vietnamese strains. Absorbance (405 nm) measurements were collected in 5 minute intervals, %pla/pst is from Data file S1. (Three biological replicates).
Data file S24. Plasminogen activation assay performed using clones (N=10) of the Vietnamese isolates. A405 measurements were collected in 5 minute intervals, %pla/pst is from Data file S1.
Data file S25. Dunn Test performed among clones of sample IP1643H.
Data file S26. Time to death of mice infected subcutaneously with Y. pestis during infection experiments in murine models of bubonic plague (n = 14 individuals per strain).
Data file S27. Time to death of mice infected intradermally with Y. pestis during infection experiments in murine models of bubonic plague (n = 20 individuals per strain).
Data file S28. Time to death of mice infected intranasally with Y. pestis during infection experiments in murine models of pneumonic plague (n = 13 for IP1638H and Pla-D206A, n = 14 for IP1642H).
Data file S29. Time to death of mice infected intravenously with Y. pestis during infection experiments in murine models of septicaemic plague (n = 24 individuals per IP1638H and IP1642H, n = 14 for Pla-D206A).
Access information
Data generated in the research article is available at:
- PRJNA1164799 (data from archaeological remains, human read removal tool has been activated for ethical and privacy purposes).
- PRJNA1137242 (all other data).
- All other data are available in the main text or the supplementary materials.