Data from: High-throughput sequencing of Bacillus anthracis in France: investigating genome diversity and population structure using whole-genome SNP discovery
Girault, Guillaume; Blouin, Yann; Vergnaud, Gilles; Derzelle, Sylviane (2015), Data from: High-throughput sequencing of Bacillus anthracis in France: investigating genome diversity and population structure using whole-genome SNP discovery, Dryad, Dataset, https://doi.org/10.5061/dryad.rc6m9
Background: Single nucleotide polymorphisms (SNPs) are ideal signatures for subtyping monomorphic pathogens such as Bacillus anthracis. Here we report the use of next-generation sequencing technology to investigate the historical, geographic and genetic diversity of Bacillus anthracis in France. 122 strains isolated over a 50-years period throughout the country were whole-genome sequenced and comparative analyses were carried out with a focus on SNPs discovery to discriminate regional sub-groups of strains.Results: A total of 1581 chromosomal SNPs precisely establish the phylogenetic relationships existing between the French strains. Phylogeography patterns within the three canSNP sub-lineages present in France (i.e. B.Br.CNEVA, A.Br.011/009 and A.Br.001/002) were observed. One of the more remarkable findings was the identification of a variety of genotypes within the A.Br.011/009 sub-group that are persisting in the different regions of France. The 560 SNPs defining the A.Br.011/009- affiliated French strains split the Trans-Eurasian sub-group into six distinct branches without any intermediate nodes. Distinct sub-branches, with some geographic clustering, were resolved. The 345 SNPs defining the major B.Br CNEVA sub-lineage clustered three main phylogeographic clades, the Alps, the Pyrenees, and the Massif Central, with a small Saône-et-Loire sub-cluster nested within the latter group. The French strains affiliated to the minor A.Br.001/002 group were characterized by 226 SNPs. All recent isolates collected from the Doubs department were closely related. Identification of SNPs from whole-genome sequences facilitates high-resolution strain tracking and provides the level of discrimination required for outbreak investigations. Eight diagnostic SNPs, representative of the main French-specific phylogeographic clusters, were therefore selected and developed into high-resolution melting SNP discriminative assays. Conclusions: This work has established one of the most accurate phylogenetic reconstruction of B. anthracis population structure in a country. An extensive next-generation sequencing (NGS) dataset of 122 French strains have been created that allowed the identification of novel diagnostic SNPs useful to rapidly determine the geographic origin of any strain found in France.