Skip to main content

Data from: Comparing phylogeographies to reveal incompatible geographical histories within genomes

Cite this dataset

Singer, Benjamin; Di Nardo, Antonello; Hein, Jotun; Ferretti, Luca (2024). Data from: Comparing phylogeographies to reveal incompatible geographical histories within genomes [Dataset]. Dryad.


Modern phylogeography aims at reconstructing the geographic movement of organisms based on their genomic sequences and spatial information. Phylogeographic approaches are often applied to pathogen sequences and therefore tend to neglect the possibility of recombination, which decouples the evolutionary and geographic histories of different parts of the genome. Genomic regions of recombining or reassorting pathogens often originate and evolve at different times and locations, which characterise their unique spatial histories. Measuring the extent of these differences requires new methods to compare geographic information on phylogenetic trees reconstructed from different parts of the genome. Here we develop for the first time a set of measures of phylogeographic incompatibility, aimed at detecting differences between geographical histories in terms of distances between phylogeographies. We study the effect of varying demography and recombination on phylogeographic incompatibilities using coalescent simulations. We further apply these measures to the evolutionary history of human and livestock pathogens, either reassorting or recombining, such as the Victoria and Yamagata lineages of influenza B and the O/ME-SA/Ind-2001 foot-and-mouth disease virus strain. Our results reveal diverse geographical paths of migration that characterise the origins and evolutionary histories of different viral genes and genomic segments. These incompatibility measures can be applied to any phylogeography, and more generally to any phylogeny where each tip has been assigned either a continuous or discrete “trait” independent of the sequence. We illustrate this flexibility with an analysis of the interplay between the phylogeography and phylolinguistics of Uralic-speaking human populations, hinting at patrilinear language transmission.



1_MASTER_sim folder

XML files generated in MASTER 6.1.1 (via BEAST 2.6.1) and used to run structured coalescent simulations and generate ancestral recombination graphs. The simulation uses an island model of population structure with three populations of 20 individuals each and different patterns of migration between all populations. The structured coalescent model is implemented backward in time as a combination of coalescence events (two lineages within the same island that coalesce into one) with rate 1, migration events with rate \mu for each arrow, and recombination events (a lineage that splits into two lineages within the same island) with rate \rho.

2_InfB folder

XML files generated in BEAST 1.10.4 and used to run discrete phylogeographic reconstructions for both Influenza B Victoria (1_Victoria folder) and Yamagata (2_Yamagata folder) lineages. For influenza B virus we selected 122 and 120 unique genome sequences of, respectively, the Victoria and Yamagata lineages (Bedford et al. 2015; Langat et al. 2017), characterising five out of the eight gene segments and encoding: the polymerase basic subunits 1 and 2 (PB1 and PB2), the hemagglutinin (HA) and neuraminidase (NA) glycoproteins, and the non-structural protein 1 (NS1), along with the the joint sequences of all these genes (~8.6 kb). MCC trees resulting from the analyses are included.

3_FMDV folder

XML files generated in BEAST 1.10.4 and used to run discrete phylogeographic reconstructions for Foot-and-Mouth Disease Vius (FMDV) O/ME-SA/Ind-2001 lineage. For FMD we selected 74 whole-genome sequences (Bachanek-Bankowska et al. 2018), extracting separate alignments for each of the four structural (VP1 to VP4) and six non-structural (2A to 2C, and 3A to 3D) proteins, along with the leader polypeptide (Lpro) and both the 3' and 5' UTR alignments. MCC trees resulting from the analyses are included.

4_Phyloling folder

We retrieved previously published genotype data of autosomal chromosomes (1_auto folder), mtDNAs (3_mt folder) and chrY (4_y folder) of Uralic-speaking individuals (Tambets et al. 2018). Conventional FST distance matrices derived for each genotype data along with linguistic (lexical) data were used to infer distinct phylogenies in FastME (.nex files), which were subsequently re-projected in time using the chronos function implemented in the R package ape (_CH.tree files). The linguistic data we used is composed of a set of 226 words translated into each language. The linguistic space was build using the default MDS projection routines in R based on lexical distances.

XML files generated in BEAST 1.10.4 for each genotype data and used to run continuous phylogeographies using the 2-dimensional lexical or geographic spaces as continuous traits. MCC trees resulting from the analyses are included.


  • Bachanek-Bankowska K, Di Nardo A, Wadsworth J, Mioulet V, Pezzoni G, Grazioli S, Brocchi E, Kafle SC, Hettiarachchi R, Kumarawadu PL, et al. 2018. Reconstructing the evolutionary history of pandemic foot-and-mouth disease viruses: the impact of recombination within the emerging O/ME-SA/Ind-2001 lineage. Sci Rep, 8(1): 14693.
  • Bedford T, Riley S, Barr IG, Broor S, Chadha M, Cox NJ, Daniels RS, Gunasekaran CP, Hurt AC, Kelso A, et al. 2015. Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature, 523(7559): 217–220.
  • Langat P, Raghwani J, Dudas G, Bowden 620 TA, Edwards S, Gall A, Bedford T, Rambaut A, Daniels RS, Russell CA, et al. 2017. Genome-wide evolutionary dynamics of influenza B viruses on a global scale. PLOS Pathog, 13(12): 1–26.
  • Tambets K, Yunusbayev B, Hudjashov G, Iluma¨e AM, Rootsi S, Honkola T, Vesakoski O, Atkinson Q, Skoglund P, Kushniarevich A, et al. 2018. Genes reveal traces of common recent demographic history for most of the uralic-speaking populations. Genome Biol, 19(1): 139.


Biotechnology and Biological Sciences Research Council, Award: BB/M011224/1

Department for Environment Food and Rural Affairs, Award: SE2945