Unveiling the impacts of land use on the phylogeography of zoonotic New World Hantaviruses

García Peña, Gabriel Ernesto 1 ; Rubio, André Víctor2

Published Feb 22, 2024 on Dryad. https://doi.org/10.5061/dryad.rv15dv4fq

Data files

Feb 22, 2024 version files 14.42 MB

AIC_tree.nwk

18.31 KB
BLAST_Nprot.csv

168.46 KB
hosts_america_PH.csv

1.48 KB
Nprot_MaxAlign.fas

113.23 KB
README.md

4.44 KB
taxa_in_network_NWH.zip

14.12 MB

Abstract

Billions of genomic sequences are stored in public repositories (NCBI) as well as records of species occurrence (GBIF). By implementing analytical tools from different scientific disciplines, data mining on these databases can be a source of information to aid in the global surveillance of zoonotic pathogens that circulate among wildlife. We illustrate this by investigating the hantavirus-rodent system in the Americas, i.e. New World Hantaviruses (NWH). First we draw the circulation of pathogenic NWH among rodents; by inferring the phylogenetic links among 278 genomic samples of the S segment (N protein) of NWH found in 55 species of Cricetidae rodents. Second, machine learning was used to assess the impact of land use on the probability of presence of the rodent species linked with reservoirs of pathogenic hantaviruses. Our results show that hosts are widely present across the Americas. Some hosts are present in the primary forest and agricultural land, but not in the secondary forest; whereas other hosts are present in secondary forest and agricultural land. The diversity of host species allows Hantavirus to circulate on a wide spectrum of habitats, in particular rural rather than urban. We highlight that Public repositories of genomic data and species occurrence are very useful resources for monitoring potential enzootic transmission and spillover of zoonotic viruses in relation with the changes that humans produce in the Biosphere.

Unveiling the Impacts of Land Use on the Phylogeography of Zoonotic New World Hantaviruses

Gabriel E García-Peña and André V. Rubio. Ecography 2024. DOI: 10.1111/ecog.06996

Supplementary Material

Description of the data and file structure

Analysis presented in the main article was performed in R (R Core Team 2022); MAFFT (Katoh 2005) and JModelTest2 (Darriba et al. 2012), following 4 main steps:

1. Data Collection and Curation.

BLAST_Nprot.csv : Accession numbers from the BLAST search for Hanatavirus. With this list of accesion numbers, it is possible to download the genetic sequences in R, by using the function read.GenBank() from the library ape. Metadata of these sequences can be accesed with the R code presented in the file: fetch.metadata.R (see code section).

2. Genetic Sequence Alignment and Phylogenetic Inference.

Nprot_MaxAlign.fas: Fasta file with Multiple sequence alignment of the genetic sequences. Fasta file can be read in R with the function read.dna() from the library ape. These data can be analyzed with the software JModelTest2.

AIC_tree.nwk: Phylogenetic relationships with topology and branch lengths infered with JModelTest2; presented in a newick format. The file can be read with the function read.tree() from the R library ape. The Figure of the tree is included in this repository (phylogeny_NWH.jpeg)

3. Phylogenetic Network analysis on the genetic links of Hantaviruses among hosts.

The phylogenetic network was infered from the phylogenetic trees contained in AIC_tree.nwk, the R code (phylonet.R), and the dataset hosts_america_PH.csv.

4. Geographic analysis on the habitat suitability of hosts linked in the phylogenetic network.

taxa_in_network_NWH.zip. The zip file conatins the results of the habitat suitability analysis performed with the R code predict_suitable_habitat.R. These maps are in geojson files named after each species and capture the probability of species presence (X1) and absence (X0) within pixels of 0.25 ° arc inside the distribution area of the species. Within each location, land use variables for 2015 are included. These variables are proportions of the pixel covered by each vegetation type, including: primary forest (primf), primary non-forest (primnf), secondary forest (secdf), secondary non forest (secdn), rangeland (range), pasture (pastr), annual C4 crops (C4ann), perennial C4 crops (C4per), C3 crops perenial (C3per) and annual (C3ann) , and nitrogen fixing plants (C3fix). These land use variables were used to predict X1 and X0.

Files can be viewed with a geographic information software including R.

Occurrence data used to analyse habitat suitability can be accesed from the original source: GBIF data: https://doi.org/10.15468/dl.pqwhfw

Sharing/Access information

Primary data used to perform the analysis can be accessed from the oﬃcial repositories:

GBIF data: https://doi.org/10.15468/dl.pqwhfw
Historical land-use dataset states.nc (LUH2 v2h) covers the period 850-2015 and projections for 2025: https://luh.umd.edu/data.shtml
Distributions of rodent species from the IUCN: https://www.iucnredlist.org/resources/spatial-data-download

Code/Software

Description of files within this repository:

fetch.metadata.R: Code of a Web scrapper to retrieve information about the sequences in the NCBI repository.
BLAST_Nprot.csv: lists with the accession numbers obtained from the BLAST search.
Nprot_MaxAlign.fas: Fasta file with the nucleotide sequences analysed; 278 genomic samples of the S segment (N protein) of NWH found in 55 species of Cricetidae rodents.
AIC_tree.nwk: Phylogenetic tree infered.
hosts_america_PH.csv: List of species known to host New World Hantavirus. Fisrt column contains the genus name, column 2 the speices name, and column 3 denotes (1) whether the species is known to harbor a pathogen strain of Hantavirus, or not (0).
phylonet.R: Code describing the phylogenetic network analysis.
predict_suitable_habitat.R: Code of Classification tree analysis on the habitat suitability of the rodent hosts.
taxa_in_network_NWH.zip: Maps for each species analysed with a prediction of habitat suitability (X1) in the distribution range of the species, drawn from the land use change variables (García-Peña et al. 2021).