Development of a SNP panel for geographic assignment and population monitoring of jaguars (Panthera onca)
Data files
Jun 04, 2025 version files 185.22 MB
-
compiled_results_self-assignment_rubias.xlsx
52.60 KB
-
Jag-SNPpanel_coordinates_and_genotypes.csv
124.44 KB
-
jaguar.57samples.459snps.vcf
19.14 MB
-
jaguar.57samples.84snps.vcf
18.55 MB
-
jaguar.57samples.allChr.snps.hardFilter.bi.maf.ld.masked.hwe.recode.vcf
147.35 MB
-
README.md
3.44 KB
Abstract
The jaguar (Panthera onca) is an iconic top predator that is threatened by habitat loss and fragmentation, along with an emerging expansion of poaching for the illegal trade of live individuals and their parts. To address the need for tools that improve surveillance and monitoring of its remaining populations, we have developed a genome-enabled single nucleotide polymorphism (SNP) panel targeting this species. From a dataset of 58 complete jaguar genomes, we identified and selected highly informative SNPs for geographic traceability, individual identification, kinship, and sexing. Our panel, named ‘Jag-SNP’, comprises 459 SNPs selected from an initial pool of 13,373,949 markers based on the inter-biome FST, followed by rigorous filtering and addition of eight sex-linked SNPs. We then randomly selected subsets of this panel and identified an 84-SNP set that exhibited a similar resolving power. With both the 459-SNP panel and its 84-SNP subset, samples were assigned with 98% success to their biomes of origin and 65-69% of them were assigned to within 500 km of their origin. Furthermore, ca. 10-18 SNPs within these panels were sufficient to distinguish individuals, while 6 sex-linked SNPs perfectly separated males and females. We used whole-genome data from an additional 18 jaguars to further test these panels, which correctly recovered kinship relationships and allowed inference of geographic origin of samples collected outside the spatial scope of the original sample set. These results support the strong potential of these panels as an efficient tool for application in forensic, genetic, ecological, behavioral and conservation projects targeting jaguars.
Paper: Development of a SNP panel for geographic assignment and population monitoring of jaguars (Panthera onca)
This README file describes the data accompanying the above publication.
Description
The present panel comprises jaguar genotypes from 459 highly informative SNPs for geographic traceability, individual identification and kinship, and another 8 sex-linked SNPs for sexing. These SNPs were identified and selected from a dataset of 58 complete jaguar genomes from Brazil, mapped against the chromosome-length genome assembly generated by the DNA Zoo team (GCF_028533385.1).
Files
1 - The CSV file named as "Jag-SNPpanel_coordinates_and_genotypes.csv" contains:
-
ID_SNP: SNP identifier.
-
CHROM:POS: Scaffold identification and SNP position, based on the reference genome.
-
Allele1/Allele2: Information about the reference alleles (Allele1) and alternative alleles (Allele2) of the reference genome.
-
bPon001 to bPon555: Jaguar samples, representing the allele variation for each SNP in different individuals.
Missing data are denoted by zero. Sex-linked SNPs are denoted by "sex_SNP_".
2 - The VCF file named as "jaguar.57samples.allChr.snps.hardFilter.bi.maf.ld.masked.hwe.recode.vcf" contains 83,527 independent and high-confidence SNPs distributed across the 19 jaguar chromosomes, filtered according to their coverage, depth, mapping quality, missing data, minor allele frequency, linkage disequilibrium, repetitive regions, and Hardy-Weinberg Equilibrium.
3 - The VCF file named as "jaguar.57samples.459snps.vcf" contains 57 samples (we removed individual 302 due to ≥ 20% missing data) and 459 highly informative SNPs for geographic traceability, individual identification, and kinship, that were selected from those 83k SNPs. Finally, from this file, we identified an 84-SNP set that exhibited a similar resolving power (VCF file named as "jaguar.57samples.84snps.vcf").
4 - The XLSX file named as "compiled_results_self-assignment_rubias.xlsx" contains the output from self assignment function (rubias R package; doi:10.1139/F08-049), showing the probabilities of assignment using both SNP panels (459-SNP panel on the first sheet, and 84-SNP panel on the second sheet).
- indiv: individual's ID
- collection: the name of the population that the individual is from.
- repunit: the reporting unit that an individual/collection belongs to.
- inferred_collection: the assigned collection
- inferred_repunit: the assigned unit
- scaled_likelihood: the posterior probability of assigning the jaguar to the
inferred_collectiongiven an equal prior on every collection in the reference. - log_likelihood: the log of the probability of the individuals genotype given it is from the collection
- z_score: a statistical metric used to measure how far a value is from the mean of a distribution. Z-score close to 0: the individual is within the distribution expected for their population; Z-score (≥ 2): the individual is more similar to the population than the average of the other individuals; Z-score (≤ 2): the individual is atypical for the assigned population.
- n_non_miss_loci: number of non-missed loci
- n_miss_loci: number of missing loci
- missing_loci: list of missing loci
From a dataset of 58 complete jaguar genomes from Brazil (Sartor et al. [in prep]), we used a genotype likelihood approach to identify 13,373,949 SNPs, that were later filtered according to their coverage, depth, mapping quality, missing data, minor allele frequence, linkage disequilibrium, repetitive regions and Hardy-Weinberg Equilibrium. With the remaining 83k SNPs, we grouped individuals according to their biomes of origin (Amazon: n=18; Atlantic Forest: n=14; Cerrado: n=14; Caatinga: n=6; Pantanal: n=6) and calculated pairwise FST values between biomes.
To select the most informative sites for geographic assignment, we ranked the 83k SNPS based on their pairwise FST values (from highest to lowest) between the sampled biomes and kept the 50 loci with the highest FST from each pairwise comparison. We identified a panel comprising 459 SNPs. This set included the top 50 SNPs of each pairwise biome comparison; 41 SNPs were retrieved in more than one comparison and were kept only once in this selected subset. Then, we randomly selected subsets of this 459-SNP panel to assess whether smaller sets of SNPs (which can be genotyped more quicky and affordably) provide similar levels of information. As an example, we identified a set with only 84 SNPs that exhibited a similar resolving power, correctly separating individuals by biomes of origin in a PCA.
To select SNPs that were informative for sexing, we identified 698 SNPs located on the X chromosome and searched for sites that were close to the X-linked genes Amelogenin (AMELX) and Zinc-finger protein (ZFX), which have been used for sex identification of jaguars and other felids (Pilgrim et al. 2005). Subsequently, we analyzed the genotypes of these SNPs in each individual and visualized their distribution in males and females using a PCA. We selected 6 SNPs among these that showed a clear-cut pattern in which all females were homozygous and males were either heterozygous or homozygous for the alternative allele. These SNPs were located in the X-linked genes ZFX, FRMPD4, TRAPPC2, TXLNG, and USP9X.
