Genomic analyses reveal poaching hotspots and illegal trade in pangolins from Africa to Asia
Data files
Dec 19, 2023 version files 5.76 GB
-
PangHKsamples2.csv
33.48 KB
-
pangolin_samples_for_code.xlsx
50.98 KB
-
Pangolin_tests_assertainment.xlsx
12.66 KB
-
Pangolin.111breeding_b.4Origen.loc.anonymize.txt
1.98 KB
-
Pangolin.breeders.FinalPanel.geno_LCWG.2mind.rm_dup.name_fix.rm3outliers.map
2.88 KB
-
Pangolin.breeders.FinalPanel.geno_LCWG.2mind.rm_dup.name_fix.rm3outliers.ped
45.33 KB
-
Pangolin.HK1_8.FinalPanel.94.647ind_genotyped.map
2.88 KB
-
Pangolin.HK1_8.FinalPanel.94.647ind_genotyped.ped
265.57 KB
-
Pangolin.incl_HK1_8.FinalPanel.94.2mind.rubias_input.txt
273.02 KB
-
Pangolin.merged_gatk.srs_filt.vcf.gz
5.75 GB
-
Pangolin.reference.Panel1_2.94.geno_LCWG.111ind..rubias_input.txt
50.04 KB
-
Pangolin.reference.Panel1_2.94.geno_only.rmoutlier.rubias_input.newname.txt
33.84 KB
-
README.md
8.21 KB
Abstract
Reducing the illegal wildlife trade requires an understanding of its origins. Here we present a genomic approach for tracing confiscated scales from the world’s most trafficked mammal, the white-bellied pangolin (Phataginus tricuspis), to their geographic origins. Analyzing scales seized in Hong Kong SAR, China from 2012–2018 revealed intense poaching along Cameroon’s southern border. Poaching pressures shifted over time from West to Central Africa. Using data from seizures representing nearly one million African pangolins, we identified Nigeria as a significant hub for trafficking, where scales are amassed and shipped to Vietnam and Hong Kong SAR, China, with final transit to markets in Guangdong and Guangxi, China. This origin-to-destination approach offers new opportunities to disrupt the illegal wildlife trade and to guide anti-trafficking measures.
README: Genomic analyses reveal poaching hotspots and illegal trade in pangolins from Africa to Asia
https://doi.org/10.5061/dryad.zkh1893g7
Here we present a genomic approach for tracing confiscated scales from the world’s most trafficked mammal, the white-bellied pangolin (Phataginus tricuspis), to their geographic origins. Analyzing scales seized in Hong Kong SAR, China from 2012-2018 revealed intense poaching along Cameroon’s southern border. We utilized whole genome resequencing of white-bellied pangolins across their range to identify population structure, and then designed Fluidigm SNP-type assays diagnostic of the distinct genetic clusters to assign confiscated pangolin scales back to their origin. This data set includes the called genotypes using whole genome sequencing (in vcf form), the raw data of targeted SNP genotyping of georeferenced pangolins and confiscated pangolin scales, and custom scripts used for assignment analyses.
Description of the data and file structure
- pangolin_samples_for_code.xlsx: This is the metadata file of individuals sampled for whole genome resequencing (sheet 1). The georeferenced sample metadata includes unique identifier (UCLA sample ID), Country of Origin, Type (e.g. sample type of blood dot or scale), Date collected, Detailed location (e.g. Site, Town, State, Province, District, etc), Simplified Location, Map_number (e.g. the map number in Figure 1 genoscape), Region_K5 (e.g. the distinct genetic cluster sample belongs to), Latitude and Longitude. Note Latitude and Longitude are rounded up to maintain anonymity of the sample locations. Missing data is denoted with NA. The confiscated pangolin scales metadata can be found in Supplementary material, Data S3.
- Pangolin.merged_gatk.srs_filt.vcf.gz: This genotype file was created using GATK HaplotypeCaller to detect variants, and filtered for biallelic loci, missingness (--max-missing 0.75), quality (--minQ 20) and maf (maf >0.05). This vcf has 26,108,076 million variants for 89 individuals. Missing genotypes="./."
- Pangolin.reference.Panel1_2.94.geno_LCWG.111ind..rubias_input.txt: This is the genotype file used in rubias for the 111 georeferenced pangolins genotyped at 94 targeted SNP-type assays. Genotypes are designated as A=1,C=2,G=3,T=4, and missing data is denoted as NA.
- Pangolin.incl_HK1_8.FinalPanel.94.2mind.rubias_input.txt: This is the genotype file used in rubias for the 647 confiscated pangolin scales genotyped at 94 targeted SNP-type assays. Genotypes are designated as A=1,C=2,G=3,T=4, and missing data is denoted as NA.
- Pangolin.reference.Panel1_2.94.geno_only.rmoutlier.rubias_input.newname.txt: This is the genotype file used in rubias self assessment to test accuracy of our targeted SNPs while accounting for ascertainment bias as these individuals were not used to design the targeted SNPs. Genotypes are designated as A=1,C=2,G=3,T=4, and missing data is denoted as NA.
- Pangolin_tests_assertainment.xlsx: This excel sheet includes how the ascertainment bias was assessed prior to SNP-type design. Because sample sizes were small, we used a leave one out method when choosing divergent variants between major genetic clusters to account for ascertainment bias. Each sample was excluded once, FST outliers were assessed between genetic clusters and assignment with rubias was assessed to determine whether the individual left out could be assigned to its known genetic unit.
- Pangolin.breeders.FinalPanel.geno_LCWG.2mind.rm_dup.name_fix.rm3outliers.ped: This is the genotype file for 111 georeferenced pangolins genotyped at 94 variants used as reference in the OriGen analyses. Missing data is 0.
- Pangolin.breeders.FinalPanel.geno_LCWG.2mind.rm_dup.name_fix.rm3outliers.map: This map file corresponds to the .ped file of the georeferenced pangolin samples. It includes the assay names for each of the 94 assays used.
- Pangolin.111breeding_b.4Origen.loc: This is the location file for the 111 georeferenced pangolin samples with sample name, lat and long of each sample.
- Pangolin.HK1_8.FinalPanel.94.647ind_genotyped.ped: This is the genotype file for 647 confiscated pangolin scaled genotyped at 94 variants that we wanted assigned back to origin using the OriGen analyses. Missing data is 0.
- Pangolin.HK1_8.FinalPanel.94.647ind_genotyped.map: This map file corresponds to the .ped file of the confiscated pangolin scale samples. It includes the assay names for each of the 94 assays used. Identical to the georeferenced map file.
- PangHKsamples2.csv: This file is the results file from OriGen of the predicted lat/long of confiscated scales that can be used to create a heatmap of pangolin scale origins by year using the script (HeatMapCreation.R). Each line represents a predicted Latitude/Longitude of a sampled scale. The SEIZURE column refers to different seizure groups that the confiscated scales were sampled from. Additional information includes the year the seizure occurred and the declared Country of origin of the seizure.
Sharing/Access information
Links to other publicly accessible locations of the data:
Raw SNP genotyping data specific for assignment of confiscated scales can be found on github:
Processed bam files of whole genome sequenced pangolin samples can be found on NCBI:
- BioProject ID PRJNA1014914
Code/Software
We used R scripts to assign confiscated scales back to origin, as well as determine the accuracy of our targeted assays while taking into account ascertainment bias. The codes are listed with a description of each:
- PopAssignment.HK.Rmd: This is an R markdown that assigns confiscated scales back to origin using R software rubias and OriGen. The genotype files are targeted SNP genotyped files of 94 SNPs. The rubias analyses, which assign individuals back to a distinct genetic clusters identified in the population genetic analyses, uses rubias genotype input files for 111 georeferenced individuals genotyped at 94 SNPs (Pangolin.reference.Panel1_2.94.geno_LCWG.111ind..rubias_input.txt) and 647 confiscated pangolin scales genotyped at 94 SNPs (Pangolin.incl_HK1_8.FinalPanel.94.2mind.rubias_input.txt). The OriGen analyses, which narrows the location origin to a predicted Lat/Long based on reference individual allele frequencies and locations using map and ped files of reference individuals (Pangolin.breeders.FinalPanel.geno_LCWG.2mind.rm_dup.name_fix.rm3outliers.map.ped), a location file of reference individuals (Pangolin.111breeding_b.4Origen.loc), and the map and ped files of confiscated pangolin scale individuals (Pangolin.HK1_8.FinalPanel.94.647ind_genotyped.map/ped). Note the location file uses rounded Latitudes and Longitudes to maintain anonymity of the sample locations.
- Leave1out_rubias.PopAssignment.breeders.R: This is an Rscript that was used to assess ascertainment bias in the designing of targeted SNP assays. For each individual used in the FST outlier analysis to identify highly differentiated loci between groups, that one individual was left out, FST analyses were conducted, the top 10 variants were chosen, and rubias was used to determine whether the sample left out could be identified back to correct genetic cluster of origin. The excel sheet Pangolin_tests_assertainment.xlsx shows which individual was left out and then analyzed.
- OriGen.PredLatLong.KnonwPang.output.R: This R script was used to assess OriGen accuracy using a leave one out method. The method removes one georeferenced individual from the reference file and location file, used that individual as an "unknown" individual to be analyzed in OriGen to identify predicted Lat/Long.
- OriGen_plot.accuracy.Pang.R: For accuracy of OriGen results, we used a leave on out method, see above. This script takes the results from the leave on out methods, determines distance between known location and predicted location results from OriGen and assess distance error in the predictions.
- HeatMapCreation.R: This Rscript is the code to create the heat map of OriGen predicted locations of all the confiscated pangolin scales (PangHKsamples2.csv) that we plot in Figure 2.