Data from: Recommendations for population and individual diagnostic SNP selection in non-model species
Data files
Dec 21, 2024 version files 634.88 MB
-
PanTig-genmap1.0.chromrename.bed
56.28 MB
-
PanTig1.0-chrrename.fa
284.94 MB
-
README.md
2.06 KB
-
tiger_chromosomes.txt
11 B
-
tiger_pop.csv
731 B
-
tiger.vcf.gz
293.66 MB
Abstract
Despite substantial reductions in the cost of sequencing over the last decade, genetic panels remain relevant due to their cost-effectiveness and flexibility across a variety of sample types. In particular, single nucleotide polymorphism (SNP) panels are increasingly favored for conservation applications. SNP panels are often used because of their adaptability, effectiveness with low-quality samples, and cost-efficiency for use in population monitoring and forensics. However, the selection of diagnostic SNPs for population assignment and individual identification can be challenging. The consequences of poor SNP selection are under-powered panels, inaccurate results, and monetary loss. Here, we develop a novel user-friendly SNP selection pipeline for population assignment and individual identification, mPCRselect. mPCRselect allows any researcher, who has sufficient SNP-level data, to design a successful and cost-effective SNP panel for species of conservation concern.
README: Data from: Recommendations for Population and Individual Diagnostic SNP Selection in Non-Model Species
https://doi.org/10.5061/dryad.0k6djhb96
Description of the data and file structure
Tiger data from Armstrong et al. 2024 and Armstrong et al. 2021. File contains tigers from five tiger subspecies and generic (admixed) tigers which are unrelated. Data has been filtered down to two chromosomes (B1 and F2) and is filtered as described in manuscript. Test dataset for mPCRselect pipeline.
Files and variables
File: tiger_pop.csv
Description: Information corresponding to individuals contained in the VCF and their population assignment.
Variables
- Sample: Tiger samples correspond to individuals from Armstrong et al. 2024 (see Supplementary Table 1)
- Population: One of five tiger subspecies (Amur, Bengal, Indochinese, Malayan, Sumatran) or admixed population (Generic)
File: tiger_chromosomes.txt
Description: Names of chromosomes contained in VCF file.
File: PanTig-genmap1.0.chromrename.bed
Description: File containing sites to be filtered from VCF which contain low mappability scores/repetitive regions.
File: tiger.vcf.gz
Description: VCF containing single nucleotide polymorphisms from tiger individuals.
File: PanTig1.0-chrrename.fa
Description: Reference genome file containing two chromosomes.
Access information
Data was derived from the following sources:
- Armstrong, E. E., Khan, A., Taylor, R. W., Gouy, A., Greenbaum, G., Thiéry, A., ... & Ramakrishnan, U. (2021). Recent evolutionary history of tigers highlights contrasting roles of genetic drift and selection. Molecular Biology and Evolution, 38(6), 2366-2379.
- Armstrong, E. E., Mooney, J. A., Solari, K. A., Kim, B. Y., Barsh, G. S., Grant, V. B., ... & Hadly, E. A. (2024). Unraveling the genomic diversity and admixture history of captive tigers in the United States. Proceedings of the National Academy of Sciences, 121(39), e2402924121.
Methods
13 tigers from each of the Amur and Bengal subspecies, and 13 Generic (N = 39 individuals total from Armstrong et al., 2024) were selected for this dataset. Unrelated individuals were previously identified in Armstrong et al., 2024. Then, we created a subset of 10,000 markers randomly sampled from across the genome using the remaining markers in linkage equilibrium. This dataset is provided as the test data for the mPCRselect pipeline (https://github.com/ellieearmstrong/mPCRselect).
References:
- Armstrong et al. 2024, DOI: 10.1073/pnas.2402924121