Data from: Unraveling the genomic diversity and admixture history of captive tigers in the United States
Data files
Sep 06, 2024 version files 37.45 GB
-
genotypes.highcov.vcf.gz
21.85 GB
-
genotypes.highcov.vcf.gz.tbi
2.32 MB
-
highcov_tigers_phased_headerv1.sorted.vcf.gz
1.08 GB
-
highcov_tigers_phased_headerv1.sorted.vcf.gz.tbi
1.95 MB
-
lowcov-all86-impute.vcf.gz
14.51 GB
-
lowcov-all86-impute.vcf.gz.tbi
2.29 MB
-
README.md
2.33 KB
Abstract
Genomic studies of rare and endangered species have focused broadly on describing diversity patterns and resolving phylogenetic relationships, with the overarching goal of informing conservation efforts. However, many studies do not consider genetic reserves that are potentially housed in captive populations. For tigers (Panthera tigris) in particular, captive individuals vastly outnumber those in the wild, and their diversity remains largely unexplored. Here, we present the first large-scale genetic study of the private (non-zoo) captive tiger population in the United States (U.S.), also known as ‘Generic’ tigers. We find that the U.S. Generic tiger population has an admixture fingerprint comprising all six extant wild tiger subspecies (P. t. altaica, Amur; P. t. tigris, Bengal; P. t. corbetti, Indochinese; P. t. jacksoni, Malayan; P. t. amoyensis, South China; P. t. sumatrae, Sumatran). We show that the Generic tiger population has a comparable amount of genetic diversity to most wild subspecies, relatively few private variants, and fewer deleterious mutations. We also observe inbreeding coefficients that are similar to wild populations, suggesting that inbreeding in captive populations is not prevalent as previously thought, although there are some individuals within the Generic population that are quite inbred. Our results reflect the complex demographic history of the Generic tiger population in the U.S. Additionally, we develop a reference panel for tigers and show that it can be used with imputation to accurately distinguish individuals and assign ancestry even with ultra-low coverage (0.25×) data. We anticipate this comprehensive study and panel will propel future research and preservation of tigers in the U.S. and globally.
README: Data from "Unraveling the Genomic Diversity and Admixture History of Captive Tigers in the United States"
https://doi.org/10.5061/dryad.k0p2ngff1
A total of 155 tiger samples were collected opportunistically during routine vet care from sanctuary facilities (Supplementary Table 1) by vet and sanctuary staff. All samples were extracted using a Qiagen DNeasy kit (Cat. No. 69504) and samples were prepared using a modified Nextera library prep protocol. 78 of these samples were sequenced between approximately 2× and 5× depth. An additional 99, publicly available (as of December 2019) tiger genome samples were downloaded from NCBI. Reads were mapped to the GenTig1.0 genome using BWA-MEM v0.7.1728 and variant calling was subsequently performed by Gencove using the Genome Analysis Toolkit (GATK) v4.1.4.129 according to best practices. Initial variant calling was performed on all samples available at the time, excluding those sequenced at 0.25×, for a total of 177 individuals. The rest of the samples (86 individuals) were imputed. The data in this repository includes raw variant call files for the initial variant calling, the imputation, and a phased variant call file for future imputation purposes.
Please note: This file contains duplicates that were supplementary removed for downstream analysis. Please refer to the preprint and the GitHub/Zenodo repository for details.
Description of the data and file structure
Here we provide raw variant call files (VCFs) for a project investigating the diversity and ancestry of captive tigers in the United States. We are uploading three files:
- Phased high-coverage variant calls and index file (highcov_tigers_phased_headerv1.sorted.vcf.gz/tbi)
- High-coverage (unphased) raw variant calls and index file (genotypes.highcov.vcf.gz/tbi)
- Low-coverage (imputed) raw variant calls and index file (lowcov-all86-impute.vcf.gz/tbi)
Sharing/Access information
Raw whole-genome sequence data for this project can be found under NCBI Bioproject number PRJNA976043.
Code/Software
All code used to process and analyze this data can be found at: Jazlyn Mooney, & Ellie Armstrong. (2024). jaam92/Tigers: Tigers (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.13540809
Methods
This dataset was collected through whole-genome sequence and subsequent variant calling of tigers (Panthera tigris). Variant calling was performed using GATK by Gencove Inc. Further details can be found in the available preprint: https://www.biorxiv.org/content/10.1101/2023.06.19.545608