Data from: implications of methodologies for integrating empirical kinships into ex situ population management using PMx: a case study of Baer’s Pochard (Aythya baeri) in North America
Data files
Nov 02, 2023 version files 76.31 GB
-
BaersPochard_Scenario1.pmxproj
104.15 KB
-
BaersPochard_Scenario2.pmxproj
333.95 KB
-
BaersPochard_Scenario3.pmxproj
492.34 KB
-
Individuals_1_20.zip
9.17 GB
-
Individuals_108_123.zip
9.43 GB
-
Individuals_124_141.zip
5.83 GB
-
Individuals_21_38.zip
10.40 GB
-
Individuals_39_55.zip
10.15 GB
-
Individuals_56_72.zip
10.58 GB
-
Individuals_73_89.zip
10.52 GB
-
Individuals_90-107.zip
10.09 GB
-
KinshipMatrix_N141_Scenario3.txt
179.89 KB
-
KinshipMatrix_N56_Scenario2.txt
33.88 KB
-
KinshpMatrix_N141_OriginalWithNegativeValues.txt
263.06 KB
-
PopulationMap
846 B
-
populations.snps.vcf
123.90 MB
-
README.md
7.55 KB
Abstract
In this study, we aimed to understand the implications of integrating empirical kinships into the genetic management of an ex situ population of the endangered waterfowl, Baer’s pochard (Aythya baeri), in North America. Single nucleotide polymorphism data were generated for 141 Baer’s pochard using double digest restriction site-associated DNA sequencing and empirical kinships were derived and integrated into the population management software PMx. We compared three different scenarios for appying empirical kinships within PMx: 1) no empirical kinships applied, 2) empirical kinships applied for pedigree terminals, 3) empirical kinships applied for the entire populations of pedigree terminals and descendants. We determined that most genetic summary statistics were impacted through the calculation of the population’s mean kinship, which increased signficantly after empirical kinships were integrated into our analyses. Our results also revealed the importance of understanding how molecular kinships derived from a particular estimator are scaled, if the scale differs significantly from pedigree-based kinships. We describe the theory behind the genetic metrics impacted and provide general guidance on incorporating empirical kinships into ex situ population management as well as provide suggestions for sampling strategies to minimize the biases inherent in merging two types of kinship estimators.
README
This README file was generated on 2023-10-03 by Asako Chaille.
GENERAL INFORMATION
- Title of Dataset: Implications of methodologies for integrating empirical kinships into ex situ population management using PMx: A case study of Baer’s Pochard (Aythya baeri) in North America
- Author Information First Author Contact Information Name: Asako Chaille San Diego Zoo Wildlife Alliance Address: San Diego, CA USA Email: achaille@sdzwa.org
- Date of data collection (single date, range, approximate date): Birds were sampled for blood between 2017 and 2019. ddRAD data was generated and analyzed between 2019-2021
- Geographic location of data collection: Birds were sampled across 21 participating North American facilities. The ddRAD library preparation and sequencing was done at the Genomic and bioinformatics Service, Texas A&M AgriLife Research laboratory.
- Information about funding sources that supported the collection of the data: This work was funded by Akron Zoo, Buttonwood Park Zoo, Minnesota Zoo, Pinola Conservancy, and San Diego Zoo Wildlife Alliance.
SHARING/ACCESS INFORMATION
- Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
- Links to publications that cite or use the data:
Chaille, A. Y., Lacy, R. C., Putnam, A. S., Toste, J. L., & Ivy, J. A. (2023). Implications of methodologies for integrating empirical kinships into ex situ population management using PMx: A case study of Baer’s Pochard (Aythya baeri) in North America. Journal of Heredity.
- Links to other publicly accessible locations of the data: None
- Links/relationships to ancillary data sets: None
- Was data derived from another source? No A. If yes, list source(s): NA
- Recommended citation for this dataset:
Chaille, A. Y., Lacy, R. C., Putnam, A. S., Toste, J. L., & Ivy, J. A. (2023). Data from: Implications of methodologies for integrating empirical kinships into ex situ population management using PMx: A case study of Baer’s Pochard (Aythya baeri) in North America. Dryad Digital Repository. https://doi.org/10.5061/dryad.6t1g1jx3m
DATA & FILE OVERVIEW
- File List:
A) Individuals_1_20.zip
B) Individuals_21_38.zip
C) Individuals_39_55.zip
D) Individuals_56_72.zip
E) Individuals_73_89.zip
F) Individuals_90_107.zip
G) Individuals_108_123.zip
H) Individuals_124_141.zip
I) PopulationMap.txt
J) populations.snps.vcf
K) KinshipMatrix_N56_Scenario2.txt
L) KinshipMatrix_N141_Scenario3.txt
M) KinshipMatrix_N141_OriginalWithNegativeValues.txt
N) BaersPochard_Scenario1.pmxproj
O) BaersPochard_Scenario2.pmxproj
P) BaersPochard_Scenario3.pmxproj
- Relationship between files, if important: Files A-H are raw ddRAD sequence data (paired-end) for 141 sampled Baer's pochard.
- Additional related data collected that was not included in the current data package: None
- Are there multiple versions of the dataset? No A. If yes, name of file(s) that was updated: NA i. Why was the file updated? NA ii. When was the file updated? NA
#########################################################################
DATA-SPECIFIC INFORMATION FOR:
Individuals_21_38.zip
Individuals_39_55.zip
Individuals_56_72.zip
Individuals_73_89.zip
Individuals_90_107.zip
Individuals_108_123.zip
Individuals_124_141.zip
These files contain raw ddRAD sequence data (paired-end) for 141 sampled Baer's pochard in FASTQ format.
Data are divided into multiple zipped folders by sample number 1-141, and not by individual ID.
#########################################################################
DATA-SPECIFIC INFORMATION FOR: PopulationMap.txt
This is the population map input file used for the STACKS analysis.
- Number of variables: 2
- Number of cases/rows: 141
- Variable List:
- 1st column: studbook number (individual identifier) for all 141 birds sampled for the study
- 2nd column: population assignment - all given 1
- Missing data codes: None
- Specialized formats or other abbreviations used: None
#########################################################################
DATA-SPECIFIC INFORMATION FOR: populations.snps.vcf
Output of SNPs and haplotypes from STACKS containing 26,454 filtered SNPs for 141 individuals.
- Number of variables: 149 (16 variables and 141 birds sampled for the study)
- Number of cases/rows: 26,454 polymorphic SNPs
- Variable List:
- CHROM: Catalog locus
- POS: SNP location
- ID: Locus number not applicable (.)
- REF: Primary SNP allele
- ALT: Alternative SNP allele
- QUAL: Not applicable '.'
- FILTER: Passed the filtering steps
- INFO: NS = number of samples with data; AF = allele frequency
- FORMAT: The format used in the VCF file to describe haplotypes for each individual (GT:DP:AD:GQ:GL)
- COLUMNS '024'through to'285': 141 birds' studbook identifiers
- Missing data codes: not applicable '.'
- Specialized formats or other abbreviations used: Variant Call Format (VCF)
#########################################################################
DATA-SPECIFIC INFORMATION FOR: KinshipMatrix_N56_Scenario2.txt
Empirical kinship text file for 56 pedigree terminals integrated into the PMx project for Scenario 2.
The first 56 rows list the 56 pedigree terminals by studbook identifier.
This is followed by a full matrix of the pairwise empirical kinships (KING-robust) between a pair of individuals in the order listed in the rows above.
Missing data codes: '-1'
#########################################################################
DATA-SPECIFIC INFORMATION FOR: KinshipMatrix_N141_Scenario3.txt
Empirical kinship text file for all 141 birds in the population integrated into the PMx project for Scenario 3.
The first 141 rows list the 141 pedigree terminals by studbook identifier.
This is followed by a full matrix of the pairwise empirical kinships (KING-robust) between a pair of individuals in the order listed in the rows above.
Missing data codes: '-1'
#########################################################################
DATA-SPECIFIC INFORMATION FOR: KinshipMatrix_N141_OriginalWithNegativeValues.txt
The original empirical kinship text file for all 141 birds in the population prior to truncating negative values to '0'.
The first 141 rows list the 141 pedigree terminals by studbook identifier.
This is followed by a full matrix of the pairwise empirical kinships (KING-robust) between a pair of individuals in the order listed in the rows above.
Missing data codes: '-1'
#########################################################################
DATA-SPECIFIC INFORMATION FOR:
BaersPochard_Scenario1.pmxproj
BaersPochard_Scenario2.pmxproj
BaersPochard_Scenario3.pmxproj
PMx project files for Scenario 1 in which no empirical kinships were applied, Scenario 2 in which empirical kinships are applied to pedigree terminals, and Scenario 3 in which empirical kinships are applied for the entire population.
The PMx (scti.tools/pmx/) software is required to open the PMx project files. The files themselves are compressed (zipped) file that uses the standard file compression format.
For the manuscript, PMx version 1.6.20200804 was used to run those files.
#########################################################################
CODE/SOFTWARE
A custom bioinformatics pipeline (available at https://github.com/apwilder/StacksParameterSelection) was used to select optimal parameters (m, M, n) for STACKS based on the guidelines of Paris et al. (2017).
Methods
DNA extracted from whole blood from each of the 141 sampled individuals were sent to the Genomic and Bioinformatics Service, Texas A&M AgriLife Research laboratory for ddRAD library preparation and sequencing. A total of 500 ng of DNA from each individual were provided. Paired-end sequences approximately 150-bp in length were produced on a single lane of the Illumina NovaSeq 6000 S2 X platform. Demultiplexed sequence data was obtained in the form of compressed fastq files (fastq.gz), representing raw paired-end sequencing reads. Data filtering and SNP discovery were performed in STACKS v2.41 (Catchen et al. 2013, Rochette et al. 2019). Initially, sequence data was cleaned using the program process_radtags by removing reads with an uncalled base or low quality score (raw phred score <10). A custom bioinformatics pipeline (available at https://github.com/apwilder/StacksParameterSelection) then was used to select optimal parameters (m, M, n) for STACKS based on the guidelines of Paris et al. (2017). The pipeline ran iterations of the STACKS de novo program, varying one parameter at a time (m, M, or n) while holding the other two parameters constant at default settings (m=3, M=2, n=1). Parameter values tested for the maximum distance allowed between stacks (-M) ranged from 2 and 5, and the minimum depth of coverage required to create a stack (-m) ranged from 1 and 5. A catalog was assembled from consensus loci with the number of mismatches allowed between sample loci when building the catalog (-n) tested from 1 and 5. Parameter values that maximized the number of total loci, polymorphic loci, and SNPs genotyped in at least 80% of individuals (r= 0.80) were then used for downstream analyses. Filtered reads were aligned into identical sequences or ‘stacks’ and putative loci were then identified de novo by comparing stacks. Putative loci (sets of stacks) were then matched against the catalog. Reads were aligned from each sample one locus at a time to identify SNPs across the entire sample set for each locus, genotyping each individual at each SNP. Finally, SNPs were further filtered for a minor allele frequency (MAF) cut-off of 0.02 to remove potential SNPs that might have been generated due to genotyping error, and loci shared by at least 90% of the population retained for further analyses (r=0.90). A higher r-value than that used for parameter selection ultimately was chosen for the final dataset because the number of available SNPs supported identifying a more consistent pool of loci across individuals for downstream analyses. The KING algorithm in PLINK v2.00a was used to calculate pairwise KING-robust kinships between individuals in the dataset.
Usage notes
ddRAD paired-end sequence data for 141 sampled Baer's pochard:
- BAPO_Seq_1_20
- BAPO_Seq_21_38
- BAPO_Seq_39_55
- BAPO_Seq_56_72
- BAPO_Seq_73_89
- BAPO_Seq_90_107
- BAPO_Seq_108_123
- BAPO_Seq_124_141
Stacks population map input file:
- PopulationMap
Stacks output file (haplotype and SNPs):
- populations.snps.vcf
Empirical kinship files imported into PMx:
- KinshipMatrix_N56_Scenario2.txt
- KinshipMatrix_N141_Scenario3.txt
- KinshipMatrix_N141_OriginalWithNegativeValues.txt
The PMx (scti.tools/pmx/) software is requried to open the PMx project files:
- BaersPochard_Scenario1.pmxproj
- BaersPochard_Scenario2.pmxproj
- BaersPochard_Scenario3.pmxproj