Data from: implications of methodologies for integrating empirical kinships into ex situ population management using PMx: a case study of Baer’s Pochard (Aythya baeri) in North America

Chaille, Asako 1 ; Lacy, Robert2 ; Putnam, Andrea1 ; Toste, Jamie3 ; Wilder, Aryn 4 ; Ivy, Jamie2

Research facility: San Diego Zoo Institute for Conservation Research

Published Nov 02, 2023 on Dryad. https://doi.org/10.5061/dryad.6t1g1jx3m

Abstract

In this study, we aimed to understand the implications of integrating empirical kinships into the genetic management of an ex situ population of the endangered waterfowl, Baer’s pochard (Aythya baeri), in North America. Single nucleotide polymorphism data were generated for 141 Baer’s pochard using double digest restriction site-associated DNA sequencing and empirical kinships were derived and integrated into the population management software PMx. We compared three different scenarios for appying empirical kinships within PMx: 1) no empirical kinships applied, 2) empirical kinships applied for pedigree terminals, 3) empirical kinships applied for the entire populations of pedigree terminals and descendants. We determined that most genetic summary statistics were impacted through the calculation of the population’s mean kinship, which increased signficantly after empirical kinships were integrated into our analyses. Our results also revealed the importance of understanding how molecular kinships derived from a particular estimator are scaled, if the scale differs significantly from pedigree-based kinships. We describe the theory behind the genetic metrics impacted and provide general guidance on incorporating empirical kinships into ex situ population management as well as provide suggestions for sampling strategies to minimize the biases inherent in merging two types of kinship estimators.

This README file was generated on 2023-10-03 by Asako Chaille.

GENERAL INFORMATION

Title of Dataset: Implications of methodologies for integrating empirical kinships into ex situ population management using PMx: A case study of Baer’s Pochard (Aythya baeri) in North America
Author Information
First Author Contact Information
Name: Asako Chaille
San Diego Zoo Wildlife Alliance
Address: San Diego, CA USA
Email: achaille@sdzwa.org
Date of data collection (single date, range, approximate date): Birds were sampled for blood between 2017 and 2019. ddRAD data was generated and analyzed between 2019-2021
Geographic location of data collection: Birds were sampled across 21 participating North American facilities. The ddRAD library preparation and sequencing was done at the Genomic and bioinformatics Service, Texas A&M AgriLife Research laboratory.
Information about funding sources that supported the collection of the data: This work was funded by Akron Zoo, Buttonwood Park Zoo, Minnesota Zoo, Pinola Conservancy, and San Diego Zoo Wildlife Alliance.

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
Links to publications that cite or use the data:

Chaille, A. Y., Lacy, R. C., Putnam, A. S., Toste, J. L., & Ivy, J. A. (2023). Implications of methodologies for integrating empirical kinships into ex situ population management using PMx: A case study of Baer’s Pochard (Aythya baeri) in North America. Journal of Heredity.

Links to other publicly accessible locations of the data: None
Links/relationships to ancillary data sets: None
Was data derived from another source? No
A. If yes, list source(s): NA
Recommended citation for this dataset:

Chaille, A. Y., Lacy, R. C., Putnam, A. S., Toste, J. L., & Ivy, J. A. (2023). Data from: Implications of methodologies for integrating empirical kinships into ex situ population management using PMx: A case study of Baer’s Pochard (Aythya baeri) in North America. Dryad Digital Repository. https://doi.org/10.5061/dryad.6t1g1jx3m

DATA & FILE OVERVIEW

File List:

A) Individuals_1_20.zip
B) Individuals_21_38.zip
C) Individuals_39_55.zip
D) Individuals_56_72.zip
E) Individuals_73_89.zip
F) Individuals_90_107.zip
G) Individuals_108_123.zip
H) Individuals_124_141.zip
I) PopulationMap.txt
J) populations.snps.vcf
K) KinshipMatrix_N56_Scenario2.txt
L) KinshipMatrix_N141_Scenario3.txt
M) KinshipMatrix_N141_OriginalWithNegativeValues.txt
N) BaersPochard_Scenario1.pmxproj
O) BaersPochard_Scenario2.pmxproj
P) BaersPochard_Scenario3.pmxproj

Relationship between files, if important: Files A-H are raw ddRAD sequence data (paired-end) for 141 sampled Baer's pochard.
Additional related data collected that was not included in the current data package: None
Are there multiple versions of the dataset? No
A. If yes, name of file(s) that was updated: NA
i. Why was the file updated? NA
ii. When was the file updated? NA

#########################################################################

DATA-SPECIFIC INFORMATION FOR:

Individuals_21_38.zip
Individuals_39_55.zip
Individuals_56_72.zip
Individuals_73_89.zip
Individuals_90_107.zip
Individuals_108_123.zip
Individuals_124_141.zip

These files contain raw ddRAD sequence data (paired-end) for 141 sampled Baer's pochard in FASTQ format.
Data are divided into multiple zipped folders by sample number 1-141, and not by individual ID.

#########################################################################

DATA-SPECIFIC INFORMATION FOR: PopulationMap.txt

This is the population map input file used for the STACKS analysis.

Number of variables: 2
Number of cases/rows: 141
Variable List:
- 1st column: studbook number (individual identifier) for all 141 birds sampled for the study
- 2nd column: population assignment - all given 1
Missing data codes: None
Specialized formats or other abbreviations used: None

#########################################################################

DATA-SPECIFIC INFORMATION FOR: populations.snps.vcf

Output of SNPs and haplotypes from STACKS containing 26,454 filtered SNPs for 141 individuals.

Number of variables: 149 (16 variables and 141 birds sampled for the study)
Number of cases/rows: 26,454 polymorphic SNPs
Variable List:
- CHROM: Catalog locus
- POS: SNP location
- ID: Locus number not applicable (.)
- REF: Primary SNP allele
- ALT: Alternative SNP allele
- QUAL: Not applicable '.'
- FILTER: Passed the filtering steps
- INFO: NS = number of samples with data; AF = allele frequency
- FORMAT: The format used in the VCF file to describe haplotypes for each individual (GT:DP:AD:GQ:GL)
- COLUMNS '024'through to'285': 141 birds' studbook identifiers
Missing data codes: not applicable '.'
Specialized formats or other abbreviations used: Variant Call Format (VCF)

#########################################################################

DATA-SPECIFIC INFORMATION FOR: KinshipMatrix_N56_Scenario2.txt

Empirical kinship text file for 56 pedigree terminals integrated into the PMx project for Scenario 2.

The first 56 rows list the 56 pedigree terminals by studbook identifier.
This is followed by a full matrix of the pairwise empirical kinships (KING-robust) between a pair of individuals in the order listed in the rows above.

Missing data codes: '-1'

#########################################################################

DATA-SPECIFIC INFORMATION FOR: KinshipMatrix_N141_Scenario3.txt

Empirical kinship text file for all 141 birds in the population integrated into the PMx project for Scenario 3.

The first 141 rows list the 141 pedigree terminals by studbook identifier.
This is followed by a full matrix of the pairwise empirical kinships (KING-robust) between a pair of individuals in the order listed in the rows above.

Missing data codes: '-1'

#########################################################################

DATA-SPECIFIC INFORMATION FOR: KinshipMatrix_N141_OriginalWithNegativeValues.txt

The original empirical kinship text file for all 141 birds in the population prior to truncating negative values to '0'.

Missing data codes: '-1'

#########################################################################

DATA-SPECIFIC INFORMATION FOR:

BaersPochard_Scenario1.pmxproj
BaersPochard_Scenario2.pmxproj
BaersPochard_Scenario3.pmxproj

PMx project files for Scenario 1 in which no empirical kinships were applied, Scenario 2 in which empirical kinships are applied to pedigree terminals, and Scenario 3 in which empirical kinships are applied for the entire population.

The PMx (scti.tools/pmx/) software is required to open the PMx project files. The files themselves are compressed (zipped) file that uses the standard file compression format.

For the manuscript, PMx version 1.6.20200804 was used to run those files.

#########################################################################

CODE/SOFTWARE

A custom bioinformatics pipeline (available at https://github.com/apwilder/StacksParameterSelection) was used to select optimal parameters (m, M, n) for STACKS based on the guidelines of Paris et al. (2017).

DNA extracted from whole blood from each of the 141 sampled individuals were sent to the Genomic and Bioinformatics Service, Texas A&M AgriLife Research laboratory for ddRAD library preparation and sequencing. A total of 500 ng of DNA from each individual were provided. Paired-end sequences approximately 150-bp in length were produced on a single lane of the Illumina NovaSeq 6000 S2 X platform. Demultiplexed sequence data was obtained in the form of compressed fastq files (fastq.gz), representing raw paired-end sequencing reads. Data filtering and SNP discovery were performed in STACKS v2.41 (Catchen et al. 2013, Rochette et al. 2019). Initially, sequence data was cleaned using the program process_radtags by removing reads with an uncalled base or low quality score (raw phred score <10). A custom bioinformatics pipeline (available at https://github.com/apwilder/StacksParameterSelection) then was used to select optimal parameters (m, M, n) for STACKS based on the guidelines of Paris et al. (2017). The pipeline ran iterations of the STACKS de novo program, varying one parameter at a time (m, M, or n) while holding the other two parameters constant at default settings (m=3, M=2, n=1). Parameter values tested for the maximum distance allowed between stacks (-M) ranged from 2 and 5, and the minimum depth of coverage required to create a stack (-m) ranged from 1 and 5. A catalog was assembled from consensus loci with the number of mismatches allowed between sample loci when building the catalog (-n) tested from 1 and 5. Parameter values that maximized the number of total loci, polymorphic loci, and SNPs genotyped in at least 80% of individuals (r= 0.80) were then used for downstream analyses. Filtered reads were aligned into identical sequences or ‘stacks’ and putative loci were then identified de novo by comparing stacks. Putative loci (sets of stacks) were then matched against the catalog. Reads were aligned from each sample one locus at a time to identify SNPs across the entire sample set for each locus, genotyping each individual at each SNP. Finally, SNPs were further filtered for a minor allele frequency (MAF) cut-off of 0.02 to remove potential SNPs that might have been generated due to genotyping error, and loci shared by at least 90% of the population retained for further analyses (r=0.90). A higher r-value than that used for parameter selection ultimately was chosen for the final dataset because the number of available SNPs supported identifying a more consistent pool of loci across individuals for downstream analyses. The KING algorithm in PLINK v2.00a was used to calculate pairwise KING-robust kinships between individuals in the dataset.

Data from: implications of methodologies for integrating empirical kinships into ex situ population management using PMx: a case study of Baer’s Pochard (Aythya baeri) in North America

Data files

Abstract

README

Methods

Usage notes

Works referencing this dataset