Skip to main content
Dryad

Data from: harnessing the power of regional baselines for broad-scale genetic stock identification: a multistage, integrated, and cost-effective approach

Data files

Dec 27, 2023 version files 966.40 KB

Abstract

In mixed-stock fishery analyses, genetic stock identification (GSI) estimates the contribution of each population to a mixture and is typically conducted at a regional scale using genetic baselines specific to the stocks expected in that region. Often these regional baselines cannot be combined to produce broader geographical baselines due to non-overlapping populations and genetic markers. In cases where the mixture contains stocks spanning across a wide area, a broad-scale baseline is created, but often at the cost of resolution. Here, we introduce a new GSI method to harness the resolution capabilities of baselines developed for regional applications in the analysis of mixtures containing individuals from a broad geographic range. This method employs a multistage framework that allows disparate baselines to be used in a single integrated process that produces estimates along with the propagated errors from each stage. All individuals in the mixture sample are required to be genotyped for all genetic markers in the baselines used by this model, but the baselines do not require overlap in genetic markers or populations representing the broad-scale or regional baselines.

We demonstrate our integrated multistage GSI model using a synthesized data set made up of Chinook salmon, Oncorhynchus tshawytscha, from the North Bering Sea of Alaska. The data set is designed to be run using R package, Ms.GSI, and it does not represent the composition of the real fishery. The results show an improved accuracy for estimates using an integrated multistage framework, compared to the conventional framework of using separate hierarchical steps. The integrated multistage framework allows GSI of a wide geographic area without first developing a large scale, high-resolution genetic baseline or dividing a mixture sample into smaller regions beforehand. This approach is more cost-effective than updating range-wide baselines with all regionally important markers.