Data and code from: Accounting for unobserved population dynamics and aging error in close-kin mark-recapture assessments

Data files

Feb 07, 2024 version files 4.37 GB

Dryad_repository.zip
4.37 GB
README.md
13.14 KB

Abstract

Obtaining robust estimates of population abundance is a central challenge hindering the conservation and management of many threatened and exploited species. Close-kin mark-recapture (CKMR) is a genetics-based approach that has strong potential to improve monitoring of data-limited species by enabling estimates of abundance, survival, and other parameters for populations that are challenging to assess. However, CKMR models have received limited sensitivity testing under realistic population dynamics and sampling scenarios, impeding application of the method in population monitoring programs and stock assessments. Here, we use individual-based simulation to examine how unmodeled population dynamics and aging uncertainty affect the accuracy and precision of CKMR parameter estimates under different sampling strategies. We then present adapted models that correct the biases that arise from model misspecification. Our results demonstrate that a simple base-case CKMR model produces robust estimates of population abundance with stable populations that breed annually; however, if a population trend or non-annual breeding dynamics are present, or if year-specific estimates of abundance are desired, a more complex CKMR model must be constructed. In addition, we show that CKMR can generate reliable abundance estimates for adults from a variety of sampling strategies, including juvenile-focused sampling where adults are never directly observed (and aging error is minimal). Finally, we apply a CKMR model that has been adapted for population growth and intermittent breeding to two decades of genetic data from juvenile lemon sharks (Negaprion brevirostris) in Bimini, Bahamas, to demonstrate how application of CKMR to samples drawn solely from juveniles can contribute to monitoring efforts for highly mobile populations. Overall, this study expands our understanding of the biological factors and sampling decisions that cause bias in CKMR models, identifies key areas for future inquiry, and provides recommendations that can aid biologists in planning and implementing an effective CKMR study, particularly for long-lived data-limited species.

https://doi.org/10.5061/dryad.bk3j9kdkg

This compilation of data and code contains three primary folders that can be accessed from the main directory:

Data_used_in_MS
JAGS_models
Scripts_used_in_MS
Simulation_log_key

Each of these folders/files and the structure of subfolders is described in details below.

Description of the data and file structure

Data_used_in_MS: This folder contains the final data that are presented in the manuscript (MS), as well as the code used to analyze it. The structure and content of this directory are as follows:
- 04_DataViz: This folder contains two primary scripts, as well as several functions that are sourced. The code assumes that the working directory is set to the same directory as the script, and that all folders and files from the repository are present.
  - mcmc analysis.R contains code for examining convergence among markov chains, cross-correlation, and autocorrelation. Note, however, that the full output from JAGS is not included in this repository because the files are very large.
  - results_analysis_and_figures_markdown_v3.Rmd is the primary code we used to analyze, summarize, and plot our results. It pulls in various files from the “output” folder that is one level up.
  - functions contains various functions that are mostly vestigial and not used in the current version of the code, but may be useful.
- output: This folder contains the output from fitting various CKMR models to simulated data. There are many levels to this folder, but maintaining the (rather convoluted) file structure should allow for the results_analysis_and_figures_markdown_v3.Rmd code to run as-is (following updates to output file locations, etc.). The most important subfolders and their contents are summarized below.
  - Model.results contains output files from the CKMR model, including parameter estimates, summary statistics for the posterior distributions, and calculations of relative bias (some of which are recalculated in the “results_analysis_and_figures_markdown_v3” code). The primary results files are those that begin with the label CKMR_results*. In these files, there are three columns that will contain NAs: 1) the column imposters contains the number of aunt/niece (uncle/nephew, etc.) pairs that were included as half-siblings pairs (HSPs). For all simulations except for those that explicitly tested the consequences of this, we did not include these “imposters” as HSPs so the results will contain NA. The columns 2) POPs_detected and 3) POPs_expected were only relevant for the scenario in which adults were sampled alongside juveniles (the “sample all ages” scenario). When only juveniles were sampled, we did not include parent-offspring pairs (POPs) in the CKMR model, so these columns received NAs. Other output that was saved includes the pairwise comparison matrices for mothers and fathers (labeled mom.comps* and dad.comps*, respectively) and various summary files to be imported for analysis such as lambda.df, objective3_mom.comps, and pop.size_all.
Bimini_dataset is a folder that contains the results and output of simulations and real genetic data for a small population of Bimini lemon sharks, corresponding to section 3.5 of the associated MS.
The other three folders (crossCorr, multiennial_comps, and samples) contain subsets of data that were used at some point during analysis, either to collate into a new file (e.g., the files in multiennial_comps were collated into the RDS file objective3_mom.comps) or used transiently in the analysis script.
Population.simulations contains output from our data generating model (DGM). The primary types of data here are information about aunt/uncle niece/nephew pairs (begins with aunt.uncle*), a breakdown of offspring per parent for the last 50 years of each simulation (parents.breakdown*), information about the size of the population and breeding population each year which is used to compare to estimates from CKMR (pop.size*), sample data for each simulation (the main file used as input to the estimation model, starts with sample.info*), and true values for survival and lambda for each simulation (truth*). More details about the scripts used to generate these files can be found below, and in sections 2.1 and 2.2 of the associated MS.

JAGS_models: This folder contains text files with the models used throughout the MS. When a parameter for population growth was included in the model, we took two different approaches: one that directly estimate abundance in the year of interest, and another that derived abundance in the year of interest from estimates of abundance in an initial reference year (t0) and estimates of population growth. The former are labeled with the suffix “estimateNtdirectly”, and were used in Appendix S1, Figure S4; the latter are labeled with the suffix “deriveNt”. These are the primary models used for objectives 2-5 (and see Section 2.4.2 and Appendix S1: S1.3 in the associated MS for more information). Models that did not include a parameter for population growth were used for initial model validation and as the naive model for tests of population growth. These contain the term “noLambda” in the file name. Finally, the model labeled as MHS.only_narrowLambda_skip_model_deriveNt is the model that was used to estimate abundance of adult female lemon sharks in Bimini, Bahamas.
Simulation_log_key: This spreadsheet is a key for the various scenarios that were tested in the associated MS. There are three columns:
- scenario_label_in_scripts: The primary scenarios tested in this project are listed in Table 1 of the MS. However, the specific labels of the different tests evolved in the writing of the MS relative to the code. In the code (Scripts_used_in_MS/02_Estimation.model/*), the settings for each scenario were linked via specification of the scenario being tested; when the scenario object was created in the code, an array of other parameters were set based on which scenario was being run. The label in this column is the label that can be given to the scenario object in the code to produce the appropriate settings for each test.
- scenario_label_in_text: Not all scenarios that were tested in the code were included in the main MS in Table 1. This column connects each “scenario” from the MS to the associated scenario in the code. Combined, one can look at the scenario specified in Table 1, then using this key, find the appropriate script in this repository.
- test_summary: A short summary of what each scenario tested.

Sharing/Access information

Kinship data for lemon sharks comes from: Feldheim, K. A., S. H. Gruber, J. D. DiBattista, E. A. Babcock, S. T. Kessel, A. P. Hendry, E. K. Pikitch, M. V. Ashley, and D. D. Chapman. 2014. Two decades of genetic profiling yields first evidence of natal philopatry and long-term fidelity to parturition sites in sharks. Molecular Ecology 23:110–117 and can be accessed at: https://doi.org/10.5061/dryad.1q9r8

Code/Software

Running the code and using the files specified in the section “Description of the data and file structure” above will give the results and figures presented in the MS. If one wishes to start from the very beginning, or to adapt the code used in this project, this can be accomplished using the scripts specified in the folder Scripts_used_in_MS. The code is presently organized in two levels: the first level generates and samples populations with different dynamics and distinct pedigrees, and then saves each set of samples, as well as other useful information about the simulated populations. The second level imports the samples and other relevant output files from the population simulations and then fits one or more CKMR models according to the scenario being tested. The scripts should run with minimal modifications, so long as the appropriate packages are installed and input/output file locations are specified (they are presently blank).The file structure is outlined below.

Scripts_used_in_MS: This folder contains the scripts that were run for our population simulations and to fit CKMR models to each set of samples drawn from those simulations.
- 01_Data.generating.model: Each script here generates a distinct population, samples three different subsets of the population, and then saves various outputs, including information about each set of samples. The file rseeds_2022_04.15.rda can be imported to reproduce the exact populations used in the MS.
  - functions: This folder contains various functions and scripts that are sourced in the main population simulation scripts one level up. The file lemon_shark_indv_based_sim.R contains the main population simulation code, saved as a function, while the specify_simulation.R script sets parameters governing mortality (and by extension, population growth) and fecundity/breeding cycle based on the values input to the population simulation script. PopSim_truth.R and query_results_PopSim.R are both sourced from the main population simulation scripts as well.
- 02_Estimation.model: This folder contains scripts that import the output from the population simulations, specify parameters for the CKMR model being tested, and then fit one or more CKMR model to each set of samples. The file rseeds_2022_04.15.rda can be imported to reproduce the exact model output used in the MS. The different scripts are labeled according to the scenario being tested (see the file Simulation_log_key.xlsx for details).
  - functions: This folder contains functions sourced by the main estimation scripts. The files Obj123.functions.R and Obj4.functions.R have code for generating pairwise comparison matrices and for importing the truth to compare to model estimates. The files are mostly the same, but in Objective 4 we have added/edited functions for age misassignment. The RunJAGS* files contain code that specifies data for the CKMR model, fits the model, and calculates summary statistics for the posterior distributions. The file specify.simulation.R takes the scenario object from the script and specifies the appropriate parameters for the model as well as the specific model file to be used.
- 03_Lemon_shark_data: This folder contains data, code, and simulations for Bimini lemon sharks (see sections 2.5 and 3.5 in the associated MS).
  - The file Main_lemon_shark.csv contains real parentage data and morphometrics from a long-term monitoring program in Bimini, Bahamas. The study from which this was derived is linked above under the Sharing/Access information heading.
  - lemon_shark_data.R contains code to filter the Bimini lemon shark data and then fit a CKMR model to various subsets of the data. By default, the code estimates abundance in the present for three different time windows (three-year, five-year, and all available). In addition, there are “toggles” near the top of the script to specify whether full siblings should be retained or not, and whether the (extensive) dataset should be downsampled or not. If downsampling is specified, then the code will run on a loop to account for randomness in which samples are retained.
  - lemon_shark_FullSim_allsibs.R contains code to simulate a small population of lemon sharks. The base of the code is the same as the other simulations, however the population is much smaller and mortality is adjusted three times over the last 20 years of the simulation to produce a population that grows, stabilizes, and then declines. This code fits the CKMR model to three sliding time windows of data over a 20 year period, which is similar to the approach taken with the real dataset. In this case, full siblings are retained in the analysis.
  - lemon_shark_FullSim_filterSibs.R is the same as above, except only one sibling is kept from each mother/father pairing.
  - functions: This folder contains functions that are sourced by the code in the parent folder. These functions are similar to those in the 02_Estimation.model/functions folder, except in this case the main population function is embedded in the Obj5.functions.R script.

There is a lot of code and many data files here, but we have endeavored to make it accessible. Part of this process involved clarifying obscure names of functions and objects within the scripts. If any of the code does not run, then double-check that file paths are appropriate, as the workflow relies on frequent export/import of data. If the code still does not run, then make sure that the objects needed are present and appropriately named. If the code still will not run, please don’t hesitate to reach out to the lead author and I will be happy to assist you.

Thank you for your interest in this project!

Data and code from: Accounting for unobserved population dynamics and aging error in close-kin mark-recapture assessments

Data files

Abstract

README: Data and code from: Accounting for unobserved population dynamics and aging error in close-kin mark-recapture assessments

Description of the data and file structure

Sharing/Access information

Code/Software

Methods

Works referencing this dataset