Inbreeding reduces fitness in spatially structured populations of a threatened rattlesnake
Data files
Sep 03, 2025 version files 52.36 MB
-
EMR_inbreeding_project.zip
52.34 MB
-
README.md
16.86 KB
Abstract
Small and fragmented populations are at high risk of local extinction, in part because of elevated inbreeding and subsequent inbreeding depression. A major conservation priority is to identify the mechanisms and extent of inbreeding depression in small populations. The eastern massasauga (Sistrurus catenatus) rattlesnake is listed as Federally Threatened in the United States, having experienced significant habitat fragmentation and concomitant population declines over the past 200 years. Here, we use long-term monitoring of two wild populations of eastern massasaugas in Michigan to estimate the extent of inbreeding in each population, identify mechanisms that generate inbreeding, and test for the impact of inbreeding on fitness. Using targeted genomic data and spatial coordinates of capture locations from over 1000 individuals, we find evidence of inbreeding and link inbreeding to spatial kinship structure within populations, possibly driven by limited dispersal. We reconstruct multi-generational pedigrees for each population to measure reproductive output and use long-term capture-recapture data to estimate individual survival (i.e., the two major components of fitness). We find evidence of inbreeding depression in both fitness metrics. The 5% most inbred individuals are 13.5% less likely to have any surviving offspring and have 11.6% lower annual survival compared to all less inbred individuals. By combining genomics and long-term monitoring data, we are able to link the life history of eastern massasaugas to inbreeding and detect relationships between fitness and inbreeding. These insights provide important conservation context for future management and for understanding how spatial structure can generate inbreeding depression even at fine spatial scales.
Contact person:
Corresponding author:
Meaghan Clark, meaghaniclark[at]gmail[dot]com, https://orcid.org/0000-0003-3297-8372
Description of the data and file structure
For a visual representation of the analysis pipeline, see pipeline_viz.pdf
EMR_inbreeding_project.zip is a compressed directory that includes the following folders––
data_inputs/: Contains processed data input files for genetic, spatial, and pedigree analyses in R.
- BAR_LifeHistData_dupes.csv: Each row contains a set of IDs that reference technical duplicates of the same individual from the Barry County site
- CAS_captive_born.Robj: R object containing two objects.
- 1: A list containing the numerical IDs of captive-born siblings. Name of each object in list is the ID of the mother.
- 2: List of individual IDs of captive-born individuals who were never captured again after partuition, and who did not have offspring assigned in the pedigree.
- depth_vcftools_12924.idepth: Output of vcftools --depth with per-individual sequencing depth information.
- depth_vcftools_12924.ldepth.mean: Output of vcftools --site-mean-depth with average per-site sequencing depth
- depth_vcftools_12924.log: Log file for command that generated depth_vcftools_12924.idepth.
- emr_rapture_filtered_15.vcf: VCF file containing genomic sequencing data.
- final_data_objects.Robj: R object containing a list with three entries:
- 1: input for pedigree reconstruction
- id: individual ID
- site: site of origin
- sex: sex of individual
- birthYear: birth year (if known, NA if unknown)
- BY.est: estimated birth year, NA if birth year could not be estimated
- BY.min: minimum birth year to be considered during pedigree reconstruction
- BY.max: maximum birth year to be considered during pedigree reconstruction
- 2: Centroid spatial locations of individuals, Barry County
- id: individual ID
- site: site of origin
- easting_shifted: centroid location of capture locations. Coordinates have been shifted to protect sensitive spatial information.
- northing_shifted: centroid location of capture locations. Coordinates have been shifted to protect sensitive spatial information.
- 3: Centroid spatial locations of individuals, Cass County
- id: individual ID
- site: site of origin
- easting_shifted: centroid location of capture locations. Coordinates have been shifted to protect sensitive spatial information.
- northing_shifted: centroid location of capture locations. Coordinates have been shifted to protect sensitive spatial information.
- 1: input for pedigree reconstruction
- Individual_estimates_SVL52.csv: Survival estimates from CJS models
- Fgrm: inbreeding estimated as Fgrm
- Site_name: site of origin
- Phi: apparent annual survival probability
- site_depth.ldepth: Output of vcftools --site-mean-depth with per-site sequencing depth
- site_depth.log: Log file for command that generated depth_vcftools_12924.ldepth.mean.
- Spaced_baits.fas_mrg_buf.bed: Genomic positions of SNPs targeted by RAPTURE protocol.
- survival_data.Robj: Estimated survival probabilities for each individual across time using top-ranked CJS model. R object of list with two entries, 1: Cass County, 2: Barry County.
- ID: individual ID
- Sex.new: sex of individual, NA if unknown
- First.cap: year first capture
- Last.cap: last year captured
- 1989-2023: estimated survival probability in the associated year
- Fgrm: inbreeding, estimated by Fgrm
- Age.at.1st.capture: estimated age of individual at first capture event
- survival_Fgrm_CAS&BAR_52svl_15June24.csv: Apparent annual survival probabilities from top-ranked CJS model. Each row represents an individual
- Fgrm: inbreeding, estimated by Fgrm
- Phi: apparent annual survival probability
- SE: standard error
- LCI: lower 85% confidence interval
- UCI: upper 85% confidence interval
- Site: site of origin
data_objects/: Includes intermediate data objects produced during data processing for genetic, spatial, and pedigree analyses.
intermediate_plots/: Contains all intermediate figures generated using code in scripts that are not included in the paper, but were useful to us in exploring the data.
scripts/: Includes .R scripts for genetic, spatial, and pedigree analyses in the paper. All analyses are conducted in R, with the exception of CJS models, which are conducted in MARK and requires separate installation.
- hpcc_scripts/: directory containing scripts required to process raw sequencing reads.
- wrapper-align_baits_to_genome.sh: Align fasta files containing bait information to a reference genome and generate .bed files corresponding to the targeted loci
- Executable: align_baits_to_genome.sbatch
- INPUT: FASTA file containing spaced baits information: "Spaced_baits.fas"
- OUTPUT: BED file of location of spaced baits in reference genome "../data_inputs/Spaced_baits.fas_mrg_buf.bed"
- wrapper-run_Process_radtags.sh: Runs the process_radtags functionality of STACKS.
- Executable: run_Process_radtags.sbatch
- INPUT:
- "lib_id.txt"
- Raw sequencing files in fastq.gz format (BioProject PRJNA1295676)
- OUTPUT: demultiplexed sequencing data
- wrapper-run_cutadapt.sh: Trims adapter contamination from sequencing reads
- Executable: run_cutadapt.sbatch
- INPUT: demultiplexed sequencing data
- OUTPUT: trimmed sequencing data
- wrapper-align_to_genome.sh: Aligns trimmed reads to reference genome using bwa mem and processes alignments with samtools
- Executable: align_to_genome.sbatch
- INPUT: trimmed sequencing data
- OUTPUT: processed .bam files
- wrapper-isolate_rapture_loci.sh: Filters .bam alignment files to retain only targeted loci
- Executable: isolate_rapture_loci.sbatch
- INPUT: processed .bam files
- OUTPUT: processed .bam files for targeted genomic regions
- wrapper-run_ref_map.sh: Calls SNPs using the ref_map.pl STACKS pipeline on targeted genomic regions
- Executable: run_ref_map.sbatch
- INTPUT: processed .bam files for targeted genomic regions
- OUTPUT: VCF file
- wrapper-run_populations.sh: Calculate population genetic statistics from STACKS pipeline
- Executable: run_populations.sbatch
- INPUT: VCF file
- OUTPUT: population genetic statistics
- vcf_filter.sh: Perform preliminary SNP filtering
- INPUT: VCF file
- OUTPUT: "../data_inputs/emr_rapture_filtered_15.vcf"
- wrapper-align_baits_to_genome.sh: Align fasta files containing bait information to a reference genome and generate .bed files corresponding to the targeted loci
- filter_vcf.R: Load and filter VCF file from STACKS, calculate genotyping error rate
- INPUT:
- VCF file: "../data_inputs/emr_rapture_filtered_15.vcf"
- BED file of BAITS: "../data_inputs/Spaced_baits.fas_mrg_buf.bed"
- Depth information:
- "../data_inputs/depth_vcftools_12924.ldepth.mean"
- "../data_inputs/depth_vcftools_12924.idepth"
- "../data_inputs/site_depth.ldepth"
- OUTPUT: Filtered SNPs "../data_objects/BAR_filtered_gt_all_snps_noZ_04152025.Robj" "../data_objects/CAS_filtered_gt_all_snps_noZ_04152025.Robj"
- INPUT:
- calc_popgen_stats.R: Calculate population genetic statistics, including Fgrm, for filtered genotypes
- INPUT:
- Filtered SNPS "../data_objects/BAR_filtered_gt_all_snps_noZ_04152025.Robj" "../data_objects/CAS_filtered_gt_all_snps_noZ_04152025.Robj"
- Processed individual metadata "../data_inputs/final_data_objects.Robj"
- OUTPUT:
- Population genetic statistics: heterozygosity, Fgrm, Principal Components "../data_objects/pop_gen_stats_04152025.Robj"
- Pairwise pi matrices "../data_objects/pwp_04152025.Robj"
- Site specific principal components "../data_objects/PCA_loadings_04152025.Robj"
- INPUT:
- vcf_funcs.R: Custom functions required for filter_vcf.R and calc_popgen_stats.R
- extract_by_longevity.R: Estimate individual longevity and extract birth year information from survival probabilities
- INPUT: Survival estimates from CJS models "../data_inputs/survival_data.Robj"
- OUTPUT: Birth year estimates for pedigree reconstruction "../data_objects/byEstimates_04152025.Robj"
- make_pedigree.R: Reconstruct pedigrees
- INPUT:
- Processed individual metadata "../data_inputs/final_data_objects.Robj"
- Filtered SNPs "../data_objects/BAR_filtered_gt_all_snps_noZ_04152025.Robj" "../data_objects/CAS_filtered_gt_all_snps_noZ_04152025.Robj"
- IDs of captive born individuals "../data_inputs/CAS_captive_born.RObj"
- Birth year estimates "../data_objects/byEstimates_04152025.Robj"
- OUTPUT:
- Reconstructed pedigrees "../data_objects/BAR_pedigree_results_04152025.Robj" "../data_objects/CAS_pedigree_results_04152025.Robj"
- Pedigree confidence results "../data_objects/BAR_conf_04152025.Robj" "../data_objects/CAS_conf_04152025.Robj"
- INPUT:
- pedigree_conf.R: Plot pedigree confidence results and output Supplemental Tables
- INPUT: Pedigree confidence results "../data_objects/BAR_conf_04152025.Robj" "../data_objects/CAS_conf_04152025.Robj"
- OUTPUT: Supplemental Tables "../figs/supplemental/BAR_conf_prob_04152025.txt" "../figs/supplemental/CAS_conf_prob_04152025.txt"
- make_clean_data_object.R: Make a clean data object for final analyses, combining information from pedigree, metadata, and spatial data
- INPUT:
- Processed individual metadata "../data_inputs/final_data_objects.Robj"
- Population genetic statistics "../data_objects/pop_gen_stats_04152025.Robj"
- Pairwise pi matrices "../data_objects/pwp_04152025.Robj"
- Reconstructed pedigrees "../data_objects/BAR_pedigree_results_04152025.Robj" "../data_objects/CAS_pedigree_results_04152025.Robj"
- IDs of captive born individuals "../data_inputs/CAS_captive_born.RObj"
- Birth year estimates "../data_objects/byEstimates_04152025.Robj"
- OUTPUT:
- Pairwise distances between individual centroids "../data_objects/centroid_distances_04152025.Robj"
- Filtered pedigrees "../data_objects/filtered_pedigrees_04152025.Robj"
- Pedigree relationship and relatedness matrices "../data_objects/kin_relM_04152025.Robj"
- Pairwise distances between relationship types "../data_objects/BAR_distances_04152025.Robj" "../data_objects/CAS_distances_04152025.Robj"
- Individual data (id, dam, sire, site, sex, geometry, Fgrm, heterozygosity, pedigree inbreeding, offspring number, birth year, estimated birth year, maximum birth year, minimum birth year, estimated last year, number of years contributing to pedigree, PCs 1 through 6) "../data_objects/data_for_analyses_04152025.Robj"
- INPUT:
- pedigree_funcs.R: Custom functions required for make_clean_data_object.R and analyses_figures.R
- run_RO_models.R: Build and run reproductive output models
- INPUT: Individual data "../data_objects/data_for_analyses_", date, ".Robj"
- OUTPUT:
- Model summary for conditional model "../data_objects/Fgrm_data_for_cond_plot.Robj"
- Model summary for zero-inflated model "../data_objects/zi_Fgrm_plot_data_04152025.Robj"
- analyses_figures.R: Do spatial analyses and make figures for EMR inbreeding paper
- INPUT:
- Processed individual metadata "../data_inputs/final_data_objects.Robj"
- Pedigree confidence results "../data_objects/BAR_conf_04152025.Robj" "../data_objects/CAS_conf_04152025.Robj"
- Pairwise pi matrices "../data_objects/pwp_04152025.Robj"
- Site specific principal components "../data_objects/PCA_loadings_04152025.Robj"
- Reconstructed pedigrees "../data_objects/BAR_pedigree_results_04152025.Robj" "../data_objects/CAS_pedigree_results_04152025.Robj"
- Filtered pedigrees "../data_objects/filtered_pedigrees_04152025.Robj"
- Pairwise distances between relationship types "../data_objects/BAR_distances_04152025.Robj" "../data_objects/CAS_distances_04152025.Robj"
- Pedigree relationship and relatedness matrices "../data_objects/kin_relM_04152025.Robj"
- Pairwise distances between individual centroids "../data_objects/centroid_distances_04152025.Robj"
- OUTPUT:
- All statistics and figures for manuscript
- INPUT:
Survival, growth, longevity/: Contains input files and scripts for running CJS models in MARK as well as estimating longevity from survival data.
- CJS analysis/
- CJS candidate set/: Contains input and output files for all CJS candidate models.
- CJS_ELF_PCCI4GROUPSADDITIVE_AF_FUNCTION2.FPT: MARK output file with log information for running of CJS candidate models.
- CJS_ELF_2009-2023.inp: MARK input file with capture-recapture data from 2009-2023.
- Columns represent: Individual field IDs, temporal capture records (0 = absent, 1 = capture), group ID, 30 individual temporal covariates: snout-vent-length (SVL) 1-SVL15 in cm (15), AF_priorcap1-AF_priorcap14 (14) (function that returns a 1 if a female was captured two years ago and was >=45cm), and 1 static individual covariate: Fgrm. "." indicate missing data.
- CJS_ELF_PCCI4GROUPSADDITIVE_AF_FUNCTION2.DBF: Database file showing model results from MARK analyses.
- CJS top model (standardized)/: contains input and output files for top-ranked CJS model from AICc comparison with standardized covariates.
- CJS_ELF_2009-2023_stdz.inp: MARK input file with capture-recapture data from 2009-2023.
- Columns represent: Individual field IDs, temporal capture records (0 = absent, 1 = capture), group ID, 30 standardized individual temporal covariates: snout-vent-length (SVL) 1-SVL15 in cm (15), AF_priorcap1-AF_priorcap14 (14) (function that returns a 1 if a female was captured two years ago and was >=45cm), and 1 static standardized individual covariate: Fgrm. "." indicate missing data.
- CJS_ELF_2009-2023_stdz.DBF: Database file showing model results from MARK analyses.
- CJS_ELF_2009-2023_stdz.FPT: MARK output file with log information for running of top CJS model with standardized covariates.
- CJS_ELF_2009-2023_stdz.inp: MARK input file with capture-recapture data from 2009-2023.
- CJS candidate set/: Contains input and output files for all CJS candidate models.
- Growth and longevity/
- SVL_by_occasion_Cass.csv: Snout-vent-length (SVL) across capture years at Cass County site. NAs indicate years individuals were not captured.
- ID: individual ID
- Sex: individual sex
- Columns 1-15: SVL across years.
- Age_first_cap_Cass.csv: File contains age estimates at first capture events for individuals at Cass and Barry County sites
- ID: individual ID
- Age.at.1st.capture: age estimate of individual at its first capture event.
- CJS_betas.csv: Effect sizes from top-ranked CJS model
- Label: variable name
- Estimate: beta or effect size
- SE: standard error
- LCI: lower 85% confidence interval
- UCI: upper 85% confidence interval
- First_caps_age_0_Barry.csv: List of individuals who were first captured at age 0 at Barry County site.
- ID: individual field ID
- Age_at_first_cap: age at first capture (0 or NA)
- First_caps_age_0_Cass.csv: List of individuals who were first captured at age 0 at Cass County site.
- ID: individual field ID
- Age_at_first_cap: age at first capture (0 or NA)
- First_caps_age_1_lessthan_24.5_Cass.csv: List of field IDs of individuals who were first captured at age 1 with a SVL of less than 24.5 cm at Cass County.
- Growth_data.csv:
- ID: individual ID
- Date: date of first capture event
- SVL1: SVL at Date in cm
- Date2: date of second capture event
- SVL2: SVL at Date2 in cm
- diff_in_days: difference in days between Date and Date2
- Sex: sex of individual
- SexNum: sex as an integer (0 = female, 1 = male)
- deltat: Length of time between Date and Date2 in years (diff_in_days/365)
- Site: Study site
- Site_sex: string containing site and sex for each individual
- SiteNum: site as an integer
- ID_Fgrm.csv: Inbreeding (Fgrm) estimates
- ID: individual field ID
- Fgrm: inbreeding as estimated by Fgrm
- PNAS_code.R: R code for Fabens (1965) von Bertalanffy models, model selection, and derived estimates of SVL by age, SVL predictions used for covariate in CJS model, backward in time predictions of SVL used for inference of individual birth years, and cumulative survivorship estimates used for calculation of longevity and years contributing offspring to the pedigree.
- SVL_by_occasion_Barry.csv: Snout-vent-length (SVL) across capture years at Barry County site. NAs indicate years individuals were not captured.
- ID: individual ID
- Sex: individual sex
- Columns 1-15: SVL in cm across years
- SVL_by_occasion_Cass.csv: Snout-vent-length (SVL) across capture years at Cass County site. NAs indicate years individuals were not captured.
