Data from: Translocations spur population growth but fail to prevent genetic erosion in imperiled Florida Scrub-Jays

Linderoth, Tyler 1 ; Deaner, Lauren2 ; Chen, Nancy 3 ; Bowman, Reed4 ; Boughton, Raoul K.5 ; Fitzpatrick, Sarah W.1

Published Feb 25, 2025 on Dryad. https://doi.org/10.5061/dryad.z612jm6j0

Abstract

Land and natural resource use that supports human society can restrict populations to degraded and fragmented habitat, which catalyzes extinction and biodiversity loss through the interplay of small population size and genetic decay. Translocating individuals is a powerful approach for overcoming direct threats from human development and reconnecting isolated populations, although this strategy is not without risks. Consequently, there is a pressing need to understand the demographic and genetic outcomes of translocations in order to determine their conservation efficacy. We achieved this by leveraging the rare opportunity of having a nearly complete population pedigree from two decades of intensive demographic monitoring coupled with temporal genomic sequencing and simulations to evaluate how translocating Federally Threatened Florida Scrub-Jays from five subpopulations into an area of restored habitat with a small recipient population influenced their recovery. Translocations led to an expanding core population that rapidly grew 10-fold in size, primarily fueled by a small subset of highly successful translocated individuals, with one breeding pair responsible for ~24% of the population’s expected genetic ancestry for 15 years on average. This high reproductive skew led to increased inbreeding and genetic erosion, despite the population expansion and simulation results showing that the variance of ancestral genetic contributions was likely reduced by translocations. These mixed conservation outcomes stress the importance of genomic and demographic monitoring, as well as the potential need for genetic rescue to offset the consequences of reproductive skew in isolated populations following translocations, regardless of demographic recovery, in order to achieve long-term species viability.

Data for the study "Translocations spur population growth but fail to prevent

genetic erosion in imperiled Florida Scrub-Jays"

Authors: Tyler Linderoth, Lauren Deaner, Nancy Chen, Reed Bowman, Raoul K. Boughton, Sarah W. Fitzpatrick
Year: 2025

Contact: Tyler Linderoth, lindero1@msu.edu

Code used to analyze data are available at https://github.com/tplinderoth/M4_FSJ_translocations or
the archived release of this GitHub repository: https://doi.org/10.5281/zenodo.14606489

Usage notes

The following is a list of the different data types by file suffix included in this repository along with brief
descriptions for how to work with them. Many examples for how these data were used can be found in the
"Linderoth_etal_Mosaic_FSJ_translocations_study_code.txt" document from the GitHub repository referenced above.

Descriptions of FASTA, BED, and VCF file formats are found in the SAMtools file-format specifications page.

.xlsx: Data in Excel format that can be viewed and manipulated with programs like LibreOffice Calc, OpenOffice Calc,
Microsoft Office, or imported into Google Sheets.

.txt: Plain text files that can be viewed or manipulated with any plain text editor (e.g., Linux less command, Nano, Vim, Text Editor).

.tsv: Tab-separated values files, which are plain text files with tab-delimited data fields. These files can be
viewed/manipulated with any plain text editor or read with the R software
using the read.table command with the sep="\t" argument.

.fa: Plain text DNA sequence information in FASTA format.

.vcf.gz: Variant Call Format (VCF) files containing genetic variant information that have been compressed using bgzip.
The VCF files contain information about the identity, position with respect to the reference genome, quality, and calling
of genetic variants. These files can also contain individual genotype information. BCFtools is
a useful software for viewing, manipulating, and analyzing VCF format data. Header lines begin with "#".

.bed: Plain text files containing genomic coordinates in BED format. Various genomic analyses software
(e.g., BCFtools, VCFtools, SAMtools) can use BED format files to include
or exclude specific genomic regions in analyses. BEDtools is a useful program for
working with BED format files. Coordinates in BED files are 0-based and half-open.

.pos: Plain text files containing genomic coordinates that are 1-based and inclusive. These genomic
position files are used by genomic analyses software, including those used in the study (ANGSD, BCFtools),
for specifying particular genomic regions to analyze.

.covar: Plain text, symmetric, covariance matrix file with tab-delimited values. The first first row contains
individual IDs. This file can be viewed with any plain text editor or read with
R using the read.table command and head=TRUE, sep="\t" arguments.

.pedstat1 and .skewstat3: These are plain text files with tab-delimited values produced by the relateStats software using the
--pedstat 1 and --skewstat 3 arguments. The format of these files is described below and in the relateStats documentation.
These files can be viewed with any plain text editor or read with R using the read.table command and head=TRUE argument.

M4 Florida Scrub-Jay demographic monitoring data

M4_FSJ_monitoring.xlsx

This Excel file contains demographic monitoring data for the M4 Florida Scrub-Jay metapoplation.
The following text describes the information contained in its four sheets.

The "jay_individuals" sheet lists all recorded individuals and has columns:
(1) FEDERAL_ID: US Fish and Wildlife Service (USFWS) identifier for individual.
(2) COLOR_ID: Color band identifier for individual.
(3) SEX: Individual's sex in the monitoring data.
(4) YEAR_CLASS: Year the individual was born.
(5) NATAL_PATCH: Location of individual's birth.
(6) YEAR_GROUP.NEST: Individual's birth nest identifier in terms of the family group to which the nest belonged and year.

The "census" sheet contains information for individuals observed during
censuses and has columns:
(1) CENSUS_MONTH: Month of the census.
(2) CENSUS_YEAR: Year of the census.
(3) CENSUS_MONTH.YEAR: The combined month and year of the census.
(4) FEDERAL_ID: The USFWS identifier of the observed individual.
(5) COLOR_ID: The color band identifier of the observed individual.
(6) PROPERTY: Location where the individual was observed.

The "nest" sheet contains information about every nest attempt and has columns:
(1) BREEDING_MALE: Male of breeding pair.
(2) BREEDING_FEMALE: Female of breeding pair.
(3) YEAR_GROUP: Nesting year and family group to which the nest belonged.
(4) NEST_ID: Nest identifier.
(5) PROPERTY: Location of nest.
(6) CLUTCH_SIZE: Total number of eggs laid.
(7) NUMBER_HATCHED: Number of eggs that hatched.
(8) NUMBER_FLEDGED: Number of offspring that fledged.
(9) NUMBER_HELPERS: Number of helper individuals for nest attempt.

The "Core_Region_pedigree" sheet contains the pedigree for the M4 Core Region population. The
columns are:
(1) ID: Focal individual's color band identifier.
(2) SIRE_ID: Color band identifier of focal individual's father.
(3) DAM_ID: Color band identifier of focal individual's mother.
(4) SEX: Focal individual's sex.
(5) COHORT: Year focal individual first existed in the Core Region.
(6) COHORT_LAST: Last year focal individual was observed in the Core Region.
(7) POPULATION: Population of origin.

In all sheets "*" denotes missing data.

Location abbreviations:
MW = Mosaic Wellfield site (part of the M4 Core Region)
COKER = Coker Tract (part of the M4 Core Region)
DP,DUETTE = Duette Preserve (part of the M4 Core Region)
LMRSP = Little Manatee River State Park

cr_pedigree_population_size.tsv

Core Region population size calculated from the pedigree. The columns are:
(1) YEAR
(2) N_TOTAL: Total number of individuals that existed in the population in the respective year.

cr_census_population_size.tsv

Core Region population size from the population census data. Note that these population sizes differ slightly from those
calculated from the pedigree because some individuals that existed in the Core Region in a given year were missed in a
census. The columns are:
(1) YEAR
(2) CENSUS_N: Total number of individuals that were censused in the population in the respective year.

M4 Florida Scrub-Jay genomic data

Whole genome short read data

Sequencing read FASTQ files are available from the Short Read Archive under BioProject accession PRJNA1099469.

FSJ.V3.fa

Florida Scrub-Jay (Aphelocoma coerulescens, hereafter FSJ) genome assembly, which served as the reference
genome for short reading mapping. This is a draft assembly version of the assembly described in
Romero et al. "A new high-quality genome assembly and annotation for the threatened
Florida Scrub-Jay (Aphelocoma coerulescens)": https://doi.org/10.1093/g3journal/jkae232.
An alignment using Minimap2 with argument -f 0.02 between these two genome assemblies is provided in the
FSJV.3_vs_Romero_etal_july2024_FSJgenome_alignment.tsv file. In this alignment file the query
corresponds to the FSJ.V3.fa genome contained in this repository while the target genome is the one cited above.

FSJ_V3_main_chr.bed

Bed format file listing the genomic coordinates of scaffolds in the FSJ.V3 assembly that were
homologous with zebra finch chromosomes. Genetic analyses were restricted to these scaffolds
and also excluded Chromosome 24 (the Z chromosome) except for when determining the sex of
individuals based on Z to autosome depth ratios.

fsj_mosaic_allsites_genome_sitesonly.vcf.gz

Variant call format (VCF) file containing all sites in the genome (including monomorphic sites)
without individual genotype information. Sample-wide variant and quality information are contained in
this file.

fsj_mosaic_variants_genome.vcf.gz

Variant call format (VCF) file containing genotypes and genetic variant information for
all variable sites in the genome. The sequencing depth of coverage for individuals in this
study was low (mean 7.2x, SD = 1.0), therefore using the genoype calls should be avoided. Instead, it
is recommended to use the genotype likelihoods (FORMAT/PL field) or genotype
posterior probabilities (FORMAT/GP field) for analyses.

fsj_mosaic_allsites_genome_qc.pos

Position file of sites (including monomorphic) across the entire FSJ.V3 genome assembly that
pass quality controls and are located in regions accessible to accurate short read mapping.

fsj_mosaic_allsites_main_autosomes_qc.pos

Position file of sites (including monomorphic) restricted to autosomal scaffolds in FSJ_V3_main_chr.bed
that pass quality controls and are in regions accessible to accurate short read mapping. Genetic analyses
based on all sites used these sites.

fsj_mosaic_biallelic_snps_genome_qc.pos

Position file of biallelic SNPs across the entire FSJ.V3 genome assembly that pass quality controls
and are in regions accessible to accurate short read mapping.

fsj_mosaic_biallelic_snps_main_autosomes_qc.pos

Position file of biallelic SNPs restricted to autosomal scaffolds listed in FSJ_V3_main_chr.bed that
pass quality controls are in regions accessible to accurate short read mapping. Genetic analyses based on
SNPs used these sites.

M4_sequenced_individual_metadata.tsv

Metadata for sequenced M4 Florida Scrub-Jays. The columns are:
(1) LAB_ID: The Fitzpatrick Lab identifier for the individual.
(2) USFWS_ID: Individual's USFWS identifier.
(3) BAND_ID: Individual's color band identifier.
(4) POPULATION: Population of origin at the time of sampling.
(5) NATAL_GROUP: Family group into which the individual was born.
(6) SEX: Sex
(7) YEAR_SAMPLED: The year at which blood was sampled for genetic analyses.
(8) AVG_AUTOSOME_DEPTH: The average sequencing depth across covered regions of autosomes listed in FSJ_V3_main_chr.bed.
(9) AVG_Z_DEPTH: The average sequencing depth for covered regions of the Z chromosome.
(10) DEPTh_RATIO: The ratio of average Z chromosome to autosome sequencing depth.
(11) H: Heterozygosity per bp.
(12) F: Inbreeding coefficient estimated with ngsF (Vieira et al. 2013).
(13) FROH: Inbreeding coefficient measured as the proportion of the genome in ROH.
(14) TRANSLOCATION_YEAR: Year that the individual was translocated.
(15) TRANSLOCATION_MONTH: Month that the individual was translocated.
(16) TRANSLOCATION_AGE: Age in years at which the individual was translocated.
(17) TRANSLOCATION_WEIGHT_GRAMS: Individual's weight in grams when they were translocated.

"*" denotes missing data.

inbreed_coefficient_alternate_calculations.tsv

Inbreeding coefficients calculated using different approaches for estimating allele frequencies, specifically
exploring the effect of pruning relatedness and using different reference populations. The columns are:
(1) LAB_ID: The Fitzpatrick Lab identifier for the individual.
(2) BAND_ID: Individual's color band identifier.
(3) USFWS_ID: Individual's USFWS identifier.
(4) F_PS_ALL: ngsF F using group-specific allele frequencies estimated without relatedness pruning.
(5) F_META_ALL: ngsF F using metapopulation-wide allele frequencies for all individuals estimating without relatedness pruning.
(6) FROH_META_PRUNE: FROH using metapopulation-wide allele frequencies estimated from a subset of individuals for which all pairwise relatedness was below 0.4.

Matrices of relationships

fsj_mosaic_biallelic_snps_main_autosomes_qc_relatedness_matrix_relateStats_input.txt

Symmetric relatedness matrix for all sequenced individuals estimated from genomic data using
ngsRelate (Hanghøj et al. 2019). The first row contains individual identifiers.

cr_ped_20230512_relatedness_matrix.txt

Symmetric relatedness matrix estimated from the M4 Core Region pedigree using the makeA
function of the nadiv (Wolak 2012) R package. The first row contains individual identifiers.

fsj_mosaic_biallelic_snps_main_autosomes_qc_all_unweighted_wIDs.covar

Genetic covariance matrix for all sequenced individuals estimated with ngsCovar (Fumagalli et al. 2014).
The first row contains individual identifiers (Fitzpatrick Lab IDs).

Reproductive skew analysis

expected_genetic_contribution.tar.gz

This tarball contains output files from relateStats (https://github.com/tplinderoth/PopGenomicsTools) --pedstat 1.
Each * .pedstat1 file has columns:
(1) ID: Individual ID
(2) N_GENOME_COPIES: Expected number of genome copies contributed to the Core Region (CR) population or cohort by individual.
(3) P_ANC_FOCAL: The expected proportion of genomic copies contributed by individual to the CR population or cohort out of all ancestor's contributions.
(4) P_ANC_MAX: Expected proportion of genomic copies contributed by individual to the CR population or cohort
out of the max contribution possible by ancestors when the focal ancestor entered the population (Hunter et al. (2019) normalization).

Expected contributions to the CR population are given in files named T_IE_20230512_< * >_pop_contribution.pedstat1,
where < * > is the year for which contributions are measured.

Expected contributions to CR cohorts are given in files named T_IE_20230512_to_< * >_LR_contribution.pedstat1,
where < * > is the year in which individuals comprising the cohort were born into the CR.

CR_T_IE_to_C_contributions_c0.0243058.skewstat3

This file contains reproductive skew statistics, K, for ancestors of the contemporary CR population calculated with
relateStats --skewstat 3. This file has columns:
(1) ID: color band ID
(2) Swtr: K statistic

number_contributing_lineages_to_cohorts_cr_ped_20230512.tsv

The number of expected founding individuals contributing to Core Region population cohorts. The columns are:
(1) YEAR: Year that cohort was born.
(2) N_CONTRIBUTORS: Total number of genetically contributing founders.
(3) N_RES: Number of genetically contributing resident founders.
(4) N_TRANS: Number of genetically contributing translocated founders.

Population simulations

All scripts used to perform simulations (including simPed.R and simJay.R) are available at https://github.com/tplinderoth/M4_FSJ_translocations

pedigree_constrained_simulations.tar.gz

Expected genetic contribution results from simulations with simPed.R in which pairing among unpaired adults and reproduction were
completely random and the population size and number of breeding pairs were constrained to closely match the observed pedigree. The probability of pairing
among unpaired adults and reproductive output among pairs was uniform. Each file contains genetic contributions by individuals
or groups to either the population or cohort (if "cohort" is in the file name) for the year given in the file name. Each file
contains the results of 10k simulations where each row is one simulation. The columns are:
(1) SEED: Random number generator seed for the simulation.
(2) MAX_CONTRIBUTION: The maximum genetic contribution observed among the focal individuals by a single individual.
(3) N_RESIDENT_CONTRIBUTORS: Number of genetically contributing resident individuals.
(4) N_TRANSLOCATED_CONTRIBUTORS: Number of genetically contributing translocated individuals.
(5–78): The genetic contribution by the founding individual specified by the column name (names are color band identifiers).

The following files are inputs to simPed.R, which was used to simulate pedigrees.

cr_ped_20240602.tsv

Core Region FSJ population pedigree used for simPed.R "ped tsv file" input. The columns are:
(1) ID: Color band identifier of the focal individual.
(2) SIRE_ID: Focal individual's father.
(3) DAM_ID: Focal individual's mother.
(4) SEX: Sex of individual.
(5) COHORT: First year that the individual was censused in the population.
(6) COHORT_LAST: Last year that the individual was censused in the population.
(7) POPULATION: Individual's population of origin.

jayid_metadata_20240602.tsv

Information for each individual in the population used for simPed.R "individual metadata tsv" input. The columns are:
(1) COLOR_ID: Individual's color band identifier.
(2) YEAR_CLASS: Year that individual was born.
(3) Biological_Origin_Status: Code for the individual's biological group membership (translocated, local recruit, resident, immigrant, etc.).

nest_metadata_20241115.tsv

Breeding pair information used for simPed.R "nest metadata" with columns (only the first four are required by simPed.R):
(1) Property: Location of the breeding pair's territory.
(2) Breeding_Male: Color band identifier of the male comprising the pair.
(3) Breeding_Female: Color band identifier of the female comprising the pair.
(4) YEAR: Nesting year.
(5) NEST_ID: Unique nest identifier.
(6) HELPERS: Specifies whether the breeding pair had helpers ("Y") or not ("N"). "U" indicates that it is unknown whether the breeding pair had helpers.
(7) Nest_Attempt: The order in the sequence of nest attempts made by the breeding pair for the respective year.
(8) Number_Fledged: Number of fledged offspring. "FAILED" and "ABANDONED" indicate zero offspring fledged. "N/A", "*", or "UNKNOWN" indicate missing data for the number of fledged offspring.

cr_ancestors_20240602.tsv

Identity of population founders used for simPed.R --anc argument. The columns are:
(1) Ancestor's color band identifier.
(2) Whether they were a resident or translocated individual.

res_trans_focal_IDs.txt

List of names of individuals used as input to simPed.R --focal_ind argument. These are the individuals for
which genetic contributions are reported in the simulation results output.

population_simulations.tar.gz

Demographic and expected genetic contribution results from simulations with simJay.R. The probabilities for new pair
formation, reproductive output, and survival were estimated from the observed pedigree. Pairing among unpaired
adults was completely random. We simulated populations under the following three demographic scenarios:
const_notrans: A resident population that did not receive translocated individuals and which could not exceed four breeding pairs.
grow_notrans: A resident population that did not receive translocated individuals and which could not exceed 134 breedings pairs (pair carrying capacity).
grow_wtrans: A resident population that received translocated individuals and which could not exceed 134 breeding pairs (this most closely matches the actual scenario).

Files named < model >_genetic_contributions3.tsv contain the following data for the respective model:
(1) SEED: The random number seed for simJay.R.
(2) YEAR: Year in the simulation.
(3–last column): The expected number of genomic copies contributed to the population by the respective ancestor named in the header row.

The simulation_demographic_stats3.tsv file has columns:
(1) model: Demographic model (i.e., const_notrans, grow_notrans, or grow_wtrans).
(2) seed: The random number seed for simJay.R.
(3) year: Year in the simulation.
(4) n: Number of individuals in the population.
(5) n_adult: Number of adults in the population (individuals >= 2 years old).
(6) n_breeder: Number of individuals that produced offspring.
(7) n_pairs: Number of breeding pairs currently in existence.

The following files are inputs to simJay.R, which was used to simulate populations.

cr_individuals_20241014.tsv

Information for individuals used as input to simJay.R --ind_file argument for seeding the simulations ("*" denotes missing data). The columns are:
(1) ID: Individual's color band identifier.
(2) SEX: Individual's sex.
(3) YEAR_CLASS: The year that the individual was born.
(4) COHORT: The first year that the individual existed in the population.
(5) COHORT_LAST: The last year that the individual existed in the population.
(6) ORIGIN: Individual's origin where "IE" = resident, "T" = translocated, "LR" = born in the focal population, "I" = immigrant, "TD-FR" = descendant of translocated individual born outside of the population.
(7) ANCESTOR: 0 or 1 value, where 1 indicates that the individual should be treated as a founding ancestor of the population.
(8) SIRE_ID: Individual's father.
(9) DAM_ID: Individual's mother.

mixed_poisson_survival_pmf2.txt

Probability distribution for survival used as input to simJay.R --survive_file argument. The columns are:
(1) Number of years.
(2) Probability that an individual survives the respective number of years.

empiric_offspring_pmf.tsv

Probability distribution for the number of offspring per pair each breeding season used as input to simJay.R --offspring_file argument. The columns are:
(1) Number of offspring.
(2) Probability that a pair has the respective total number of offspring in a breeding season.

pair_limits_const.tsv and pair_limits_grow.tsv

Demographic parameter files used as input to simJay.R --events_file argument that specify the timing and type of demographic events.
The pair_limits_const.tsv file was used for the const_notrans model and the pair_limits_grow.tsv file was used for the grow_notrans and
grow_wtrans models. The columns are:
(1) TIME: Time step in the simulation at which the event occurs (1-based).
(2) PAIR_LIMITS: Specifies the maximum number of breeding pairs allowed to exist at any particular time.