Data from: Sample size guidelines for mapping migration corridors and population distributions using tracking data

Beaupre, Chloe 1 2 3 ; Curtis, Angelique2; Frankland, Brent2; Halseth, Joseph2; Johnson, Aran; van de Kerk, Madelon1; Kircher, Alyssa2; Mao, Julie S.2; Shapiro, Jessie4; Slezak, Elissa2; VanNatta, Eric2; Young, Jessica R.1; Blecha, Kevin A.2

Published Jun 26, 2025 on Dryad. https://doi.org/10.5061/dryad.gmsbcc2tc

Data files

Jun 26, 2025 version files 50.06 MB

Abstract

Animal distribution maps are a key tool for wildlife conservation, guiding high-profile decisions, such as legally designating priority habitat or building highway crossing structures. GPS tracking data enhances these efforts but requires balancing statistically robust sample sizes with minimizing researcher impacts on wildlife and costs. Nevertheless, rigorous guidelines that leverage a priori information are still lacking on how to determine the optimal number of tracked animals (i.e., sample size) for accurately mapping migration corridors and seasonal ranges at the population level, particularly in the context of ungulate conservation. We used a cumulative curve resampling approach to evaluate the consequences of reduced animal sample size, assessed sample size sufficiency, and extrapolated where sample size sufficiency might occur outside of the observed data. We illustrate our approach with simulated data. We then compiled GPS data from 77 ungulate populations and aggregated individuals’ spatial distributions in each study area to create population-level migration and seasonal distributions, and examined whether known explanatory variables (e.g., population abundance, environmental metrics) could predict sufficient sample sizes to map population distribution. Our simulated and empirical analyses to assess and model sample size sufficiency demonstrated that sample size varies depending on the species, season, population-level percent volume contour of interest, and population abundance. For example, for migration distributions at the 95% volume contour, the interquartile range for the number of individuals needed to reach an adequate sample was 10 – 23 for bighorn sheep, 51 – 93 for elk, and 58 – 164 for mule deer. For existing datasets, the resampling approach quantifies the sensitivity of population distribution maps to sample size. To guide study design for future GPS tracking projects aimed at mapping population distributions, our models provide specific sample size recommendations incorporating known population covariates. If adequate model training data are available, our approach can be extended across a wide range of taxa and populations to inform sample size requirements for estimating robust distribution patterns.

We have submitted simulated location data for animals (example_data.zip) and R scripts to recreate our analyses.

We compiled GPS data from ungulate studies throughout Colorado and Utah to examine animal sample size sufficiency for population distributions. We resampled population migration and seasonal range distributions to evaluate the consequences of reduced data, assessed sample size sufficiency, and extrapolated where sample size sufficiency might occur outside of the observed data.

Rarefaction R scripts

1a_CorridorRarefaction.R, 1b_SeasonalRangeRarefaction.R

These script performs a rarefaction analysis to evaluate how the estimated area of population-level utilization distributions (migration corridors and seasonal [summer or winter] range respectively) changes as more individual animals are added.

Key Steps:

Loads individual UD rasters for each project.
Generates 100 random permutations of individual inclusion order.
Iteratively stacks rasters to build cumulative population-level UDs.
Calculates smoothed volume contours (50%, 75%, 90%, 95%, 99%) and associated areas.
Tracks data duration, number of migrations, and animal-years per step.
Runs in parallel to speed up processing.
Saves .rda, .csv, and .RData outputs per project.

Requirements:

R packages: raster, spatialEco, stringr, arrangements, foreach, doSNOW, parallel

Outputs:

Rarefaction curves showing how corridor area scales with sample size.
Summary metrics for individual contributions and data richness.

2_CompileAddCovariates.R

This script compiles rarefaction results from multiple projects, adds ecological and management covariates, and prepares both wide and long-format datasets for analysis.

Requirements:

R packages: dplyr, reshape2

3_areaAccumulation.R

This R script estimates the number of animals needed for utilization distribution (UD) area estimates to stabilize using asymptotic regression. It models UD area vs. sample size across projects, seasons, and contour levels.

Requirements:

R packages: nlme, aomisc, stringr, drc, ggplot2, statforbiology

example_data.zip

These folders contain simulated input data for the rarefaction workflow.

example_data/
│
├── RarefactionLONGWCovs.rda  
│   # Final cleaned and formatted dataset used as input for modeling.  
│   # Generated from `2_CompileAddCovariates.R` and ready for analysis in
|   # `3_areaAccumulation.R`.
│   # This data frame (object named `rare_long`) includes the following variables:
│   #   - anID         : Character ID for the individual animal.
│   #   - nAnimals     : Number of animals included in the rarefaction sample.
│   #   - simulation   : Simulation replicate number.
│   #   - projName     : Project or scenario name (e.g., "sim2").
│   #   - UD           : Season of UD (e.g., "mig", "winter", "summer").
│   #   - Contour      : Isopleth value used to define the UD (e.g., 50 for 50% UD).
│   #   - area_sqkm    : Area of the estimated UD isopleth in square kilometers.
│   #   - [additional covariates] : Variables to be added in `2_CompileAddCovariates.R`
|   #     and used for modeling, such as:
│   #       - spp         : Species code (e.g., "elk", "deer").
│   #       - abundance   : Simulated or real population abundance value (e.g., 250).
│   #       - allDuration : Project duration (numerical).
│
├── outputs/  
│   # Intermediate outputs from `1a_CorridorRarefaction.R` and `1b_WinterRarefaction.R`.
│   ├── *.rda  
│   ├── *.RData  
│   └── *.csv
│
└── projFolders/  
    # Raw input data (simulated utilization distribution [UD] rasters).  
    # All rasters are in 500 m resolution and share a common coordinate system and extent.
    |
    └── pop250/  
        # Example population of 250 individuals (other populations could be included
        # similarly).
        ├── footprints_pop/  
        │   └── *.asc  # Migration 99% contour footprint rasters
        |              # Filenames follow the convention: AnimalID_99pct_contour.asc
        |              # For example, `14_99pct_contour.asc` indicates: AnimalID = 14.
        └── UDsWinter/  
            └── *.asc  # Seasonal (winter) utilization distribution rasters
                       # Filenames follow the convention: AnimalID_SeasonYear_ASCII.asc
                       # `1_wi22_ASCII.asc`: AnimalID = 1, SeasonYear = wi22 (winter 2022).

Simulation details

Each folder under pop250/ includes UD rasters at 500 m resolution, simulated for 250 individuals.

To generate the data:

Seasonal range movements: We created 250 random starting points drawn from a normal distribution centered on a mean centroid, with a standard deviation of 10 km on both the x- and y-axes.
Migratory movements: We defined two seasonal range centroids 99.24 km apart (based on average empirical mule deer migration data). Start and end points were randomly drawn around these centroids from a normal distribution with a 10 km standard deviation.

UD rasters were generated using Brownian Bridge Movement Models (see manuscript for full methodological details).

Data from: Sample size guidelines for mapping migration corridors and population distributions using tracking data

Data files

Abstract

README

Rarefaction R scripts

1a_CorridorRarefaction.R, 1b_SeasonalRangeRarefaction.R

Key Steps:

Requirements:

Outputs:

2_CompileAddCovariates.R

Requirements:

3_areaAccumulation.R

Requirements:

example_data.zip

Simulation details

Works referencing this dataset