Demography and environment modulate the effects of genetic diversity on extinction risk in a butterfly metapopulation
Data files
Jul 25, 2024 version files 9.91 MB
-
DiLeo_et_al_DRYAD_ver3.zip
-
README.md
Abstract
Linking genetic diversity to extinction is a common goal in genomic studies. Recently, a debate has arisen regarding the importance of genetic variation in conservation as some studies have failed to find associations between genome-wide genetic diversity and extinction risk. However, only rarely are genetic diversity and fitness measured together in the wild, and typically demographic history and environment are ignored. It is therefore difficult to infer whether a lack of an association is real or obscured by confounding factors. To address these shortcomings, we analysed genetic data from 7,501 individuals with extinction data from 279 meadows and mortality of 1,742 larval nests in a butterfly metapopulation. We found a strong negative association between genetic diversity and extinction when considering only heterozygosity in models. However, this association disappeared when accounting for ecological covariates, suggesting a confounding between demography and genetics and a more complex role for heterozygosity on extinction risk. Modelling interactions between heterozygosity and demographic variables revealed that associations between extinction and heterozygosity were context-dependent. For example, extinction declined with increasing heterozygosity in large, but not currently small populations, although negative associations between heterozygosity, extinction, and mortality were detected in small populations with a recent history of decline. We conclude that low genetic diversity is an important predictor of extinction, predicting >25% increase in extinction beyond ecological factors in certain contexts. These results highlight that inferences about the importance of genetic diversity for population viability should not rely on genomic data alone but requires investments in obtaining demographic and environmental data from natural populations.
README
This README file was generated on 2024-03-21 by Michelle DiLeo.
GENERAL INFORMATION
- Title of Dataset: Demography and environment modulate the effects of genetic diversity on extinction risk in a butterfly metapopulation
- Author Information A. Principal Investigator Contact Information Name: Marjo Saastamoinen Institution: University of Helsinki Address: Helsinki, FI Email: marjo.saastamoinen@helsinki.fi
B. Associate or Co-investigator Contact Information
Name: Michelle DiLeo
Institution: Ontario Ministry of Natural Resources
Address: Peterborough, ON Canada
Email: michelle.dileo@ontario.ca
3. Date of data collection (single date, range, approximate date): 2007-2012
4. Geographic location of data collection: Aland Islands, Finland
5. Information about funding sources that supported the collection of the data: Academy of Finland, Natural Sciences and Engineering Council of Canada, Helsinki Institue of Life Science
SHARING/ACCESS INFORMATION
- Licenses/restrictions placed on the data: CC0 1.0 Universal (CC0 1.0) Public Domain
- Links to publications that cite or use the data: In Review
- Links to other publicly accessible locations of the data: None
- Links/relationships to ancillary data sets: None
- Was data derived from another source? Yes A. If yes, list source(s): Fountain, T., Husby, A., Nonaka, E., DiLeo, M.F., Korhonen, J.H., Rastas, P., Schulz, T., Saastamoinen, M. & Hanski, I., (2018). Inferring dispersal across a fragmented landscape using reconstructed families in the Glanville fritillary butterfly. Evolutionary Applications, 11(3), pp.287-297.
- Recommended citation for this dataset:
DiLeo, M. F., Nair, A., Kardos, M., Husby, A., & Saastamoinen, M. (2024). Data from: Demography and environment modulate the effects of genetic diversity on extinction risk in a butterfly metapopulation. Dryad Digital Repository. https://doi.org/10.5061/dryad.905qfttrg
DATA & FILE OVERVIEW
- File List:
A) empirical_models/data/RAWDATA/colonyFamilies.csv
B) empirical_models/data/RAWDATA/fall_survey_2004_2013.csv
C) empirical_models/data/RAWDATA/nest_survival.csv
D) empirical_models/data/RAWDATA/raw.snps.csv
E) empirical_models/data/nests.245snp_21122023.csv
F) empirical_models/data/nests_withMerged.245snp_21122023.csv
G) empirical_models/data/patch.245snp_21122023.csv
H) empirical_models/helperFunctions/getSi.R
I) empirical_models/helperFunctions/pitFunctions.R
J) empirical_models/prepareModelData.R
K) empirical_models/nestMortality.R
L) empirical_models/popExtinction.R
M) empirical_models/popExtinctionOW.R
N) empirical_models/SEM.R
O) power_analysis/rCode_Åland_butterflies_familyAnalysis_23February2024.R
P) power_analysis/rCode_Åland_butterflies_PatchExtinction_power_23Feb2024.R
Q) power_analysis/rCode_estiamteProcessHetSDWithinNests_22Feb2024.R
R) power_analysis/rCode_sim_het_var_12Feb2024.R
S) power_analysis/pedR
- Relationship between files, if important: None
- Additional related data collected that was not included in the current data package: None
- Are there multiple versions of the dataset? No A. If yes, name of file(s) that was updated: NA i. Why was the file updated? NA ii. When was the file updated? NA
#########################################################################
DATA-SPECIFIC INFORMATION FOR: empirical_models/data/RAWDATA/colonyFamilies.csv
These data contain information on reconstructed full-sibling families from Fountain, T., Husby, A., Nonaka, E., DiLeo, M.F., Korhonen, J.H., Rastas, P., Schulz, T., Saastamoinen, M. & Hanski, I., (2018). Inferring dispersal across a fragmented landscape using reconstructed families in the Glanville fritillary butterfly. Evolutionary Applications, 11(3), pp.287-297.
- Number of variables: 4
- Number of cases/rows: 8322
- Variable List:
- ID: unique identification number for individual larval DNA
- Region: site where data were collected (Sottunga, Saltvik, Foglo)
- Year: sampling year (2007-2012)
- COLONY2 Family ID: identification number of assigned full-sibling family. Note: must be combined with Year to be unique
- Missing data codes: None
- Specialized formats or other abbreviations used: None
#########################################################################
DATA-SPECIFIC INFORMATION FOR: empirical_models/data/RAWDATA/fall_survey_2004_2013.csv
These data originate from annual fall surveys
- Number of variables: 17
- Number of cases/rows: 36704
- Variable List:
- Patch: habitat patch (meadow) where data were collected
- Year: survey year (2004-2013)
- Survey: survey idenfication code (A=Aland, F=Fall, #=Year)
- Network: identification number for group of patches considered in the same semi-independent network
- Area: patch size in square meters
- Centroid_X: geographic X coordinate of patch (coordinate system=EPSG:3067)
- Centroid_Y: geographic Y coordinate of patch (coordinate system=EPSG:3067)
- Occupancy: occupancy status of patch (0=unoccupied, 1=occupied)
- Nest_count: number of larval nests in patch
- Grazing_percent: percentage of the patch that had evidence of cattle grazing (0-100)
- Vs: abundance of Veronica spicata in patch (0=none present, 1=very sparse, 2=at least one dense group which could support one larval group but no more, 3=at least one large high quality patch of plants that could support tens of larval nests)
- Vs_dry: percentage of dessicated Veronica spicata (0-100)
- Vs_low: percentage of Veronica spicata plants growing in low vegetation (0-100)
- Pl: abundance of Plantago lanceolata in patch (0=none present, 1=very sparse, 2=at least one dense group which could support one larval group but no more, 3=at least one large high quality patch of plants that could support tens of larval nests)
- Pl_dry: percentage of dessicated Plantago lanceolata (0-100)
- Pl_low: percentage of Plantago lanceolata plants growing in low vegetation (0-100)
- Age: number of years the patch has been continuously occupied
- Missing data codes: n/a (data not available)
- Specialized formats or other abbreviations used: None
#########################################################################
DATA-SPECIFIC INFORMATION FOR: empirical_models/data/RAWDATA/nest_survival.csv
These data contain information on overwintering survival of individual nests. Data originate from fall and spring surveys
- Number of variables: 4
- Number of cases/rows: 20685
- Variable List:
- Dataset: dataset identification number
- Year: year of fall survey
- Family: unique identifier for individual nest
- survived: survival of nest overwinter (1=survived, 0=died)
- Missing data codes: none
- Specialized formats or other abbreviations used:
#########################################################################
DATA-SPECIFIC INFORMATION FOR: empirical_models/data/RAWDATA/raw.snps.csv
These data give individual genotypes of larva from Fountain, T., Husby, A., Nonaka, E., DiLeo, M.F., Korhonen, J.H., Rastas, P., Schulz, T., Saastamoinen, M. & Hanski, I., (2018). Inferring dispersal across a fragmented landscape using reconstructed families in the Glanville fritillary butterfly. Evolutionary Applications, 11(3), pp.287-297.
- Number of variables: 261
- Number of cases/rows: 8322
- Variable List:
- DNA: unique identification number for individual larval DNA (corresponds to ID in file A)
- region: site where data were collected (Sottunga, Saltvik, Foglo)
- Year: sampling year (2007-2012)
- Patch: habitat patch (meadow) where data were collected
- Family: unique identifier for individual nest from which larva originated
- Sex: sex of larva (M=male, F=female)
- Centroid_X: geographic X coordinate of patch (coordinate system=EPSG:3067)
- Centroid_Y: geographic Y coordinate of patch (coordinate system=EPSG:3067)
- KASPX-XXX: genotype
- Missing data codes: n/a (data not available) or empty cell
- Specialized formats or other abbreviations used: none
#########################################################################
DATA-SPECIFIC INFORMATION FOR: empirical_models/data/nests.245snp_21122023.csv
These data were generated with empirical_models/prepareModelData.R using files A-D as input. These data were used in nest mortality models
- Number of variables: 25
- Number of cases/rows: 2873
- Variable List:
- Patch: habitat patch (meadow) where data were collected
- Year: sampling year (2009-2012)
- Family: unique identifier for individual nest
- Network: identification number for group of patches considered in the same semi-independent network
- Centroid_X: geographic X coordinate of patch (coordinate system=EPSG:3067)
- Centroid_Y: geographic Y coordinate of patch (coordinate system=EPSG:3067)
- merged: identifier for if nest consisted of all full-siblings (0) or contained non-full siblings (1)
- n_genotypes: number of genotypes contributing to heterozygosity values
- Age: number of years the patch has been continuously occupied
- survived: survival of nest overwinter (1=survived, 0=died)
- Extinct_in_Tp1: variable indicating if local population in patch went extinct in the next fall survey year or not (0=persisted, 1=went extinct)
- Hs: expected heterozygosity of nest
- Ho: observed heterozygosity of nest
- Fis: inbreeding coefficient of nest
- Occupancy: occupancy status of patch (0=unoccupied, 1=occupied)
- Area: patch size in square meters
- Nest_count: number of larval nests in patch in year t (corresponds to variable:Year)
- Nests_Tm1: number of larval nests in patch in previous year (t-1)
- Nests_Tm2: number of larval nests in patch in year t-2
- connectivity: connectivity of patch
- Ntrend_Tm1_to_T: Ntrend of patch, gives growth rate trend of area surrounding patch
- Grazing_percent: percentage of the patch that had evidence of cattle grazing (0-100)
- host: abundance of most dominant host plant in patch (0=none present, 1=very sparse, 2=at least one dense group which could support one larval group but no more, 3=at least one large high quality patch of plants that could support tens of larval nests)
- dryhost: percentage of dessicated host plant (0-100)
- lowhost: percentage of host plant growing in low vegetation (0-100)
- Missing data codes: n/a (data not available)
- Specialized formats or other abbreviations used: None
#########################################################################
DATA-SPECIFIC INFORMATION FOR: empirical_models/data/nests_withMerged.245snp_21122023.csv
These data were generated with empirical_models/prepareModelData.R using files A-D as input. They contain data from E plus additional data from merged nests. These data were used for overwintering extinction models
- Number of variables: 25
- Number of cases/rows: 3572
- Variable List:
- Patch: habitat patch (meadow) where data were collected
- Year: sampling year (2009-2012)
- Family: unique identifier for individual nest
- Network: identification number for group of patches considered in the same semi-independent network
- Centroid_X: geographic X coordinate of patch (coordinate system=EPSG:3067)
- Centroid_Y: geographic Y coordinate of patch (coordinate system=EPSG:3067)
- merged: identifier for if nest consisted of all full-siblings (0) or contained non-full siblings (1)
- n_genotypes: number of genotypes contributing to heterozygosity values
- Age: number of years the patch has been continuously occupied
- survived: survival of nest overwinter (1=survived, 0=died)
- Extinct_in_Tp1: variable indicating if local population in patch went extinct in the next fall survey year or not (0=persisted, 1=went extinct)
- Hs: expected heterozygosity of nest
- Ho: observed heterozygosity of nest
- Fis: inbreeding coefficient of nest
- Occupancy: occupancy status of patch (0=unoccupied, 1=occupied)
- Area: patch size in square meters
- Nest_count: number of larval nests in patch in year t (corresponds to variable:Year)
- Nests_Tm1: number of larval nests in patch in previous year (t-1)
- Nests_Tm2: number of larval nests in patch in year t-2
- connectivity: connectivity of patch
- Ntrend_Tm1_to_T: Ntrend of patch, gives growth rate trend of area surrounding patch
- Grazing_percent: percentage of the patch that had evidence of cattle grazing (0-100)
- host: abundance of most dominant host plant in patch (0=none present, 1=very sparse, 2=at least one dense group which could support one larval group but no more, 3=at least one large high quality patch of plants that could support tens of larval nests)
- dryhost: percentage of dessicated host plant (0-100)
- lowhost: percentage of host plant growing in low vegetation (0-100)
- Missing data codes: n/a (data not available)
- Specialized formats or other abbreviations used: None
#########################################################################
DATA-SPECIFIC INFORMATION FOR: empirical_models/data/patch.245snp_21122023.csv
These data were generated with empirical_models/prepareModelData.R using files A-D as input. These data were used for annual and overwintering extinction models
- Number of variables: 22
- Number of cases/rows: 729
- Variable List:
- Patch: habitat patch (meadow) where data were collected
- Year: sampling year (2009-2012)
- Network: identification number for group of patches considered in the same semi-independent network
- Centroid_X: geographic X coordinate of patch (coordinate system=EPSG:3067)
- Centroid_Y: geographic Y coordinate of patch (coordinate system=EPSG:3067)
- n_genotypes: number of genotypes contributing to heterozygosity values
- Age: number of years the patch has been continuously occupied
- Extinct_in_Tp1: variable indicating if local population in patch went extinct in the next fall survey year or not (0=persisted, 1=went extinct)
- Hs: expected heterozygosity of local population in patch
- Ho: observed heterozygosity of local population in patch
- Fis: inbreeding coefficient of local population in patch
- Occupancy: occupancy status of patch (0=unoccupied, 1=occupied)
- Area: patch size in square meters
- Nest_count: number of larval nests in patch in year t (corresponds to variable:Year)
- Nests_Tm1: number of larval nests in patch in previous year (t-1)
- Nests_Tm2: number of larval nests in patch in year t-2
- connectivity: connectivity of patch
- Ntrend_Tm1_to_T: Ntrend of patch, gives growth rate trend of area surrounding patch
- Grazing_percent: percentage of the patch that had evidence of cattle grazing (0-100)
- host: abundance of most dominant host plant in patch (0=none present, 1=very sparse, 2=at least one dense group which could support one larval group but no more, 3=at least one large high quality patch of plants that could support tens of larval nests)
- dryhost: percentage of dessicated host plant (0-100)
- lowhost: percentage of host plant growing in low vegetation (0-100)
- Missing data codes: n/a (data not available)
- Specialized formats or other abbreviations used: None
#########################################################################
INFORMATION FOR: empirical_models/helperFunctions/getSi.R
This R script contains a function for calculating patch connectivity.
#########################################################################
INFORMATION FOR: empirical_models/helperFunctions/pitFunctions.R
This R script contains functions for adjusting PIT values for binomial-distributed data
#########################################################################
INFORMATION FOR: empirical_models/prepareModelData.R
This R script takes files A-D as input to prepare data used to run nest mortality and patch extinction models (files K-N).
This script also writes data raw.snps_nest0removed_mergedRemoved_namesFixed.csv (empty cells indicate missing data) which is used in power analysis.
#########################################################################
INFORMATION FOR: empirical_models/nestMortality.R
This R script runs INLA models for nest mortality. Analysis in the manuscript was run with R version 3.6.3
#########################################################################
INFORMATION FOR: empirical_models/popExtinction.R
This R script runs INLA models for population extinction. Analysis in the manuscript was run with R version 3.6.3
#########################################################################
INFORMATION FOR: empirical_models/popExtinctionOW.R
This R script runs INLA models for overwintering population extinction. Analysis in the manuscript was run with R version 3.6.3
#########################################################################
INFORMATION FOR: empirical_models/SEM.R
This R script runs structural equation models for population extinction. Analysis in the manuscript was run with R version 4.2.3
#########################################################################
INFORMATION FOR: power_analysis/rCode_Åland_butterflies_familyAnalysis_23February2024.R
This R script runs the power analysis for nest survival
#########################################################################
INFORMATION FOR: power_analysis/rCode_Åland_butterflies_PatchExtinction_power_23Feb2024.R
This R script runs the power analysis for population extinction
#########################################################################
INFORMATION FOR: power_analysis/rCode_estiamteProcessHetSDWithinNests_22Feb2024.R
This R script estimates the standard deviation in real heterozygosity within nests in the metapopulation
#########################################################################
INFORMATION FOR: power_analysis/rCode_sim_het_var_12Feb2024.R
This R script estimates the standard error in estimates of individual heterozygosity
#########################################################################
INFORMATION FOR: power_analysis/power_analysis/pedR
This folder contains the R package pedR for pedigree simulation. Package author is Marty Kardos