Opening the museum’s vault: Historical field records preserve reliable ecological data
Data files
Oct 26, 2023 version files 797.78 KB
Abstract
Museum specimens have long served as foundational data sources for ecological, evolutionary, and environmental research. Continued reimagining of museum collections is now also generating new types of data associated with, but beyond physical specimens, a concept known as “extended specimens”. Field notes penned by generations of naturalists contain first-hand ecological observations associated with museum collections and comprise a form of extended specimens with the potential to provide novel ecological data spanning broad geographic and temporal scales. Despite their data-yielding potential, however, field notes remain underutilized in research due to their heterogeneous, unstandardized, and qualitative nature. We introduce an approach for transforming descriptive ecological notes into quantitative data suitable for statistical analysis. Tests with simulated and real-world published data show that field notes and our transformation approach retain reliable quantitative ecological information under a range of sample sizes and evolutionary scenarios. Unlocking the wealth of data contained within field records could facilitate investigations into the ecology of clades whose diversity, distribution, or other demographic features present challenges to traditional ecological studies, improve our understanding of long-term environmental and evolutionary change, and enhance predictions of future change.
README: Overview of Publication and Associated Data/Scripts
Citation to Associated Publication: Astudillo-Clavijo et al. 2023. Opening the museum’s vault: historical field records preserve reliable ecological data. American Naturalist.
Citation to Associated Data and Scripts: Astudillo-Clavijo, et al. 2023. Opening the museum’s vault: historical field records preserve reliable ecological data [Dataset]. Dryad. https://doi.org/10.5061/dryad.59zw3r2cg.
Authors: Viviana Astudillo-Clavijo[1,*], Tobias Mankis[2], Hernán López-Fernández[1,3]
[1] Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, 48109, USA
[2] Ecology and Evolutionary Biology, University of Toronto, Toronto, M5S 3B2, Canada
[3] Museum of Zoology, University of Michigan, Ann Arbor, 48108, USA
[*] Corresponding Author: vivianaa@umich.edu
Abstract: Museum specimens have long served as foundational data sources for ecological, evolutionary, and environmental research. Continued reimagining of museum collections is now also generating new types of data associated with, but beyond physical specimens, a concept known as “extended specimens”. Field notes penned by generations of naturalists contain first-hand ecological observations associated with museum collections and comprise a form of extended specimens with the potential to provide novel ecological data spanning broad geographic and temporal scales. Despite their data-yielding potential, however, field notes remain underutilized in research due to their heterogeneous, unstandardized, and qualitative nature. We introduce an approach for transforming descriptive ecological notes into quantitative data suitable for statistical analysis. Tests with simulated and real-world published data show that field notes and our transformation approach retain reliable quantitative ecological information under a range of sample sizes and evolutionary scenarios. Unlocking the wealth of data contained within field records could facilitate investigations into the ecology of clades whose diversity, distribution, or other demographic features present challenges to traditional ecological studies, improve our understanding of long-term environmental and evolutionary change, and enhance predictions of future change.
Data collection and analyses were performed by corresponding author, Viviana Astudillo-Clav ijo
Repository Description: The Dryad repository associated with the cited publication (see publication and data citations above) contains scripts and subdirectories with all data used in the study.
Software Requirements
Data are available as .csv files, which can be opened with software such as Microsoft ExcelData, most text editors, and imported into most coding programs, including R.
Data analyses were performed in R version 4.0.2.
The following R packages were used in analyses
factoextra 1.0.7
FactoMineR 2.4
corrplot 0.92
vegan 2.5-6
fmsb 0.7.2
ape 5.4
TreeSim 2.4
phytools 0.7-47
mvMORPH 1.1.3
ggplot2 3.3.3
Repository Layout and Contents
BAUMBERGER_RAW.csv: Real-world habitat data, California, USA. Percent composition of different soil and vegetation types across spadefoot burrows.
Data Link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.8359820
Column_Name Description
AnimalNumber Individual spadefoot identity
clay Percent of burrow sediment classified as clay
silt Percent of burrow sediment classified as silt
sand Percent of burrow sediment classified as sand
gravel Percent of burrow sediment classified as gravel
pebble Percent of burrow sediment classified as pebble
cobble Percent of burrow sediment classified as cobble
bolder_bedrock Percent of burrow sediment classified as boulders or bedrock
grass Percent cover of grass within 1m^2 of burrow opening
forbs Percent cover of forbs within 1m^2 of burrow opening
shrubs Percent cover of shrubs within 1m^2 of burrow opening
trees Percent cover of trees within 1m^2 of burrow opening
open_ground Percent of forbs open ground 1m^2 of burrow opening
leaf_litter Percent cover of recently fallen leaves within 1m^2 of burrow opening
duff Percent cover of dead and decomposing vegetation within 1m^2 of burrow opening
NA cells correspond to samples with missing information in the original dataset and are excluded from analysis in the associated R code.
CLOYED2017_RAW.csv: Real-world diet data, Kentucky, USA. Composition of different invertebrate groups across anuran stomach contents.
Data Link: https://datadryad.org/stash/dataset/doi:10.5061%2Fdryad.jh843
Column_Name Description
Species Species sampled
Site Habitat in which anuran was sampled
Sp_Site Combined denotation of sampled species and sampling site
A.Orthoptera Preference for orthopteran prey, based on Chesson's alpha computed using stomach contents
A.Beetle Preference for beetle prey, based on Chesson's alpha computed using stomach contents
A.Ant Preference for ant prey, based on Chesson's alpha computed using stomach contents
A.Flying Preference for miscellaneous flying prey (e.g., flying hymenopterans, dipterans, flying hemipterans, adult lepidopterans, adult odonates), based on Chesson's alpha computed using stomach contents
A.Non-Flying Preference for miscellaneous non-flying prey (e.g., spiders, flightless Pentatomidae, Reduviidae, Membracidae, non-flying hemipterans, larval lepidopterans), based on Chesson's alpha computed using stomach contents
MELISetal_RAW.csv: Read-world habitat data, Northern Sweden. Proportion of different natural and artificial habitat elements across habitats from which racoon dogs were sampled.
Data link: https://datadryad.org/stash/dataset/doi:10.5061/dryad.q2k7k
Column_Name Description
Trajectory Identity of racoon dog
Prop.Artificial Proportion artificial structures (e.g., cities, airports, roads) in sampled habitat
Prop.Broadleaved.forest Proportion broad leaved forest in sampled habitat
Prop.Conifer.forest Proportion coniferous forest in sampled habitat
Prop.Mixed.forest Proportion mixed forest in sampled habitat
Prop.Agricultural.area Proportion agricultural area in sampled habitat
Prop.Open.natural.area Proportion open natural area (e.g., meadows) in sampled habitat
Prop.Wetland Proportion wetland in sampled habitat
Prop.Water Proportion open water in sampled habitat
ROFFetal_RAW.csv: Real-world habitat data, Palau Archipelago. Percent composition of different benthic habitat elements across coral-reef sites.
Data link: https://datadryad.org/resource/doi:10.5061/dryad.0145sn6/1
Column_Name Description
Site Identity of sampled reef site
Hard coral Percent cover of hard coral groups
Soft coral / gorgonian / sponge Percent cover of soft coral, gorgonian, or sponge groups
Crustose corallines Percent cover of crustose corallines
Turf algae Percent cover of multi-species assemblages of tiny, primarily filamentous, algae
Macroalgae Percent cover of macroalgae
Lobophora Percent cover of Lobophora spp.
Non crustose corallines Percent cover of non-crustose corallines
Articulated corallines Percent cover of articulate corallines
SIEVERS2021_RAW.csv: Real-world habitat data, Philippines. Percent composition of different substrate and benthic cover elements across coral-reef sites.
Data Link: https://datadryad.org/stash/dataset/doi:10.5061%2Fdryad.0rxwdbrxs
Column_Name Description
Location_Status Name of sampled location and status as a fished or marine protected area (MPA)
P_Rub Percent rubble substrate
P_Sand Percent sand substrate
P_HC Percent benthic cover comprising hard coral
P_MA Percent benthic cover comprising macroalgae
P_EAM Percent benthic cover comprising epithelial algal matrix
P_SC Percent benthic cover comprising soft coral
Frag_C Percent benthic cover comprising fragile coral
Robust_C Percent benthic cover of robust coral
WOOD2019_RAW.csv: Real-World diet data, Michigan, USA. Percent composition of different pollen species in honeybees diets.
Data link: https://datadryad.org/stash/dataset/doi:10.5061%2Fdryad.913qd93
Column_Name Description
Site Sampling year and identity of hive sample site
Ambrosia Percent of honeybee pollen load from the Ambrosia genus
Brassicaceae Percent of honeybee pollen load from the Brassicaceae family
Centaurea Percent of honeybee pollen load from the Centaurea genus
Chenopodium Percent of honeybee pollen load from the Chenopodium genus
Cichorium/Sonchus Percent of honeybee pollen load from the Cichorium or Sonchus genera
Daucus Percent of honeybee pollen load from the Daucus genus
Plantago Percent of honeybee pollen load from the Plantago genus
Solidago Percent of honeybee pollen load from the Solidago genus
Trifolium pratense Percent of honeybee pollen load from the species Trifolium pratense
Trifolium repens Percent of honeybee pollen load from the species Trifolium repens
Zea mays Percent of honeybee pollen load from the species Zea mays
Rhus Percent of honeybee pollen load from the Rhus genus
Scripts
DistributionComps.R: Code for comparing distribution of values in real-world and simulated datasets (Fig. 5 in main manuscript). This script imports R Objects produced by the PubTests.R and SimTests.R scripts, so those two scripts must be run prior to the DistributionComps.R script.
PubTests.R: Code for performing tests of our transformation approach on six real-world published datasets.
SimTests.R: Code for simulating habitat data, and then performing tests of our transformation approach on simulated datasets.
How to Use
Prior to running scripts do the following:
(1) Download the repository directory from Dryad to your local computer
(2) Unpack the repository directory and all subdirectories on your local computer
(3) Open R and install/load required software for the script you are working with (see the "Prepare the workspace" section of the script you are working with for a list of required software)
(4) All scripts should be run in R from within the downloaded repository directory. Navigate to the correct working directory in R with the setwd() function. For example, if the downloaded repository directory is in your local Downloads folder and is called "doi_10.5061_dryad.59zw3r2cg__v3", then in R set the working directory with the following command:
setwd("~/Downloads/doi_10.5061_dryad.59zw3r2cg__v2"
SimTests.R and PubTests.R can be run in any order. DistributionComps.R can only be run once SimTests.R and PubTests.R have been completed, as it uses files output by these latter two scripts.
Methods
Simulations are performed using the R code provided.
Real-world data is provided as excel documents. This data was downloaded from previously pubished articles and reduced to retain only the variables used in our study. See article and codes for citations for these articles.