Data from: Vespula pensylvanica locate odor sources across diverse natural wind conditions
Data files
Nov 13, 2025 version files 14.62 GB
-
farm_preprocessed_hdfs.zip
4.17 GB
-
files_used_in_analysis.zip
3.25 GB
-
forest_preprocessed_hdfs.zip
1.51 GB
-
raw_wind_data.zip
4.24 GB
-
README.md
10.51 KB
-
urban_preprocessed_hdfs.zip
1.44 GB
Abstract
Many organisms across ecosystems track odor plumes to locate mates and food. This behavior holds
particular ecological significance for flying animals, such as bees and birds, due to their critical role in crop
pollination. In flying insects, the task of localizing an odor source is particularly challenging due to the
complicated dynamics associated with wind flow and odor plume dispersion through spatially complex
environments. Though prior studies have discussed the role of wind dynamics in the foraging and flight
behaviors of bees, our knowledge of how wind characteristics influence insects’ success and strategies to
locate odor sources remains an open area of investigation. Here, we tested whether certain wind conditions
were more favorable for foraging insects by comparing yellowjacket arrival times and corresponding wind
conditions across three distinct natural environments. Our results indicated that Vespula pensylvanica
were capable of locating odor sources across the full range of observed wind conditions, without any clear
preferences. This suggests that insects have adapted strategies to perform odor localization tasks across
the full spectrum of natural wind that they may encounter. These experiments provide insight into key
considerations for future wind tunnel experiments which seek to better resolve insect plume tracking in
understudied flow conditions.
[Access this dataset on Dryad] (10.5061/dryad.wwpzgmsx0)
Description of the data and file structure
The data provided in this repository includes:
- Raw wind files (.bin format) collected over a ~3 month period (raw_wind_data.zip)
- Processed wind files (.hdf format), which can be directly read in any Python script (farm_preprocessed_hdfs.zip, urban_preprocessed_hdfs.zip, forest_preprocessed_hdfs.zip)
- Simplified wind and arrival/trigger data for ease of analysis, as the combined preprocessed wind data is a large dataset (files_used_in_analysis.zip)
Examples of how to use these files can be found in the associated GitHub repository.
Files and variables
File: raw_wind_data.zip
Description: all raw wind files collected (in .bin format), separate directories are included for each environment ("farm", "verdi" (i.e. the forest location), and "urban") and subdirectories within each environment directory descriptively segment daily/weekly periods of recording (with format MMDDYYYY).
Each binary file contains the data_type variable, with values:
millis - time since recording started (in milliseconds)
lat - gps latitude
lon - gps longitude
gps_time - hour/minute/second of data, recorded by GPS (UTC)
gps_date - year/month/day of data, recorded by GPS (UTC)
S2 - Horizontal wind speed (m/s)
wind - String variable with all recorded measurements, including wind speed (S2), horizontal wind direction (D), temperature (T), E-W wind velocity (U), N-S wind velocity (V), Z wind velocity (W), air density (AD).
File: farm_preprocessed_hdfs.zip
Description: Corresponding preprocessed wind files for the farm environment resulting from using the processing script in our Github repository with all raw wind .bin files. A list of any files that were corrupt/erroneous are also included as a separate .csv file. Each hdf corresponds to a preprocessed .bin file.
File: urban_preprocessed_hdfs.zip
Description: Corresponding preprocessed wind files for the urban environment resulting from using the processing script in our Github repository with all raw wind .bin files. A list of any files that were corrupt/erroneous are also included as a separate .csv file. Each hdf corresponds to a preprocessed .bin file.
File: forest_preprocessed_hdfs.zip
Description: Corresponding preprocessed wind files for the forest environment resulting from using the processing script in our Github repository with all raw wind .bin files. A list of any files that were corrupt/erroneous are also included as a separate .csv file. Each hdf corresponds to a preprocessed .bin file.
Below is a list of variable names used in these preprocessed hdf files:
lat - gps latitude
lon - gps longitude
year - year of data recording
month - month of data recording
day - day of data recording
hour - hour of data recording (UTC)
minute - minute of data recording (UTC)
second - minute of data recording (UTC)
time_epoch - Epoch time
S2 - Horizontal wind speed (m/s)
D - Horizontal wind direction (0-360 degrees)
T - Temperature (in Celcius)
U - E-W wind velocity component (m/s)
V - N-S wind velocity component (m/s)
W - Z wind velocity component (m/s)
File: files_used_in_analysis.zip
Description: All files that are directly used in our data analysis and figure generation notebook (see Github repository for all code/analysis). Within this .zip file there are multiple files:
-
File: forest_df_6am_to_9pm.parquet
Description: A concatenation of all files in 'verdi_preprocessed_hdfs.zip', alongside the recorded trigger events in 'forest_master_sheet.csv'. Only wind data between 6am to 9pm is included in an effort to keep file size manageable, and nighttime wind recordings were not relevant to the analysis.
-
File: farm_df_6am_to_9pm.parquet
Description: A concatenation of all files in 'farm_preprocessed_hdfs.zip', alongside the recorded trigger events in 'farm_master_sheet.csv'. Only wind data between 6am to 9pm is included in an effort to keep file size manageable, and nighttime wind recordings were not relevant to the analysis.
-
File: urban_df_6am_to_9pm.parquet
Description: A concatenation of all files in 'urban_preprocessed_hdfs.zip', alongside the recorded trigger events in 'urban_master_sheet.csv'. Only wind data between 6am to 9pm is included in an effort to keep file size manageable, and nighttime wind recordings were not relevant to the analysis.
Relevant variable names in these preprocessed parquet files:
lat - gps latitude
lon - gps longitude
year - year of data recording
month - month of data recording
day - day of data recording
hour - hour of data recording (UTC)
minute - minute of data recording (UTC)
second - minute of data recording (UTC)
time_epoch - Epoch time
S2 - Horizontal wind speed (m/s)
D - Horizontal wind direction (0-360 degrees)
T - Temperature (in Celcius)
U - E-W wind velocity component (m/s)
V - N-S wind velocity component (m/s)
W - Z wind velocity component (m/s)
trap_number - 'nan' if there was no arrival at that time, '1' or '2' during arrival events
-
File: urban_master_sheet.csv
Description: A list of all yellowjacket arrival times at the urban site throughout the period of data collection, and which trap they entered.
-
File: farm_master_sheet.csv
Description: A list of all yellowjacket arrival times at the farm site throughout the period of data collection, and which trap they entered.
-
File: forest_master_sheet.csv
Description: A list of all yellowjacket arrival times at the forest site throughout the period of data collection, and which trap they entered.
Relevant variable names in these csv files:
date_time - local date + time of arrival (PDT)
time_epoch - epoch time
trap_number - '1' or '2' depending on which trap the insect entered
-
File: small_urban_10000.parquet
Description: Wind data merged with urban trap arrival times - only 5000 frames before and after the arrival time are kept for each trigger arrival in an effort to reduce data size (frame rate is 10Hz).
-
File: small_forest_10000.parquet
Description: Wind data merged with forest trap arrival times - only 5000 frames before and after the arrival time are kept for each trigger arrival in an effort to reduce data size (frame rate is 10Hz).
-
File: small_farm_10000.parquet
Description: Wind data merged with farm trap arrival times - only 5000 frames before and after the arrival time are kept for each trigger arrival in an effort to reduce data size (frame rate is 10Hz).
Relevant variable names in these preprocessed parquet files:
lat - gps latitude
lon - gps longitude
year - year of data recording
month - month of data recording
day - day of data recording
hour - hour of data recording (UTC)
minute - minute of data recording (UTC)
second - minute of data recording (UTC)
time_epoch - Epoch time
S2 - Horizontal wind speed (m/s)
D - Horizontal wind direction (0-360 degrees)
T - Temperature (in Celcius)
U - E-W wind velocity component (m/s)
V - N-S wind velocity component (m/s)
W - Z wind velocity component (m/s)
trap_number - 'nan' if there was no arrival at that time, '1' or '2' during arrival events
-
File: urban_daily_stats.hdf
Description: Randomly sampled 5 minute statistics (mean wind speed, mean wind direction, standard deviation in wind speed, standard deviation in wind direction) taken across the full 'urban_df_6am_to_9pm.parquet' dataset.
-
File: forest_daily_stats.hdf
Description: Randomly sampled 5 minute statistics (mean wind speed, mean wind direction, standard deviation in wind speed, standard deviation in wind direction) taken across the full 'forest_df_6am_to_9pm.parquet' dataset.
-
File: farm_daily_stats.hdf
Description: Randomly sampled 5 minute statistics (mean wind speed, mean wind direction, standard deviation in wind speed, standard deviation in wind direction) taken across the full 'farm_df_6am_to_9pm.parquet' dataset.
Relevant variable names in these preprocessed hdf files:
avg_speed - 5 min average horizontal wind speed (m/s)
avg_dir - 5 min average horizontal wind direction (0-360 degrees)
avg_AD - 5 min average air density (kg/m^3)
std_speed - 5 min stand deviation in horizontal wind speed (m/s)
std_dir - 5 min standard deviation in direction (0-90deg)
std_AD - 5 min standard deviation in air density (kg/m^3)
avg_temp - 5 min average temperature (C)
trap_number - 'nan' if there was no arrival at that time, '1' or '2' during arrival events
time_epoch - epoch time
TKE - 5 min turbulent kinetic energy
environment - 'urban', 'farm' or 'forest'
local_time - Local time (PDT)
HMS_local - Hours, Minutes, Seconds (PDT)
hours - local hour at time of recording
TI - 5 minute average turbulence intensity
-
File: NOAA_daily_averages.csv
Description: Daily averages for Reno Tahoe Airport weather station (KRNO) available from NOAA's NCEI database.
Relevant variable names in this file:
NAME - Weather station name/state/country
LATITUDE - Latitude of station
LONGITUDE - Longitude of station
ELEVATION - Elevation of station
DATE - Date of weather measurement
PRCP - Daily total precipitation (inches)
Code/software
All relevant processing and analysis scripts can be found in our public GitHub repository: https://github.com/JaleesaHoule/YellowJacketTracking or the archived release of this GitHub repository: https://doi.org/10.5281/zenodo.17443918
For information on specific methodologies involved in data collections, see the associated article.
