Bat-aggregated time series workflow

Published Oct 15, 2024 on Dryad. https://doi.org/10.5061/dryad.w0vt4b8zf

Abstract

This dataset and code provides radar-based detections of Brazilian free-tailed bats (Tadarida brasiliensis) across select regions of California and Texas, compiled using weather radar data from the NEXRAD (NEXtgeneration weather RADar) system. NEXRAD radars, operated by the US National Weather Service, continuously monitor the airspace, detecting various airborne organisms including birds, insects, and bats.

The dataset was generated using the ‘BATS’ Python toolkit (program included), which automates the retrieval, processing, and classification of radar data. It employs a pre-trained machine learning model specifically designed to detect radar echoes associated with Brazilian free-tailed bats. The dataset includes the results from machine learning models trained and tested on radar data, which achieved an AUC of 0.963, demonstrating high accuracy in identifying bat activity. The dataset also includes pre-trained neural network and random forest models for reproducibility.

This dataset provides valuable spatiotemporal information on bat presence at a large landscape scale and across extended timeframes. By distilling radar data into efficient summaries of bat occurrence, the dataset enables researchers to explore patterns in bat activity and their potential ecosystem services, such as insect consumption, in agricultural regions.

Doppler Radar Data Pipeline

BATS is a Python-based algorithm that identifies Mexican free-tailed bats in weather radar data. BATS downloads, processes, and classifies large amounts of NOAA NEXRAD Weather Radar data using a pre-trained neural network.

Background

Mexican free-tailed bats (commonly referred to as Brazillian free-tailed bats) are a common bat species found across much of North and South America. Due to their voracious appetite and large roosting numbers, free-tailed bats are believed to provide invaluable ecosystem services in the form of pest control.

This project highlights a computer vision algorithm based on an artificial neural network that quantifies the occurence of free-tailed bats over a given area within a given time frame.

Features
Prerequisites
Installation and Setup
Usage
Dataset
Model Information
Contributing
License
Acknowledgments

Features

Data Download: Automates the download of radar scans for specified dates.
Classification: Uses a trained model to classify radar scans.
Aggregation: Processes and aggregates radar data for analysis.

Prerequisites

Python 3.10.8
Conda (recommended for environment management)

Installation and Setup

Clone the Repository:

git clone https://github.com/bhyleee/bats_doppler_test.git
cd doppler-radar-analysis

Set up the conda environment

conda env create -f environment.yml
conda activate doppler_env

Additional Setup
-Download the pretrained neural network (and other models if interested).
These models are hosted in the same repository as the sample data (Zenodo)

Usage

Downloading/Accessing
```
python main.py [start_date] [end_date] [tower] --hours [hours] --start_time [start_time]
```
Replace [start_date], [end_date], [tower], [hours], and [start_time] with your desired values.
Other Functionalities:
...

Dataset

The reference data used for model training and testing are hosted on in the data/reference directory hosted on Zenodo.
The radar data is sourced from the NOAA NEXRAD data repository and accessed through the Python packages Py-ART and NEXRADAWS.
For the purposes of demonstration, a sample dataset is included in the data/ directory hosted on Zenodo.

File Descriptions:

There are three main data types attached to this project and database.

There are three csv (comma-separated value) files of reference data used to train the machine learning models. These files are titled "california_data.csv", "texas_data.csv", and "gauthreaux_and_diehl_2020_base_dataset.csv". As the name implies, the California dataset represents radar reference data created from california free-tailed bat observations. Likewise with the Texas dataset. Finally, the Gauthereaux & Diehl dataset includes radar reference data from other species, as detailed in this paper (https://pubs.usgs.gov/publication/70250307).
1. For the california_data.csv and texas_data.csv, the column headers represent radar variables or required information: date (date of data collection), cor (correlation coefficient), pha (phase differential deg), dif (differential reflectivity dB), ref (reflectivity dB), spw (spectrum width m/s), vel (velocity m/s), and training_class of species. More information can be found in the publication.
2. For the Gauthereaux & Diehl dataset, the column headers represent radar variables and required information: scatterer (type of scatterer using 4-letter codes in Gauthreaux and Diehl), radar (call sign of WSR-88D radar from which data were drawn, date (YYYY-MM-DD), time (in decimal hours), sweep (integer count; always 0 relating to the lowest elevation angle ~0.5 deg), rndAz (azimuth (deg); rounded to nearest 0.25 deg in 0.50 deg increments), rndBegGate (range to the leading edge of a sample volume (m); rounded to nearest 250 m), vel (radial velocity (m/s)), ref (radar reflectivity factor (dB)), spw (spectrum width (m/s)), dif (differential reflectivity (dB)), cor (correlation coefficient), pha (differential phase (deg)), velElevAngle (elevation angle (deg); applies to radial velocity, spectrum width).
The files ending in .pb and .index are files that are required to re-train both the random forest and neural network models using the python package Tensorflow. These files contain the model weights.
There are 12 geotiff files, representing 6 raw data and 6 classified data. The classified data represent what the final data output looks like a binary representation of pixels containing bats compared to those that do not, while the raw geotif data (eg. 20180621_0336.tif) contains 6 layers, each representing a separate radar datatype. Ideally, when one loads the pre-trained models from the pb and index files, classifies the raw geotiff layers, the output is identical to the classified_*.tif data.

Model Information

Various popular machine learning algorithms were considered for this purpose, including Support Vector Machines, Random Forest, and Neural Networks. Ultimately the neural network was chosen for the final model. These exploratory model may be trained and tested in the attached jupyter notebooks.

Our classification task leverages a traditional Artificial Neural Network (ANN) constructed using the Keras Python Deep Learning package from Google’s Tensorflow library. The classifier processes input as a single pixel from the mentioned cartesian grid, essentially a six-dimensional vector of radar data.

Following the methods of Chilson et al. (2019) and Zewdie et al. (2019) – who utilized neural networks for tracking purple martins and pollen via NEXRAD respectively – we've structured our classifier as a feed-forward, fully-connected network. It's comprised of:

A 6-unit input layer
Three intermediate 152-unit layers with ReLU activation functions
A concluding 2-unit layer with a SoftMax activation, outputting a scaled probability (between 0 and 1) signifying the likelihood of a pixel containing a bat swarm.

This architecture was chosen based on several metrics including precision, recall scores, AUC, and a qualitative assessment rooted in established bat dispersal strategies. The assessment was performed on a validation set extracted from the primary training set. For a deeper dive into the training methodology, refer to section 2.4.

The model underwent training for 20 epochs, utilizing mini-batches of size 32 and the Adam optimizer. The learning rate was pegged at 0.001, with Adam parameters set to the recommended defaults of β1 = 0.9 and β2 = 0.999. The training was done semi-supervised, employing the standard cross-entropy loss function.

Network Architecture:

[
f(x) = \mathrm{softmax}(W_3(\mathrm{relu}(W_2(\mathrm{relu}(W_1(\mathrm{relu}(W_0 x + b_0)) + b_1)) + b_2) + b_3)
]

Equation 1: This represents the network's architectural flow, where x is the radar data input vector. The symbols W0...W3 and b0...b3 denote the network's trainable parameters.

Contributing

Contributions are welcome! Please reach out here on github or via email at brianlee52@ucsb.edu

License

This project is licensed under the MIT License.

Acknowledgements

Thanks to Py-ART, an essential library used in this project.
Appreciation to the team or individuals who contributed to this project.

Data Description

This dataset provides detailed radar-based detections of Brazilian free-tailed bats (Tadarida brasiliensis) across select regions of California and Texas. The data were compiled from the NEXRAD (NEXt-generation weather RADar) system, which operates S-band Doppler weather radars across the United States. NEXRAD radars detect various airborne targets such as birds, insects, and bats.

The dataset is processed using the 'BATS' Python toolkit, which automates the retrieval and classification of radar data. Using radar data sourced from the Amazon Web Services (AWS) repository, the BATS toolkit classifies radar echoes based on a machine learning model trained to identify Brazilian free-tailed bats. The dataset contains bat presence information at a pixel resolution of 70 meters, derived from radar data over multiple time periods in 2018 and 2019. This data will be useful for researchers exploring bat ecology, insectivorous bat ecosystem services, and landscape-level bat monitoring.

The dataset includes:

Radar data processed to detect bat presence in California (2018) and Texas (2019)
Classified radar pixels indicating bat presence or absence
Machine learning-derived bat occurrence probabilities (thresholded for binary classification)
Geotiff files that aggregate radar data over six-month periods

Methods

Data Collection

The dataset was generated using NEXRAD radar data, sourced from AWS. The BATS Python toolkit facilitated the collection and processing of radar data files, automating the pipeline from raw radar retrieval to bat detection. Radar data was selected based on specific regions, timeframes, and weather conditions associated with confirmed Brazilian free-tailed bat emergence events. The radar data collected spans 11 weather-free days in California (2018) and 7 days in Texas (2019). Reference data on bat emergence was gathered from field observations provided by local bat monitoring organizations.

Data Processing

Once downloaded, the raw radar data (Level II “.gz” files) was processed using the Py-ART library, which is designed for radar data manipulation. Py-ART converted the radar data from its native polar coordinates into a uniform Cartesian grid, with a resampled pixel resolution of 70 meters to facilitate accurate bat detection.

The processed radar data was then classified using a machine learning pipeline. The BATS toolkit includes scripts for classification, in which radar echoes were evaluated by pre-trained machine learning models. The dataset was classified using three machine learning models: random forest (RF), support vector machines (SVM), and artificial neural networks (ANN). The ANN model, selected for its superior performance (AUC of 0.963), was used to classify each radar pixel as either containing or not containing Brazilian free-tailed bats. The model outputs a binary classification based on a 90% probability threshold to ensure accurate detection while minimizing false positives.

Evaluation and Quality Control

To ensure the accuracy of the model and its classifications, the dataset was evaluated using standard binary classification metrics: precision, recall, AUC (Area Under the ROC Curve), and precision-recall curves. Hyperparameter tuning and spatial cross-validation were performed to account for spatial autocorrelation in the radar data and to improve the generalization of the machine learning models.

Training data for the model was primarily sourced from California, while independent testing was conducted using radar data from Texas. The dataset also includes labeled data representing noise sources (such as birds, vehicles, and weather phenomena) to reduce false positives during classification.

By processing large volumes of radar data and applying machine learning algorithms, the BATS toolkit condensed terabytes of raw radar data into concise geotiff maps of bat presence, enabling efficient analysis of bat populations across landscapes.

Bat-aggregated time series workflow

Data files

Abstract

Doppler Radar Data Pipeline

Table of Contents

Features

Prerequisites

Installation and Setup

Usage

Dataset

Model Information

Contributing

License

Acknowledgements

Data Description

Methods

Data Collection

Data Processing

Evaluation and Quality Control

Bat-aggregated time series workflow

Data files

Abstract

README: BATS: Bat-aggregated time series

Doppler Radar Data Pipeline

Table of Contents

Features

Prerequisites

Installation and Setup

Usage

Dataset

Model Information

Contributing

License

Acknowledgements

Methods

Data Description

Methods

Data Collection

Data Processing

Evaluation and Quality Control

Works referencing this dataset