DroneZaic Dataset: a robust end-to-end pipeline for mosaicking freely flown aerial video of agricultural fields

Kharismawati, Dewi Endah 1 ; Kazic, Toni1

Published Aug 06, 2025 on Dryad. https://doi.org/10.5061/dryad.r4xgxd2q7

Data files

Aug 06, 2025 version files 70.40 GB

1_raw_videos.zip

38.21 GB
2_processed_data.zip

30.96 GB
3_models.zip

1.23 GB
README.md

12.14 KB

Abstract

Unoccupied aerial vehicles (UAVs) are increasingly used for high-throughput phenotyping in quantitative genetics and breeding trials. In principle, freely flown vehicles would permit real-time flexibility in identifying and monitoring regions and plants of interest. Mosaicking multiple images provides a high-resolution global image, and consumer-grade UAVs offer low cost, ease of flying, and excellent RGB cameras. However, the vehicles’ inaccurate telemetry complicates estimating the homographies between pairs of frames during mosaicking, and accumulated errors distort later portions of the mosaic. Crop fields are particularly challenging because their regular planting pattern and very similar plants eliminate the distinctive features that guide mosaicking. To meet these challenges for a wider range of investigators, we propose DroneZaic, an end-to-end pipeline that dynamically samples video frames, automates camera and gimbal calibration, estimates homographies, and generates mini-mosaics. Together, these techniques significantly reduce errors in the output mosaics. Our unsupervised deep learning model component is trained on a comprehensive video dataset comprising different flight trajectories, maize lines, growth stages, and synthetic illumination data augmentation, which involves systematically altering lighting conditions and adding noise to enhance model generalizability. DroneZaic and its refined CorNetv3, is more accurate, achieving a 13.1% improvement in APE, 14.11 times faster than ASIFT, and more robust than our earlier CorNet and CorNetv2. We demonstrate DroneZaic’s effectiveness and generalizability in computing accurate mosaics of imagery captured by freely-flown UAVs.

Dataset DOI: 10.5061/dryad.r4xgxd2q7

Description of the data and file structure

DroneZaic Dataset

Welcome to the DroneZaic dataset repository, accompanying our paper:

"DroneZaic: A robust end-to-end pipeline for mosaicking freely flown aerial video of agricultural fields."

This dataset supports the evaluation and reproducibility of our proposed end-to-end aerial video mosaicking pipeline.

Dataset Structure

The dataset is organized into four main directories:

`1_raw_videos.zip`

This directory contains the raw aerial video test data used in the paper. The videos were captured using different flight trajectories and at various growth stages of agricultural fields. Each subdirectory represents a specific flight mission.

Each video is accompanied by its corresponding subtitle file (.SRT) and can be parsed using common video processing or text processing tools. Each file contains timestamped metadata (such as GPS coordinates), which are particularly useful for synchronizing frames with geospatial coordinates and for downstream processing such as:

Mosaic generation
Crop growth analysis
Frame alignment across different missions

Once unzipped, it contains 11 subdirectories, each corresponding to a unique aerial flight mission:

1_seedling_linear
- DJI_0205.MOV, DJI_0205.SRT
2_seedling_serpentine_8passes
- DJI_0063.MOV, DJI_0063.SRT
- DJI_0064.MOV, DJI_0064.SRT
3_adult_linear
- DJI_0696.MOV, DJI_0696.SRT
4_adult_serpentine_2passes
- DJI_0700.MOV, DJI_0700.SRT
5_adult_serpentine_9passes
- DJI_0693.MOV, DJI_0693.SRT
6_pumpkin_patch_cruising
- DJI_0296.MOV, DJI_0296.SRT
- DJI_0297.MOV, DJI_0297.SRT
7_pumpkin_patch_linear
- DJI_0298.MOV, DJI_0298.SRT
8_farm_building
- DJI_0006.MOV, DJI_0006.SRT
- DJI_0007.MOV, DJI_0007.SRT
- DJI_0008.MOV, DJI_0008.SRT
9_sunflower
- DJI_0004.MOV, DJI_0004.SRT
10_seedling_linear_dji_air_2s
- DJI_0001.MOV, DJI_0001.SRT
11_adult_serpentine_2passes_dji_air_2s
- DJI_0044.MOV
- DJI_0045.MOV

`2_processed_data.zip`

DroneZaic is an end-to-end UAV video processing pipeline that transforms raw UAV videos into high-resolution global mosaics through a five-step processing workflow.

This zipped directory contains two fully processed datasets:

1_seedling_linear
5_adult_serpentine_9passes

The raw video data for these datasets is available separately in 1_raw_videos.zip.

Each subdirectory contains saved images and data from each processing step, starting from raw frame extraction and progressing all the way to the final stitched global mosaic. This structure is designed to showcase the step-by-step process of the DroneZaic pipeline in a transparent and reproducible way.

Detailed Description of Each Subfolder

raw/
- Contents: Extracted frames from the original UAV video using a dynamic sampling approach. This sampling method leverages Farnebäck dense optical flow to estimate camera motion and frame-to-frame overlap. By analyzing the motion vectors, the pipeline adaptively selects frames that maximize scene coverage and minimize redundant imagery, ensuring an optimal balance between data volume and spatial completeness.
- File format: .jpg, .tif, or .png images, depending on requirements. Use JPG or TIF if GPS metadata needs to be embedded in the image. And use PNG for standard RGB images without embedded metadata.
- Naming convention: <year>_<month>_<day>_<videoID>_frame_<frame_number>.png
  - Example: 2023_06_23_205_frame_000001.png
- Purpose: Provides the original extracted frames.
quiver/
- Contents: Motion vector visualizations derived from optical flow computations between pairs of frames extracted during the dynamic sampling step. Each image represents the calculated motion vectors as quiver plots overlaid on the frame.
- File format: .png images with overlaid quiver plots.
- Naming convention: <year>_<month>_<day>_<videoID>_quiver_frame_<frame_number>.png
  - Example: 2023_06_23_205_quiver_frame_000002.png (generated from frames 2023_06_23_205_frame_000001.png and 2023_06_23_205_frame_000002.png)
- Purpose: Validates the dynamic sampling step by visualizing camera and scene motion.
calibrated/
- Contents: Frames after lens distortion correction and gimbal calibration. Each lens has its own intrinsic and extrinsic camera parameters used for correction, which are obtained using a checkerboard calibration procedure. These frames are geometrically corrected to ensure accurate alignment in later processing steps.
- File format: .jpg, .tif, or .png images, depending on requirements. Use JPG or TIF if GPS metadata needs to be embedded in the image, and PNG for standard RGB images without embedded metadata.
- Naming convention: <year>_<month>_<day>_<videoID>_frame_<frame_number>.png
  - Example: 2023_06_23_205_frame_000001.png
- Purpose: Provides geometrically corrected frames ready for feature matching and mosaicking. To generate a better global mosaic without any distortions.
homography_matrices/
- Contents: Homography transformation matrices for aligning frames. These matrices are computed using feature descriptors and homography estimation methods (both traditional and deep learning-based) to map one frame to the next. These samples include homographies computed from all versions of CorNet (v1, v2, and v3), as well as ASIFT. Each file contains the sequence of pairwise homographies from the beginning to the end of the sequence and therefore has frame_count - 1 lines.
- File format: Serialized matrix files (.csv). Each line is a flattened 3x3 transformation matrix (converted into a 1x9 vector) with values separated by commas:
```
0.9987421823473002,-0.000609277434688708,1.808174062745844,0.0008320474162297961,0.9986113181857407,8.602796753772944,9.79313685914524e-07,-7.920604925794536e-07,1.0
```
- Naming convention: H_<estimation_method>.csv
  - Example: H_cornetv3.csv
- Purpose: Stores pairwise transformations for the next step processing.
cornetv3_mini_partition/
- Contents: Subdirectories representing shots based on shot boundaries computed using directional changes from the optical flow. In this example, the grouping step is performed using CorNetv3. Each group_<id> subdirectory contains the frames belonging to that group. Additionally, there are corresponding .csv files for the homography matrices associated only with the frames in that group.
- Directory format: group_<id> with a corresponding CSV file H_<estimator>_group_<id>.csv
  - Example: group_001 and H_cornetv3_group_001.csv
- Naming convention: Same as the files in raw/ and calibrated/, as they are moved or copied over.
  - Example: 2023_06_23_205_frame_000001.png
- Purpose: Groups frames into smaller shots that will be processed as mini-mosaics.
cornetv3_mini_mosaics/
- Contents: Intermediate stitched mosaics generated from each shot group. These mini-mosaics are the result of stitching together frames from the cornetv3_mini_partition/ step.
- File format: .png images (one mini-mosaic per shot group).
- Naming convention: <datetime_when_generated>_<group_id>.png
  - Example: 2024-04-02_12-15-23_group001.png
- Purpose: Provides intermediate mosaics that can be validated before combining them into the final global mosaic.
cornetv3_global_mosaics/
- Contents: Final stitched global mosaics assembled from all the mini-mosaics using ASIFT for robust matching. This folder also includes the ASIFT processing time log and the homography matrices used to align each mini-mosaic in the final assembly.
- File format: .png images.
- Naming convention: final_global_mosaic_<number_of_shots>.png
  - Example: final_global_mosaic_2.png
- Purpose: The final deliverable mosaic produced by the DroneZaic pipeline.

Notes

Consistency: File names across subfolders share the same frame or group indices, making it easy to cross-reference data between pipeline stages.
Data volume: Some subfolders (e.g., raw/) may contain thousands of frames depending on the duration of the UAV flight.

`3_models.zip`

This directory contains all trained models for all versions of CorNet deep homography estimation, including:

CorNetv1
CorNetv1.2
CorNetv2
CorNetv3

Each model version has its own subdirectory that includes:

The trained model files
The corresponding checkpoint file

For example, under the cornetv3 directory, there is a checkpoint file whose first line reads: model_checkpoint_path: "../models/cornetv3/l1_loss_normalize/model.ckpt-700000"

Before running the model through the test pipeline, verify that the relative path in the model_checkpoint_path entry is correct.

Usage

These models can be used to:

Estimate homography matrices with unsupervised deep learning method for a faster and more accurate homography.
Train or fine-tune the CorNet models on new datasets.

`4_full_sized_figures.zip`

This directory contains all high-resolution figures used in the main paper and supplementary materials.

Note: Due to copyright license restrictions, these figures will be made available under the Supplemental Information section.

Files and variables

File: 1_raw_videos.zip

Description: Contains the raw aerial video data to test the robustness of DroneZaic. Captured from different flight trajectories, at various plant growth stages, altitudes, and using different UAVs. Includes associated .SRT metadata files containing latitude and longitude information.

File: 4_full_sized_figures.zip

Description: Contains all full-resolution figures used in the paper and supplementary materials for easy reference and high-quality visuals.

File: 3_models.zip

Description: Stores all trained models for different versions of the cornet pipeline (CorNetv1, CorNetv1.2, CorNetv2, CorNetv3). Each version includes subdirectories with checkpoints for reproducibility and fine-tuning.

File: 2_processed_data.zip

Description: Demonstrates the full pipeline on example test data, showcasing every step from initial dynamic sampling with optical flow and quiver plots, automatic lens and gimbal calibration, homography estimation, shot detection, mini-mosaicking, and global mosaicking.

Code/software

No special software is required to view the raw data in this dataset. All videos are stored in the standard .MOV format, and all images are in the .PNG format, which can be opened with any common media or image viewer.

Viewing Data

Videos (.MOV) can be viewed using:

Default media players (VLC, QuickTime, Windows Media Player)

Any other standard video playback software.

Images (.PNG) can be viewed using:

Standard image viewers (Windows Photos, Preview on macOS)

Browsers and editors (GIMP, Photoshop, etc.).

Processing the Data

While no specialized software is required for viewing, the DroneZaic processing pipeline was implemented in Python using various open-source libraries, as described in our paper. However, the raw dataset itself does not require any proprietary or specialized software to open.

If you would like to replicate our processing pipeline or run the included code, please refer to the DroneZaic GitHub repository for detailed setup instructions, including installation of dependencies and model checkpoints.

Access information

Other publicly accessible locations of the data:

https://drive.google.com/drive/folders/1oVcTgs3YV3AzSi2Iajl6O9g1onI7ArBN?usp=sharing