Complex landscapes, complex diets: DNA metabarcoding reveals lady beetle prey richness increase with landcover diversity
Data files
Apr 14, 2026 version files 416 KB
-
dna_data.zip
229.99 KB
-
inventory2019.csv
36.54 KB
-
inventory2021.csv
53.54 KB
-
Iuliano_EA_2026_analysis.Rmd
33.36 KB
-
Iuliano_EA_2026.Rproj
205 B
-
landscape250_reclass.csv
39.30 KB
-
prey_species_data.csv
2.52 KB
-
README.md
13.69 KB
-
site_info.csv
6.85 KB
Abstract
Overview
This repository contains the data and R analysis code associated with:
Iuliano B, Gratton C (2026). Complex landscapes, complex diets: DNA metabarcoding reveals lady beetle prey richness increases with landcover diversity. Ecological Applications.
The study used DNA metabarcoding of gut contents to characterize the diets of eight lady beetle (Coccinellidae) species collected across agricultural and semi-natural habitats in Wisconsin, USA (2019 and 2021). We examined how local habitat type and landscape composition and diversity relate to prey detection rates and prey taxon richness.
Files included
Code
Iuliano_EA_2026.Rproj— RStudio project file. Opening this file in RStudio will automatically set the working directory to the project folder, ensuring all relative file paths in the analysis script resolve correctly.Iuliano_EA_2026_analysis.Rmd— R Markdown script containing all data processing, cleaning, and statistical analyses reported in the manuscript. Produces all figures and model outputs.
Data files
inventory2019.csv— Sample collection metadata for 2019 (beetle ID, species, collection site, date, habitat class, etc.)inventory2021.csv— Sample collection metadata for 2021 (same format as above)site_info.csv— Site-level information including site ID, name, and yearlandscape250_reclass.csv— Landscape composition and diversity metrics (% cropland, Simpson diversity index) calculated within a 250 m radius of each collection site, with land cover reclassified into broad categoriesprey_species_data.csv— Prey taxon lookup table including taxonomic group, habitat association (crop/natural/mixed/aquatic/unknown), and functional role (pest/predator/neutral/mixed)dna_data.zip— Compressed archive containing OTU abundance tables (one CSV per sequencing run:run1_data.csv,run2_data.csv,run3_data.csv,run4_data.csv,run5_data.csv).
Note on raw sequencing data: Raw Illumina sequencing reads are not currently available in a public archive.
Software requirements
Analyses were conducted in R (version ≥ 4.1 recommended). The following packages are required:
| Package | Purpose |
|---|---|
| tidyverse | Data manipulation and visualization |
| sjmisc | Data utilities |
| glmmTMB | Generalized linear mixed models |
| DHARMa | Residual diagnostics for mixed models |
| visreg | Model visualization |
| bipartite | Bipartite interaction network visualization |
| MuMIn | Multi-model inference and AICc model selection |
| emmeans | Estimated marginal means and contrasts |
| ggeffects | Marginal effects and partial residual plots |
All packages are available on CRAN. Install with:
install.packages(c("tidyverse", "sjmisc", "glmmTMB", "DHARMa", "visreg",
"bipartite", "MuMIn", "emmeans", "ggeffects"))
How to run the analysis
- Place all data files and the
./dna.data/folder in the same working directory asIuliano_Ch3_analysis.Rmd. - Open the
.Rmdfile in RStudio and set the working directory to the source file location (Session > Set Working Directory > To Source File Location). - Run all chunks sequentially (or knit the document to HTML).
The script is organized in the following order:
- Setup — load packages and define color palettes
- Load & format data — import sequencing results, sample metadata, and landscape data; merge into a single analysis dataset
- Clean data — subtract negative control contamination, apply read-count threshold (1% of total reads per sample), resolve duplicate samples across sequencing runs
- Data exploration — sample counts by species and habitat, prey detection summaries
- Species-level diet analysis — bipartite interaction network, prey taxonomic composition
- Local habitat and landscape context — mixed models (species as random effect) examining effects of local habitat type, % cropland, and landscape diversity on prey detection rate and prey taxon richness
Variable descriptions
Landscape variables (landscape250_reclass.csv)
Proportion of landcover types and landscape diversity in a 250m radius of collection sites, derived from the USDA Cropland Data Layer for 2019 and 2021
| Variable | Description | Variable class | Levels/scale | Units |
|---|---|---|---|---|
id |
Site ID (links to site.info.csv) |
character | ||
scale |
Radius of landscape buffer | numeric | 250 | meters |
corn_site250 |
% maize within 250 meter radius of collection site | numeric | 0-100 | percentage |
soy_site250 |
% soybean within 250 meter radius of collection site | numeric | 0-100 | percentage |
alfalfa_site250 |
% alfalfa within 250 meter radius of collection site | numeric | 0-100 | percentage |
small.grains_site250 |
% wheat, rye, and barley within 250 meter radius of collection site | numeric | 0-100 | percentage |
other_site250 |
% other cropland within 250 meter radius of collection site | numeric | 0-100 | percentage |
grass_site250 |
% grassland and pasture within 250 meter radius of collection site | numeric | 0-100 | percentage |
wood_site250 |
% forest and woody vegetation within 250 meter radius of collection site | numeric | 0-100 | percentage |
water_site250 |
% water within 250 meter radius of collection site | numeric | 0-100 | percentage |
developed_site250 |
% impervious surface within 250 meter radius of collection site | numeric | 0-100 | percentage |
seminatural_site250 |
% seminatural habitat (sum of grassland and woodland classes) within 250 meter radius of collection site | numeric | 0-100 | percentage |
crop_site250 |
% cropland habitat (sum of all crop classes) within 250 meter radius of collection site | numeric | 0-100 | percentage |
sidi_site250 |
Simpson diversity index of land cover within 250 meter radius | numeric | 0-1 |
Site information (site_info.csv)
| Variable | Description | Variable class | Levels/scale | Units |
|---|---|---|---|---|
id |
Site ID (links to landscape250_reclass.csv) |
character | ||
name |
Unique landscape location name | factor | 24 levels | |
site |
Site number within each landscape | character | ||
year |
Sampling year | factor | 2 levels (2019, 2021) | yyyy |
Inventory files (inventory2019.csv, inventory2021.csv)
Database of all adult lady beetle specimens collected (note: not all specimens were processed and sequenced)
| Variable | Description | Variable class | Levels/scale | Units |
|---|---|---|---|---|
id |
Unique sample ID | character | ||
number |
Unique landscape location ID number | factor | 24 levels | |
name |
Unique landscape location name | factor | 24 levels | |
site |
Site number within each landscape | character | ||
class |
Local habitat type | factor | 5 levels (corn, soy, alfalfa, small grains, grass, wood) | |
round |
Sequential sampling round within year | ordered factor | 7 levels (1-7) | |
year |
Collection year | factor | 2 levels (2019, 2021) | yyyy |
date |
Collection date | date | mm/dd/yy | |
species |
Lady beetle species | factor | 19 levels |
Prey species lookup (prey_species_data.csv)
| Variable | Description | Variable class | Levels/scale |
|---|---|---|---|
prey.taxon |
Prey taxon name (matches OTU column names in DNA data) | character | |
group |
Broad taxonomic group | factor | 8 levels (Arachnids, Collembola, Beetles Flies, Aphids, Mirids, Moths, Thrips) |
habitat |
Prey taxon's typical habitat association | factor | 5 levels (natural, crop, mixed, aquatic, unknown) |
specific.habitat |
More specific habitat description, if available | character | |
function |
Functional role of prey | factor | 4 levels (pest, predator, neutral, mixed) |
notes |
Additional notes | character |
DNA data files (dna.data.zip)
Each of the five CSV files (run1_data.csv through run6_data.csv) represents one Illumina sequencing run and has the same structure. Rows are OTUs; columns are samples. The Taxonomy column is always the last column. Read counts are raw (not normalized); contamination filtering and read-count thresholding are performed in the analysis script.
| Variable | Description |
|---|---|
#OTU ID |
MD5 hash uniquely identifying each OTU (Operational Taxonomic Unit) |
M-XXXX columns |
Raw read counts for each beetle sample, identified by sample ID. Columns prefixed M-BLANK are negative controls; columns prefixed M-C are positive controls |
Taxonomy |
BLAST/BOLD taxonomic assignment for the OTU, including match quality score, reference ID, and full taxonomic path (e.g., `GS|99.5|BOLD:AAB5640;k:Animalia,p:Arthropoda,...`) |
Data collection notes
- Beetles were collected by sweep netting across six habitat types (corn, soy, alfalfa, small grains, grassland, woodland) at multiple farms in Wisconsin in 2019 and 2021.
- Gut contents were extracted and subjected to amplicon sequencing targeting a COI barcode region to identify arthropod prey.
- DNA was sequenced across five runs. Where the same individual was sequenced in multiple runs, the run with higher detection sensitivity was retained (see code comments in the data cleaning section).
- Negative controls were included in each sequencing run; a per-run contamination filter was applied before analysis (reads exceeding 1% of total sample reads were retained as detections).
Contact
Ben Iuliano
CUNY Baruch College
benjamin.iuliano@baruch.cuny.edu
