Data from: Every hue has its fan club: Diverse patterns of color-dependent flower visitation across Lepidoptera
Data files
Jun 03, 2025 version files 1.51 MB
Abstract
This dataset contains RGB color measurements and family-level plant identifications for flowers visited by adult Lepidoptera in the Southwestern Ozarks, USA. Observations were manually curated from iNaturalist and span diurnal and crepuscular flower visits between 2002 and 2024. The dataset includes: flower color data from Lepidoptera visitation records, a control sample of commonly photographed flowers matched for seasonal availability, and a subset of records for evaluating the randomness of flower choices. Flower colors were measured digitally using standardized RGB sampling procedures to assess color preferences across Lepidoptera species.
Dataset DOI: 10.5061/dryad.4f4qrfjqr
Description of the data and file structure
This README.txt file was generated on 8 May 2025 by Dmitry Kutcherov
GENERAL INFORMATION
1. Title of Dataset: Data from: 'Every hue has its fan club: Diverse patterns of color-dependent flower visitation across Lepidoptera'
2. Author Information:
Corresponding Investigator
Name: Dr Dmitry Kutcherov
Institution: University of Arkansas, Fayetteville, AR, USA
Institution email: kucherov@uark.edu
Personal email: dmitry.kutcherov@gmail.com
Co-investigator
Name: Dr Erica L Westerman
Institution: University of Arkansas
Email: ewesterm@uark.edu
3. Date of data collection: 31 Dec 2024
4. Geographic location of data collection: southwestern Ozark Highlands in the central United States of America, including the following counties: Benton, Carroll, and Washington in Arkansas; Adair, Cherokee, Delaware, and Ottawa in Oklahoma; Cherokee in Kansas; and Barry, Jasper, Lawrence, McDonald, Newton, and Stone in Missouri, which together occupy 23,814 km2.
5. Funding sources that supported the collection of the data:
National Science Foundation (grant IOS-2238931 to ELW)
6. Recommended citation for this dataset: Kutcherov, Dmitry and Westerman, Erica L (2025), Data from: 'Every hue has its fan club: Diverse patterns of color-dependent flower visitation across Lepidoptera', Dryad, Dataset
DATA & FILE OVERVIEW
This dataset contains information on the RGB color and family-level taxonomic identity of Lepidoptera-visited flowers in the Southwestern Ozarks (USA).
Simply put, file 1 is a dataset of color-dependent flower visitation by Lepidoptera; file 3 is a sample of the most photographed flowers; file 2 contains subsamples from 1 and 3 to test for randomness of flower choices.
Files and variables
FILE LIST
File 1 Name: iNaturalist_Butterfly-Visited_Flower_Colors_Southwestern_Ozarks.csv
File 1 Description: Comma-delimited dataset containing 8,032 entries of Lepidoptera visiting flowers and RGB color of those flowers.
File 2 Name: iNaturalist_Flower_Colors_Control_Southwestern_Ozarks.csv
File 2 Description: Comma-delimited dataset of observed vs. expected flower visitation by 32 selected Lepidoptera species.
File 3 Name: iNaturalist_Flower_Colors_Southwestern_Ozarks.csv
File 3 Description: Comma-delimited dataset containing 700 pseudorandom entries of flowers and RGB color of those flowers.
DATA-SPECIFIC INFORMATION FOR: iNaturalist_Butterfly-Visited_Flower_Colors_Southwestern_Ozarks.csv
1. Number of variables: 14
2. Number of rows: 8033 total, first row is headings
3. Variable list:
Group: butterflies (Papilionoidea) or moths (rest of Lepidoptera)
Family: taxonomic family
Species: species
ID: iNaturalist observation ID
Date: calendar date (DD-Mon-YY)
Red: red channel in the RGB color space
Green: green channel in the RGB color space
Blue: blue channel in the RGB color space
Month: month number (1 is January, 12 is December)
Cluster: flower color group (Violet, Lavender, Magenta, Blush, Crimson, Red, Orange, Yellow, Beige or White)
Plant_Family: family to which the visited flower with measured color belongs
4. Missing data codes:
Family: Unknown (when family-level ID was problematic)
Species: Genus sp. (e.g., Hemaris sp., when species-level ID was problematic, or Genus sp., when even genus-level ID was difficult)
Plant_Family: Unknown (when family-level ID was problematic; this category also includes a few cases when observations were deleted after we took color measurements but before we did plant family IDs; it also includes a few cases when objects in the photo turned out to be not real flowers upon reexamination)
5. Abbreviations used:
N/A; not applicable
6. Other relevant information:
N/A; not applicable
DATA-SPECIFIC INFORMATION FOR: iNaturalist_Flower_Colors_Control_Southwestern_Ozarks.csv
1. Number of variables: 8
2. Number of rows: 12990 total, first row is headings
3. Variable list:
Species: species
Distribution: Observed (copied from iNaturalist_Butterfly-Visited_Flower_Colors_Southwestern_Ozarks.csv) or expected (a seasonally-controlled pseudorandom subsample from iNaturalist_Flower_Colors_Southwestern_Ozarks.csv)
Date: calendar date (DD-Mon-YY)
Red: red channel in the RGB color space
Green: green channel in the RGB color space
Blue: blue channel in the RGB color space
Month: month number (1 is January, 12 is December)
Cluster: flower color group (Violet, Lavender, Magenta, Blush, Crimson, Red, Orange, Yellow, Beige or White)
4. Missing data codes:
N/A; not applicable
5. Abbreviations used:
N/A; not applicable
6. Other relevant information:
N/A; not applicable
DATA-SPECIFIC INFORMATION FOR: iNaturalist_Flower_Colors_Southwestern_Ozarks.csv
1. Number of variables: 8
2. Number of rows: 701 total, first row is headings
3. Variable list:
ID: iNaturalist observation ID
Date: calendar date (DD-Mon-YY)
Red: red channel in the RGB color space
Green: green channel in the RGB color space
Blue: blue channel in the RGB color space
Month: month number (1 is January, 12 is December)
Cluster: flower color group (Violet, Lavender, Magenta, Blush, Crimson, Red, Orange, Yellow, Beige or White)
Plant_Family: plant family to which the flower with measured color belongs
4. Missing data codes:
Plant_Family: Unknown
5. Abbreviations used:
N/A; not applicable
6. Other relevant information:
N/A; not applicable
Access information
Data was derived from the following source:
All data used in this study were sourced from https://www.inaturalist.org/ by doing a search on the website constrained to the order Lepidoptera and the following 14 counties: Benton, Carroll, and Washington in Arkansas; Adair, Cherokee, Delaware, and Ottawa in Oklahoma; Cherokee in Kansas; and Barry, Jasper, Lawrence, McDonald, Newton, and Stone in Missouri: https://www.inaturalist.org/observations?place_id=2601,2595,2162,1779,710,1187,256,389,2900,2939,2753,2940,229,576&subview=map&taxon_id=47157.
We manually filtered the photographs of adult butterflies and moths to those visiting flowers during daylight and twilight hours and recorded the calendar date of each such observation. The final dataset was compiled on December 31, 2024, and included all relevant observations uploaded to iNaturalist by that date. We also identified, to the family level, the flowering plants on which Lepidoptera were photographed. In total, our dataset comprised diurnal flower visitation entries spanning a period from April 20, 2002 to December 21, 2024.
Additionally, we created a control flower color dataset by measuring the colors of 700 individual flowers from our study area. The purpose of this control dataset was to test whether individual Lepidoptera species showed non-random flower color preferences or simply visited flowers at random, relative to what was seasonally available in the environment. To construct this additional dataset, we first generated a distribution of 700 calendar dates that closely mirrored the seasonal distribution of Lepidoptera observations in our primary dataset; the total sample size of 700 was chosen because it exceeded the number of observations for the most observed species, the monarch Danaus plexippus (N = 618). Both insects and plants exhibit strong seasonal patterns, and we aimed to control for phenological overlap by matching the timing of plant and Lepidoptera records as closely as possible. For each calendar date, we searched iNaturalist for angiosperm records from the study area, selected one to three most recently uploaded photographs (depending on how many were needed, based on the generated distribution) that clearly depicted flowers, and measured flower color from these images as described below. We did not control for plant taxonomic identity – instead, we intended to sample the most frequently photographed flowers. For each Lepidoptera species with >50 observations in the primary dataset, we drew a subset of expected visited flowers from the additional control dataset that matched not only the sample size but also the day-to-day seasonal distribution of the corresponding Lepidoptera observations. We acknowledge that drawing multiple random subsets, not just one, and from a larger control pool, would be a more rigorous approach for estimating expected distributions. However, this was not feasible in our case due to a necessary tradeoff between randomness and seasonal matching. Generating multiple seasonally constrained random subsets of sufficient sample size would have required orders of magnitude more manual color measurements. We therefore chose a single expected subset of flowers per species.
For our analyses of Lepidoptera flower color preference, we took into account all flowering plants except grasses (Poaceae). Flower colors were measured on a MacBook Pro 2019 (macOS Sonoma v. 14.5) with the display color profile set to sRGB IEC61966-2.1. We used the Digital Color Meter application v. 5.22 with the aperture size of 11 x 11 pixels and the color space set to ‘Display native values.’ To expedite the measurements, we did not download the original photos from iNaturalist and instead viewed them in the Google Chrome browser v. 125.0.6422.78 whose color profile was also set to sRGB. Before measuring the flower colors, we compared the red, green and blue readings from the Digital Color Meter by sampling known CSS web colors and confirmed that this software displayed the red, green, and blue (RGB) values from web pages correctly. We then picked an area of each image that was as representative of the overall flower color as possible, which could include not only the petals, but also the calyx, stamens and pistils, bracts, and other parts that contributed to the general appearance of the flower or inflorescence. Sometimes this procedure would require zooming the image in or out. If the flower was obscured by deep shade or by the body and wings of the butterfly or moth, we measured the average color of a well-lit neighboring flower of the same species in the same image. We did not make any adjustments to white balance, exposure, etc., and treated our photographic data as inherently noisy, but we did assume that the nature and amount of statistical noise would be constant across all families and species. Thus, we had three values (red, green, and blue), which were calculated as means across a representative area of 11 x 11 pixels, for one flower per observation. Sometimes one iNaturalist observation contained images of the same individual sitting on flowers of different color, and these were treated as separate observations.
