Skip to main content

Data and analysis from: Two centuries of monarch butterfly collections reveal contrasting effects of range expansion and migration loss on wing traits

Cite this dataset

Freedman, Micah; Dingle, Hugh; Strauss, Sharon; Ramírez, Santiago (2020). Data and analysis from: Two centuries of monarch butterfly collections reveal contrasting effects of range expansion and migration loss on wing traits [Dataset]. Dryad.


Migratory animals exhibit traits that allow them to exploit seasonally variable habitats. In environments where migration is no longer beneficial, such as oceanic islands, migration-association traits may be selected against or be under relaxed selection. Monarch butterflies are best known for their continent-scale migration in North America but have repeatedly become established as non-migrants in the tropical Americas and on Atlantic and Pacific Islands. These replicated non-migratory populations provide natural laboratories for understanding the rate of evolution of migration-associated traits. We measured >6,000 museum specimens of monarch butterflies collected from 1856 to the present, as well as contemporary wild-caught monarchs from around the world. We determined (1) how wing morphology varies across the monarch’s global range, (2) whether initial long-distance founders were particularly suited for migration and (3) whether recently-established non-migrants show evidence for contemporary phenotypic evolution. We further reared >1,000 monarchs from six populations around the world under controlled conditions and measured migration-associated traits. Historical specimens show that (1) initial founders are well-suited for long-distance movement and (2) loss of seasonal migration is associated with reductions in forewing size and elongation. Monarch butterflies raised in a common garden from four derived non-migratory populations exhibit genetically-based reductions in forewing size, consistent with a previous study. Our findings provide a compelling example of how migration-associated traits may be favored during the early stages of range expansion, and also the rate of reductions in those same traits upon loss of migration.


This repository contains two sets of analyses and raw data. The first folder ("monarch_morphology_primary_analysis") contains the raw data and code used for generating all of the analyses and figures reported in the main manuscript. The second folder ("RAFM_and_DRIFTSEL") contains raw data and code used for some of the analyses reported in the supplementary materials.

The first file "wings_04.25.20.csv" contains all measurements from museum and contemporary wild caught monarch specimens. The measurement protocol is described in the manuscript. The raw data file includes the following columns:

  1. Index: variable for sorting purposes.
  2. SampleID: unique identifier for each individual butterfly.
  3. Collection: refers to the ID of the collection from which images were taken.
  4. Region: refers to the broad geographic region from which the monarch butterfly was collected. Regions include North America, Central America, South America, the Caribbean, Pacific Islands, and the Atlantic.
  5. Country/Archipelago: refers to the country or archipelago from which butterflies were collected.
  6. Island/State: refers to the individual island or state from which butterflies were collected.
  7. County/District: refers to the county or district from which butterflies were collected.
  8. Site/City: refers to the site or city from which butterflies were collected.
  9. exact_location: variable describing whether the locality information is sufficient to provide pinpointed GPS coordinates to be used in analysis
  10. lon: decimal longitude, generated using the geocode function
  11. lat: decimal latitude, generated using the geocode function
  12. overwintering?: variable describing whether butterflies were collected from known overwintering sites in North America
  13. Wild-Caught?: variable to distinguish butterflies that were reared or collected as eggs, larvae, or pupae
  14. image_type: variable describing the imaging procedure. "scan" refers to butterflies that were dissected and had their wings imaged using a flatbed scanner. "photo" refers to butterflies that were photographed in standard pinning position. "satscan" is specific to the Natural History Museum (London) collection as refers to images that were generated as satscans, which involves whole drawer-level imaging.
  15. smoothing: variable describing whether forewing perimeter was measured using the "spline fit" or "interpolate" option in ImageJ
  16. Host_plant: variable describing host plant ID, if known, for reared monarchs. Not important for analysis.
  17. Collection_Date: refers to the date of collection for monarchs, in the form YYYYMMDD. In cases where month or day of collection is unknown, the MM and DD fields are left as "00"
  18. Sex: variable describing butterfly sex
  19. LLength: left forewing length
  20. LWidth: left forewing width
  21. LArea: left forewing area
  22. LPerimeter: left forewing perimeter
  23. RLength: right forewing length
  24. RWidth: right forewing width
  25. RArea: right forewing area
  26. RPerimeter: right forewing perimeter
  27. observer: refers to the individual who conducted the measurements. >95% of entries were measured by the first author (MF).

The second and third files ("adult_morphology.csv" and "thorax_abdomen_mass.csv") are measurements from common-garden reared monarchs. The details of rearing are provided in thorough detail in Freedman et al. (2020), Evolution. The raw data file includes the following columns:

  1. Cat.ID: unique identifier for butterflies
  2. Index: sorting variable
  3. Year: year in which rearing occurred (2017 or 2018)
  4. Date: the date on which caterpillars were added to host plants
  5. Usage: refers whether a given host plant was being used for the first or second time
  6. Plant.ID: unique identifier for host plants
  7. Species: host plant species
  8. Pop: host plant population
  9. GH: refers to the identity of the greenhouse in which butterflies were reared
  10. Group: refers to the position of the plants on the bench within each greenhouse
  11. Mon.Pop: monarch population identity
  12. ID: maternal family identity
  13. Caterpillar: the index of caterpillars on each host plant
  14. Weight: weight (in g) on day 8 of development
  15. Pupation: date of pupation
  16. Eclosion: date of eclosion
  17. Env.weight: weight of glassine envelope in which adults were weighed
  18. Total.weight: weight of glassine envelope and enclosed butterfly together. Adult butterfly mass determined as the difference between this and previous column (see column 34).
  19. Sex: butterfly sex
  20. OE: infection status with the protozoan parasite Ophryocystis elektroschirrha, on an approximate log(10) scale
  21. LLength: left forewing length
  22. LWidth: left forewing width
  23. LArea: left forewing area
  24. LPerimeter: left forewing perimeter
  25. RLength: right forewing length
  26. RWidth: right forewing width
  27. RArea: right forewing area
  28. RPerimeter: right forewing perimeter
  29. RHWArea: right hindwing area (not used in analysis)
  30. HW.Mass: right hindwing mass (not used in analysis)
  31. days_to_pupation: time difference between caterpillar placement and pupation, in days
  32. days_to_eclosion: time difference between caterpillar placement and eclosion, in days
  33. maternal_family: combination of Mon.Pop and ID columns, gives unique identifier for each maternal family
  34. emergence_weight: mass of butterflies (in g) ~6 hours after eclosion
  35. infection_status: considered infected if log(10) OE spore counts >= 2, uninfected if <= 2
  36. infection_status_binomial: same as previous column, but with 1/0
  37. sym.allo: refers to whether host plant species and monarch population are sympatric or allopatric

The thorax/abdomen mass file comes from the same experiment and includes the following columns:

  1. Index: sorting variable, matches with adults_morphology 
  2. Cat.ID: identifier, matches with adults_morphology, used to merge files
  3. Eclosion: eclosion date
  4. Sex: butterfly sex
  5. thorax: mass of dried thoracic tissue in g
  6. abdomen: mass of dried abdominal tissue in g
  7. fresh: refers to whether butterflies were frozen immediately post-eclosion (fresh = "yes") or were kept alive in envelopes subsequent to eclosion (fresh = "no"). Only fresh specimens were used for analysis.

Additional .csv files includes in this repository are associated with the RAFM and driftsel analyses reported in the Supplementary Information. Files included in the "RAFM_and_DRIFTSEL" folder are as follows:

  1. flt_snps_clean.RDS -- this file contains called genotypes
  2. RAFM_1000SNPs_10000_theta.csv -- this is a single output run from RAFM reporting the theta parameter, which is used for generating the ancestry/coancestry matrix shown in Figure S4a. The remaining .csv files are all quantitative trait inputs for the driftsel analysis
  3. monarchs_rafm.R -- code for generating the ancestry/coancestry matrix reported in the supplementary information
  4. RAFM.R -- code associated with the R admixture F model
  5. DRIFTSEL -- folder containing driftsel analyses
    1. analysis_quant.traits_drifstel_monarchs.R -- code for implementing the driftsel model
    2. driftsel.r -- the underlying driftsel model
    3. viztraits_driftsel_monarchs.R -- code for creating the drift ellipses shown in the supplementary information
    4. input_data -- folder containing quantitative trait data used in driftsel models
      1. complete_driftsel_raw_data.csv
      2. complete_monarch_covariates_driftsel.csv
      3. complete_monarch_traits_driftsel.csv
      4. complete_monarchs_pedigree_driftsel.csv


Usage notes

All analyses were conducted in R, and scripts are available here. The primary script that contains all analyses reported in the main text of the paper is global_monarch_wing_morphology.R. Remaining scripts are associated with the RAFM and driftsel analyses reported in the Supplementary Information.