Skip to main content

Comparative phylogenetics of Papilio butterfly wing shape and size demonstrates independent hindwing and forewing evolution

Cite this dataset

Owens, Hannah et al. (2020). Comparative phylogenetics of Papilio butterfly wing shape and size demonstrates independent hindwing and forewing evolution [Dataset]. Dryad.


The complex forces that shape butterfly wings have long been a subject of experimental and comparative research. Butterflies use their wings for flight, camouflage, mate recognition, warning and mimicry. However, general patterns and correlations among wing shape and size evolution are still poorly understood. We collected geometric morphometric measurements from over 1400 digitized museum specimens of Papilio swallowtails and combined them with phylogenetic data to test two hypotheses: 1) forewing shape and size evolve independently of hindwing shape and size, and 2) wing size evolves more quickly than wing shape. We also determined the major axes of wing shape variation and discovered that most shape variability occurs in hindwing tails and adjacent areas. We conclude that forewing shape and size are functionally and biomechanically constrained, while hindwings are more labile, perhaps in response to disruptive selective pressure for Batesian mimicry or against predation. The development of a significant, re-usable, digitized data resource will enable further investigation on tradeoffs between flight performance and ecological selective pressures, along with the degree to which intraspecific, local-scale selection may explain macroevolutionary patterns.



Standardized dorsal and ventral images of Papilio butterfly specimens with scale and color bars were obtained from four natural history museums (Supplemental Figure 1): the American Museum of Natural History (AMNH), Field Museum (FMNH), Florida Museum of Natural History, McGuire Center for Lepidoptera and Biodiversity (MGCL), and the Smithsonian Institution National Museum of Natural History (NMNH; Supplemental Table 1). Images were taken with a NIKON D300S with an AF Micro-Nikkor 60mm f/2.8D lens (AMNH), Canon EOS 70D with a Canon EF 50mm f/1.4 USM lens (FMNH), Canon EOS 70D with a Canon EF-S 60mm f/2.8 Macro USM lens (MGCL), or a Canon EOS 6D with a Canon EF 28-80mm f/3.5-5.6 lens (NMNH) mounted on either a copy stand or tripod and operated manually (AMNH) or tethered to a MacBook Air with the Canon EOS Utility (FMNH, MGCL, NMNH; Supplemental Figure 2). Photographs were centered with white-space around images, where labels and color bar are located, to limit lens distortion.  Specimen label data, collection date, location and sex of specimen (where available) were transcribed by volunteers via the Notes from Nature platform (Hill et al. 2012). All standardized images used in this study have been made publicly available.

            Landmarks for morphometric measurement were based on previous morphological work on Heraclides swallowtails (Lewis et al. 2015; Fig. 1). One forewing landmark (F1 in Fig. 1), was removed from final analysis due to particularly high rate of measurement error; this was largely due to curling of the anterior wing margin in many specimens. To allow full view of otherwise overlapping wing elements, we used 10 forewing landmarks from dorsal images and 12 hindwing landmarks from ventral images (Figure 1). Landmark and 1 cm scale bar coordinates were collected in ImageJ 1.49 ( using the PointPicker plugin ( and imported into Microsoft Excel for collation and post-processing. We collected 1,449 dorsal and 1,404 ventral landmark measurement sets representing 60 and 59 species, respectively (Supplemental Table 1). Measurements were calibrated with scale bar coordinates, and final coordinate text files were prepared using TextWrangler 5.5.2 ( These data are available in Appendix 1. Forewing and hindwing landmark data with scale information were superimposed using Procrustes alignment; species represented by fewer than 10 specimens and statistical outlier landmark sets were removed, and the final datasets were re-aligned. We tested for allometric effects in fore- and hindwing shape variation using the homogeneity of slopes test (grouping specimens by species) and a subsequent Procrustes ANOVA to determine the contribution of size in determining shape variation. Mean species shape was calculated, the resulting dataset was re-aligned, gross outliers were removed, and allometric effects (grouping species into two subclades: Agehana + Pterourus, Alexanoria, Chilasa, and Heraclides, Fig. 1) were examined. Finally, mean intraspecific Procrustes distance from mean shape, morphospace volume occupied by each species (the product of the range of each principal component excluding values more than 1.5 times greater or less than the interquartile range), mean intraspecific centroid size, and intraspecific centroid variance were calculated from the specimen-level dataset. These four shape summary statistics were each rescaled to values between 0 and 1 using min-max normalization to render them comparable for downstream analyses. All shape analyses were performed using the R package geomorph 3.1.2 (Adams et al. 2013), except for intraspecific Procrustes distance from mean shape, which was calculated using Evomorph 0.9 (Cabrera and Giri 2016; details can be found in Appendix 2).


Testing independence of forewing and hindwing shape evolution

Using a recently-published time-calibrated phylogeny for swallowtails in the clade of interest (Fig. 1; Appendix 1; Owens et al. 2017), we performed a suite of comparative phylogenetic analyses of forewing and hindwing morphology to test the independence of fore- and hindwing shape and size evolution. These analyses were all done in R 3.5.1 (R Core Team 2012; details of the analyses can be found in Appendix 2). We estimated phylogenetic signal of fore- and hindwing shapes using Kmult, (Blomberg et al. 2003), a generalization of Blomberg’s K implemented in the R package geomorph 3.1.2 (Adams et al. 2013). Just as with the traditional Blomberg’s K statistic, the closer Kmult is to 1, the more variation in species’ characters can be explained by their phylogenetic relationships under a Brownian motion (BM) model of evolution, and values of K< 1 indicate more variation than expected under BM (Blomberg et al. 2003). We then used geomorph to calculate the modular rate ratio (function: compare.multi.evol.rates) for the fore- and hindwing datasets (Denton and Adams 2015), and test whether fore- and hindwing shape evolution was correlated


(also known as phylogenetic integration, function: phylo.integration; Adams and Felice 2014). Both tests were conducted under an assumption of BM, as this is the only model currently available for such multidimensional datasets implemented in geomorph (Adams and Collyer 2018). 


Testing independence of fore- and hindwing size evolution

We also tested the independence of forewing and hindwing size evolution. First, we estimated the phylogenetic signal (univariate Blomberg’s K), then estimated the evolutionary rates of fore- and hindwing shape based on the best-fit evolutionary model for each dataset, and finally, tested for significant fore- and hindwing centroid size correlation. These analyses were done using the R packages phytools 0.6-60


(Revell 2012), geiger 2.0.6 (Harmon et al. 2008), and phylolm 2.6 (Tung Ho and Ané 2014); additional details can be found in Appendix 2. Correlations were assessed using a linear-time algorithm developed by Tung Ho and Ané (2014). For the correlation between fore- and hindwing centroid size, we fit a series of phylogenetic linear regressions with different evolutionary models—BM, Ornstein-Uhlenbeck (OU) with root ancestral state estimated from the data, and early burst (EB). We chose the best-fit model for the relationship between fore- and hindwing centroid size based on its minimum corrected Akaike Information Criterion (AICc) score as calculated using the R package MuMIn 1.42.1 (Barton 2018) and examined the ratio of hindwing to forewing evolutionary rates (σ­­2) under these best-fit models using geomorph. To get a clear picture of the degree to which fore- and hindwing centroid sizes were correlated, we calculated R2­pred for hindwing size as explained by forewing size; this is an extension of the traditional R2 that weighs the residuals by variance and covariance (Ives 2018). These calculations were done using the R package rr2 1.0.1 (Ives and Li 2018; Appendix 2).

Differences in speed of shape and size evolution

We made a final comparison of rates of fore- and hindwing shape and size evolution (σ­­2) as obtained from the analyses described above to determine the relative evolutionary lability of these four characteristics.  Appendix 2 provides an R markdown script of all analyses performed.

Usage notes

As a caveat, please note that at each of the four instutions where specimens were imaged, a different camera lens was used (see Methods section). We did not correct for possible lens-based image distortion. We considered lens-based distortion of images to likely be very minor for this study because 1) the specimens are centered in the images, and 2) there is quite a bit of space between the specimen and the edge of the image (as we include color standards and all labels associated with the specimen in each image). Further, variance introduced by this distortion is likely minimal compared to variance between species. If our data were to be used for analyses at the intraspecies level in the future, as we hope they will, lens corrections will likely be necessary.


National Science Foundation, Award: DEB 1523732

National Science Foundation, Award: DEB 1557007

Agence Nationale de la Recherche, Award: ANR-10-LABX-25-01