Skip to main content

Global biogeography of warning colouration in the butterfly Danaus chrysippus


Liu, Wanzhen et al. (2022), Global biogeography of warning colouration in the butterfly Danaus chrysippus, Dryad, Dataset,


Warning colouration provides a textbook example of natural selection, but the frequent observation of polymorphism in aposematic species presents an evolutionary puzzle. We investigated biogeography and polymorphism of warning patterns in the widespread butterfly Danaus chrysippus using records from citizen science (n=5467), museums (n=8864), and fieldwork (n=2586). We find that polymorphism in three traits controlled by known mendelian loci is extensive. Broad allele frequency clines, hundreds of km wide, suggest a balance between long-range dispersal and predation of unfamiliar morphs. Mismatched clines for the white hindwing and forewing tip in East Africa are consistent with a previous finding that the black wingtip allele has spread recently in the region through hitchhiking with a heritable endosymbiont. Light/dark background colouration shows more extensive polymorphism. The darker genotype is more common in cooler regions, possibly reflecting a trade-off between thermoregulation and predator warning. Overall, our findings show how studying local adaptation at the global scale provides a more complete picture of the evolutionary forces involved.


Data sources

We considered two sources of D. chrysippus records (Table S1). “Research records” included museum collections and field collections/observations. “Citizen science records” are those from online biodiversity monitoring platforms to which users can upload photographs and other metadata. These were obtained from the Global Biodiversity Information Facility (GBIF), which collects observations from multiple online databases (Table S1, last downloaded on 01 November 2021,


Phenotyping and allele frequency estimation

We scored each butterfly for three traits, corresponding to three known loci: hindwing white/orange (A locus), light orange/dark brown background colour (B locus) and presence/absence of forewing black tip (C locus) (Figure 1, S1). Heterozygotes at all three loci are not reliably scorable due to variable penetrance [12]. We therefore considered two genotype classes: homozygous recessive (e.g. aa) and dominant (e.g. A-) (Figure 1, S1). Records were grouped in grid cells of 4x4 or 2x2 degrees (latitude and longitude) and allele frequencies were estimated assuming Hardy-Weinberg equilibrium. Maps were plotted using the R packages ggplot2, maps, and mapproj.


Transects and cline analysis

We selected four transects that connected approximate regions of highest abundance of contrasting genotypes at two of the three loci. For cline analysis, we used the estimated allele frequencies for each grid cell ≤450 km from the transect path, accounting for the curvature of the earth (cross-track distance). The position along the transect for each cell was the position nearest to the centre of the cell (along-track distance). Distances were calculated using the R package geosphere. We used HZAR to fit cline models and compared the fits with different parameters for scaling ("none", "fixed", "free") and tails ("none", "both") using corrected AIC scores. Cline parameters were estimated using 105 generations of an MCMC search in three independent chains after 104 burn-in generations.


Environmental associations

To test the hypothesis that the frequency of dark/light morphs (B locus) may be partly driven by the abiotic environment, we selected four environmental variables of potential relevance from the CliMond database (averaged 1961–1990): annual mean temperature (bio01), solar radiation (bio20), annual precipitation (bio12) and soil moisture index (bio28) (Figure S10-S11). Variable layers were processed and checked for errors using Arcmap v. 10.3 (ESRI, 2015). We extracted the corresponding variables for each butterfly record location using the R packages rgdal, rgeos, and maps. For records where coordinates were out of range of the variable layers (N = 283) we manually applied values of nearby areas.

To test for associations, we first assessed Pearson correlations and pairwise linear regressions between each environmental variable (averaged across all record locations within each grid cell) and genotype frequency, weighted by the number of records per cell. We then generated generalised least-squares (GLS) models using the R package nlme: one null model and 15 models including all combinations of explanatory variables. The GLS accounted for spatial autocorrelation assuming that errors are exponentially related to the euclidean distance between grid cells based on coordinates. Variance weights were defined as the inverse of the number of records per grid cell. We assessed the effect of correlation among the explanatory variables using the PerformanceAnalytics R package and by quantifying the variance inflation factor (VIF).


Royal Society, Award: URF\R1\180682

National Geographic Society, Award: WW-138R-17