Skip to main content

Two ways to be endemic: Alps and Apennines are different functional refugia during climatic cycles

Cite this dataset

Dapporto, Leonardo (2021). Two ways to be endemic: Alps and Apennines are different functional refugia during climatic cycles [Dataset]. Dryad.


Endemics co-occur because they evolved in situ and persist regionally or because they evolved ex situ and later dispersed to shared habitats, generating evolutionary or ecological endemicity centres, respectively. We investigate whether different endemicity centres can intertwine in the region ranging from Alps to Sicily, by studying their butterfly fauna. We gathered an extensive occurrence dataset for butterflies of the study area (27,123 records, 269 species, in cells of 0.5x0.5 degrees of latitude-longitude). We applied molecular-based delimitation methods (GMYC model) to 26,557 COI sequences of Western Palearctic butterflies. We identified entities based on molecular delimitations and the most recent checklist of European butterflies and objectively attributed occurrences to their most probable entity. We obtained a zoogeographic regionalisation based on the 69 endemics of the area. Using phylogenetic ANOVA we tested if endemics from different centres differ from each other and from non-endemics for key ecological traits and divergence time. Endemicity showed high incidence in the Alps and Southern Italy. The regionalisation separated the Alps from the Italian Peninsula and Sicily. The endemics of different centres showed a high turnover and differed in phenology and distribution traits. Endemics are on average younger than non-endemics and the Peninsula-Sicily endemics also have lower variance in divergence than those from the Alps. The observed variation identifies Alpine endemics as paleoendemics, now occupying an ecological centre, and the Peninsula-Sicily ones as neoendemics, that diverged in the region since the Pleistocene. The results challenge the common view of the Alpine-Apennine area as a single “Italian refugium”.


Sampling and datasets

The study area includes the Alps (, the Italian Peninsula, Sicily and the small Italian islands closer to this land than to any other (Figure 2a). We obtained 307,228 records for butterfly species as recognised in Wiemers et al. (2018) within the study area for cells of 0.5x0.5 degrees of latitude and longitude, corresponding to 1277 km2 in the centre of the study area (Rome) (sources described in Appendix S1). We generated occurrence maps for each species and compared them with the distribution of European butterflies (Kudrna, 2019) with the goal to removed possible misplaced records. After filtering unique occurrences for each cell, we counted 27,123 records . We gathered 26,557 COI (standard barcode, 658 bp) sequences from 519 species occurring in the Western Palearctic . Among these, 23,563 COI sequences belong to the 269 species occurring in the study area.

Phylogeny and GMYC

We collapsed the COI dataset to unique haplotypes using the “haplotype” function of the R package “pegas” ( We used BEAST 1.8 (Drummond, Rambau, & Suchard, 2013) to reconstruct five ultrametric phylogenetic trees, one for each butterfly family (the single European Riodinidae was merged with Lycaenidae) (available in Dryad). The number of haplotypes was 6459 (3232 Nymphalidae, 644 Pieridae, 561 Hesperiidae, 247 Papilionidae and 1775 Lycaenidae-Riodinidae). Each dataset included one outgroup for each of the other families. Two independent chains of 100 million generations were run in BEAST for each dataset. The substitution model was set to GTR+I+G with six gamma rate categories. A coalescent tree prior was set. Divergence times were estimated by applying a strict clock and a normal prior distribution centered on the mean between two widely used substitution rates of 1.5% uncorrected pairwise distance per million years (Quek, Davies, Itino, & Pierce, 2004), and 2.3 % (Brower, 1994). Values were sampled every 10% of the run length and convergence was inspected in Tracer v.1.6 ( We applied the general mixed Yule-coalescent model (GMYC, Fujisawa & Barraclough, 2013) for each family tree to identify evolutionary significant units (ESUs) using the R package “splits” ( with default settings.

We identified entities as taxa recognised by the taxonomic list of Wiemers et al. (2018) or as haplotypes belonging to different GMYC ESUs (Figure 2b). According to the GMYC results each species identified by Wiemers et al. (2018) could be 1) “single entity species (SE)”: all haplotypes of a species belong to a single GMYC ESU, 2) “multiple entity species (ME)”: haplotypes belong to two or more ESUs, 3) “lumped entities (LE)”: two or more species are recovered as a single ESU, and 4) “lumped + multiple entities (LME): species are split in multiple ESUs and lumped with other species (Figure 2b).

For SE and LE all occurrences were attributed to the original species while for ME and LME, we attributed species occurrence to their most probable ESU by using “biodecrypt” (“recluster” R package, The function creates concave hulls based on the distribution of the sequences attributed to a given ESU and uses the relative hull geometries to attribute unknown occurrence data to a given species (Platania et al., 2020). The biodecrypt function also provides a measure for hull overlap as an evaluation of sympatry among cryptic entities.

We identified as endemics those entities for which all COI sequences occurred exclusively within the study area.

1) Which are the centres of endemism?

To locate the centres of endemism we ran regionalisation analyses for the occurrence data of endemics in 0.5x0.5 cells. We used the “recluster.region” function in the R package “recluster” ( specifically designed to retrieve biogeographic regions at the intracontinental scale. We obtained clustering solutions from 2 to 8 centres based on the Simpson turnover index, suited to identify regions based on vicariant patterns of distribution. The “recluster.region” function also calculates the silhouette width and the explained dissimilarity, evaluating how cells resemble those of their own centre (cohesion) compared to other centres (separation). Once the centres were obtained, we identified their exclusive endemics using the “indval” function in the “labdsv” R package (

2) Are endemics characterised by different ecological traits in different centres?

The traits of species which entities belong to, were compared between endemics from different centres and between endemics and non-endemics from the same centre. We used a series of 10 ecological traits for European Butterflies (Middleton-Welling et al, in press; Platania et al., 2020). These traits were used to describe both the alpha niche (i.e. functional traits describing the primary functions of invertebrates, and the beta niche (features related to distributional and environmental preferences) (Table 1). Butterfly traits are highly intercorrelated and are usually reduced to factors by Principal Component Analyses (PCA). We applied PCA to life history and distribution traits using the function rda of the R package “vegan” ( Those components showing eigenvalues higher than one were retained as variables.

To assess differences in traits we applied a phylogenetic ANOVA, using the “aov.phylo” function of the R package “geiger” ( As a reference phylogeny we used a time-calibrated phylogenetic tree for all 496 species of European butterflies, based on 14 mitochondrial and nuclear genes (Wiemers, Chazot, Wheat, Schweiger, & Wahlberg, 2020). We carried out pairwise comparisons through sequential Bonferroni corrections. We log transformed the number of host plants to improve its normality. To investigate if the traits show different variances among groups, we carried out tests of variance homogeneity (followed by pairwise comparisons with sequential Bonferroni correction) through the non-parametric Fligner-Killeen test, using the “check_homogeneity” function of the R package “performance” (

3) Do the endemics from EVOc(s) show lower variance in genetic divergence?

For each entity, we obtained genetic divergence from its closest entity in the five family trees using the function “distTips” of the R package “adephylo” ( We compared divergence between endemics from different centres and between endemics and non-endemics of the same region using ANOVAs and variance comparisons as described above. We also compared divergence among types of entity using the same method. We did not apply a correction for phylogenetic autocorrelation because genetic distances are exactly the variable compared here. We compared the incidence of the SE, ME, LE and LME endemics among centres through a Chi Square test.

Usage notes

All analyses can be replicated by launching the R script (script.R).