Skip to main content

Diversity and phylogenetic community structure across elevation during climate change in a family of hyperdiverse neotropical beetles (Staphylinidae)

Cite this dataset

Smith, M. Alex (2021). Diversity and phylogenetic community structure across elevation during climate change in a family of hyperdiverse neotropical beetles (Staphylinidae) [Dataset]. Dryad.


Environmental stress from abiotic conditions imposes physiological limits on individuals within communities, and these stressful conditions can act as a filter on the species present in any given environment. Such abiotic stressors can reduce a community’s diversity and make its composition more phylogenetically clustered. Using a decade of staphylinid beetle (Staphylinidae, Coleoptera, rove beetles) collections made across a 1,500 m elevation gradient in northwestern Costa Rica (2008-2017) we asked what species lived there, how large and overlapping were the communities across this gradient, and what relationship was there between elevation and diversity. Using DNA barcodes for identification and phylogenetic estimates of community structure, we found high turnover across elevation, and that staphylinid diversity increased linearly with elevation. Because of this, we found staphylinid diversity was negatively related to surface area and temperature, and positively with precipitation. We suggest that historical biogeography and contemporary environmental stress have combined to produce these observed patterns. The forests in which these beetles are found are heating and drying rapidly and our finding that diversity increases with elevation suggests that there will be catastrophic biodiversity loss in the coming decades.


The Área de Conservaciόn Guanacaste (ACG) is a 165,000 hectare UNESCO Natural World Heritage Site in northwestern Costa Rica that includes three stratovolcanoes ( As elevation increases, precipitation increases and temperature decreases. For example, the lapse rate of maximum temperature across this gradient is approximately 1 °C for every 100 m along Volcan Cacao (Smith et al., in prep). Due to this, there are three distinct forest types across the gradient (Hulshof and Powers 2020, Janzen and Hallwachs 2020, Smith, Hallwachs and Janzen 2014). Low elevations are hot and dry (dry forests), making them physically stressful for organisms needing moisture. High elevations (tropical montane cloud forests) are cold and wet making them physically stressful for ectotherms. Mid-elevations (rain forests) are a mixture of these two environments being warm and moist during much of the year. Beetles were derived from collections made over a decade of sampling in the ACG between 2008 and 2017 along elevational transects established from sea level to the summit of the volcano Cacao (Smith, Hallwachs and Janzen 2014) (Fig 1). Sampling was standardized so that each elevation was sampled for the same amount of time using the same combination of sampling methods. We focused principally on leaf-litter arthropod fauna by using pitfall traps, Davis-sifting of the leaf litter, mini-Winkler sifting of the leaf-litter, bait (tuna and cookies), active searching and Malaise traps.  For full details regarding sampling please see Smith, Hallwachs and Janzen (2014), and in particular Appendix ECOG‐00631 at All samples were preserved in 95% ethanol upon collection and later preserved at -20 °C.

Abiotic Factors

To calculate the surface area of the elevational band associated with each of the eight collection sites, we used topographic data for Costa Rica from the Earth Observing System Data and Information System (EOSDIS; and downloaded the Digital Elevation Model (DEM) into ArcGIS ( Shape files of all the individual terrestrial protected areas in the ACG are available from and were used here. These projections were then defined and matched to the EOSDIS land data using the spatial reference CR LAMBERT NORTE. We categorized the topographic data into 100 m elevation bands, starting at 0 m ± 50 m to 1800 m ± 50 m above sea level. We used categories starting at -50 m to surround the elevation sites that are typically on the 100 m (Fig 1; i.e. the 600 m site was represented from 550 to 650 m).

Temperature has been recorded at each site since 2013 (every 15 minutes) using Hobo RG3M and Pendant data loggers. From these, we used daily maximum temperatures (Smith et al., in prep). We extracted mean annual precipitation data from each of the 8 elevation sites using Worldclim - Global Climate data from 1960-1990 (; (Hijmans, et al. 2005)).

Staphylinidae Sampling

For all staphylinid collections made between 2008 and 2017 in the ACG, we calculated abundance for each of the 8 elevation sites. Of the staphylinids sampled, initial identifications to subfamily were made using a key to staphylinid genera in Mexico (Navarrete-Heredia, et al. 2002), and a key to subfamilies (Brunke, et al. 2011). Subsequent identifications to a finer taxonomic scale, where possible, where made by one of us (AJB). We included sub-families such as Scydmaeninae in our analyses due to recent changes in staphylinid systematics (McKenna, et al. 2015).

Tissue Sampling, DNA Sequencing, and Amplification

All collected staphylinids were tissue sampled (point-side leg(s)) and then point mounted for preservation. Three high-resolution focus-stacked photographs were taken of each specimen under a Leica Z16 AP0A microscope using Leica Application Software V4.3 at three orientations (dorsal, lateral, and anterior (head)).

Total genomic DNA was extracted from 2-6 legs depending on beetle size. Mitochondrial DNA from the 5’ region of the cytochrome c oxidase (COI) gene (the animal DNA barcode locus) was amplified and sequenced using standard methods (Ivanova, et al. 2006, Janzen, et al. 2009). Where sequencing failed or if the amplicon was low-quality, samples were re-amplified using primers that amplified a smaller portion of the same locus (i.e. 400 bp) (as in (Smith and Fisher 2009)). Successful amplifications of this smaller region were then sequenced at the University of Guelph Genomics Facility. We trimmed primers, removed or resolved ambiguities using Sequencher 5.4.1 (Sequencher® version 5.4.6 DNA sequence analysis software, Gene Codes Corporation, USA), aligned all edited sequences using MUSCLE (Edgar 2004) and examined sequences alignment and base calls using MEGA6 (Tamura, et al. 2013), and BioEdit (Hall 1999). All DNA sequences, trace files, images and sample metadata were uploaded to the Barcode of Life Data System (BOLD;; (Ratnasingham and Hebert 2013)) and can be accessed on BOLD (

Alpha Diversity

Due to the multiple taxonomic impediments affecting neotropical staphylinids (great diversity, largely undescribed, and a paucity of taxonomists and resources supporting work on the group (Smith 2012)), we used Barcode Index Numbers (BINs) to quantify taxon richness. A BIN is a type of molecular operational taxonomic unit (MOTU - Blaxter, et al. 2005) based on the RESL algorithm calculated upon COI sequences within BOLD (Ratnasingham and Hebert 2013). For a sequence in BOLD to receive a BIN designation, it must meet certain sequence quality criteria, including length. In cases where a staphylinid sequence did not meet the length criteria we used the RESL algorithm directly within BOLD to calculate BIN-equivalent MOTUs (Cluster Sequences). We used these unique alphanumeric BIN names as a species proxy, and for simplicity, use them synonymously as species names from now on. We calculated phylogenetic diversity by constructing a maximum likelihood tree in MEGA5 using a single representative high-quality sequence (longest read length and lowest number of ambiguities) for each species (Tamura, Stecher, Peterson, Filipski and Kumar 2013). We calculated the best substitution model to describe nucleotide substitution pattern in MEGA6 (Nei and Kumar 2000, Tamura, Stecher, Peterson, Filipski and Kumar 2013) and used this to create a ML tree (Goldman 1990). Subsequent calculations of the summation of branch lengths within a community (a measure of phylogenetic diversity) were made using the ‘picante’ package (Kembel, et al. 2010) in R v 3.6.1 (R: A language and environment for statistical computing  2013).

We used rarefaction and non-parametric estimators to measure sampling intensity to predict species diversity and evaluate sampling variation amongst sites. We used observed species estimators run 1000 times to extrapolate our hypothetical extensions of sampling of individuals using the ‘iNEXT’ package (Hsieh, et al. v 2.0.19).

To explore whether there were differing relationships with diversity amongst the staphylinids, we examined the relationship of diversity and elevation for the six most abundant staphylinid subfamilies.

Beta Diversity

We used a pairwise Jaccard distance (Jaccard Index (Jaccard 1901)) to determine the percent similarity of each elevation site by examining the number of shared species between each site. We additionally used a Mantel test (Mantel 1967) in R using the package ‘ade4’ version 1.7-15 (Dray and Dufour 2007) with 1000 replications to determine if distances between elevation sites were related to Jaccard distance.

We further examined the nature of the betadiversity patterns by testing whether the species shared between sites is a result of nestedness (species from one community are nested within other communities) or turnover (distinct communities across the gradient with limited species shared between sites) using the package ‘betapart’ version 1.5.2  (Baselga 2010, Baselga and Orme 2012). ade4’

Phylogenetic Community Structure

To examine phylogenetic community structure, we made an incidence matrix of species and sites, and used the maximum likelihood phylogeny described above. The community data matrix was first randomized to determine a random phylogenetic distance (PD) in order to compare the calculated observed values to the randomly generated ones. We randomized the matrix using the ‘taxalabels’ null model to maintain species richness and frequency within a sample site (Gotelli and Graves 1996). We calculated taxon richness and the mean nearest taxon distance (MNTD) (distance observed) in the ‘picante’ package (Kembel, Cowan, Helmus, Cornwell, Morlon, Ackerly, Blomberg and Webb 2010) in R.

Comparing diversity and abundance to abiotic explanatory variables

To test the relationship between staphylinid abundance, species richness, and phylogenetic diversity against independent variables of area, elevation, precipitation, and temperature we performed general linear regressions in R. We also used a general linear regression of the log of both species richness and surface. We further examined the residuals of the relationship between PD and species richness to determine if there were elevational sites where PD was higher or lower than was predicted by species richness.

We additionally used a multiple linear regression to test the relationship between staphylinid species richness against independent variables of area, elevation, precipitation, and temperature. Due to the collinearity between these variables as is expected along a gradient, we used a stepwise regression and our final model contained the independent variables of log(area), precipitation, and temperature. Abundance was included in the model in a separate analysis. The same model was used to test the relationship to nearest taxon index (NTI). All analyses were performed in R using the package ‘MASS’ version 7.3-53 for the stepwise regression. To determine if there was a mid-elevation peak (or trough) of species richness or NTI, we tested the fit of our data to a second order quadratic function (based on Akaike Information Criterion - AIC (Akaike 1973)).

Usage notes

Dolson et al ECOG-05427 community matrix.csv: abundance matrix of staphylinid species across elevational gradient of Volcan Cacao in ACG, Costa Rica.

Dolson et al ECOG-05427 ML tree.nwk: ML tree with branch lengths for each species in community matrix.  

Dolson et al ECOG-05427 Sample Information.xlsx: sample information and DNA sequences associated with each staphylinid sample successfully sequenced. 

Dolson et al Ecography in press (