Radiation of tropical island bees and the role of phylogenetic niche conservatism as an important driver of biodiversity
Data files
Mar 31, 2020 version files 247.85 MB
-
Dryad_files.zip
247.85 MB
Abstract
Methods
Sample locations and collection methods. Collections throughout Fiji were made between 2010 and 2017 from multiple localities including the main islands of Viti Levu, Vanua Levu, Kadavu and Taveuni, as well as multiple small islands in the Lau group (SI Appendix, Fig. S5). Sampling of specimens at each location was not biased towards particular species because, for these very small bees, only H. achrostuscould be easily identified in the field due to its distinctive coloration; all other species required microscopy or DNA sequencing for species identification.
Samples were collected from 3 m to 1,324 m asl (highest elevation of Fiji) by sweep netting both native and introduced plants, and from nesting aggregations along roadsides. For each collection site, latitude, longitude and elevation were recorded using a Garmin 550 (Garmin Ltd., USA); latitude and longitude were then checked against satellite images (Google Earth) to confirm accuracy. Once collected, bees were immediately transferred into vials containing 98% ethanol. Vials were kept cool at ~5˚C and ethanol was replaced within a week of collection to reduce DNA degradation.
Maps of Fiji were produced in ArcMap (50)and a digital elevation model (DEM) of the archipelago was provided by Fiji Lands Information System (FLIS).
Sampling bias and elevational species richness. It was not possible to evenly sample bees across all geographical and elevational regions of Fiji because physical access to many regions was restricted by terrain and lack of roads. Access constraints could therefore affect sampling effort and this, in turn, could influence ability to recover true species richness in different elevational bands. Here, we quantize sampling effort as the number of DNA sequences obtained for different elevations, categorized into 200 m asl bands. Because specimens were only identified to species levels after DNA sequencing, the number of obtained sequences represents sampling effort. We examined whether this sampling effort may have influenced our estimates of species richness using multiple regression, where the number of detected species was the dependent variable and the number of sequences (sampling effort) and elevational band were the independent variables. The relative importance of sampling effort and elevation band for detected species richness can then be explored by regression bvalues and their statistical significance.
DNA extraction and sequencing. Tissue samples for DNA extraction were obtained by removing a single hind leg from each of the 764 specimens. For all samples obtained after 2014, DNA extraction and PCR amplification was completed at the South Australian Regional Facility for Molecular Ecology and Evolution (SARFMEE). DNA extraction and PCR amplification of COI prior to the 2014 samples was completed at the Canadian Centre for DNA Barcoding (CCDB) at the Biodiversity Institute of Ontario (31)and amplification used the universal primer pair LepF1 and LepR2 (31, 51). Extractions at SARFMEE followed protocols described by (52)with the subsequent DNA eluted into 75 µL of TLE buffer. PCR amplification of the 710 bp fragment of the DNA (COI) was completed using the primers LCO1490 (forward) and HCO2198 (reverse). The 25 µL PCR reactions comprised the following reagents: Sterile H2O (15.9 µL), MRT buffer (5 µL), 1 µL (5 µM) of LCO1490, 1 µL(5 µM) of HCO2198, Immolase Taq (0.1 µL) and DNA from specimen (2 µL). The thermocycling regime comprised of one cycle at 94˚C for 10 minutes, then five cycles at 94˚C for 60 seconds, 46˚C for 90 seconds, 72˚C for 75 seconds, followed by 35 cycles at 94˚C for 60 seconds, 51˚C for 90 seconds, 72˚C for 75 seconds, followed by 72˚C for 10 minutes and then 25˚C for 2 minutes.
Sequences were checked against the NCBI BLAST database to screen for non-target DNA. Forward and reverse sequences were aligned and chromatograms visually checked before creating final consensus sequences in Geneious version 10.2.2 (53). Initial alignments were trimmed to 630 bp to avoid any problems associated with missing data.
Phylogenetic, elevational and species analyses. The full COI alignment consisted of 630 bp for 764 specimens. Partition finder version 2 was employed using BIC and a greedy algorithm to find the best partition schemes and DNA substitution models from widely-used (i.e. MrBayes) models (54-56). The first and second codon positions were combined into a single partition with an HKY + I substitution model. A GTR substitution model was applied to third codon position. The BEAST file and parameters for phylogenetic analyses were set using BEAUti version 1.8.4 (57). Because of the small numbers of substitutions on each branch, a strict clock was used to avoid overparameterization. To infer changes in elevation across the tree we included elevation as a continuous trait using a strict or relaxed Brownian motion model (confirmed as adequate given our λ, κ, and δestimates;Table 1). Phylogenetic analyses were implemented in BEAST version 1.10(34)with 200 million iterations sampled every 50,000thiteration. Resulting log files were analyzed in Tracer version 1.6 (58)and a burnin of 2.5x107iterations was employed, which was always after stationarity had been achieved. Maximum clade credibility trees and posterior probability support values were obtained using TreeAnnotator Version 1.8.4 (57). Each run was performed four times for each analysis to ensure consistent results and stationarity. Post-burnin log and tree files for each run were then combined using LogCombiner version 2.5.2 (59)for the final analysis.
To infer the evolutionary mode and phylogenetic signal in the elevation data, we used BayesTraits version 3.0 (29). The tree-transformation models employed in BayesTraits assume that each terminal taxon is a species, hence we repeated the BEAST analysis using only one DNA sample from each species, and elevation data as either the median or minimum for all samples of that species. The (reduced) BEAST analysis used 100 million iterations, sampling every 50,000thiteration; stationarity and burnin was checked as above. The resulting consensus tree was run in BayesTraits using the median and the minimum elevational value for each terminal taxon to estimate λ (degree of phylogenetic signal), κ (degree of punctuated evolution), and δ (degree of early burst, adaptive radiation). The model of best fit for each estimate was chosen using Akaike’s Information Criterion with 100 bootstrap replicates in Tracer (60). Analyses in BayesTraits used 500 million iterations sampled every 50,000thiteration. Each run was performed four times for each model at each elevation to ensure consistent results. BayesTraits log files for each run were then combined using LogCombiner version 2.5.2 (59)for the final analysis.
We attempted to co-estimate phylogeny and elevational niche evolution, but these analyses repeatedly failed to converge. Thus, to infer elevational changes across the full phylogeny, we mapped elevation across all post-burnin trees sampled in the full COI analysis. This was done using BEAST, under a standard rate-constant Brownian motion model, as well as a rate-variable Brownian motion model, which assumes rates vary across branches according to an uncorrelated relaxed clock (35). Stationarity and burnin were confirmed as above. Both models gave very similar ancestral state reconstructions, but the latter model fitted better and is shown in Fig. 1.
Genetic analyses of bee clades were explored using Arlequin version 3.11 (61). For each species with multiple haplotypes and a sample size of more than 10 specimens we calculated haploytype diversity (h) and pairwise FSTvalues.
SNP quality filtering and analyses. The thorax and front legs were taken from 19 individuals from H. fijiensis, H. tuiwawae, H. ostridorsum, H. groomi and H.sp.S, respectively. To perform Restriction-site Associated DNA sequencing (RAD seq), the solid state method Diversity Arrays Technology (DArT) was used (62). The restriction enzymes used werea combination of PstI and HpaIIenzymes. Only female specimens were used to avoid the impact of male haploidy on SNP diversity. Post filtering, missing data was capped at 1.16%.
A total of 62,426 SNP loci were called across all species. Using the R package ‘DArTR’ version 1.0.5 low quality loci were removed at a threshold of 0.85% removing loci with 15% or more missing values (63), leading to retention of 8,381 SNP loci.The neighbour joining tree (SI Appendix, Fig. S4)was made using the R package ‘ape’ with the ‘nj’ function (64).
Once SNP data were filtered they were subjected to a discriminant analysis of principal components (DAPC) using the DAPC procedure (65)in the Adegenet package in R (66). The DAPC was used to identify the number of genetic clusters within the SNP data and the relationship between these clusters. DAPC uses synthetic variables constructed as linear combinations from the original alleles, showing the largest between group variations and lowest within group variation. Discriminant analysis also provides membership probabilities of each individual to the different clusters. Our DAPC followed protocols outlined by Jombart (65).
Usage notes
Files are uploaded in a zip folder. Within this zip, files and folders are labelled according to analyses. These materials are mostly run files.