Skip to main content

Run and output files from: Holocene population expansion of a tropical bee coincides with early human colonisation of Fiji rather than climate change

Cite this dataset

Dorey, James B. et al. (2021). Run and output files from: Holocene population expansion of a tropical bee coincides with early human colonisation of Fiji rather than climate change [Dataset]. Dryad.


There is substantial debate about the relative roles of climate change and human activities on biodiversity and species demographies over the Holocene. In some cases, these two factors can be resolved using fossil data, but for many taxa such data are not available. Inferring historical demographies of taxa has become common, but the methodologies are mostly recent and their shortcomings often unexplored. The bee genus Homalictus is developing into a tractable model system for understanding how native bee populations in tropical islands have responded to past climate change. We greatly expand on previous studies using sequences of the mitochondrial gene COI from 474 specimens and between 171 and 3,928 autosomal (DArTSeq) SNP loci from 19 specimens of the native Fijian bee, Homalictus fijiensis (Perkins & Cheesman, 1928), to explore its historical demography using coalescent and mismatch analyses. We ask whether past changes in demography were human- or climate-driven, while considering analytical assumptions. We show that inferred changes in population sizes are too recent to be explained by past climate change. Instead we find that a dramatic increase in population size for the main island of Viti Levu coincides with increasing occupation by humans and their modification of the environment. We found no corresponding change in bee population size for another major island, Kadavu, where human populations and agricultural activities have been historically very low. Our analyses indicate that molecular approaches can be used to disentangle the impacts of humans and climate change on a major tropical pollinator and that stringent analytical approaches are required for reliable interpretation of results. 


Sampling sites and methods

Collections were made throughout Fiji between 2010 and 2017 from multiple localities but with the greatest number of samples from the largest island of Viti Levu (n = 309) and then the island of Kadavu (n = 109) (Table S1). Samples were collected from 3 to 1,328 m asl by sweep netting both native and introduced plants, and from nesting aggregations along roadsides. For each collection site, latitude, longitude and elevation were recorded using GPS devices (primarily using a Garmin 550). Once collected, bees were immediately transferred into individual vials of ³98% ethanol. Vials were kept at ~5˚C and ethanol was replaced within a week of collection to lessen DNA degradation.


Geographic information systems

In order to explore whether patterns in historical demography were related to subaerial land mass over time, we used bathymetric maps to examine how subaerial landmass and connectivity have changed since the last glacial maximum. Bathymetric data were downloaded in R version 3.6.2 using the package marmap version 1.0.4 (Pante & Simon-Bouhet, 2013). The marmap package was also used to produce maps and calculate subaerial landmasses presently and at the last glacial maximum. 


COI data generation

We subjected a subset of COI data from a previous study (Dorey et al., 2020b) to different analyses to answer novel hypotheses about past population demography of a single species, rather than relationships between many species. Tissue samples for DNA extraction were obtained by removing a single hind leg from each specimen. Samples prior to 2015 were sequenced at the Canadian Centre for DNA Barcoding (CCDB) at the Biodiversity Institute of Ontario (Groom et al., 2013). For these samples DNA amplification used the universal primer pair LepF1 and LepR2 (Groom et al., 2013; Hebert, Penton, Burns, Janzen, & Hallwachs, 2004). For all other samples DNA extractions and PCR amplifications were completed at the South Australian Regional Facility for Molecular Ecology and Evolution (SARFMEE) and DNA sequencing and purification carried out at Macrogen Inc. (Korea). DNA extractions at SARFMEE were performed using a Gentra Puregene® DNA Purification kit (Gentra Systems Inc.) according to the manufacturer’s protocol. PCRs amplified a 710 bp fragment of the mtDNA (COI) gene using the primers LCO1490 (forward) and HCO2198 (reverse). The 25 µL PCR reactions comprised the following reagents: Sterile H2O (15.9 µL), MRT buffer (5 µL), 1 µL (5 µM) of LCO1490, 1 µL (5 µM) of HCO2198, Immolase Taq (0.1 µL) and mtDNA from specimen (2 µL). The thermocycling regime comprised of one cycle at 94˚C for 10 minutes, then five cycles at 94˚C for 60 seconds, 46˚C for 90 seconds, 72˚C for 75 seconds, followed by 35 cycles at 94˚C for 60 seconds, 51˚C for 90 seconds, 72˚C for 75 seconds, followed by 72˚C for 10 minutes and then 25˚C for 2 minutes. 


Sequences were checked against the NCBI database using BLAST (blastn and blastx) to screen for non-HomalictusDNA. Forward and reverse sequences of each H. fijiensis specimen were aligned and checked for stop codons and/or nucleotide assignment errors using chromatograms examined with Geneious version 10.2.2 (Kearse et al., 2012). Any sequences with one or more base pairs that could not be reliably determined were excluded from the dataset. The H. fijiensis alignment was trimmed to 630 bp to remove primers and avoid spurious results that could arise from missing data in mismatch and Bayesian skyline coalescence analyses (Grant, 2015; Joly, Stevens, & Jansen van Vuuren, 2007).A total of 474 H. fijiensis sequences were analysed from across the entire Fijian archipelago including 309 sequences from the largest island of Viti Levu.


SNP data generation

We subjected the raw SNP data from a previous study (Dorey et al., 2020b) to more rigorous filtering and analyses that resulted in substantially changed subsets of the initial dataset that are more relevant to the present questions. The thorax and front legs were taken from 19 Viti Levu females from each of five species: H. fijiensisH. tuiwawae, H. ostridorsum, H. groomi and H. sp. S (Dorey et al., 2020a; Dorey et al., 2019). We used the solid-state method Diversity Arrays Technology in Canberra, Australia (DArTseq™) (Jaccoud, Peng, Feinstein, & Kilian, 2001) to perform restriction site-associated DNA sequencing. DArTseq combines complexity reduction with a next generation sequencing platform in a conceptually-similar method to double digest RADseq (Shams et al., 2019). The restriction enzymes PstI and Hpall were used.


A total of 62,426 SNP loci were called across all five sequenced species. We used the R package DArTR version 1.3.4 (Gruber, Unmack, Berry, & Georges, 2018) to filter our data. The original SNP dataset was filtered to only include H. fijiensis. Monomorphic (non-variable) sites were then removed, leaving 7,719 loci. We then filtered these data to remove all missing data (4,046 loci remaining), for repeatability (percentage of scores that are repeated in the technical replicate dataset; 3,928 loci remaining) and to remove secondaries (multiple linked SNPs per fragment; 3,768 loci remaining). Genome-wide SNPs can suffer from large numbers of linked loci and this linkage can break assumptions of independence for many analyses, and bias results (O'Leary, Puritz, Willis, Hollenbeck, & Portnoy, 2018). Hence, we analysed a wide variety of linkage disequilibrium (LD) filtering criteria. We filtered for linkage, removing loci with r2values below 0.9, 0.7 and 0.2, retaining 1,811, 1,646 and 171 loci, respectively. Linked loci were removed sequentially in order of most- to least-linked connections to retain loci that might otherwise be removed (Script S1). We used the latter five filtering levels in analyses.


Haplotype analyses of COI data

A minimum-spanning network (Bandelt, Forster, & Röhl, 1999) of our complete COI dataset was created using PopART version 1.7 (Leigh & Bryant, 2015). Geneious version 10.2.2 (Kearse et al., 2012) was used to examine unique haplotypes and amino acid sequences.


Mismatch analyses of COI data

Mismatch analyses, and extended Bayesian skyline plots (EBSPs), make several assumptions about the data provided, including that: (i) a random sample is drawn from the population, (ii) the population is panmictic, and (iii) largely neutral markers were used (Grant, 2015). To examine panmixia, we used Arlequin version 3.11 (Excoffier, Laval, & Schneider, 2005) to examine pairwise FST values of COI haplotypes between all islands and combined island datasets that were not significantly different (p > 0.05). We then carried out mismatch analyses to explore whether past demographic changes could be explained by population expansion towards the present, graphing observed pairwise nucleotide differences with those expected under a recent population expansion, with 2,000 simulations used to generate an expected distribution of nucleotide differences. A unimodal distribution in a mismatch graph can be consistent with a sudden population expansion, whilst multimodal distributions can suggest past population bottlenecks or demographic structure (Rogers & Harpending, 1992).


Extended Bayesian skyline plots (EBSP) of COI data

We employed PartitionFinder version 2.1 (Lanfear, Frandsen, Wright, Senfeld, & Calcott, 2016) to determine the most appropriate model of molecular evolution for all COI datasets. Because the existence of population structure violates assumptions of panmixia (Heller, Chikhi, & Siegismund, 2013) our all-islands dataset was not used in our final demographic Bayesian analyses. For the island groups with sample sizes >50 (see Appendix) — Viti Levu and Kadavu — we used extended Bayesian skyline plots of COI data sets in BEAST version 2.6.3 (Bouckaert et al., 2019a; Drummond, Suchard, Xie, & Rambaut, 2012) to infer changes in historical demography. We restricted demographic EBSP analyses to the third codon position, where most synonymous mutations occur (Ermolaeva, 2001). We applied a strict molecular clock and the best-fit PartitionFinder 2 model, an HKY and an HKY+Γ substitution model for the Viti Levu and Kadavu populations, respectively. Four independent runs for each analysis were performed, to confirm stationarity. For the Viti Levu population, each run consisted of 4 chains with heating, carried out for 300 million iterations, resampling every 30,000th iteration using the BEAST package CoupledMCMC version 1.0.2 (Altekar, Dwarkadas, Huelsenbeck, & Ronquist, 2004). Multiple chains were required to properly sample across multiple possible optima in phylospace. For the Kadavu population, we used single chains with 500 million iterations sampled every 100,000th iteration. The log files from each run were examined in Tracer version 1.7.1 (Rambaut, Drummond, Xie, Baele, & Suchard, 2018) and a burnin of 10% was used, which was always after stationarity had been achieved (effective sample sizes all exceeding 200). Log and tree files were combined using LogCombiner version 2.5.0 (Bouckaert et al., 2019a). The EBSP log files were analysed with the plotEBSP script in R version 3.5 (Bouckaert et al., 2019a).


The estimated mutation rate of 1.09x10-7 per site per generation, based on only the 3rd codon position from the whole mitogenome of Caenorhabditis elegans (Denver, Morris, Lynch, Vassilieva, & Thomas, 2000) was applied to all of our EBSP plots to infer an approximate time scale in generation units. This directly-estimated mutation rate is appropriate for inferring recent demographic changes and is broadly consistent with other empirical values (Gratton, Konopiński, & Sbordoni, 2008; Papadopoulou, Anastasiou, & Vogler, 2010). The AT bias of C. elegans (70.3% from 21 C. elegansCOI sequences; BOLD (2020)) is similar to that of our H. fijiensis COI fragment (74% from 474 H. fijiensis COI sequences). We converted the EBSP time scale from generations to chronological time by assuming four Homalictus generations per year, following Groom et al. (2014). However, we also explored the effects of assuming three or five generations per year, which is analytically equivalent to assuming a faster or slower per-generation mutation rate. 


For our reconstruction of island ancestral states, we analysed all three codon positions using within-island unique COI sequences and the BEAST 2 package CoupledMCMC with eight chains and an EBSP tree prior. Four independent runs were performed, to confirm stationarity. We used two outgroup species — H. groomi and H. sp. O. The molecular data were allocated to a single partition to which we applied a single HKY+I substitution model and an uncorrelated relaxed clock model. "Island" was included in the analysis as a discrete trait given a symmetric change model and a strict clock (more complex models prevented this partition from converging). Each run was 100 million iterations, resampling every 20,000th iteration. Log files were examined in Tracer and a burnin of 10% was used. The maximum clade credibility tree as well as posterior support values were produced in TreeAnnotator version 2.6.3 (Bouckaert et al., 2019b) using median node heights. The tree was visualised in FigTree version 1.4.4 (Drummond, 2016).


Extended Bayesian skyline plots and Ne using SNP data

We employed PartitionFinder 2 to determine the most appropriate model of molecular evolution for each SNP dataset corresponding to the different levels of linkage filtering. For our SNP datasets, we ran EBSP analyses in BEAST using the following models: for our LDR2=0.2 we used a K80 model, for LDR2=0.7 and LDR2=0.9 we used HKY+Γ and for the remaining datasets (without and with secondaries — multiple SNPs on a single fragment — included) we used a GTR+Γ model. We used a relaxed log normal clock for all SNP EBSP analyses (Lemey, Rambaut, Welch, & Suchard, 2010). All runs except for that with secondaries were executed for 100 million iterations, sampling every 50,000thiteration, and were repeated four times to confirm convergence in EBSP results. For the run that kept secondaries, four heated-chain runs were carried out for 100 million iterations, resampling every 10,000th iteration using the BEASTpackage CoupledMCMC. The log files from each run were examined in Tracer and then combined using LogCombinerwith a burnin of 20%, which was always after stationarity had been achieved (effective sample sizes all exceeding 200). The EBSP log files were analysed using plotEBSP (Bouckaert et al., 2019a). 


Supplementary methods summary

We examined haplotype sample sizes required for robust demographic inference, using rarefaction analysis in EstimateS version 9.1.0 (Colwell, 2013) (see Appendix). We also undertook nested sampling (NS package in BEAST 2) and used DIYABC-RF version 1.0.12 in R (Collin et al., Submitted; Durif & Collin, 2021) to explicitly compare alternative possible demographic patterns (see Appendix). 

Usage notes

See the "DryadFile_info.docx" file, the published article, or contact the authors for clarification.


Australia and Pacific Science Foundation

Playford Trust

Flinders University

Department of Foreign Affairs and Trade