Skip to main content

Genomic and phenomic analysis of island ant community assembly

Cite this dataset

Darwell, Clive et al. (2019). Genomic and phenomic analysis of island ant community assembly [Dataset]. Dryad.


Island biodiversity has long fascinated biologists as it typically presents tractable systems for unpicking the eco-evolutionary processes driving community assembly. In general, two recurring themes are of central theoretical interest. First, immigration, diversification, and extinction typically depend on island geographical properties (e.g. area, isolation, and age). Second, predictable ecological and evolutionary trajectories readily occur after colonization, such as the evolution of adaptive trait syndromes, trends toward specialization, adaptive radiation, and eventual ecological decline. Hypotheses such as the taxon cycle draw on several of these themes to posit particular constraints on colonization and subsequent eco-evolutionary dynamics. However, it has been challenging to examine these integrated dynamics with traditional methods. Here we combine phylogenomics, population genomics and phenomics, to unravel community assembly dynamics among Pheidole (Hymenoptera, Formicidae) ants in the isolated Fijian archipelago. We uphold basic island biogeographic predictions that isolated islands accumulate diversity primarily through in situ evolution rather than dispersal, and population genomic support for taxon cycle predictions that endemic species have decreased dispersal ability and demography relative to regionally widespread taxa. However, rather than trending toward island syndromes, ecomorphological diversification in Fiji was intense, filling much of the genus-level global morphospace. Furthermore, while most endemic species exhibit demographic decline and reduced dispersal, we show that the archipelago is not an evolutionary dead-end. Rather, several endemic species show signatures of population and range expansion, including a successful colonization to the Cook islands. These results shed light on the processes shaping island biotas and refine our understanding of island biogeographic theory.


We provide filtered VCF files for restriction site associated DNA (RAD) markers for 942 Pheidole individuals mostly from the Fijian archipelago in order to facilitate island biogeographical investigation of community assembly dynamics. Also included are phylogenomic trees from RAD data. Additionally, we provide GPS coordinates and Python scripts in order to calculate the relations ships between population structure and geographic connectivity between islands according to current bathymetric data and past sea level change data.

Usage notes

There are three folders containing different groups of data files and/or R/Python scripts. These are called:

1. bathymetry

2. phylogenomics

3. VCFs

1. bathymetry folder

This folder contains a work-flow for taking pairwise FST estimates to conduct mantel tests against historic sea level connectivity estimates. The primary work-flow is outlined in the bathymetry.R script. This R script contains commented notes indicating when certain Python scripts (also included in the folder) are to be employed (apologies, some of the Python scripts are very old and look it, but they work). The folder contains all the files required to conduct Circuitscape analysis on the FST estimates derived for P. roosevelti (EGPA number: EGPA0048). It also contains all files that will be generated by the scripts and indeed by Circuitscape. takes the seaLevels.csv file (from Haq et al 1987) to create a formatted bathymetry raster file as described in the Methods of the paper.

The file pops.csv (used in contains both the population code and a locality code that matches up with the global_locality_full.csv file to give each individual a GPS coordinate. uses the EGPA0048.csv SNP file to obtain taxa (but this step can be organized by the user) generates a formatted file of FST estimates from the EGPA0048_WC.csv file (generated from popgen analysis)

The R scripts use global_locality_full.csv and pop.csv to generate a main file (here roos.txt) of the relevant data. generates several files for geographic distance analysis (particularly make a polygon delineating the geographic range of each population). If there are not enough points to generate the polygon it will randomly make a few more within the range of the most extreme sample points.

R scripts now write raster files using the outputs. creates a dummy mask file

R scripts to format raster files and load data

R scripts to load and format outputs then conduct mantel tests

2. phylogenomics folder

In this folder are placed four phylogenetic trees derived from our RADseq data. These are: (i) the main Exabayes Pheidole phylogeny from Figure 2 (pheiFJ_exa.tre); (ii) the RAxML phylogeny from the same sequence files as the main Pheidole phylogeny (pheiFJ_raxml.tre); (iii) the Exabayes tree showing four distinct P. knowlesi lineages (with three obvious species clades plus one singleton clade) (knowlesi_3spp.exabayes.tre); and (iv) the Exabayes tree showing two distinct P. vatu lineages (vatu_2spp.exabayes.tre).

3. VCFs folder

This folder contains VCF files from all species, generated from raw fastq files using parameter settings in Stacks and vcftools described in the paper’s Methods section. From these, population genomic analyses can be conducted. The sequenceMetaData.csv file contains both population designations and GPS coordinates for all sampled individuals plus the global database CASENT number if available. The egpa number represents a code that was given to each group of samples (i.e. species) for conducting analyses.