Skip to main content

Raw allelic matrix and supplementary materials: Origin and dispersion pathways of guava in the Galapagos Islands inferred through genetics and historical records

Cite this dataset

Urquia, Diego et al. (2022). Raw allelic matrix and supplementary materials: Origin and dispersion pathways of guava in the Galapagos Islands inferred through genetics and historical records [Dataset]. Dryad.


Guava (Psidium guajava) is an aggressive invasive plant in the Galapagos Islands. Determining its provenance and genetic diversity could explain its adaptability and spread, and how this relates to past human activities. With this purpose, we analyzed 11 SSR markers in guava individuals from Isabela, Santa Cruz, San Cristobal and Floreana islands in the Galapagos, as well as from mainland Ecuador. The mainland guava population appeared genetically differentiated from the Galapagos populations, with higher genetic diversity levels found in the former. We consistently found that the Central Highlands region of mainland Ecuador is one of the most likely origins of the Galapagos populations. Moreover, the guavas from Isabela and Floreana show a potential genetic input from southern mainland Ecuador, while the population from San Cristobal would be linked to the coastal mainland regions. Interestingly, the proposed origins for the Galapagos guava coincide with the first human settlings of the archipelago. Through Approximate Bayesian Computation, we propose a model where San Cristobal was the first island to be colonized by guava from the mainland, then it would have spread to Floreana and finally to Santa Cruz; Isabela would have been seeded from Floreana. An independent trajectory could also have contributed in the invasion of Floreana and Isabela. The pathway shown in our model agrees with the human colonization history of the different islands in the Galapagos. Our model, in conjunction with the clustering patterns of the individuals (based on genetic distances), suggests that guava introduction history in the Galapagos archipelago was driven by either a single event or a series of introduction events in rapid succession. We thus show that genetic analyses supported by historical sources can be used to track the arrival and spread of invasive species in novel habitats and the potential role of human activities in such processes.


376 individuals of Psidium guajava were sampled for this study; 96 from mainland Ecuador and 280 from the Galapagos Islands (Fig. 1). Samples were grouped into five populations: Mainland (96 individuals), Santa Cruz (SCZ, 80 individuals), Isabela (ISA, 95 individuals), San Cristobal (SCY, 94 individuals) and Floreana (FLO, 11 individuals). Due to the significantly larger surface area of the mainland area of study, its samples were further grouped into nine regions based on the three geographic regions of continental Ecuador (Coast, Highlands and Amazon), as well as a latitudinal division where the extension of the country was measured from the northernmost tip to the southernmost one (720km), and divided in three latitudinal regions: North, Center, South, each one with a vertical extension of 240km (Fig. 1). Therefore, the nine mainland regions obtained were: North Coast (NC, 8 individuals), North Highlands (NH, 13 individuals), North Amazon (NA, 12 individuals), Central Coast (CC, 11 individuals), Central Highlands (CH, 11 individuals), Central Amazon (CA, 8 individuals), South Coast (SC, 10 individuals), South Highlands (SH, 13 individuals), and South Amazon (SA, 10 individuals).  Collection sites were georeferenced using a Garmin E-Trex Legend HCx GPS system (Garmin International Inc., USA). Sampled individuals were separated by a minimum distance of 100m from one another in order to minimize pseudo sampling (Urquía et al., 2019). After confirming the taxonomic identity of samples, two to five young leaves were collected from each individual and either transported to the Molecular Biology and Microbiology Laboratory at the Galapagos Science Center in San Cristobal or to the Plant Biotechnology Laboratory at the Universidad San Francisco de Quito campus in Quito, Ecuador, where they were stored at -20°C. 

The CTAB protocol (Saghai-Maroof et al., 1984) was used to isolate total genomic DNA from each individual, after which the concentration and quality of the DNA was measured using a Nanodrop 1000 Spectrophotometer (Thermo Scientific, USA).

PCR reactions of 11 SSR regions were performed for all samples using species-specific primers developed by Risterucci et al. in 2005. For amplification, a third fluorescently marked universal primer was incorporated, as described by Blacket et al. (2012). Annealing temperatures for each primer pair were optimized, with 55 ºC being the ideal temperature for all loci, the only exception was 54 ºC for the mPgCIR25 locus. The program used was 15 min at 95ºC; 30 to 40 cycles of 30 sec at 94ºC, 90 sec at optimized annealing temperature, 60 sec at 72ºC, and a final step of 5 min at 72ºC. PCR products were labeled with one of four fluorescent dyes: 6-FAM, VIC, PIC or NED and genotyped by Macrogen (Seoul, Korea) using an ABI 3130 Genetic Analyzer (ThermoFisher Scientific, USA). Results were analyzed with the GeneMarker software v 2.4.0 (Softgenetics, State College, PA, USA).

Usage notes

The allelic matrix uploaded shows the genotypes of the 376 guava individuals studied in our research. Genotypes are shown for the 11 analyzed SSR markers. Alleles are expressed in SSR size (bp), including the 15 bp. from the universal primer added for fluorophore genotyping. The allelic matrix format is suited for analyses in the adegenet package implemented for R; a slash (/) separates the two different alleles in each diploid individual.