Skip to main content

Pieces in a global puzzle: Population genetics at two whale shark aggregations in the western Indian Ocean

Cite this dataset

Hardenstine, Royale et al. (2023). Pieces in a global puzzle: Population genetics at two whale shark aggregations in the western Indian Ocean [Dataset]. Dryad.


The whale shark Rhincodon typus is found throughout the world’s tropical and warm-temperate ocean basins. Despite their broad physical distribution, research on the species has been concentrated at a few aggregation sites. Comparing DNA sequences from sharks at different sites can provide a demographically neutral understanding of the whale shark’s global ecology. Here, we created genetic profiles for 84 whale sharks from the Saudi Arabian Red Sea and 72 individuals from the coast of Tanzania using a combination of microsatellite and mitochondrial sequences. These two sites, separated by approximately 4500 km (shortest over-water distance), exhibit markedly different population demographics and behavioral ecologies. Eleven microsatellite DNA markers revealed that the two aggregation sites have similar levels of allelic richness and appear to be derived from the same source population. We sequenced the mitochondrial control region to produce multiple global haplotype networks (based on different alignment methodologies) that were broadly similar to each other in terms of population structure but suggested different demographic histories. Data from both microsatellite and mitochondrial markers demonstrated the stability of genetic diversity within the Saudi Arabian aggregation site throughout the sampling period. These results contrast previously measured declines in diversity at Ningaloo Reef, Western Australia. Mapping the geographic distribution of whale shark lineages provides insight into the species' connectivity and can be used to direct management efforts at both local and global scales. Similarly, understanding historical fluctuations in whale shark abundance provides a baseline by which to assess current trends. Continued development of new sequencing methods and the incorporation of genomic data could lead to considerable advances in the scientific understanding of whale shark population ecology and corresponding improvements to conservation policy.



Whale shark tissue was collected over six seasons in Saudi Arabia (2010-2015) and over two seasons in Tanzania (late2012- early2014). Sampling at both sites occurred during their respective whale shark tourism seasons and followed the same procedure. Free-swimming whale sharks were approached by snorkelers and tissue samples were taken using a Hawaiian sling pole-spear fitted with a biopsy tip.Collected samples were preserved immediately in 70-90% ethanol, or, in cases where ethanol was not available, samples were put on ice and transferred to a freezer. All samples were eventually placed in 90% ethanol and kept at -20 ℃ for long-term storage.Each sample consisted of white, subcutaneous tissue and a black dermal-cap. The dermal-cap was separated with a scalpel and used for all further analyses.

Microsatellite Loci

Each sample consisted of white, subcutaneous tissue and a black dermal-cap. The dermal-cap was separated with a scalpel and used for all further analyses. DNA was extracted using one of two kits, the DNeasy Blood and Tissue Kit (Qiagen Inc.) or the NucleoSpin Tissue Kit (Macherey-Nagel), following the respective kit instructions. The quality and quantity of extracted DNA were measured using a NanoDrop 8000 spectrophotometer (Thermo Scientific; Next-generation sequencing was performed for one specimen using a Roche 454 GS FLX (titanium) sequencer ( and genomic library was constructed following the manufacturer’s protocol. Raw unassembled reads from this library were mined for putative microsatellite loci using the msat-commander v 1.0.8 software. Default settings were used to screen perfect dinucleotide and tetranucleotide repeats that were at least 20 bp long, resulting in 1588 putative microsatellite loci. Primer 3 software was then used to design 353 primer pairs for all reads that contained suitable microsatellite repeat motifs. From these 14 novel primer pairs were selected and synthesized along with eight primers from earlier publications (three from Ramirez-Macias et al. 2009, five from Schmidt et al. 2009), producing 22 candidates for PCR trials.

Microsatellite allele size was read using Geneious 8.1.6 software (Biomatters Ltd.). After scoring, any duplicate genotypes were identified using the Microsatellite Toolkit. Primers were screened using Genepop 4.2 to exclude markers containing either null alleles or linked loci and to ensure Hardy-Weinburg equilibrium. The remaining 11 loci were used for all further microsatellite analyses.

Usage notes


Includes clarifying information about the MicrosatellitePrimers and MicrosatelliteData files. 


<Microsatellite Primer File>

Includes 48 primer pairs that were tested including the final 14 novel primers selected for use in the study (highlighted rows). Primer names are listed in the first column, second column is the forward sequence, third column (left_tm) is the forward melting temperature, fourth  column is the reverse sequence, fifth column (right_tm) is the reverse melting temperature, sixth column is the PCR product size in basepairs (pair_product_size), and seventh column is the repeat motif and number of repeats. 


Includes microsatellite profiles for all sharks collected at Mafia Island, Tanzania, and the Al Lith, Saudia Arabia (Red Sea). 

Working Sample ID were the labels on the samples. MF are samples from Mafia Island Tanzania. All of the remaining are from the Saudi Arabian Red Sea. UK are unknown individuals, UT are individuals that were not tagged during other studies, WS are individuals that were tagged during other studies. 

Wildbook-ID column indicates the online identification number of the sharks on Wildbook for Whale Sharks at If an ID was not available for a shark, their "Working Sample ID" was retained. If the sample could not be connected to a shark on Wildbook there is a - in the column. TZ is for individuals sampled in Tanzania, MZ are individuals that were orginally photo identified in Mozambique but sampled in Tanzania, and R are individuals from the Red Sea sampled in the Saudi Arabian Red Sea. 

All columns are labled by the Loci name. Missing values in microsatellite profiles are "0"