Skip to main content

Genetic relatedness in social groups of the emerald coral goby Paragobiodon xanthosoma creates potential for weak kin selection

Cite this dataset

Rueger, Theresa; Buston, Peter; Bogdanowicz, Steven; Wong, Marian (2021). Genetic relatedness in social groups of the emerald coral goby Paragobiodon xanthosoma creates potential for weak kin selection [Dataset]. Dryad.


The explanation for why animals form social groups that include breeders and non-breeders is an evolutionary puzzle with some parts yet unsolved. Ecological constraints explain why non-breeders do not disperse to breed on their own, while social constraints explain why they do not contest to breed at home. Often kin selection explains why non-breeders behave cooperatively. Originally kin selection was assumed to play no role in social evolution in organisms with dispersive larval phases, such as many marine organisms. However, recent evidence suggests small-scale patterns of relatedness in some marine animals, which may influence behavior. To understand the relevance of kin selection in marine fishes, it is first necessary to examine relatedness patterns within groups. We, therefore, investigated the genetic structure of social groups of the emerald coral goby, Paragobiodon xanthosoma, for which kin selection has been assumed to play a minimal role in its social evolution. We genotyped 20 microsatellite loci in 73 individuals, from 16 groups, on three reefs in Kimbe Bay, Papua New Guinea, and estimated pairwise relatedness. We found that relatedness was slightly higher within, compared to among groups, but relatedness values were generally low and did not differ depending on the dyad maturity. Thus, kin selection does not help explain social evolution in this species. Rather, it is likely that the combination of future fitness gains via territory inheritance, and ecological and social constraints drives social group formation. These findings emphasize the role that simple societies can play in providing new insights into social evolution in the absence of kin selection. 



The study was conducted on inshore reefs near Mahonia Na Dari Research and Conservation Centre, Kimbe Bay, Papua New Guinea (5°30’S, 150°05’E), in May 2019. We collected tissue samples from 78 individuals in 16 groups on three inshore reefs. Ten groups were collected at ‘Bob’s Knob’ (reef 1, Nfish=39), two at ‘Kilu’ (reef 2, Nfish=19), and four at ‘Lui’ (reef 3, Nfish=20). ‘Kilu’ lies furthest to the north in Kimbe Bay and is approximately 560m offshore, ‘Lui’ (~630m offshore) is located ~3.5km further south, and ‘Bob’s Knob’ (~730m offshore) is ~1.5km further south than ‘Lui’. No direct information about ocean currents connecting these reefs is available, however, larval connectivity among inshore reefs has been found in other reef fish within the same area (Rueger et al. 2020). Groups were comprised of 2-12 individuals (mean group size ± standard error = 4.69 ± 0.62); two groups had N=2 group members, two groups N=3, five groups N=4, three groups N=5, three groups N=7, and one group N=12. All individuals in a group were caught on SCUBA using hand nets and diluted clove oil solution as a mild anesthetic (Munday & Wilson 1997). Groups were clearly distinguishable by living in one distinct coral head and to the best of our knowledge all group members were sampled. Each fish was measured underwater (Standard Length, SL) and a fin clip was taken from the caudal fin. Maturity and breeder status were assessed according to SL and rank in the size hierarchy: The two largest individuals represent the monogamous breeding pair (mean SL ± standard error, SLR1=18.44 ± 0.88mm; SLR2= 15.19 ± 0.99), all other individuals are non-breeders (Wong et al. 2008b) (SLR3= 10.68 ± 1.33 mm; SLR4= 7.79 ± 1.24mm; SLR5= 7.71 ± 1.61mm; SLR6= 8.63 ± 2.21mm; SLR7= 7.75 ± 2.36mm; SLR8= 10mm; SLR9=8mm; SLR10= 7mm; SLR11= 5.5mm; SLR12= 5mm).  Tissue samples were preserved in 99% ethanol for genetic analysis. After sampling, the fish were released back into their host coral.

Isolation of microsatellite loci

Construction and screening of a P. xanthosomus genomic DNA library enriched for microsatellite loci followed procedures described in Nali et al. (2014), with modifications. Genomic DNA from four individuals was pooled, and the resulting library was screened for 29 unique tetrameric motifs. Paired reads (2 x 250 bp) were collected on an Illumina Miseq platform at the BioResource Center at Cornell University. Contig assembly was with NGen (v.11) software (DNASTAR, Madison, WI), using a de novo assembly within a Transcriptome project type. Reads were trimmed of adapter and low-quality regions (minimum quality 20), with mer size set to 99 and minimum match percentage 93. Gap penalty and maximum gap length were set to 10 and 50, respectively. Contigs shorter than 150 bp were not kept. The assembly generated 44,421 contigs, with an N50 of 376 bp and an average coverage of 16 reads. 

Contigs were scanned for microsatellite repeats (minimum perfect repeat length=5) and primers were designed with msatcommander (v. 1.08-beta) software. Primer lengths ranged between 18 and 24 bp, primer Tms between 57°C and 62°C, and optimal PCR product size was 325-350 bp. Over 4000 unique tetrameric microsatellite loci with designed primers were recovered from the assembled contigs.

Marker development and multiplex PCR

All individuals were sequenced at 54 microsatellite loci using a multiplex PCR protocol for targeted amplicon sequencing. For a detailed description of multiplex PCR and Nextera barcoding methods, see D’Aloia et al. (2017). We amplified the loci with multiplex PCR reactions using QIAGEN Multiplex kits and the primers listed in Table S1. Samples were pooled across multiplexes and Illumina’s S5 and N7 Nextera primers were used to run a barcoding PCR. The sequencing library was prepared by pooling all barcoded individuals, which were then size-selected with Ampure XP (Beckman Coulter). The library was sequenced on an Illumina MiSeq with paired 250 bp reads at the BioResource Center, Cornell University.

Data processing

We used a python script (, to call genotypes at each microsatellite locus. Default commands were used except the following: -c 1, -a 0.005, -l 150. We also explored two reads ratios (-r command) for calling heterozygotes; the default of –r 20 as well as –r 40. A minimum of two reads were required for each allele; otherwise, the diploid genotype was re-coded as missing data. To retain only the highest quality markers and individuals, we first excluded loci missing >20% of individuals, then individuals with >20% missing loci. After the filters were applied, 73 individuals remained, analyzed at 37 loci. Seventeen loci were in HWE at –r 20 and three more were in HWE at –r 40; our final dataset contained 73 individuals and 20 markers. Marker sequences and marker-specific statistics are provided in Supplemental Table 1.

Population structure

Before investigating relatedness, we investigated whether there was any population structure, because relatedness of any two individuals is the probability that they share an allele relative to the probability that two individuals from the same population share the allele (Queller 1994). We estimated pairwise Fst between different reefs using GENEPOP (Raymond & Rousset 1995, Rousset 2008). Genotypic differentiation was not significant between reef 1 and reef 2 (Fst=0.013, Exact G-test, χ2(40) = 53.604, p =0.074), reef 1 and reef 3 (Fst=0.011, χ2(40) = 38.904, p =0.519), or reef 2 and reef 3 (Fst=0.016, χ2(40) = 29.777, p =0.881). We proceeded assuming that we were working with a single population.  


Pairwise relatedness was calculated for all dyads in the population using the R package Demerelate (Kraemer & Gerlach 2017). Dyad relatedness calculations were done using the Queller-Goodnight estimator (Queller & Goodnight 1989) because it has recently been used in similar species (D’Aloia et al. 2018) and it allows comparison with other reef fish species where relatedness has been investigated (Buston et al. 2009, Rueger et al. 2020). We did not assess dyad sibship, since the number of markers utilized did not allow us to identify full or half-siblings with confidence (D’Aloia et al. 2018). Overall mean degree of relatedness in the sample was calculated as the average of pairwise relatedness among all sampled individuals. Mean degree of relatedness among group members was calculated as the average of the pairwise relatedness within each group. 

Usage notes

The readme file contains an explanation of each of the variables in the dataset and its measurement units. #NA =  values not available. Information on how the measurements were done can be found in the associated manuscript referenced above.


Marie Sklodowska-Curie Individual Global Fellowship, Award: 841263

Marie Sklodowska-Curie Individual Global Fellowship, Award: 841263