Skip to main content

Phenotypic architecture of sociality and its associated genetic polymorphisms in zebrafish

Cite this dataset

Oliveira, Rui et al. (2021). Phenotypic architecture of sociality and its associated genetic polymorphisms in zebrafish [Dataset]. Dryad.


Sociality is often seen as a single phenotypic trait, but it relies on motivational and cognitive components implemented by specific causal mechanisms. Hence, these components may have evolved independently, or may have been linked by phenotypic correlations driven by a shared selective pressure for increased social competence. Furthermore, these components may be domain-specific or of general domain across social and non-social contexts. Here we have characterized the phenotypic architecture of sociality in zebrafish, which has been increasingly used as a model organism in social neuroscience. For this purpose, we have behaviorally phenotyped zebrafish from different wild type lines in four tests: social tendency, social and non-social recognition, and open-field test. Our results indicate that: (1) sociality has two main components that are independent from each other (social tendency and social recognition), hence not supporting the occurrence of a sociality syndrome; (2) both social traits are phenotypically linked to non-social traits (non-social exploration and non-social memory, respectively), forming two general behavioral modules, general inspection and general recognition, and suggesting that sociality traits have been co-opted from general-domain motivational and cognitive traits. Moreover, the study of the association between genetic polymorphisms (i.e. single nucleotide polymorphisms, SNPs) and each behavioral module further supports this view, since several SNPs from a list of candidate “social” genes, are statistically associated with the general inspection (motivational), but not with a general recognition (cognitive), behavioral module. The SNPs associated with general inspection are widespread across different chromosomes and include neurotransmitters, neuromodulators, and synaptic plasticity genes, suggesting that this behavioral module is regulated by multiple genes, each of them with small effects. Together, these results support the occurrence of general domain motivational and cognitive behavioral modules in zebrafish, which have been co-opted for the social domain.


Ethics statement

All experimental procedures were reviewed by the institutional internal Ethics Committee at the Gulbenkian Institute of Science and approved by the National Veterinary Authority (Direção Geral de Alimentação e Veterinária, Portugal; permit number 0421/000/000/2017).


Zebrafish lines and housing conditions

Zebrafish were raised in the Fish Facility of the Gulbenkian Institute of Science under laboratory conditions. The following lines were used in this study:

1. AB, was established by George Streisinger and Charline Walker in the Oregon labs, from two lines, A and B, purchased by George Streisinger at different times from a pet shop in Albany, Oregon, in the late 1970s. The original A and B lines probably originated from a hatchery in Florida. The AB line has been screened for lethal-free embryos by in vitro fertilization and selected females subsequently used to establish the current AB line [47-48]. This procedure reduced the number of lethal mutations in this line, which has been used as the primary background for most of the transgenic and mutant lines that are currently available.

2. TU (Tuebingen), originated from a pet store in Tuebingen and was selected during the 1990s at the Max-Planck in Tuebingen to remove embryonic lethal mutations from the background before being used by Sanger for the zebrafish sequencing project [49-50].

3. WIK (Wild India Kolkata), was derived from a wild catch of a single pair in India, near Kolkata. The WIK line is very polymorphic relative to the TU line and was first described as WIK11 [51].

4. TL (Tüpfel long fin), was derived from a cross between an AB with a spotted phenotype and a TU resulting in a long-finned phenotype. This line is homozygous for leot1, a recessive mutation causing spotting in adult fish (aka tup), and for lofdt2, a dominant mutation causing long fins.

5. 5D (5D Tropical), was derived at Sinnhuber Aquatic Research Laboratory (SARL) at Oregon State University in 2007, from a commercial breeding facility (5D Tropical Inc., Florida), to generate a Pseudoloma neurophilia(Microsporidia) free line [52].

6. LEO (Leopard), is a wild type line commonly available in pet shops, which displays a spotted adult pigment pattern instead of striped. This line is homozygous for a spontaneous mutation in the gene leopard (leo), leot1 [53-55].

A total of 164 experimentally naive adult zebrafish of both sexes, aged 6-8 months, were used in this study as focal subjects (AB: M = 8, F = 14; TU: M = 9, F = 12; WIK: M = 12, F = 4; TL: M = 13, F = 10; LEO: M = 7, F = 10; 5D M = 32, F = 33). Focal fish were raised and housed separately from fish used as stimuli to prevent effects of prior familiarity. Fish used as stimuli were of the same line as the focal fish. Housing was in groups of 35 fish kept in 3.5 L aquaria of a recirculating system (ZebraTec, 93 Tecniplast), with water parameters set at 27-28 °C, 7.5 ± 0.2 pH, ~ 900 μSm, and <0.2 ppm nitrites, <50 ppm nitrates and 0.01-0.1 ppm ammonia. Daily photoperiods were alternated between 14h light and 10h dark and feeding occurred twice-daily and included a combination of live (Paramecium caudatum; Artemia salina) and processed dry food (GEMMA Micro).


Experimental setup and procedures

The behavior of each experimental fish was assessed in four different tests: (1) a shoal preference test to measure social tendency; two one-trial recognition tests using either objects (2) or conspecifics (3) as stimuli to measure non-social and social recognition/exploration, respectively; and (4) an open-field test to measure the anxiety trait. Excluding the open-field, all setups included an experimental tank (30 L x 15 W x 15 H cm) and two adjacent tanks (15 L x 15 W x 7.5 H cm) with a stimulus-holding compartment having a viewing side of 10 cm in the shoal preference test and of 5 cm in the social and object recognition tests, the difference accounting for the different visual target areas offered by a shoal vs. an individual or an object. Water depth was kept constant at 9cm in all tanks (Fig 1a). For the open-field test, a round tank with a 22 cm diameter was used and water level was kept at 6 cm depth (Fig 1b).

All tests occurred during the light period between 09:00 and 19:00, before which fish were kept overnight in an aquarium with individual compartments for identification purposes. These compartments were separated by fine mesh that allowed visual and chemical access to neighbors and minimized stress from isolation. In the experimental tanks, external stimuli were visually blocked by opaque, non-reflective stickers and opaque covers obscured adjacent stimulus containers prior to the onset of recordings during the shoal preference and recognition tests. Behavior during tests was recorded using black and white mini surveillance cameras (Henelec 300B) suspended above the experimental tank and relaying the image to a laptop kept at a distance to reduce disturbance of fish by the experimenter. During recording, lighting in the room was kept at conditions that reduce water-surface reflection in the videos, and extra lighting was provided by an infrared lightbox placed under the experimental tank in order to facilitate video tracking during the data collection stage. Between tests, water in the experimental tank was changed to eliminate olfactory cues.

Before tests, focal fish were netted from their individual overnight compartment and immediately placed in the experimental tank. For the shoal preference test, fish were first given 10 min to acclimatize to the experimental tank and then tests were initiated by removing the opaque covers and allowing fish visual access to the two adjacent containers, one empty (control) and the other holding a mixed-sex shoal (Fig 1c), for 10 min. The side of presentation of each stimulus was counterbalanced between focal individuals to control for side biases. Recognition tests were comprised of two phases: an acquisition phase and a probe-test phase, and the experiment included a 10 min initial acclimation period before the acquisition phase and a 10 min interval before the probe-test phase. Both phases were initiated by removing opaque covers and allowing fish visual access to two adjacent containers. During the acquisition phase, animals were presented with two novel stimuli for 10 min: two conspecifics for the social test and two objects (0.5 ml eppendorf tubes of the same color) for the non-social test. During the probe test, animals were presented with one of the stimuli from the acquisition phase (familiar) and a novel stimulus (a new conspecific or a differently colored eppendorf tube) for 10 min (Fig 1d). For the non-social recognition test, the size of the eppendorf tubes was matched to the average zebrafish size to control for size-dependent prey or predator directed responses and, based on preliminary preference tests, were colored with colors of equal preference by the fish (either green or red for all lines, except for LEO that instead show no preference between purple and blue). The side of each stimulus (novel or familiar) during probe tests was counterbalanced across animals, to control for side biases, and the color used for the familiar or novel stimulus was randomized, to control for color biases (Fig 1e). For the open-field test (Fig 1b), animals were placed in the center of the circular tank and recorded for 10 min.

Videos were analyzed using a commercial video-tracking software (EthoVision XT, Version 11.5, Noldus Information Technology) and behavioral measures were extracted from each test. Regions of Interest (ROI) marked were kept at an average body length distance from the target location (grey regions in Fig 1a-b). Social tendency during the shoal preference test was quantified by the proportion of time in ROIs spent near the shoal, social and non-social discrimination during the conspecific and object recognition tests was measured by the proportion of time in ROIs spent near the preferred stimulus (familiar or novel), while the overall time spent in ROIs near both stimuli was used as a measure of exploration. Anxiety in the open field test is typically exhibited by thigmotaxis (i.e. the propensity to avoid exposed areas), which was measured as the proportion of time spent within the ROI near the periphery following first entry (to control for any initial freezing in the center), while the average distance (in cm) from the wall was used to quantify the edge or wall orienting tendency associated with fear-induced thigmotaxis [56].


Genetic polymorphisms analysis

At the end of the behavioral phenotyping, animals were anesthetized by immersion into an ethyl 3-aminobenzoate methanesulfonate salt solution (MS222) 100-200mg/L, a fin clip collected from the caudal fin of each experimental fish, and preserved in a digestion mix (PK, 10 mg/ml, Lysis solution [Fermentas #K0512], TE buffer) until further processing. Subsequently, DNA was extracted from preserved fin clips using DNA Extraction kit (Fermentas #K0512) with some adjustments to the protocol provided by the manufacturer.  Briefly, samples were thawed at room temperature and placed in a thermomixer for approximately 20h with shaking (700 rpm) at 50ºC. After, chloroform was added in a 1:1 ratio and the samples gently mixed by inversion. Samples were then centrifuged at 18506 g (13200 rpm in Eppendorf 5430R centrifuge) for 7 min and the upper aqueous phase transfer to a new 1.5 ml tube. 800 µl (720μl H2O + 80μl of precipitation solution [Fermentas #K0512]) was added to each tube, mixed gently by inversion for 2 min and centrifuged again for 10 min at 18506 g. The supernatant was removed, the DNA pellet dissolved in 100μl NaCl 1.2M solution [Fermentas #K0512], and 300μl of freezer cold 100% ethanol (-20ºC) was added to allow DNA to precipitate over night at -20°C. In the day after, samples were centrifuged for 10 min (18506 g) and the ethanol removed. To wash the pellet, 200 μl of freezer cold 70% ethanol was added to each sample and centrifuged for 10 min (18506 g). Finally, the pellet was allowed to dry for 15-30min at 37ºC and 30μl of DNAse-free sterile H2O was added. To access the concentration and quality of the DNA, samples were quantified in the Nanodrop (Thermo Scientific, Nanodrop 2000) and the ratios 260/280 and 260/230 listed. 

We built a list of candidate genes to test their association with the behavior traits, based on evidence from the literature for their involvement in the regulation of social behavior. This gene list included genes for: neurotransmitter systems (e.g. dopamine, serotonin), neuromodulators (e.g. oxytocin, AVT, NPY), neuroplasticity (e.g. bdnf, neurexins, neuroligins), and genes linked to autism (e.g. shank3a). To obtain candidate SNPs for the genes of interest, all germline variations from this species were downloaded from in the form of a GVF file. The GVF file was filtered to keep only SNPs in locus of interest and which evidence was sustained by frequency observations to increase probability of variation. Sequences were extracted with Ensembl's Biomart tool using the "Zebrafish Short Variants (SNPs and indels excluding flagged variants) (GRCz11)" dataset. Several iterations of Assay Design 4.0 (Agena Biosciences), which designs multiplexed MassEXTEND® assays for Mass Spectrometry detection, were run to accomplish an even distribution on the genes of interest. Four multiplexes were designed with 38, 36, 35 and 35 assays. Agena Biosciences iPlex(®) Kit, MassARRAY(®) platform and Typer software v.4 were used following manufacturer's standard protocols and procedures, for the genotyping reactions, acquisition of genotypes and inspection of results, respectively. 139 SNPs in locus of interest were successfully sequenced, but we had to remove 7 for lack of variation between the 164 tested zebrafish (the final list of SNPs is available in Table 2).


Statistical Analysis

In order to confirm that all lines express social tendency and are able of social and object recognition, one-sample t-tests (µ ≠ 0.5 vs. >0.5) were used to test if the scores of social tendency, object discrimination and social discrimination were significantly different from chance levels for each sex and for each line. Next, we extracted behavioral modules that aggregate correlated behaviors by carrying out a factor analysis using principal component extraction (PCA) followed by varimax rotation, based on the correlation matrix of all behavioral measures (social tendency, social discrimination, social exploration, object discrimination, object exploration, thigmotaxis and edge-orienting) . The analysis identified three main components (Cs) to which we call behavioral modules: general inspection, general recognition and anxiety (see the results’ section for more details). Then, Linear Mixed Models (LMM) were used to assess the effects of sex, line, the interaction between the two and the fish ID as a random covariate on the scores each behavioral module, followed by Tukey post-hoc tests. These analyses were carried out in the statistical software Minitab ® version 17 (Minitab Inc., State Collage, PA, USA).

The remaining analyses were carried out in the statistical software R, version 4.0.4 [57]. To test if the behavioral modules are differently related for with each zebrafish line, we computed Pearson correlation matrices between the three PC scores across each line. All p-values were corrected for multiple testing with Benjamin and Hochberg’s method. Heatmaps were used for visual representation of the correlation matrices for each line. The packages “Hmisc” [58] and “ggplot2” [59] were used for computing the correlations and building the heatmaps, respectively. The quadratic assignment procedure (QAP) correlation test with 5000 permutations [60], was used to assess the association between any two correlation matrices between different zebrafish lines on UCINET 6 [61]. Given that the null hypothesis of the QAP test is that there is no association between matrices, a significant p-value indicates that the correlation matrices are similar.

To check whether the genetic distances between subjects (using their genetic data from the list of 132 SNPs) are structured by line or represent a uniform population, we performed a hierarchical clustering analysis, using the “philentropy” package [62]. We computed the jaccard distance between all subjects, which is the proportion of the similar genetic distances between subjects over the total genetic distances. With the genetic distances’ matrix, we performed the hierarchical clustering with complete-linkage, which calculates the maximum distance between clusters before merging. Then, we plotted the hierarchical cluster in a dendrogram using the “dendextend” package [63]. We found a structed population with 5 clusters, corresponding to 4 of the 6 different lines, with the 5th cluster merging the TU and WIK lines together (see results section for more details). Therefore, we decided to include line as a covariate in the analyses of SNP-behavior associations (see below).

To assess the associations between genetic polymorphisms and behavior, we tested each of the 132 SNPs independently against each behavioral phenotype (the 7 behaviors and 3 PC scores). We did not include 3 zebrafish subjects in this analysis because their sample call rate was below 5%, meaning they lack genetic information for most SNPs. For the behaviors that followed a linear distribution (general inspection, general recognition, anxiety and edge-orienting) we used linear models (LM) implemented with the R “base” package. For the behaviors that were proportions (social tendency, social discrimination, social exploration, object discrimination, object exploration and thigmotaxis), we used generalized linear models (GLM) with beta regression implemented with the “betareg” package [64]. In all models, the behaviors were the response variables, SNP was the explanatory variable and line was a co-variate. SNPs were integers, where 1 represented the heterozygote case and 0 and 2 the homozygotes. For example, for SNP rs180151563, 0 represents the genotype AA, 1 the genotype CA and 2 the genotype CC. For some SNPs there were only two of the three possible conditions. Line represents the different origins of the zebrafish subjects that we tested. It was also an integer, varying between 1 and 6, where 1 represented the 5D line, 2 the AB line, 3 the LEO line, 4 the TL line, 5 the TU line and 6 the WIK line. For each statistical model, we used the summary() function in R to extract the p-value of the SNP, which was corrected for the line effect. Because we run 132 independent tests for each SNP, we corrected the p-values with the false discovery rate (FDR) adjustment method.

For some of the SNP-behavior associations that remained significant after FDR adjustment, we used the “ggplot2” [59] and “ggpubr” [65] packages to draw boxplots for the given phenotype, broken down by the SNP genotype. Over the boxplots, we added dot plots broken down by Line to help visualizing the Line effect on zebrafish behavior. For a more comprehensive comparison of the SNP-behavior associations, we plotted the significant associations by behavioral categories using Venn diagrams, with the “VennDiagram” package [66].


Fundação para a Ciência e Tecnologia, Award: PTDC/BIA-COM/30627/2017

Fundação para a Ciência e Tecnologia, Award: PTDC/BIA-ANM/0810/2014

Fundação para a Ciência e Tecnologia, Award: UIDB/04555/2020