Data from: Do goldfish like to be informed?
Data files
Apr 17, 2025 version files 39.55 MB
-
Annotated_data.zip
255.41 KB
-
Code.zip
20.21 KB
-
README.md
13.10 KB
-
Source_data.zip
39.26 MB
Abstract
Like humans, several mammalian and avian species prefer foretold over unsignalled future events, even if the information is costly and confers no direct benefit. It is unclear whether this is an epiphenomenon of basic associative learning mechanisms or whether these preferences reflect a derived form of information-seeking that is reminiscent of human curiosity. We investigate whether a fish that shares basic reinforcement learning mechanisms with birds and mammals, also shows such a preference, with the aim of elucidating whether widely shared conditioning processes are sufficient to explain paradoxical preferences resulting in unusable information. Goldfish (Carassius auratus) chose between two alternatives, both resulting in a 5s delay and 50% reward chance. The ‘informative’ option immediately produced a stimulus correlated with the trial’s forthcoming outcome (reward/no reward). Choosing the ‘non-informative’ option instead triggered an uncorrelated stimulus. Goldfish discriminated between the different contingencies, but did not develop a preference for the informative option, suggesting that in goldfish, associative learning mechanisms are not sufficient to generate preferences between alternatives differing only in outcome predictability. These results challenge the notion that informative preferences are a by-product of ubiquitous associative processes, and are consistent with the possibility that derived information-seeking mechanisms have evolved in some vertebrate species.
Dataset DOI: 10.5061/dryad.ksn02v7gh
1. Data and File Overview
1.1 Dataset files by figure/panel (all in Annotated_data.zip):
- Fig2a_InfoPreference_all.csv
- Fig2b_ForcedLatencies.csv
- Fig3aTOP_fulltracks.csv
- Fig3aTOP_SplusTracks.csv
- Fig3aTOP_noInfoTracks.csv
- Fig3aTOP_SminusTracks.csv
- Fig3a_BOTTOM.csv
- Fig3b.csv
- Fig3c_MovementMetric.csv
- FigS3_TerminalChoices_all.csv
- processingMetaData_tracks.csv
1.2. Source_data.zip (See 3. Raw data for details)
1.2.1 Phase 1
Includes individual animal/session event data files
1.2.2 Phase 2
Includes individual animal/session event and tracking data files
1.3 Code (all in Code.zip):
- figures.m
- dataProcessing.m
- parse_choices_and_latencies_phase_1.m
- parse_choices_phase_2.m
- parseTracks.m
- Statistical_analysis_do_goldfish_like_to_be_informed.R
2. Annotated data and accompanying code
2.1. Annotated data descriptions by figure
Figure 2a
Filename: Fig2a_InfoPreference_all.csv
Number of variables: 3
Variable list:
- Choice_proportion: proportion of choices made for the Info option over the NoInfo option on a given day for a given individual (number of Info choice /number of Info choices + number of Noinfo choices).
- Subject: number identifying a given individual goldfish in the study (1-8).
- Day: denotes the testing day in consecutive order (1-30).
Figure 2b
Filename: Fig2b_ForcedLatencies.csv
Number of variables: 4
Variable list:
- Latency: median across trials of time taken in seconds to respond to the presentation of either Info or NoInfo in forced trials where only one option is presented.
- Option: indicates whether data is from Info or NoInfo forced trials.
- Subject: number identifying a given individual goldfish in the study (1-8).
- Day: denotes the testing day in consecutive order (1-30).
Figure 3a top panel
Filename: Fig3aTOP_fulltracks.csv
Number of Variables: 2
Variable list:
- Xpos: normalised x co-ordinates of an example subject’s (BuBu) centroid from an experimental session presented in sequence.
- Ypos: normalised y co-ordinates of an example subject’s (BuBu) centroid from an experimental session presented in sequence.
Filename: Fig3aTOP_SplusTracks.csv
Number of Variables: 10
Variable list:
- t1_x
- t1_y
- t2_x
- t2_y
- t3_x
- t3_y
- t4_x
- t4_y
- t5_x
- t5_y
The data shows x and y co-ordinates of the centroid of an example subject (BuBu) during the post-choice delay period in trials where the Info stimulus for reward (S+) is presented. Corresponding x and y co-ordinates for 5 example trials are present. Column headings denote the trial and co-ordinate type for each sequence. For example, ‘t1_x’ is a column of x-coordinates taken from trial 1 during the post choice delay when (S+) is presented, while, ‘t1_y’ is the column of y-coordinates taken from trial 1 during the post choice delay when (S+) is presented.
Filename: Fig3aTOP_noInfoTracks.csv
Number of Variables: 20
Variable list:
- t1_x
- t1_y
- t2_x
- t2_y
- t3_x
- t3_y
- t4_x
- t4_y
- t5_x
- t5_y
- t6_x
- t6_y
- t7_x
- t7_y
- t8_x
- t8_y
- t9_x
- t9_y
- t10_x
- t10_y
The data shows x and y co-ordinates of the centroid of an example subject (BuBu) during the post-choice delay period in trials where the stimuli in the non-informative option (N1 or N2) are presented. Corresponding x and y co-ordinates for 10 example trials are present. Column headings denote the trial and co-ordinate type for each sequence. For example, ‘t1_x’ is a column of x-coordinates taken from trial 1 during the post choice delay when either N1 or N2 is presented, while, ‘t1_y’ is the column of y-coordinates taken from trial 1 during the post choice delay when either N1 or N2 is presented.
Filename: Fig3aTOP_SminusTracks.csv
Number of Variables: 18
Variable list:
- t1_x
- t1_y
- t2_x
- t2_y
- t3_x
- t3_y
- t4_x
- t4_y
- t5_x
- t5_y
- t6_x
- t6_y
- t7_x
- t7_y
- t8_x
- t8_y
- t9_x
- t9_y
The data shows x and y co-ordinates of the centroid of an example subject (BuBu) during the post-choice delay period in trials where the Info stimulus for no reward (S-) is presented. Corresponding x and y co-ordinates for 9 example trials are present. Column headings denote the trial and co-ordinate type for each sequence. For example, ‘t1_x’ is a column of x-coordinates taken from trial 1 during the post choice delay when (S-) is presented, while, ‘t1_y’ is the column of y-coordinates taken from trial 1 during the post choice delay when (S-) is presented.
Figure 3a bottom panel
Filename: Fig3a_BOTTOM.csv
Number of Variables: 4
Variable list:
- occupancy probability: probability of animal being present in a given tank region. The probabilities are organised within a 10 by 20 matrix which represents the total area of the experimental tank.
- subject: identifies participating animal.
- condition: denotes whether occupancy histograms are derived from subject’s position when S+, N1/N2 or S- are presented.
The data comprises three 10 by 20 matrices (one for each stimulus type) showing occupancy probabilities for tank regions during stimulus presentation. All data are taken from an example session from the same example subject (Bubu).
Figure 3b
Filename: Fig3b.csv
Number of Variables: 4
Variable list:
- occupancy probability: probability of animal being present in a given tank region. The probabilities are organised within a 10 by 20 matrix which represents the total area of the experimental tank. The probability shown for each region is the average across the last 5 experimental days for each subject.
- subject: identifies participating animal.
- condition: denotes whether occupancy histograms are derived from subject’s position when S+, N1/N2 or S- are presented.
For each subject, the data comprises three 10 by 20 matrices (one for each stimulus type) showing averaged occupancy probabilities for tank regions during stimulus presentation.
Figure 3c
Filename: Fig3c_MovementMetric.csv
Number of Variables: 3
Variable list:
- Entropy: movement metric computed as the entropy of subjects’ spatial distributions during the post-choice delay when various anticipatory stimuli are presented, split by stimulus type. Larger values correspond to more movement during these periods.
- subject: identifies participating animal (1-8).
- Stimulus: Denotes the reward predictive stimulus that was presented S+ (100% reward), N1/N2 (50%) or S- (0%), during the post-choice delay.
Figure S1
Same data as in Figure 2a and Figure 2b (see above).
Figure S2b,c
Same data as in Figure 3b (see above).
Figure S3
Filename: FigS3_TerminalChoices_all.csv
Number of Variables: 4
Variable list:
- choice_proportion: proportion of choices for S+ stimulus over N1/N2 stimuli or proportion of choices for S- stimulus over N1/N2 stimuli.
- Stimulus: shows whether S+ or S- was available to choose.
- Subject: identifies participating animal (1-8).
- Day: denotes the testing day in consecutive order (1-5).
2.2. Code to generate figures from annotated data
All code required to generate all figures is included in <<figures.m>>, with separate annotated sections for each relevant figure and panel combinations.
2.3. R code for statistical analysis
All R code for statistical analyses is included in file <<Statistical_analysis_do_goldfish_like_to_be_informed.R>> with separate sections for each relevant figure panel combination.
3. Raw data files (source data)
Above we reference code to reproduce figures and statistical analyses using annotated data which is pre-processed. Here we list source data files and code that were used to produce the annotated data.
The experiment was fully controlled using a custom Bonsai workflow (see Ajuwon et at. 2024 for details). This workflow simultaneously controlled task contingencies and tracked subjects’ centroid in real time (see Apparatus and task control) and saved the resulting data.
For every subject and session, two different file types were saved, one — the “events.csv” file — that included the task’s events (and accompanying timestamps and outcome, if relevant) and another — the “tracks.csv” file (Figure 1a) — that saved the x,y coordinates of each subject’s centroid and the luminance value of a specific screen location.
4. Parsing choices and response latencies from event data from phase 1
Choices proportions and response latencies were obtained for every trial, session and subject from <<“{subject}events{timestamp}.csv”>> files (see example snippets in Appendix figure 1a) following the routine in <<parse_choices_and_latencies_phase_1.m>>. The source data can be found in <<data/source data/phase 1/events>>. The resulting data is compiled and annotated in <<‘Fig2a_InfoPreference_all.csv’>> and <<‘Fig2b_ForcedLatencies.csv’>> (see 1.1. Annotated data descriptions by figure for details).
Appendix figure 1. Example snippet of an “event.csv” file (a) and a “track.csv” file (b). a. Every trial started with an inter-trial-interval (ITI) onset and ended with its corresponding trial number. Other events were ITI offset time, trial initiation, choice, terminal stimulus onset and offset times and outcome. b. For every frame in the video (rows) screen luminance (at a specific fixed location), x and y position values were stored - columns.
5. Combining event and track data
All code that combines and parses events and tracking data, as well as normalizes and transforms tracks can be found in <<parseTracks.m>>. The accompanying source data can be found in <<data/source data//phase1/tracking/>>
5.1. Event detection from luminance traces
The luminance of a circular area (0.75 cm radius) located in the top and center of was continuously recorded (see Appendix Figure 1b for example raw file and Appendix Figure 2a for example session). Its hue would change from black, during the ITI, to grey when a trial was available to white following a choice. Thresholding of this signal allowed for the trial-by-trial detection of these events (Appendix Figure 2b). Due to slight variations of lighting conditions and camera position across animals, these thresholds were pre-set on an individual-by-individual basis, and stored for easy access and analyses calls in a metadata file — <<processingMetaData_tracks.csv>>.
The combination of events detected from luminance profiles with those same events logged into the event files (Appendix Figure 2c), allowed to pinpoint onto the animals movement trajectories every relevant task condition and epoch, especially during the post-choice pre-outcome period (Figure 3a top).
Appendix figure 2. Example session luminance trace. a. Raw luminance traces. b. ITI offset and choice detection based on luminance values. c. Choice detection combined with event data, depicting in different colours, info/no-info choices and corresponding terminal stimulus presented.
5.2. Generating normalized occupancy histograms/plots
Animals were tested in two experimental tanks. These tanks were, in all respects identical, other than the camera orientation that was flipped. Side of conditions and colour allocation of the terminal stimuli were counterbalanced across subjects. The metadata file (<<processingMetaData_tracks.csv>>) includes, in addition to thresholding values for luminance detection, information regarding the experimental tank, counterbalance and if the tracking data is not corrupted (see Supplementary Table 2). Using this information, all trajectories coordinates were normalized [-1,1], the info side arbitrarily set to the left (i.e., bottom left of trajectory and occupancy plots). Occupancy 2D-histograms, also normalized (i.e., the number of elements in each bin relative to the total number of elements in the input data was at most 1) were built for every session and subject and then averaged (Supplementary Figure 2 and <<Fig3b.csv>>).
6. Movement metric calculation
Normalized histograms of animals’ occupancy (data in <<Fig3b.csv>>) were converted into a movement metric (for details see Data analyses section) score using the function <<entropy()>> by Will Dwinnell that returns entropy (in bits). Example code for this analysis can be found in <<dataProcessing.m>>.
7. Parsing choices from event data from phase 2
Choices during phase 2 for every trial, session and subject were obtained from “{subject}terminal_events{timestamp}.csv” files following the routine in <<parse_choices_phase_2.m>>.The source data can be found in <<data/source data/phase2/>>. The resulting data are compiled and annotated in << <<‘FigS3_TerminalChoices_all.csv’>> (see 1.1. Annotated data descriptions by figure, for details).
