Skip to main content
Dryad logo

Reinforcement learning theory reveals the cognitive requirements for solving the cleaner fish market task


Quiñones, Andres; Leimar, Olof; Lotem, Arnon; Bshary, Redoaun (2019), Reinforcement learning theory reveals the cognitive requirements for solving the cleaner fish market task, Dryad, Dataset,


Learning is an adaptation that allows individuals to respond to environmental stimuli in ways that improve their reproductive outcomes. The degree of sophistication in learning mechanisms potentially explains variation in behavioural responses. Here, we present a model of learning that is inspired by documented intra- and interspecific variation in the performance in a simultaneous two-choice task, the ‘biological market task’. The task presents a problem that cleaner fish often face in nature: the decision of choosing between two client types; one that is willing to wait for inspection and one that may leave if ignored. The cleaners’ choice hence influences the future availability of clients, i.e. it influences food availability. We show that learning the preference that maximizes food intake requires subjects to represent in their memory different combinations of pairs of client types rather than just individual client types. In addition, subjects need to account for future consequences of actions, either by estimating expected long-term reward or by experiencing a client leaving as a penalty (negative reward). Finally, learning is influenced by the absolute and relative abundance of client types. Thus, cognitive mechanisms and ecological conditions jointly explain intra and interspecific variation in the ability to learn the adaptive response.


All the data for the paper was generated by running individual based simulations written in c++ laguange. The code both for generating the data and for figure of the published article can be found in here: DOI: 10.5281/zenodo.3361665

Files names starting with the identifier "FAA" correspond to Fully Aware Agents, while files named starting with "PAA" correspond to Partially Aware Agents. After the identifier, file names contain 7 numbers, each one preceeded by the information of the simulation they communicate. The first number gives the alpha parameter, it is followed by the word "alph". The second number provides the value used in parameter gamma. The third number correspond to the value of parameter tau (this parameter is only relevant for simulation run for algorithm SARSA). The fourth number provides the value for a boolean variable that determines whether penalty is used in the simulation. The fifth and sixth numbers give the value for the probability of a visitor, and the probability of a resident, respetively. Finally, the last number gives the seed used in the random number generator. 

Columns, in the files, have headers that correspond to values or parameters in the models. All other parameter values can be found in the associated json files that are contained within the same folder.