Skip to main content
Dryad logo

On the strategic learning of signal associations

Citation

Sherratt, Thomas; James, Voll (2022), On the strategic learning of signal associations, Dryad, Dataset, https://doi.org/10.5061/dryad.6m905qg1x

Abstract

Signal detection theory (SDT) has been widely used to identify the optimal response of a receiver to a stimulus when it could be generated by more than one signaller type. While SDT assumes that the receiver adopts the optimal response at the outset, in reality receivers often have to learn how to respond. We therefore recast a simple signal detection problem as a multi-armed bandit (MAB) in which inexperienced receivers chose between accepting a signaller (gaining information and an uncertain payoff) and rejecting it (gaining no information but a certain payoff). An exact solution to this exploration-exploitation dilemma can be identified by solving the relevant dynamic programming equation (DPE). However, to evaluate how the problem is solved in practice, we conducted an experiment. Here humans (n = 135) were repeatedly presented with a four readily discriminable signaller types, some of which were on average profitable, and others unprofitable to accept in the long term. We then compared the performance of SDT, DPE and three candidate exploration-exploitation models (Softmax, Thompson and Greedy) in explaining the observed sequences of acceptance and rejection. All of the models predicted volunteer behaviour well when signallers were clearly profitable or clearly unprofitable to accept. Overall however, the Softmax and Thompson sampling models, which predict the optimal (SDT) response towards signallers with borderline profitability only after extensive learning, explained the responses of volunteers significantly better. By highlighting the relationship between the MAB and SDT models, we encourage others to evaluate how receivers strategically learn about their environments.

Methods

A computer game was created in Microsoft® Visual Basic 6.0 in which human subjects were sequentially presented with a series of signallers of four distinct types and asked to accept or reject each one. Each of these four types of signaller had a particular probability of being desirable, the alternative being that they were undesirable. Throughout the experiment, any desirable signaller that was accepted always gave a benefit of 1 and any undesirable signaller that was accepted always incurred a cost of 1. Rejection of a signaller resulted in no benefit or cost but provided no information. Data collection took place at the University Centre and MacOdrum Library of Carleton University Campus during January-April 2019. Human volunteers (mainly undergraduates) were recruited by invitation as they passed. Consenting participants were shown a Microsoft® PowerPoint presentation outlining the rules of the game but given no information about the purpose of the experiment. Subjects were simply asked to accept/reject signallers in a way that maximized their score. All protocols were approved by Carleton University Research Ethics Board.

The four distinct signaller types were generated by combining a colour (red or blue) and a pattern (cross or circle). We let the colour most strongly associated with a desirable signaller be denoted C+ (in this case red) with the (conditional) probability of its occurrence in desirable signallers being denoted pC+ (> 0.5). Likewise, we let the pattern that most reliably indicated a desirable signaller be denoted P+ (in this case cross), with association probability pP+ (> 0.5). A form of symmetry was assumed in which the probabilities that undesirable signallers have the alternative colour (pC-) and alternative pattern (pP-) were such that pC+ = pC- = pC and pP+ = pP- = pP. To generate virtual signallers with these attributes, signallers were first set to be either desirable (probability r) or undesirable (probability 1-r). Depending on their desirability the signallers were then stochastically allocated a colour C+ or C- (probabilities pC and 1- pC respectively if desirable; 1- pC and pC respectively if undesirable) and pattern P+ or P- (probabilities pP and 1- pP if desirable; 1- pP and pP if undesirable) assuming that colour and pattern were conditionally independent (that is, desirable signallers with C+were no more likely to have P+ than desirable signallers with C-). The combination of parameters r, pC and pP collectively determined the underlying probability of a given type of signaller being desirable and its frequency. Since signallers were stochastically generated, there was inevitable variation not only in the order in which signallers were encountered by a volunteer, but also in their actual (realized) frequency and actual probability of being desirable.

A trial began with the presentation of a single square-shaped computer-generated prey placed at a random position on a white background, 11.2 cm × 11.2 cm. For each presentation, the volunteer could decide whether to accept the signaller item (by clicking on it) or move to the next screen (by clicking on the “Reject” button). Every screen contained a single signaller but the pace at which new signallers were presented was entirely set by the volunteer (the only way to move to a new screen was by pressing the “Next Screen” button). All volunteers were presented with the same total number of signallers however they behaved, so there was no incentive to rush. The trial cumulative score was shown on the top of the screen. To reinforce the change (± 1) in total score (continuously displayed), accepting a signaller generated one of two distinct sounds depending on its desirability (cash register sound for desirable, buzzer sound for undesirable).

 Trials were run for a total of 27 different simulated environments (treatments) with 5 volunteers per treatment (135 different volunteers in total). Each treatment comprised a particular proportion of desirable signallers (or base rate, ρ), with ρ = 0.25, 0.5, and 0.75. For each base rate, the component colour and pattern signals were tested at 3 different reliabilities (pC = 0.6, 0.75, 0.95; pP = 0.6, 0.75, 0.95) in a 3 x 3 x 3 factorial design. Each volunteer was presented with a total of 100 signallers selected stochastically according to the above rules and once all signallers had been presented, the game ended.

Usage Notes

There are no missing values. Our Supplementary Information provides the key code to generate the predictions and fit models.

Funding

NSERC