Skip to main content
Dryad

On the strategic learning of signal associations

Data files

Mar 25, 2022 version files 1.32 MB

Abstract

Signal detection theory (SDT) has been widely used to identify the optimal response of a receiver to a stimulus when it could be generated by more than one signaller type. While SDT assumes that the receiver adopts the optimal response at the outset, in reality receivers often have to learn how to respond. We therefore recast a simple signal detection problem as a multi-armed bandit (MAB) in which inexperienced receivers chose between accepting a signaller (gaining information and an uncertain payoff) and rejecting it (gaining no information but a certain payoff). An exact solution to this exploration-exploitation dilemma can be identified by solving the relevant dynamic programming equation (DPE). However, to evaluate how the problem is solved in practice, we conducted an experiment. Here humans (n = 135) were repeatedly presented with a four readily discriminable signaller types, some of which were on average profitable, and others unprofitable to accept in the long term. We then compared the performance of SDT, DPE and three candidate exploration-exploitation models (Softmax, Thompson and Greedy) in explaining the observed sequences of acceptance and rejection. All of the models predicted volunteer behaviour well when signallers were clearly profitable or clearly unprofitable to accept. Overall however, the Softmax and Thompson sampling models, which predict the optimal (SDT) response towards signallers with borderline profitability only after extensive learning, explained the responses of volunteers significantly better. By highlighting the relationship between the MAB and SDT models, we encourage others to evaluate how receivers strategically learn about their environments.