Experimental data analyzed in: Signal detection models as contextual bandits
Data files
May 29, 2023 version files 272.24 KB
-
finalversionALL36.xlsx
226.41 KB
-
FittingofModelsToHumanData.R
40.41 KB
-
README.md
5.42 KB
Abstract
Signal detection theory (SDT) has been widely applied to identify the optimal discriminative decisions of receivers under uncertainty. However, the approach assumes that decision-makers immediately adopt the appropriate acceptance threshold, even though the optimal response must often be learned. Here we recast the classical normal-normal (and power-law) signal detection model as a contextual multi-armed bandit (CMAB). Thus, rather than starting with complete information, decision-makers must infer how the magnitude of a continuous cue is related to the probability that a signaller is desirable, while simultaneously seeking to exploit the information they acquire. We explain how various CMAB heuristics resolve the trade-off between better estimating the underlying relationship and exploiting it. Next, we determined how naïve human volunteers resolve signal detection problems with a continuous cue. As anticipated, a model of choice (accept/reject) that assumed volunteers immediately adopted the SDT-predicted acceptance threshold did not predict volunteer behaviour well. The Softmax rule for solving CMABs, with choices based on a logistic function of the expected payoffs, best explained the decisions of our volunteers but a simple midpoint algorithm also predicted decisions well under some conditions. CMABs offer principled parametric solutions to solving many classical SDT problems when decision-makers start with incomplete information.
Methods
The experiment was administered through a web application programmed in R (version 3.6.2) using the RShiny package. Our volunteers (36 in total) were drawn primarily from Biology undergraduate and graduate programs at Carleton University, Ottawa with recruitment by email. Participants accessed the web application via a URL link in the email invitation and were only engaged once. No details of the experimental aims were given at that time, and no information was given concerning a potential relationship between a signaller’s appearance and its true nature.
Our volunteers were presented with a series of computer-generated signallers (solid-coloured circles) over a sequence of trials. In any given trial, the signaller was either desirable (“good”) or undesirable (“bad”) and signaller type could be probabilistically inferred from their appearance (their greyness, see below). Participants were told the benefits of a correct acceptance of a desirable signaller (1), the cost of an incorrect acceptance of an undesirable signaller (-1), and the number of trials they were to complete (50). They were also made aware they would not gain or lose any points for rejecting a signaller. With this limited information, participants were asked to accept or reject signallers with the aim of maximizing their total points by the end of the 50 trials. There was no time limit. Volunteers that accepted a signaller found out whether it was desirable or undesirable from their change in payoff and the above feedback. However, volunteers that rejected a signaller received no information, creating an exploration-exploitation dilemma.
Signallers were solid-coloured circles that varied only in their shade of grey. Setting R, G, and B values identical (= C) produced a shade of grey between black (C = 0) and white (C = 255). When generating signallers, their greyness (i.e. value of C) was drawn randomly from one of two normal distributions with population means dependent on the nature of the signaller (µgood and µbad). As with the classical normal-normal SDT model, the variance in C was the same for the two types of signaller. There were four treatment groups in a factorial design with 2 levels of discriminability (based on differences in the population mean greyness of the two signaller types) and 2 levels of base rate. In high discriminability treatments, µgood and µbad were 90 and 165, respectively (with a common standard deviation of 25, this represents a difference of 3 standard deviation units). In low discriminability treatments, µgood and µbad were 115 and 140, respectively (a difference of 1 standard deviation unit). In all cases, the vast majority of sample values of C fell between 0 and 255, although any draw of C outside this range was truncated. The underlying probability of a signaller being good in any given trial (base rate) was either high (0.7) or low (0.3). Subjects were invited to play one of the four treatments according to their birth month (Jan.-Mar.; Apr.-Jun.; etc.). Consequently, our sample sizes for the different versions were similar but not identical (n = 8 or 10). For each trial, we recorded whether the signaller was good or bad, the C value of the signaller, the choice made by the participant (accept or reject), and the participant’s cumulative points based on its acceptances of good and bad signallers.
Usage notes
We used Stan (https://mc-stan.org/) to fit and compare multi-level models of human choices. Stan was accessed in R via RStan and the models were coded using the ulam function in the rethinking package. All posterior distributions were estimated using Markov Chain Monte Carlo (MCMC) sampling for 4000 iterations in four separate chains. To facilitate model fitting, the RGB values (C) of all signallers were rescaled by dividing C by 255, ensuring a value of perceived appearance (x) between 0 and 1. We provide a full listing of R code as part of this submission.