Optimizing the mnemonic similarity task for efficient, widespread use

Stark, Craig E. L.1 ; Noche, Jessica A.1; Ebersberger, Jarrett R.1; Mayer, Lizabeth1; Stark, Shauna M.1

Published Sep 05, 2023 on Dryad. https://doi.org/10.7280/D18405

Data files

Sep 05, 2023 version files 146.31 KB

Master_Summary_Data.ods
142.07 KB
README.md
4.24 KB

Abstract

Introduction: The Mnemonic Similarity Task (MST) has become a popular test of memory and, in particular, of hippocampal function. It has been heavily used in research settings and is currently included as an alternate outcome measure in a number of clinical trials. However, as it typically requires ~15 min to administer and benefits substantially from an experienced test administrator to ensure the instructions are well-understood, its use in trials and in other settings is somewhat restricted. Several different variants of the MST are in common use that alter the task format (study-test vs. continuous) and the response prompt given to participants (old/similar/new vs. old/new).

Methods: In eight online experiments, we sought to address three main goals: (1) To determine whether a robust version of the task could be created that could be conducted in half the traditional time; (2) To determine whether the test format or response prompt choice significantly impacted the MST’s results; and (3) To determine how robust the MST is to repeat testing. In Experiments 1–7, participants received both the traditional and alternate forms of the MST to determine how well the alternate version captured the traditional task’s performance. In Experiment 8, participants were given the MST four times over approximately 4 weeks.

Results: In Experiments 1–7, we found that the test format had no effect on the reliability of the MST, but that shifting to the two-choice response format significantly reduced its ability to reflect the traditional MST’s score. We also found that the full running time could be cut in half or less without an appreciable reduction in reliability. We confirmed the efficacy of this reduced task in older adults as well. Here, and in Experiment 8, we found that while there often are no effects of repeat-testing, small effects are possible, but appear limited to the initial testing session.

Discussion: The optimized version of the task developed here (oMST) is freely available for web-based experiment delivery and provides an accurate estimate of the same memory ability as the classic MST in less than half the time.

Access this dataset on Dryad

The Mnemonic Similarity Task (MST) has become a popular test of memory and, in particular, of hippocampal function. It has been heavily used in research settings and is currently included as an alternate outcome measure on a number of clinical trials. However, as it typically requires ~15 minutes to administer and benefits substantially from an experienced test administrator to ensure the instructions are well-understood, its use in trials and in other settings is somewhat restricted. Several different variants of the MST are in common use that alter the task format (study-test vs. continuous) and the response prompt given to participants (old / similar / new vs. old / new). Here, we sought to address three main goals: 1) To determine whether a robust version of the task could be created that could be conducted in half the traditional time; 2) To determine whether the test format or response prompt choice significantly impacted the MST’s results; and 3) To determine how robust the MST is to repeat testing. In 7 online experiments in which the same subjects received both the traditional and alternate forms of the MST, we found that test format had no effect on the reliability of the MST, but that shifting to the two-choice response format significantly reduced its ability to reflect the traditional MST’s score. We also found that the full running time could be cut it half or less without appreciable reduction in reliability. We confirmed the efficacy of this reduced task in older adults as well. In these data and an 8th experiment designed to test this directly, we found that while there often are no effects of repeat-testing, small effects are possible, but appear limited to the initial testing session.

Subject-level data from Experiments 1-7 are provided here. Where “n/a” is present, it means the metric cannot be computed. For example, the 3-choice (old / similar / new) experiments can have legitimate hit-rates for repeated targets and both false-alarm and correct rejection rates for novel foils. They cannot have legitimate hit, false-alarm or correct rejection rates for the similar lures as the correct response for these is “similar” rather than “old” or “new”. In the 2-choice versions of the task (old / new), these rates are legitimate and therefore provided. Likewise, the number of “similar” responses (`nsim`) is coded as “n/a” in the 2-choice versions as “similar” isn’t even an option.

Description of the data and file structure

Data are provided as an open-source ODS spreadsheet. Each tab represents data from one experiment and each row represents an individual participant (sid).

Data from the first test are indicated by _t1 columns and data from the second are indicated by _t2 columns.
stimset1: The stimulus set from the MST used (which groups of images)
rec: the object recongition metric (aka corrected recognition\, aka p(“old”|target) - p(“old”|foil).
psep: refers to the pattern separation or lure discrimination metric (typically p(“similar”|lure)-p(“similar”)|foil).
hr: hit rate
crf: correct rejection to foil rate
crl: correct rejection to lure rate
faf: false alarm to foil rate
fal: false alarm to lure rate
nsim: the total number of similar responses made regardless of trial type
task_order: Which task (baseline mst-OSN or variant) came 1st and which 2nd?
base_psep and base_rec: psep and rec pulled out for just the baseline, standard MST task
comp_psep and comp_rec: these scores for the comparision or alternate task being tested
rec_valid: Using the recognition memory tasks’ values, is this a valid participant? (1=yes, 0=no)
rout_outlier: Following robust linear regression of the two lure discrimination metrics against each other, was this participant flagged as an outlier?

Sharing/Access information

Data are also available on the StarkLabUCI Github
The final task created, the oMST is available in its own repo

Overview:

Briefly, each sheet contains data from one of the experiments. Each experiment consisted of giving participants both the classic MST and a variant of the MST (order was counter-balanced).

Published Methods

The MST has been extensively described elsewhere (Stark et al., 2013, 2019). Briefly, when used in the study-test format, pictures of color objects appear on the screen (2.0s, ≥0.5s ISI) initially during an incidental encoding phase in which participants were asked to classify each object as belonging indoors or outdoors. For example, one might judge a picnic basket as an outdoor item and a rubber duck as an indoor item. The traditional full-length version uses 128 study items and is partially self-paced (the image disappears after the 2s duration, but the prompts remain on screen until the participant makes a response with a minimum total trial length of 2.5s). Immediately following the encoding phase, a test phase is given with an equal number of novel foils that are unrelated to any study items, true repetitions of study items, and similar lure items. Lure items have the same name as study items (names are not shown to the subject) but can vary in their similarity to the originally studied items and can be altered along a range of dimensions or be different exemplars (Stark et al., 2013, 2019). The continuous format version is similar, but uses a single phase, separating first and second (or lure) presentations out by typically 4-100 trials. In either format, participants’ memory can be probed with either a three-choice old / similar / new (OSN) response prompt (the ideal responses for repeat, lure, and foil trials respectively) or a two-choice old / new (ON) prompt (here, “new” would apply to both novel foils and similar lures). Both test structures and both choice formats have been used extensively in prior work (see Stark et al., 2019 for review). The MST has six independent sets of 192 image pairs with each pair having a particular degree of “mnemonic similarity”, derived from testing a large number of individuals and assessing the actual false alarm rate across individuals for image pairs and binning them into 5 lure-bin difficulty levels (Lacy et al., 2011).

The primary outcome measure of interest reflects a participant’s ability to discriminate similar lure items as being unique images rather than being a repetition of the studied item. In the OSN tasks, the measure is dubbed the LDI and equates to the probability of responding “similar” to the similar lure items minus the probability of responding “similar” to the novel foil items. This difference score helps to adjust for response biases. A parallel secondary outcome measure, dubbed REC, is a traditional “corrected recognition” score, equating to the probability of responding “old” to repeated items minus the probability of responding “old” to novel items. In the ON tasks, a signal detection theory framework is adopted (Stanislaw and Todorov, 1999). Here we create two different d’ measures to index discriminability, paralleling prior work (Stark et al., 2015). Paralleling the LDI, we compute d’(TL) to reflect how well participants can discriminate a true repetition from a similar lure (“old” responses to repetitions form the hit rate and “old” responses to lures form the false alarm rate). Paralleling the REC, we compute the d’(TF) to reflect how well participants can discriminate a true repetition from a novel foil (the same hit rate is used, but “old” responses to novel foils become the false alarm rate).

The traditional version of the MST has been freely available in several formats (stand-alone and online) on GitHub (https://github.com/celstark/MST). In prior work (Stark et al., 2021), we created an online version of the MST using the open-source jsPsych library for web-based deployment (de Leeuw, 2015) and the open-source JATOS package (Lange et al., 2015) to provide a reliable means of securely administering test sessions on the web and managing the data. We utilized the same structure here. All code is available at https://github.com/StarkLabUCI/oMST-Data.

Experiments 1-7

In each of Experiments 1-7, all participants received a full-length traditional MST (Study-test, OSN prompt, 128 study trials, 192 test trials, referred to as the “baseline MST”) and a modified version of the MST back-to-back in one session. Which test appeared first was counterbalanced across participants. For lure items, we used Set 1 and Set 2 from the MST, counterbalancing which was assigned to each test variant. Our primary outcome measure in these experiments was the correlation between the baseline MST’s LDI and the analogous measure in the modified version of the MST from the same subject. We used linear regression with automatic outlier detection (Prism 9.4.1, ROUT Q=1%) to help address non-compliance with testing.

Prior to each phase came a set of instructions. Experiments 1-5 used the same video-based instructions used in many previous studies and made available on our lab website and on GitHub (https://github.com/celstark/MST). Experiments 6-7 shifted to a short series of guided practice trials in which participants are first told what to press for several stimuli before trying the task on their own for several more. Specifically, the study-test variant had 4 practice study trials and 6 practice test trials (3 guided and 3 unguided), and the continuous version had 9 practice trials (5 guided and 4 unguided). Correct answers are forced on all trials and differences between studied items and lures are shown on all lure trials to clarify how participants should treat similar lure items.

Experiments 1-6 had participants recruited from UCI’s Human Subjects Lab Pool consisting of undergraduate students who participated for course credit. Participants were anonymous and were not screened. Experiment 7’s participants were older adults (mean age = 74 ± 8.3 years), recruited from several sources: an existing lab database, UCI’s Alzheimer’s Disease Research Center, and UCI’s Consent to Contact database. Cognitively normal, English-fluent older adults without prior history of neurological disorders or injury were included in the study.

Prior work in our lab with repeat testing on the standard MST has shown correlation coefficients ranging from 0.48-0.8 when comparing a full-length MST to variants shortened to 25-50% of the original length (Stark et al., 2015). Resolving a 0.48 correlation (α=0.05, β=0.2) requires a sample size of 32, which formed the minimum sample size used in each experiment. Participants were recruited in waves, however, and being done wholly online, data loss from poor engagement was anticipated. We recruited until, following analysis of a batch, at least 32 valid samples were present (see below), leading to sufficient, albeit unequal numbers of participants in each experiment.

Experiment 1 used two baseline MST tasks. Experiment 2’s variant was a continuous task with 256 trials using the OSN response prompt. 128 of the trials were first presentations, 64 were similar lures, and 64 were repetitions. For both repetitions and lures, the gaps between first and subsequent presentations included 32 trials with gaps of 4-11 and 32 trials with gaps of 20-99. Experiment 3’s variant was a full-length study-test, but shifting to the ON prompt. Experiment 4 combined both of these for a continuous ON test. Experiment 5a-b used the OSN prompt to test shortened versions of study-test (20 repetitions, 44 lures, and 20 novel foils) and continuous (64 1^st presentations, 20 repetitions, and 44 lures) tasks. Experiment 6a-b replicated 5a-b but shifted away from our traditional video-based instructions to a guided practice task. Finally, Experiment 7 replicated 6b in healthy older adults.

We used the MST’s measure of traditional object recognition to filter participants who were not actively engaged in the study. In the OSN experiments, a minimum REC score of 0.5 was required. As this is a difference score (probability of responding “old” to repetitions minus the probability of responding “old” to novel foils), chance would be 0, but even older adults with Mild Cognitive Impairment typically score ~0.6 on this measure (Stark et al., 2013) making 0.5 a reasonable threshold for young, healthy adults. In the ON, the analogous metric is a d’(TF) score with a threshold of 1.5. Note, neither of these measures are our primary outcome measure.

Optimizing the mnemonic similarity task for efficient, widespread use

Data files

Abstract

README: Optimizing the mnemonic similarity task for efficient, widespread use

Description of the data and file structure

Sharing/Access information

Methods

Works referencing this dataset