Data from: Constructing concepts without feedback: An empirical investigation of how relational information affects multidimensional concept completion behavior in an unsupervised task
Data files
Aug 02, 2025 version files 3.29 MB
-
GRIT_NPE_ModelPredictions.xlsx
13.50 KB
-
Main_Experiment_XOR_C3D_Data.xlsx
1.86 MB
-
Pilot_Experiment_1D_XOR_C3D_Data.xlsx
1.40 MB
-
README.md
14.05 KB
Abstract
The ability of humans to intentionally learn, without feedback, unidimensional stimulus relations in categorization tasks has been empirically established over the past two decades. However, whether observers can learn more complex multidimensional stimulus relations across these unsupervised tasks has not yet been determined. We demonstrate across an unsupervised concept completion experiment that the failure to observe multidimensional learning in previous experiments may be attributable to factors such as increased stimulus or task complexity. We posit that concept completion is related to category learning in that it reveals the underlying tendencies that are associated with some categories being easier to learn than others. In our experiments, we found observers readily learned to complete a two-dimensional exclusive-or concept, evidenced by an increase in object selection as the task progressed with a decrease in choice response times. We also found that observers readily learned to complete, as evidenced by similar patterns in object selection and response time behavior, a more complex three-dimensional stimulus relation that has empirically been associated with large amounts of categorization errors in related supervised classification tasks. Accordingly, we tested two existing formal models to determine their ability to account for our observations: namely, the Simplicity Model and the Generalized Representational Information Theory (GRIT) basic measure. We show how relational information processing as expounded in GRIT accounts for the observed completion behavior. Overall, our findings show how people gravitate, in a gradual and composite fashion, towards minimizing the perceived complexity of categories as much as possible.
Dataset DOI: 10.5061/dryad.ht76hdrtk
Description of the data and file structure
This dataset contains the raw and processed data for the pilot experiment and the main experiment reported in Doan and Vigo (In Press, PLOS One), for assessing whether individuals display an increase in multidimensional concept completion behavior across trials of the concept completion task used by the authors. Data cover six repeated-measures conditions of concept completions for the pilot experiment (1D-XOR-C3D, 1D-C3D-XOR, XOR-1D-C3D, XOR-C3D-1D, C3D-1D-XOR, C3D-XOR-1D) and two between-subjects conditions of concept completions for the main experiment (XOR-XOR, C3D-C3D). For each experiment, we include data associated with the series of objects seen by participants (partial 1D, XOR, or C3D concept) and the set of five objects from which they selected to complete the partial concept. For each trial of each experiment, we include data on which object from the set of five objects was selected by each participant and the accompanying response time to make the selection. Statistical analyses on the processed pilot data revealed statistical increases in XOR and C3D concept completions and statistical decreases in response times to make these decisions for participants' first task in the pilot experiment. The main experiment sought to replicate and extend these results with substantially more statistical power. Regression analyses on the processed data for the main experiment revealed statistically-significant increases in XOR and C3D concept completions and statistically-significant decreases in response times to make these decisions. The results indicated a logarithmic increase in XOR completion behavior from task 1 to task 2 and a linear increase in C3D completion behavior from task 1 to task 2. The results indicated logarithmic statistically-significant decreases in response times to make completion decisions across tasks 1 and 2 for both the XOR and C3D concepts.
Files and variables
File: GRIT_NPE_ModelPredictions.xlsx
Description: This file contains formal model predictions from Generalized Representational Information Theory for the three concept completion conditions (1D, XOR, C3D). Formulas associated with calculating the model predictions provided in column S are provided in cells M and O-S.
Variables
- Fam = Describes the dimensional makeup of the categorical stimulus along with the number of positive examples of the category (3[3] = 3 dimensions and 3 positive examples).
- Type = Logical configuration of the positive examples (binary dimensions = Boolean). Type 1 = 1D partial, Type 2 = XOR partial, Type 3 = C3D partial.
- Objects = Binary notation of the object stimuli belonging to the categorical stimulus (3 object stimuli/positive examples) or that were the resulting categorical stimulus after one of the other five object stimuli were added (completion decision; 4 object stimuli).
- Logical Manifold = Degree of dimensional diagnosticity for each of the stimulus dimensions (each is a structural kernel) as determined via the invariance-detection process as described in Generalized Invariance Structure Theory (GIST; Vigo, 2013, 2015).
- D = Number of stimulus dimensions composing the object stimuli.
- P = Number of object stimuli as positive examples of the categorical stimulus.
- D1-D3 = Numerator of the each logical manifold structural kernel.
- Alpha1-Alpha3 = Sensitivity parameters (not estimated or used for the current study).
- k = Scaling parameter reflecting degree of discriminability between different categorical stimuli for observers (nonparametric value = 2/D; not estimated for the current study).
- Disc. Param = Metric selection. The value is 1 if City-block metric (Integral dimension stimuli; Vigo, Doan, & Zhao, 2022) and the value is 2 if Euclidean metric (Separable dimension stimuli; Vigo, Doan, & Zhao, 2022).
- Phi = Degree of categorical invariance of the categorical stimulus as calculated using GIST (Vigo, 2013, 2015).
- Phi^2 = The squared value of the degree of categorical invariance of the categorical stimulus as calculated using GIST (Vigo, 2013, 2015).
- Lambda = Degree of structural equilibrium of the categorical stimulus as calculated using GIST (Vigo, 2013, 2015).
- GISTM-NPE (Lambda) = The non-parametric Generalized Invariance Structure Theory Model using the Lambda variant for structural equilibrium.
- GRIT-NPE (Lambda) = The non-parametric Generalized Representational Information Theory Model using the Lambda variant for structural equilibrium. This model is a rate of change measure in conceptual complexity. For the current task, the model predicts observers will select the object to add to the 3-object category (partial concept) that minimizes the complexity of the resulting category (the bolded values; see Doan & Vigo, 2016, 2023, in press for more information).
File: Main_Experiment_XOR_C3D_Data.xlsx
Description: This file contains the raw and processed concept completion data (selections, response times) for the main experiment reported in Doan and Vigo (In Press, PLOS One). There are two sheets in the Excel file that represent the raw data: MainExpt_XOR (raw data), MainExpt_C3D (raw data). The cells that are highlighted red in these two sheets represent participant(s) who were excluded from the main analyses in the paper due to not selecting the XOR or C3D relation most often. The cells that are highlighted yellow in these two sheets represent participant(s) who did not select the XOR or C3D relation most often for at least one of the two tasks, but who did overall have more XOR or C3D selections compared to alternative possibilities. The bolded values in these sheets represent the relation that was selected most frequently for each participant and task. There are three sheets in the Excel file that represent the processed data: Data Set_Prop (Maj), Data Set_RT_2SD (Maj), Data Set_RT (Maj). The bolded red values in the rightmost sheet (Data Set_RT (Maj)) represent averages response times that were more than two standard deviations above the mean for that block of trials and were thus excluded from the main analyses (Outliers; see sheet Set_RT_2SD (Maj)). In cells located in rows 52-54, we provide the Excel formulas for calculating the mean RT, the standard deviation of the RT, and the two standard deviation upper limit used for determining the bolded red values (Outliers).
Variables
- PN = Anonymized Participant Number
- Task = First or second task for the participant (1, 2)
- Stimuli = Type of stimulus shown for that Task number for that participant (Clock, Tshirt)
- TrialNumber = As labeled (one series of 3 clocks + modification decision of 5 clocks thereafter)
- Type = Boolean structure type (2 = 3[3] - II = XOR; 3 = 3[3] - III = C-3D)
- Instance = The specific instance of the Boolean structure type (3 objects presented in the series)
- Inst_align_t# = The particular object belonging to the Boolean structure type instance (see above; limited to 3 columns). "Align" refers to fact that entire column of objects occupy the same structural role.
- Inst_align_b# = The particular object possible to add to the Boolean structure type instance (limited to 5 columns for the 5 objects possible to add on each trial). "Align" refers to fact that entire column of objects occupy the same structural role.
- Inst_shuff_t# = A randomized order of the three objects belonging to the Boolean structure type instance (e.g., compare "Inst_align_t#" to these columns).
- Inst_shuff_b# = The randomized order of the five objects possible to add to the Boolean structure type instance (e.g., compare "Inst_align_b#" to these columns). This was the order shown left to right to participants.
- Inst_shuff_t#_o# = The randomized order of the three objects belonging to the Boolean structure type instance (e.g., compare "Inst_align_t#" to these columns). Shown to participants in this order (left to right on screen).
- StartTime = The internal timestamp clocked by the computer at the start of the Trial.
- RTime = The amount of time passed from onset of Trial (StartTime) and when the mouse clicked on one of the 5 objects enclosed in the white box (The program only advanced if mouse clicked on one of the five objects).
- Choice = The object selected by the participant out of the 5 objects enclosed in the white box (e.g., compare this object to the objects in the "Inst_align_b#" columns).
- xCoord = The horizontal pixel location of the mouse click on the computer screen.
- yCoord = The vertical pixel location of the mouse click on the computer screen.
- Choice_Obj = Excel function determining whether the chosen object (Choice) refers to the aligned bottom object 1, 2, 3, 4, or 5 (see "Inst_align_b#).
- PropCorr_#_XOR = Proportion of completion decisions consistent with selecting the XOR object from the set of 5 for that block of 4 trials (PropCorr_4_XOR = trials 1-4, PropCorr_8_XOR = trials 5-8, etc.).
- PropCorr_#_C3D = Proportion of completion decisions consistent with selecting the C-3D object from the set of 5 for that block of 4 trials (PropCorr_4_C-3D = trials 1-4, PropCorr_8_C-3D = trials 5-8, etc.).
- RT_#_XOR = Average response time for making a decision from the set of 5 for that block of 4 trials (RT_4_XOR = trials 1-4, RT_8_XOR = trials 5-8, etc.).
- RT_#_C3D = Average response time for making a decision from the set of 5 for that block of 4 trials (RT_4_C3D = trials 1-4, RT_8_C3D = trials 5-8, etc.).
File: Pilot_Experiment_1D_XOR_C3D_Data.xlsx
Description: This file contains the raw and processed concept completion data (selections, response times) for the pilot experiment reported in Doan and Vigo (In Press, PLOS One). There are three sheets in the Excel file that contain the raw data: Type_3_3_1 (raw_data), Type_3_3_2 (raw_data), Type_3_3_3 (raw_data). The bolded values in these sheets represent the relation that was selected most frequently for each participant and task. There are two sheets in the Excel file that contain the processed data in full: Full Data Set_Prop, Full Data Set_RT. There are two sheets in the Excel file that contain the processed data after removing participants who did not predominantly engage in 1D, XOR, or C3D concept completion behavior: Data Set_1D_XOR_C3D_Prop, Data Set_1D_XOR_C3D_RT.
Variables
- PN = Anonymized Participant Number
- TrialNumber = As labeled (one series of 3 clocks + modification decision of 5 clocks thereafter)
- Type = Boolean structure type (1 = 3[3] = 1D; 2 = 3[3] - II = XOR; 3 = 3[3] - III = C-3D)
- Instance = The specific instance of the Boolean structure type (3 objects presented in the series)
- Inst_align_t# = The particular object belonging to the Boolean structure type instance (see above; limited to 3 columns). "Align" refers to fact that entire column of objects occupy the same structural role.
- Inst_align_b# = The particular object possible to add to the Boolean structure type instance (limited to 5 columns for the 5 objects possible to add on each trial). "Align" refers to fact that entire column of objects occupy the same structural role.
- Inst_shuff_t# = A randomized order of the three objects belonging to the Boolean structure type instance (e.g., compare "Inst_align_t#" to these columns).
- Inst_shuff_b# = The randomized order of the five objects possible to add to the Boolean structure type instance (e.g., compare "Inst_align_b#" to these columns). This was the order shown left to right to participants.
- Inst_shuff_t#_o# = The randomized order of the three objects belonging to the Boolean structure type instance (e.g., compare "Inst_align_t#" to these columns). Shown to participants in this order (left to right on screen).
- StartTime = The internal timestamp clocked by the computer at the start of the Trial.
- RTime = The amount of time passed from onset of Trial (StartTime) and when the mouse clicked on one of the 5 objects enclosed in the white box (The program only advanced if mouse clicked on one of the five objects).
- Choice = The object selected by the participant out of the 5 objects enclosed in the white box (e.g., compare this object to the objects in the "Inst_align_b#" columns).
- xCoord = The horizontal pixel location of the mouse click on the computer screen.
- yCoord = The vertical pixel location of the mouse click on the computer screen.
- Choice_Obj = Excel function determining whether the chosen object (Choice) refers to the aligned bottom object 1, 2, 3, 4, or 5 (see "Inst_align_b#).
- Order = The value for the between-subjects variable of presentation order. (1 = 1D, XOR, C3D; 2 = 1D, C3D, XOR; 3 = XOR, 1D, C3D; 4 = XOR, C3D, 1D; 5 = C3D, 1D, XOR; 6 = C3D, XOR, 1D).
- 1st_Task = The specific strategy associated with the first task for that participant (1D = Order 1 or 2; XOR = Order 3 or 4; C3D = Order 5 or 6).
- 1D_Block# = Proportion of completion decisions consistent with 1D sorting strategy across the four trials in this block.
- XOR_Block# = Proportion of completion decisions consistent with XOR sorting strategy across the four trials in this block.
- C3D_Block# = Proportion of completion decisions consistent with C3D sorting strategy across the four trials in this block.
Code/software
JASP version 0.19.3 was used for statistical analyses and can be used to view the data.
Human subjects data
Participants provided written and verbal informed consent. Each participant was assigned a unique number, which was neither attached to nor stored with their name. There is no existing document that links the number with any participant's name, making it impossible to determine whose data belongs to any participant.