Data for: Inter-rater reliability of risk of bias tools for non-randomized studies

Name: Data for: Inter-rater reliability of risk of bias tools for non-randomized studies
Creator: Isabel Kalaycioglu

Kalaycioglu, Isabel, University of Montreal, https://orcid.org/0000-0002-7116-151X

Isabel.kalaycioglu@umontreal.ca

Published Apr 24, 2024 on Dryad. https://doi.org/10.5061/dryad.5qfttdz8p

Cite this dataset

Kalaycioglu, Isabel (2024). Data for: Inter-rater reliability of risk of bias tools for non-randomized studies [Dataset]. Dryad. https://doi.org/10.5061/dryad.5qfttdz8p

Abstract

PURPOSE: Currently, there is limited knowledge about the reliability of risk of bias (ROB) tools for assessing internal validity in systematic reviews of exposure and frequency studies. We aimed to identify and then compare the inter-rater reliability (IRR) of six commonly used tools for frequency (Loney scale, Gyorkos checklist, American Academy of Neurology [AAN] tool) and exposure (Newcastle-Ottawa scale, SIGN50 checklist, AAN tool) studies.

METHODS: Six raters independently assessed the ROB of 30 frequency and 30 exposure studies using the 3 respective ROB tools. Articles were rated on a 3-level summary measure of ROB (low, intermediate, or high). We calculated an intraclass correlation coefficient (ICC) for each tool and category of ROB tool. We compared the IRR between ROB tools and tool type by inspection of overlapping ICC 95% CIs and by comparing their coefficients after transformation to Fisher Z values. We assessed criterion validity of the AAN ROB tools by calculating an ICC for each rater in comparison with the original ratings from the AAN.

RESULTS: All individual ROB tools had an IRR in the substantial range or higher (ICC point estimate = 0.61-0.80). The IRR was almost perfect (ICC point estimate > 0.80) for the AAN frequency tool and the SIGN50 checklist. All tools were comparable in IRR, except for the AAN frequency tool which had a significantly higher ICC than the Gyorkos checklist (p=0.021) and trended towards a higher ICC when compared to the Loney scale (p=0.085). When examined by category of ROB tool, scales and checklists had a substantial IRR, whereas the AAN tools had an almost perfect IRR. For the criterion validity of the AAN ROB tools, the average agreement between our raters and the original AAN ratings was moderate.

CONCLUSION: All tools had substantial IRR except for the AAN frequency tool and the SIGN50 checklist, which both had an almost perfect IRR. The AAN ROB tools were the only category of ROB tool to demonstrate an almost perfect IRR. This category of ROB tool had fewer and more simple criteria. Overall, parsimonious tools with clear instructions, such as those from the AAN, may provide more reliable ROB assessments.

README: GENERAL INFORMATION

1. Title of Dataset: Inter-rater Reliability of Risk of Bias Tools for Non-Randomized Studies

2. Author Information

a. Principal Investigator Contact Information

Name: Mark Keezer

Institution: Université de Montréal

Address : Centre Hospitalier de l'Université de Montréal, Pavillon R R04-700 1000 Saint-Denis St., Montreal, QC, Canada, H2X 0C1

Email: mark.keezer@umontreal.ca

b. Associate or Co-investigator Contact Information

Name: Isabel Kalaycioglu

Institution: Université de Montréal

Email: Isabel.kalaycioglu@umontreal.ca

3. Date of data collection (single date, range, approximate date): 2020-2021

4. Geographic location of data collection: Montréal, Québec, Canada

SHARING/ACCESS INFORMATION

1. Links to publications that cite or use the data:

Kalaycioglu, I., Rioux, B., Briard, J.N. et al. Inter-rater reliability of risk of bias tools for non-randomized studies. Syst Rev 12, 227 (2023). https://doi.org/10.1186/s13643-023-02389-w

2. Links to other publicly accessible locations of the data: None

3. Links/relationships to ancillary data sets: None

4. Was data derived from another source? No

5. Recommended citation for this dataset :

Kalaycioglu, I., Rioux, B., Briard, J.N. *et al. *(2023). Data from: Inter-rater reliability of risk of bias tools for non-randomized studies. Dryad Digital Repository. https://doi.org/10.5061/dryad.5qfttdz8p

Data & File Overview

1. File List:

A) AAN__Criteria_for_Rating_Population_Screening_Studies.csv

B) AAN__Criteria_for_Rating_Prognostic_Studies.csv

C) Gyorkos_Cohort_Studies.csv

D) Gyorkos_Cross-Sectional_Studies.csv

E) Loney__Prevalence_or_Incidence_ROB.csv

F) NEWCASTLE_-_OTTAWA_QUALITY_ASSESSMENT_SCALE_COHORT_STUDIES.csv

G) NEWCASTLE-OTTAWA_QUALITY_ASSESSMENT_SCALE_CASE_CONTROL_STUDIES.csv

H) SIGN_Methodology_Checklist_3_Cohort_Studies.csv

I) SIGN_Methodology_checklist_4__Case_control_studies.csv

2. Relationship tween files, if important: None

3. Additional related data collected that was not included in the current data package: None

4. Are there multiple versions of the dataset? No

#######################################################################

DESCRIPTION OF THE DATA AND FILE STRUCTURE:

Each dataset was collected through Microsoft Forms. Each Microsoft Form corresponded to one of the nine risk of bias tools investigated in the study. These Microsoft Forms were used as a way to collect responses from each rater as they were evaluating articles while using a specific risk of bias tool. The datasets collected information to identify the rater, to outline the time it took for the rater to reply to all the questions and to record the raters final rating of each article depending on the risk of bias tool used, along with any other additional questions specific to the risk of bias tool which was used to complete the rating for the article.

DATA-SPECIFIC INFORMATION FOR: AAN__Criteria_for_Rating_Population_Screening_Studies.csv

Description:

1. Number of variables: 7

2. Number of cases/rows: 192

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Joel Neves Briard was evaluating article 41.

g. Column G: This column represents the final rating that the rater gave the article that was being evaluated. I.e in row 4, Joel Neves Briard was evaluating article 41 and assigned a low risk of bias class for this article. Please note, the question asked in column G also had further prompts included on the Microsoft form to help the rater classify the article into a low, intermediate or high risk of bias class.

4. Missing data codes: E2, E3. Names are missing in both these cells.

5. Specialized formats or other abbreviations used: None

AAN__Criteria_for_Rating_Prognostic_Studies.csv

Description:

1. Number of variables: 7

2. Number of cases/rows: 184

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Toma was evaluating article 12.

g. Column G: This column represents the final rating that the rater gave the article that was being evaluated. I.e in row 4, Lahoud Toma was evaluating article 12 and assigned a low risk of bias class for this article. Please note, the question asked in column G also had further prompts included on the Microsoft form to help the rater classify the article into a low, intermediate or high risk of bias class.

4. Missing data codes: E2. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None

Gyorkos_Cohort_Studies.csv

Description:

1. Number of variables: 13

2. Number of cases/rows: 92

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Joel Neves Briard was evaluating article 42.

g. Column G: This column contains each rater’s answer to the question, “[Did the article have a] proper assembly of [their] cohort?”. This is the first question of the Gyorkos Cohort Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Joel Neves Briard was evaluating article 42 and he said that this article did have a proper assembly of their cohort.

h. Column H: This column contains each rater’s answer to the question, “[Did the article] control for confounders?”. This is the second question of the Gyorkos Cohort Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Joel Neves Briard was evaluating article 42 and he said that this article had a partial/incomplete control for confounders.

i. Column I: This column contains each rater’s answer to the question, “[Did the article have] soundness and completeness in measurement of intervention/exposure?”. This is the third question of the Gyorkos Cohort Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Joel Neves Briard was evaluating article 42 and he said that this article had soundness and completeness in measurement of intervention/exposure.

j. Column J: This column contains each rater’s answer to the question, “[Did the article have] soundness of outcome assessment?”. This is the fourth question of the Gyorkos Cohort Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Joel Neves Briard was evaluating article 42 and he said that this article had soundness of outcome assessment.

k. Column K: This column contains each rater’s answer to the question, “[Did the article have] blinding of observers?”. This is the fifth question of the Gyorkos Cohort Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Joel Neves Briard was evaluating article 42 and he said that he couldn’t tell if this article had blinding of observers.

l. Column L: This column contains each rater’s answer to the question, “[Did the article have] completeness of follow-up?”. This is the sixth and final question of the Gyorkos Cohort Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Joel Neves Briard was evaluating article 42 and he said this article did have completeness of follow-up.

m. Column M: This column represents the final class of bias assigned by each rater to the article in question. I.e in row 4, Joel Neves Briard was evaluating article 42 and he said this article had a low risk of bias.

4. Missing data codes: E2. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None

Gyorkos_Cross-Sectional_Studies.csv

Description:

1. Number of variables: 11

2. Number of cases/rows: 97

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Bastien was evaluating article 54.

g. Column G: This column contains each rater’s answer to the question, “[Did the article have a] proper selection of study population?”. This is the first question of the Gyorkos Cross-Sectional Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Bastien was evaluating article 54 and he said this article did not have a proper selection of study population.

h. Column H: This column contains each rater’s answer to the question, “[Did the article] control for confounders?”. This is the second question of the Gyorkos Cross-Sectional Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Bastien was evaluating article 54 and he said that this article did control for confounders.

i. Column I: This column contains each rater’s answer to the question, “[Did the article have] soundness and completeness in measurement of intervention/exposure?”. This is the third question of the Gyorkos Cross-Sectional Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Bastien was evaluating article 54 and he said that this article had partial/incomplete soundness and completeness in measurement of intervention/exposure.

j. Column J: This column contains each rater’s answer to the question, “[Did the article have] soundness of outcome assessment?”. This is the fourth question of the Gyorkos Cross-Sectional Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Bastien was evaluating article 54 and he said that this article had partial/incomplete soundness of outcome assessment.

k. Column K: This column represents the final class of bias assigned by each rater to the article in question. I.e in row 4, Bastien was evaluating article 54 and he said that this article had high risk of bias.

4. Missing data codes: E2. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None
Loney__Prevalence_or_Incidence_ROB.csv

Description:

1. Number of variables: 15

2. Number of cases/rows: 198

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 6, Joel Neves Briard was evaluating article 41.

g. Column G: This column contains each rater’s answer to the question, “Are the study design and sampling method appropriate for the research question?”. This is the first question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and he said this article did have an appropriate study design and sampling method for the research question.

h. Column H: This column contains each rater’s answer to the question, “Is the sampling frame appropriate?”. This is the second question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and he said this article did have an appropriate sampling frame.

i. Column I: This column contains each rater’s answer to the question, “Is the sample size adequate?”. This is the third question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and he said this article did have an adequate sample size.

j. Column J: This column contains each rater’s answer to the question, “Are objective, suitable and standard criteria used for measurement of the health outcome?”. This is the fourth question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and he said this article did not have objective, suitable and standard criteria to measure the health outcome.

k. Column K: This column contains each rater’s answer to the question, “Is the health outcome measured in an unbiased fashion?”. This is the fifth question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and recorded that he can’t say if this article measured the health outcome in an unbiased fashion.

l. Column L: This column contains each rater’s answer to the questions, “Is the response rate adequate? Are the refusers described?”. This is the sixth question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and recorded that this article did have an adequate response ate and the refusers are described.

m. Column M: This column contains each rater’s answer to the question, “Are the estimates of prevalence or incidence given with confidence intervals and in detail by subgroup, if appropriate?”. This is the seventh question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and recorded that this article did not have estimates of prevalence or incidence given with confidence intervals.

n. Column N: This column contains each rater’s answer to the question, “Are the study subjects and the setting described in detail and similar to those of interest to you?”. This is the eight question of the Loney Scale. The rater could answer with either Yes, No, or Can’t Say. I.e in row 6, Joel Neves Briard was evaluating article 41 and recorded he cannot say if this article had study subjects and setting described in detail.

o. Column O: This column represents the final class of bias assigned by each rater to the article in question. I.e in row 6, Joel Neves Briard was evaluating article 41 and recorded that this article had an intermediate risk of bias.

4. Missing data codes: E2, E3, E4, E5.. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None

NEWCASTLE_-_OTTAWA_QUALITY_ASSESSMENT_SCALE_COHORT_STUDIES.csv

Description:

1. Number of variables: 18

2. Number of cases/rows: 127

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 12.

g. Column G: This column contains each rater’s answer when considering the representativeness of the exposed cohort in the article. This is the first criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

truly representative of the average _______________ (describe) in the community*

somewhat representative of the average ______________ in the community*

selected group of users eg nurses, volunteers

no description of the derivation of the cohort

The rater would then have to fill in the blank in the next question by describing the population sample the article was aiming to study. The asterisk at the end of the first two questions indicate that if the rater selects either of those answers, the article collects a point. The points later on determine the article’s rating of risk of bias.

h. Column H: This column contains each rater’s free text answer in describing the population sample studied in each article.

i. Column I: This column contains each rater’s answer in describing how the article selected the non exposed cohort. This is the second criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

drawn from the same community as the exposed cohort*

drawn from a different source

no description of the derivation of the non exposed cohort

j. Column J: This column contains each rater’s answer in describing how the exposure was ascertained in the article. This is the third criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

secure record (eg surgical records)*

structured interview*

written self report

no description

k. Column K: This column contains each rater’s assessment of if the article demonstrated that the outcome of interest was not present at the start of their study. This is the fourth criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

yes*

l. Column L: This column contains each rater’s assessment of the article’s comparability of the included cohorts on the basis of design or analysis. This is the fifth criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

study controls for __________ (select the most important factor)*

study controls for any additional factor* (this criteria could be modified to indicate specific control for a second important factor)

not applicable

If the rater selected the first or second the answer, the rater would then have to fill in the blank in the next question by describing the factor that the study controls for. If the rater selected not applicable, then the next question will remain blank.

m. Column M: This column contains each rater’s free text answer in describing the factor(s) that the study controls for if they selected the first or second answer in the previous question.

n. Column N: This column contains each rater’s answer concerning the assessment of outcome in the study. This is the sixth criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

independent blind assessment*

record linkage*

self report

no description

o. Column O: This column contains each rater’s answer to the question “Was follow-up long enough for outcomes to occur?” This is the seventh criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

yes (select an adequate follow up period for outcome of interest) *

p. Column P: This column contains each rater’s assessment to the adequacy of follow up of cohorts in the article. This is the eight criteria of the NewCastle Ottawa Cohort Scale. The rater could select an answer from the following:

complete follow up-all subjects accounted for*

subjects lost to follow up unlikely to introduce bias-small number lost > ___% (select an adequate %) follow up, or description provided of those lost*

follow up rate <____% (select an adequate %) and no description of those lost

no statement

If the rater selected the second or third answer, they will fill in the blank percentage with the next question.

q. Column Q: This column contains rater’s free text answer to fill in the blank percentage from the previous question. Please note, if the rater chose “no statement” or “complete follow up” in the previous question, then no percentage was provided in this column.

r. Column R: This column represents the final class of bias assigned by each rater to the article in question based on the amount of “stars” or points their assessment accumulated.

4. Missing data codes: None

5. Specialized formats or other abbreviations used: None

NEWCASTLE-OTTAWA_QUALITY_ASSESSMENT_SCALE_CASE_CONTROL_STUDIES.csv

a) no history of disease (endpoint) Ø b) no description of source

Description:

1. Number of variables: 16

2. Number of cases/rows: 56

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 17.

g. Column G: This column contains each rater’s answer when considering if the case definition is adequate in the selected article. This is the first criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

yes, with independent validation*

yes, eg record linkage or based on self reports

no description
The asterisk at the end of the first question indicates that if the rater selects this answer, the article collects a point. his is the second criteria of the NewCastle Ottawa Case-Control Scale. The points later on determine the article’s rating of risk of bias.

h. Column H: This column contains each rater’s answers concerning the representativeness of the cases. The rater could select an answer from the following:

consecutive or obviously representative series of cases *

potential for selection biases or not stated

i. Column I: This column contains each rater’s answer in describing the selection of controls. This is the third criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

community controls*

hospital controls

no description

j. Column J: This column contains each rater’s answer concerning the definition of controls. This is the fourth criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

no history of disease (endpoint) *

no description of source
k. Column K: This column contains each rater’s answer concerning the comparability of cases and controls on the bases of the design or analysis of the study. This is the fifth criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

study controls for __________ (select the most important factor)*

study controls for any additional factor* (this criteria could be modified to indicate specific control for a second important factor)

not applicable

l. Column L: This column contains each rater’s free text answer in describing the factor(s) that the study controls for if they selected the first or second answer in the previous question.

m. Column M: This column contains each rater’s answer concerning the study’s ascertainment of exposure. This is the sixth criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

secure record (eg surgical records) *

structured interview is where blind to case/control status *

interview not blinded to case/control status

written self report or medical record only

no description

n. Column N: This column contains each rater’s answer asking if the same method of ascertainment was used for both cases and controls. This is the seventh criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

yes *

o. Column O: This column contains each rater’s answer in determining the non-response rate of participants within the article. This is the eight criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

same rate for both groups *

non respondents described

rate different and no designation

p. Column R: This column represents the final class of bias assigned by each rater to the article in question based on the amount of “stars” or points their assessment accumulated.

4. Missing data codes: None

5. Specialized formats or other abbreviations used: None

SIGN_Methodology_Checklist_3_Cohort_Studies.csv

Description:

1. Number of variables: 22

2. Number of cases/rows: 128

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 13.

g. Column G: This column contains each rater’s answer when asked if the study addresses an appropriate and clearly focused question. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not addressed or not applicable. This is the first question in the Sign50 Cohort Checklist.

h. Column H: This column contains each rater’s answer when asked if the two groups being studied are selected from source populations that are comparable in all respects other than the factor under investigation. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not addressed or not applicable. This is the second question in the Sign50 Cohort Checklist.

i. Column I: This column contains each rater’s answer when asked if the study indicates how many of the people asked to take part did so, in each of the groups being studied. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the third question in the Sign50 Cohort Checklist.

j. Column J: This column contains each rater’s answer when asked if the likelihood that some eligible subjects might have the outcome at the time of enrolment is assessed and taken into account in the analysis. The raters could select an answer from: well covered, adequately addressed, poorly addressed. not addressed or not applicable. This is the fourth question in the Sign50 Cohort Checklist.

k. Column K: This column contains each rater’s free text answer when asked what percentage of individuals or clusters recruited into each arm of the study dropped out before the study was completed. This is the fifth question in the Sign50 Cohort Checklist.

l. Column L: This column contains each rater’s answer when asked how well the information to the question in column K is reported in the article. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the sixth question in the Sign50 Cohort Checklist.

m. Column M: This column contains each rater’s answer when if the article makes a comparison between full participants and those who were lost to follow up based on exposure status. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the seventh question in the Sign50 Cohort Checklist.

n. Column N: This column contains each rater’s answer when asked if the article’s outcomes are clearly defined. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the eight question in the Sign50 Cohort Checklist.

o. Column O: This column contains each rater’s answer when asked if the article’s assessment of outcome is made blind to exposure status. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the ninth question in the Sign50 Cohort Checklist.

p. Column P: This column contains each rater’s reflection to the statement “When blinding was not possible, there is some recognition that knowledge of exposure status could have influenced the assessment of outcome”. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the tenth question in the Sign50 Cohort Checklist.

q. Column Q: This column contains each rater’s reflection to the statement “The measure of assessment of exposure is reliable.” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the 11th question in the Sign50 Cohort Checklist.

r. Column R: This column contains each rater’s reflection to the statement “Evidence from other sources is used to demonstrate that the method of outcome assessment is valid and reliable.” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the 12th question in the Sign50 Cohort Checklist.

s. Column S: This column contains each rater’s reflection to the statement “Exposure level or prognostic factor is assessed more than once.” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the 13th question in the Sign50 Cohort Checklist.

t. Column T: This column contains each rater’s reflection to the statement “The main potential confounders are identified and taken into account in the design and analysis.” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the 14th question in the Sign50 Cohort Checklist.

u. Column U: This column contains each rater’s reflection to the statement “Confidence intervals are provided.” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the 15th question in the Sign50 Cohort Checklist.

v. Column V: This column represents the final class of bias assigned by each rater to the article in question.

4. Missing data codes: None

5. Specialized formats or other abbreviations used: None

SIGN_Methodology_checklist_4__Case_control_studies.csv

Description:

1. Number of variables: 20

2. Number of cases/rows: 56

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 17.

h. Column H: This column contains each rater’s answer when asked if the the cases and controls are taken from comparable populations. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not addressed or not applicable. This is the second question in the Sign50 Case-Control Checklist.

i. Column I: This column contains each rater’s answer when asked if same exclusion criteria are used for both cases and controls. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the third question in the Sign50 Case-Control Checklist.

j. Column J: This column contains each rater’s free text answer when asked “What percentage of cases participated in the study?” This is the fourth question in the Sign50 Case-Control Checklist.

k. Column K: This column contains each rater’s free text answer when asked “What percentage of controls participated in the study?” This is the fifth question in the Sign50 Case-Control Checklist.

l. Column L: This column contains each rater’s answer when asked if a comparison was made between participants and non-participants to establish their similarities or differences. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the sixth question in the Sign50 Case-Control Checklist.

m. Column M: This column contains each rater’s answer when if the article has cases that are clearly defined and differentiated from controls. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the seventh question in the Sign50 Case-Control Checklist.

n. Column N: This column contains each rater’s answer when asked if it is clearly established that controls are non-cases. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the eight question in the Sign50 Case-Control Checklist.

o. Column O: This column contains each rater’s reflection to the statement “Measures will have been taken to prevent knowledge of primary exposure influencing case ascertainment.”. The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the ninth question in the Sign50 Case-Control Checklist.

p. Column P: This column contains each rater’s reflection to the statement “Exposure status is measured in a standard, valid and reliable way.” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the tenth question in the Sign50 Case-Control Checklist.

q. Column Q: This column contains each rater’s reflection to the statement “The main potential confounders are identified and taken into account in the design and analysis.” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the 11th question in the Sign50 Case-Control Checklist.

r. Column R: This column contains each rater’s reflection to the statement “Confidence intervals are provided” The raters could select an answer from: well covered, adequately addressed, poorly addressed, not reported/addressed or not applicable. This is the 12th question in the Sign50 Case-Control Checklist.

s. Column S: This column represents the final class of bias assigned by each rater to the article in question.

4. Missing data codes: E2, name is missing.

5. Specialized formats or other abbreviations used: None

GENERAL INFORMATION

1. Title of Dataset: Inter-rater Reliability of Risk of Bias Tools for Non-Randomized Studies

2. Author Information

a. Principal Investigator Contact Information

Name: Mark Keezer

Institution: Université de Montréal

Address : Centre Hospitalier de l'Université de Montréal, Pavillon R R04-700 1000 Saint-Denis St., Montreal, QC, Canada, H2X 0C1

Email: mark.keezer@umontreal.ca

b. Associate or Co-investigator Contact Information

Name: Isabel Kalaycioglu

Institution: Université de Montréal

Email: Isabel.kalaycioglu@umontreal.ca

3. Date of data collection (single date, range, approximate date): 2020-2021

4. Geographic location of data collection: Montréal, Québec, Canada

SHARING/ACCESS INFORMATION

1. Links to publications that cite or use the data:

Kalaycioglu, I., Rioux, B., Briard, J.N. et al. Inter-rater reliability of risk of bias tools for non-randomized studies. Syst Rev 12, 227 (2023). https://doi.org/10.1186/s13643-023-02389-w

2. Links to other publicly accessible locations of the data: None

3. Links/relationships to ancillary data sets: None

4. Was data derived from another source? No

5. Recommended citation for this dataset :

Data & File Overview

1. File List:

A) AAN__Criteria_for_Rating_Population_Screening_Studies.csv

B) AAN__Criteria_for_Rating_Prognostic_Studies.csv

C) Gyorkos_Cohort_Studies.csv

D) Gyorkos_Cross-Sectional_Studies.csv

E) Loney__Prevalence_or_Incidence_ROB.csv

F) NEWCASTLE_-_OTTAWA_QUALITY_ASSESSMENT_SCALE_COHORT_STUDIES.csv

G) NEWCASTLE-OTTAWA_QUALITY_ASSESSMENT_SCALE_CASE_CONTROL_STUDIES.csv

H) SIGN_Methodology_Checklist_3_Cohort_Studies.csv

I) SIGN_Methodology_checklist_4__Case_control_studies.csv

2. Relationship tween files, if important: None

3. Additional related data collected that was not included in the current data package: None

4. Are there multiple versions of the dataset? No

#######################################################################

DESCRIPTION OF THE DATA AND FILE STRUCTURE:

DATA-SPECIFIC INFORMATION FOR: AAN__Criteria_for_Rating_Population_Screening_Studies.csv

Description:

1. Number of variables: 7

2. Number of cases/rows: 192

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Joel Neves Briard was evaluating article 41.

4. Missing data codes: E2, E3. Names are missing in both these cells.

5. Specialized formats or other abbreviations used: None

AAN__Criteria_for_Rating_Prognostic_Studies.csv

Description:

1. Number of variables: 7

2. Number of cases/rows: 184

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Toma was evaluating article 12.

4. Missing data codes: E2. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None

Gyorkos_Cohort_Studies.csv

Description:

1. Number of variables: 13

2. Number of cases/rows: 92

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Joel Neves Briard was evaluating article 42.

4. Missing data codes: E2. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None

Gyorkos_Cross-Sectional_Studies.csv

Description:

1. Number of variables: 11

2. Number of cases/rows: 97

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Bastien was evaluating article 54.

i. Column I: This column contains each rater’s answer to the question, “[Did the article have] soundness and completeness in measurement of intervention/exposure?”. This is the third question of the Gyorkos Cross-Sectional Checklist. The rater could answer with either Yes, No, Partial/Incomplete or Can’t Tell. I.e in row 4, Bastien was evaluating article 54 and he said that this article had partial/incomplete soundness and completeness in measurement of intervention/exposure.

4. Missing data codes: E2. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None

Loney__Prevalence_or_Incidence_ROB.csv

Description:

1. Number of variables: 15

2. Number of cases/rows: 198

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 6, Joel Neves Briard was evaluating article 41.

4. Missing data codes: E2, E3, E4, E5.. Names are missing in this cell.

5. Specialized formats or other abbreviations used: None

NEWCASTLE_-_OTTAWA_QUALITY_ASSESSMENT_SCALE_COHORT_STUDIES.csv

Description:

1. Number of variables: 18

2. Number of cases/rows: 127

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 12.

truly representative of the average _______________ (describe) in the community*

somewhat representative of the average ______________ in the community*

selected group of users eg nurses, volunteers

no description of the derivation of the cohort

drawn from the same community as the exposed cohort*

drawn from a different source

no description of the derivation of the non exposed cohort

secure record (eg surgical records)*

structured interview*

written self report

no description

yes*

study controls for __________ (select the most important factor)*

study controls for any additional factor* (this criteria could be modified to indicate specific control for a second important factor)

not applicable

m. Column M: This column contains each rater’s free text answer in describing the factor(s) that the study controls for if they selected the first or second answer in the previous question.

independent blind assessment*

record linkage*

self report

no description

yes (select an adequate follow up period for outcome of interest) *

complete follow up-all subjects accounted for*

subjects lost to follow up unlikely to introduce bias-small number lost > ___% (select an adequate %) follow up, or description provided of those lost*

follow up rate <____% (select an adequate %) and no description of those lost

no statement

If the rater selected the second or third answer, they will fill in the blank percentage with the next question.

r. Column R: This column represents the final class of bias assigned by each rater to the article in question based on the amount of “stars” or points their assessment accumulated.

4. Missing data codes: None

5. Specialized formats or other abbreviations used: None

NEWCASTLE-OTTAWA_QUALITY_ASSESSMENT_SCALE_CASE_CONTROL_STUDIES.csv

a) no history of disease (endpoint) Ø b) no description of source

Description:

1. Number of variables: 16

2. Number of cases/rows: 56

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 17.

yes, with independent validation*

yes, eg record linkage or based on self reports

no description

The asterisk at the end of the first question indicates that if the rater selects this answer, the article collects a point. his is the second criteria of the NewCastle Ottawa Case-Control Scale. The points later on determine the article’s rating of risk of bias.

h. Column H: This column contains each rater’s answers concerning the representativeness of the cases. The rater could select an answer from the following:

consecutive or obviously representative series of cases *

potential for selection biases or not stated

community controls*

hospital controls

no description

no history of disease (endpoint) *

no description of source

k. Column K: This column contains each rater’s answer concerning the comparability of cases and controls on the bases of the design or analysis of the study. This is the fifth criteria of the NewCastle Ottawa Case-Control Scale. The rater could select an answer from the following:

study controls for __________ (select the most important factor)*

study controls for any additional factor* (this criteria could be modified to indicate specific control for a second important factor)

not applicable

l. Column L: This column contains each rater’s free text answer in describing the factor(s) that the study controls for if they selected the first or second answer in the previous question.

secure record (eg surgical records) *

structured interview is where blind to case/control status *

interview not blinded to case/control status

written self report or medical record only

no description

yes *

same rate for both groups *

non respondents described

rate different and no designation

p. Column R: This column represents the final class of bias assigned by each rater to the article in question based on the amount of “stars” or points their assessment accumulated.

4. Missing data codes: None

5. Specialized formats or other abbreviations used: None

SIGN_Methodology_Checklist_3_Cohort_Studies.csv

Description:

1. Number of variables: 22

2. Number of cases/rows: 128

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 13.

v. Column V: This column represents the final class of bias assigned by each rater to the article in question.

4. Missing data codes: None

5. Specialized formats or other abbreviations used: None

SIGN_Methodology_checklist_4__Case_control_studies.csv

Description:

1. Number of variables: 20

2. Number of cases/rows: 56

3. Variable list:

a. Column A: This represents the submission number.

b. Column B: Start time represents the time the rater began the Microsoft form.

c. Column C: Completion time represents the time the rater completed the Microsoft form.

d. Column D: This column represents the submitted emails for each rater. This column has been left anonymous.

e. Column E: This column includes the names of all the raters who used this Microsoft form.

f. Column F: This column represents the article number that the rater was evaluating during that submission. I.e in row 4, Lahoud Touma was evaluating article 17.

s. Column S: This column represents the final class of bias assigned by each rater to the article in question.

4. Missing data codes: E2, name is missing.

5. Specialized formats or other abbreviations used: None

Funding

Université de Montréal