Data from: The dual nature of trust in participatory science: An investigation into data quality and household privacy preferences
Data files
Aug 29, 2024 version files 207.40 KB
-
LinHunter_CSTP697_data_08.27.2024.csv
200.94 KB
-
LinHunter_CSTP697_metadata_08.27.2024.csv
2.30 KB
-
README.md
4.16 KB
Abstract
There is a duality of trust in participatory science (citizen science) projects in which the data produced by volunteers must be trusted by the scientific community and participants must trust the scientists who lead projects. Facilitator organizations can diversify recruitment and broaden learning outcomes. We investigated the degree to which they can broker trust in participatory science projects. In Crowd the Tap, we recruited participants through partnerships with facilitators, including high schools, faith communities, universities, and a corporate volunteer program. We compared data quality (a proxy for scientists’ trust in the project) and participant privacy preferences (a proxy for participants’ trust in the project leaders) across the various facilitators as well as to those who came to the project independently (unfacilitated). In general, we found that data quality differed based on the project’s level of investment in the facilitation partner in terms of both time and money. We also found that demographic characteristics, rather than facilitation, was most important in predicting privacy preferences. Ultimately our results reveal several tradeoffs that project leaders and facilitators should weigh when deciding to work together.
README: Data from: The dual nature of trust in participatory science: An investigation into data quality and household privacy preferences
The dataset contains data on participation in Crowd the Tap, a large-scale participatory science (citizen science) project focused on identifying and addressing lead contamination in household drinking water. The project crowdsources information on plumbing materials, age of home, water aesthetics, and demographic data to learn more about the geographic spread of lead plumbing and social and environmental correlates to lead plumbing. We investigated how data quality (completeness, accuracy, and understandability) and participant privacy (whether or not they select to be public or private, the number times they select “prefer not to say”) preferences differed by facilitators. Data quality relates to scientists’ trust in the project, and privacy relates to the trust that participants have in the project leadership team. As participatory science projects increasingly recruit participants through partnerships with facilitators, understanding facilitators’ roles in establishing trust with participants and collecting high quality data will be increasingly important.
Description of the Data and file structure
Column A - ID: numeric anonymous identifier
Column B - Facilitator: What partner organization (if any) did this participant participate through? Levels: Unfacilitated, University, Level 1, Level 2, Corporate, Faith Community
Column C - Understandability: How many questions did they select "Unknown" for instead of providing an answer that would give more detail? Levels: 1, 2, 3, 4, 5, 99
Column D - Completeness: How many web pages of questions did they complete when participating in the project? Levels: 1, 2, 3, 4
Column E - Accuracy_InternalPipes: Did they accurately identify their internal plumbing based on their magnet and scratch tests? Levels: 0 = inaccurate, 1 = accurate, 99 = missing
Describe relationship between data files, missing data codes, other abbreviations used. Be as descriptive as possible.
Column F - Accuracy_ExternalPipes: Did they accurately identify their external plumbing based on their magnet and scratch tests? Levels: 0 = inaccurate, 1 = accurate, 99 = missing
Column G - Accuracy_HouseholdCount: Did they report the right number of total people based on the number of infants, children, and adults they reported? Levels: 0 = inaccurate, 1 = accurate, 99 = missing
Column H - Accuracy_ChemStrip: If they conducted a water chemistry test strip, did they submit a photo? Levels: 0 = inaccurate, 1 = accurate, 99 = missing
Column I - Privacy: Did they select to be publicly associated with the project, or opt to remain private? Levels: 0 = Private, 1 = Public
Column J - Number_PreferNotToSay: How many questions did they select "Prefer not to say" for instead of providing an answer that would have given more detail? Levels: 0, 1, 2, 3, 4
Column K - Race: What is their race? Levels: White, POC, Prefer not to say?
Column L - Insurance: Do they have health and home insurance? Levels: Yes = they have both home and health insurance, No = they do not have both home and health insurance, Unknown = they are unsure if they have home or health insurance
Column M - HomeTypeAndOwnership: What type of home do they live in and do they own it? Levels: Apartment owner, Apartment renter, Stand-alone home owner, Stand-alone home renter, Stand-alone home neither rented nor owned, Other type of home owner, Other type of home renter, Other type of home neither rented nor owned
Column N - PlumbingType: What materials were their plumbing materials made of? Levels: All metal, Both plastic and metal, All plastic, Unknown, 99 = Missing
Column O - ConstructionPeriod: When was their home built? Levels: 1986 or earlier, After 1986, Unknown
99 - missing data
POC = Person of color
Sharing/access Information
Data from another paper, “Diversifying large-scale participatory science: The efficacy of engagement through facilitator organizations,” is available at https://doi.org/10.5061/dryad.v15dv422h
Methods
The data was collected through an IRB approved survey in which Crowd the Tap participants submitted data on the types of pipes they had, the age of their home, water aesthetics, and demographic information. As part of this process, participants also indicated if they came to the project through a partner organization (what we call facilitator organizations). Using the data available to us, we determined how completely, accurately, and informatively (understandability) they participated in the project to assess data quality. We also asked if they had interest in being publically associated with the project or if they referred to remain private. We used this and the number of times they selected "Prefer not to say" as indicators of privacy. We compared data quality and privacy preferences to the facilitator organization through which they came to the project.