Screenshots and metadata for 214 reCAPTCHA challenges encountered between September 2022 - September 2023
Data files
Jun 19, 2024 version files 92.72 MB
-
00383d35-870f-4f79-9098-b56182bf2e3a.png
92.06 KB
-
09693400-6cb5-4c6a-abf0-2660cafb4b47.png
3.07 MB
-
09ccf814-2a7a-41b1-8539-dbaf66fdcbb3.png
218.38 KB
-
0a80c2c2-7003-4fba-b11b-cc767a05a5c5.png
623.51 KB
-
0afbba96-ff9c-48a3-a75a-a9967daa8b9c.png
159.59 KB
-
0c69b12f-b467-4cfa-ad2d-a8f774181935.png
395.65 KB
-
0c8ed564-6b52-4b69-abcb-21927e483ff5.png
814.34 KB
-
0cdf511e-ac1d-4eb9-a802-0c218d3f6042.png
429.90 KB
-
0e61007f-596d-4201-8174-1c2b14c9cc27.png
355.98 KB
-
0f709c55-9919-44a8-83cb-a3005b039d23.png
90.90 KB
-
0fb4bb7c-94e6-4702-80a7-adf7c80795c3.png
361.76 KB
-
1054564e-7f8e-4ef2-bfc4-765689c3d651.png
226.74 KB
-
1471c57a-0af1-4563-a15a-78c84ee8d83e.png
319.72 KB
-
15d35241-09b2-4c2c-891f-b0ad93fa6bdc.png
292.18 KB
-
17c8121c-04ee-4610-aee9-90900df365fa.png
85.80 KB
-
1a665038-f523-456d-9997-97e4086669b8.png
1.23 MB
-
1aca8e78-79ab-4fcd-aa67-0c5707206489.png
1.27 MB
-
1f90380f-9e80-4423-bdf8-f4fa14320bc2.png
545.53 KB
-
21501bab-17b4-49fd-82db-685ffcc171ce.png
1.51 MB
-
236e62af-177a-41db-867f-6e0018225e3c.png
382.30 KB
-
2378e91a-3651-4d6b-8603-adee66933e32.png
176.67 KB
-
246b0dd5-a0b3-410b-87ee-66276b8aefe9.png
715.55 KB
-
26248943-ace6-41a1-98ed-d2a29adaeff9.png
197.11 KB
-
26c12f7d-9971-4092-8ac9-19c880d92ed4.png
236.12 KB
-
28cd64d0-1909-4d95-912a-b61a5aff438d.png
47.68 KB
-
2ae69df4-1720-4cba-afa5-e6f781272fc5.png
47.13 KB
-
2e2f2e05-3343-41a0-bec2-ccada40d702a.png
43.28 KB
-
2e70fc56-127a-47eb-80ca-411d5cab0551.png
4.53 MB
-
2f782845-8919-4a80-a195-705ec020d092.png
58.46 KB
-
2fd87db8-73d6-499b-a383-16cc94203da4.png
30.56 KB
-
2feab8ac-b59d-4a1e-837c-6f29842873cd.png
50.69 KB
-
3076c975-2420-4bd9-a5e8-8b7e5bbe5eca.png
31.46 KB
-
3397c908-eea8-4279-90f6-a43362f01c89.png
346.56 KB
-
352ba275-d57a-4540-96b9-877269334f56.png
392.91 KB
-
358b08a1-cc15-4fd4-a324-2a73056281d5.png
21.29 KB
-
35a7dc5c-214d-4e8f-8d4f-3eb310572b79.png
546.33 KB
-
376f6cd6-092a-425d-ad23-d64a801c8819.png
173.69 KB
-
37b12c40-8932-4153-9cb1-152891482817.png
159.81 KB
-
37ed609c-3e13-45e8-a1f2-9de2a33859ba.png
763.14 KB
-
39b2cf6a-f9b9-46f7-b1f5-633593e6ef7d.png
65.53 KB
-
3d072551-3a79-43c6-8a43-5cd18add4020.png
2.37 MB
-
3dbbee09-922f-43bd-a7e6-5c41d2944202.png
45.04 KB
-
3e443de4-11aa-466b-be4b-ca1e45c58ced.png
729.66 KB
-
3fe0391b-c044-4d7a-98c4-9c23a4a9071c.png
434.99 KB
-
414714a1-dbfe-42ed-b588-f16edc425a02.png
81.27 KB
-
455179c7-6521-4776-bdee-042d51e91a52.png
82.71 KB
-
463e1c14-af5c-4fc5-a9d9-3da6979aa244.png
1.57 MB
-
49af5464-5041-458b-8585-f4de1e5a9257.png
1.21 MB
-
4be55d11-98b4-4dbf-9613-3e51365306df.png
89.90 KB
-
4d9f754f-1b2c-4b33-a78e-f3aa63737304.png
58.94 KB
-
4dfcc781-b7a6-46f1-bece-2527e89f80fc.png
234.51 KB
-
4fee58c8-fbb8-431a-a2cb-f80bd5518452.png
390.50 KB
-
501b6a6d-fe4f-4838-aad6-2c6778c8e12a.png
134.42 KB
-
53ad57b6-2089-406d-8697-a7f14b0bf646.png
945.75 KB
-
569d60af-ae03-40d0-9aa7-fb7f09083348.png
59.23 KB
-
571d1fa9-23c6-4b2c-aa7a-8957fb11298f.png
109.56 KB
-
59361cb2-bb81-4657-9b07-e9df6cdd77a8.png
425.19 KB
-
59750fcf-c0a8-42bd-953d-0847b59a890c.png
46.24 KB
-
5a78038f-7171-45b5-a34c-ec574ef6ca37.png
412.31 KB
-
5aa9d52c-6d86-46af-ba13-d39e614cd982.png
3.59 MB
-
5eea0969-94f8-4c5a-bc14-3e16d3c33799.png
510.69 KB
-
5f10be4c-1c1b-4fbe-8016-d6c99a9b8556.png
474.72 KB
-
61618842-d22b-4090-af3b-41682ac5b25f.png
363.14 KB
-
641074a9-7fb5-47f7-ba0f-b8d4ce988bc1.png
55.59 KB
-
68440240-0373-4359-a0e1-07d470d54b7a.png
1.83 MB
-
68add777-5beb-4e0f-9d68-94814073390a.png
291.23 KB
-
68ca0c72-53af-4665-84ae-9939f5b41192.png
347.90 KB
-
692e7a8a-0082-4110-a027-6cf658360290.png
237.49 KB
-
6957d3f5-3aa5-4eea-9475-8be4cee74574.png
107.41 KB
-
69ddac18-4733-4829-b3e2-e378fbbe45f0.png
128.11 KB
-
6ae3a22a-42d9-459d-bae2-41ba78e9f60c.png
900.52 KB
-
6f515a48-0b0e-4b39-9e80-2b09d43b3d90.png
349.22 KB
-
71659b40-9f0c-4316-876e-2601cb19d877.png
34.70 KB
-
746a4d4c-37df-4061-a578-12370458258a.png
443.55 KB
-
74c52e09-b378-4359-a5e5-29e80b037c73.png
384.84 KB
-
76b150bb-3607-4cc8-86a6-6a014631f6b0.png
57.54 KB
-
77a43384-a10b-44bd-8643-cbe55a2c3678.png
701.38 KB
-
784f0139-438f-45d7-8b08-e365459e6ebd.png
430.12 KB
-
78618643-8bd8-4cc9-96f4-0d63d4a7d642.png
153.57 KB
-
78880b82-17b7-4604-a54f-7a08490c8f3b.png
851.16 KB
-
78f47f98-715a-4c0c-8dbb-bbbc4624ba8b.png
1.67 MB
-
7eaf640f-eee7-4cd7-bfa5-4739fbd3cf7c.png
603.45 KB
-
7fffbc71-b17e-46c3-ae1c-77bd5159e760.png
10.50 KB
-
8120753f-f19d-4ff5-b64f-ca0dfbe683e1.png
471.24 KB
-
8235c297-ead3-4895-945b-9932da87c1b4.png
104.59 KB
-
83fea360-52e6-42bf-8cfa-14261481aead.png
92.60 KB
-
851f13ec-09b7-493c-9ec8-ef4f65d1f446.png
139.26 KB
-
87988fec-4f72-416f-b952-c05f4e077753.png
459.72 KB
-
87f597d8-2f97-49e5-ad20-4e23d56c1863.png
3.44 MB
-
89376d70-0f72-4c53-9a32-328227ccc6e8.png
379.70 KB
-
89d8753e-fc6d-4cb6-89d1-ef124dbfce39.png
357.82 KB
-
8bbb47f1-4e18-49e5-92a1-19d62252a026.png
169.15 KB
-
93e3029a-94d2-4a83-b91e-2fb1d26b44b7.png
190.83 KB
-
949232db-a09d-49ff-a828-29c570c7e05e.png
1.64 MB
-
98203c1b-64fa-48be-8029-b705cbd83c11.png
144.96 KB
-
989e6fbb-a582-4423-8c8b-48dcb79a9c66.png
135.08 KB
-
99358c30-7db9-41f8-8550-d5f040fdfb94.png
204.21 KB
-
9a4fdd6d-23c4-4efa-8fee-d3f195a04209.png
344.13 KB
-
9a94a744-0c9b-431e-be2a-29a768aadcf5.png
96.03 KB
-
9e477aa3-9b90-4863-b445-08bb01bef9bf.png
21.83 KB
-
9ebf1639-011e-48fd-868f-7ee5537165ee.png
1.12 MB
-
a161179b-c801-49eb-a32b-cc31092039b2.png
896.45 KB
-
a1618ab4-6eea-40d1-aab7-fd0cf4b9dee1.png
277.91 KB
-
a32304fa-a891-4f32-b3d8-9b4f69ffa86f.png
360.42 KB
-
a4c1d407-648a-4ed0-a603-4aad2c8882dc.png
575.60 KB
-
a90edd02-e80d-49e0-9531-1f4816a9fc64.png
527.95 KB
-
a94fa950-d601-493a-9399-46f6a7544426.png
285.02 KB
-
ac2e496f-8858-4067-a8ab-d26f6e73e87f.png
2.61 MB
-
ac3c97bb-eabe-44ad-b9da-1c54c4be403b.png
1.70 MB
-
ac6b01ce-54f1-4d55-bdb5-01bac99ba0a4.png
267.32 KB
-
ad848c51-2caf-46e3-a92e-74ac815293bf.png
96.03 KB
-
afccc06e-9a66-4087-858b-7fc932634796.png
1.72 MB
-
b1b46491-54bd-441f-8a3a-0abb3785024e.png
123.56 KB
-
b51b6f72-3389-4b6a-8b51-ee6197515a01.png
35.94 KB
-
b76ade9e-9c5f-4e0f-91a8-9f4f36018755.png
396.86 KB
-
c19e56b7-5104-4730-bc34-135ef22ef277.png
456.56 KB
-
c2e14660-143e-4fcc-96c6-1de2e4675f2a.png
72.35 KB
-
c34d9263-47c0-4b79-9af8-0f4ad8e0a7d5.png
1.05 MB
-
c3c4f719-159b-4076-aec6-e527c6ad4e23.png
784.55 KB
-
c5ea962a-d13e-49c8-a52f-12427b91e95e.png
2.66 MB
-
c6a6895d-2414-4294-b7a8-c97081d766f9.png
19.83 KB
-
c6ce11f3-2845-4411-b1be-bb289cf82a38.png
30.14 KB
-
c79430f5-b809-4736-9bc1-7da13fc51653.png
356.52 KB
-
c7c476d7-871d-4046-82f6-7158b1f49963.png
1.08 MB
-
c87fada1-0bd7-4e72-85d8-0230fff67af8.png
192.72 KB
-
c9e4dc32-3460-43f4-b97f-e1b0f23da1ae.png
21.28 KB
-
ca49a8e2-5aab-40c8-a777-3915c8518634.png
60 KB
-
ca718d25-b138-4048-94a7-c266f357dad0.png
90.30 KB
-
ca7cfb4a-5c0f-4def-b0ce-99f796c1e6fe.png
494.18 KB
-
caba7214-41d7-4ceb-825a-c9b6ab1b5f4b.png
181.46 KB
-
captcha-out.jsonl
172.58 KB
-
cb9b31bc-cf10-47cd-b244-e191fe5b7e75.png
1.57 MB
-
ce6356a4-5cee-4401-b4f3-3d974b10fb74.png
52.91 KB
-
ceeaef00-f2e5-4b62-9de4-414aef6ee219.png
500.43 KB
-
d3a1fe50-3929-4627-bac8-d1753d877149.png
168.70 KB
-
d48ceedb-4e30-4394-a8c7-874585273619.png
36.48 KB
-
d7108974-6b26-4218-9fd3-0433d8f9a139.png
413.66 KB
-
d918ca7e-6061-444e-a490-5fcae7e8a1df.png
531.04 KB
-
d9abc864-10a7-4ee3-a1b0-1a7d36159f27.png
1.10 MB
-
d9dfea4a-a7ad-4b59-bd29-adde49331434.png
121.72 KB
-
dc2ea08a-04d0-45a2-936b-530b456f1e0d.png
142.41 KB
-
dc4e1214-8c10-4a49-b973-a919f07d1e92.png
157.25 KB
-
dc6b03b4-a75a-45d0-8850-0bac5f78da4f.png
107.41 KB
-
ddd133a4-40fa-4cbd-b7d0-c135bea4f5bf.png
54.33 KB
-
e1166f47-6399-4770-900c-570e724ca492.png
17.36 KB
-
e2a2cf95-40a2-44d8-af35-2816ac0879a4.png
533.56 KB
-
e48e3542-3da6-4594-83e1-16768dbc44a2.png
224.76 KB
-
e4d08761-eaa5-4610-bda1-20dae803ed85.png
27.64 KB
-
e5929391-1312-46c8-ae95-f63b2245c0d9.png
454.15 KB
-
e5b2d7bf-ee37-4bd9-b67f-7232c9bfb020.png
554.94 KB
-
e6eb0981-a5e0-4677-8dd9-bebc76759d2b.png
158.33 KB
-
e7691220-a6cb-4216-858e-f276a1bc56e2.png
547.23 KB
-
e7f817c1-0a90-4a57-8d07-052090671863.png
168.92 KB
-
e7ff43b7-4dc2-4c16-b373-b25b08bcf3cc.png
701.09 KB
-
e88c6a46-2fbe-4a3a-81d1-5773f086c46b.png
39.35 KB
-
e91a32e1-73cd-465a-a27f-0fdfda28c104.png
217.91 KB
-
e962efa2-b23e-4460-8d35-e7adc19e8a01.png
476.90 KB
-
e9aa0fd0-85f6-4106-ab9c-7365629493c3.png
648.08 KB
-
ea3b2795-8f3f-4ab7-b21c-54289d366787.png
678.45 KB
-
ea6e6764-2e4d-4271-becb-178e9615e9c4.png
1.65 MB
-
ea7462db-335c-41da-9616-fc403289e07b.png
2.99 MB
-
eb332fc6-a4ac-410b-b427-23202c256478.png
111.72 KB
-
ed85f7a8-9881-4260-8c6f-1241dd60fc2e.png
46.38 KB
-
ef00e504-c053-42a7-8b63-0c369669abc6.png
122.89 KB
-
f0707721-b487-4038-83ad-005aea70f7e7.png
165.21 KB
-
f0997512-3340-4f52-87f5-8705681b9110.png
591 KB
-
f1efc530-7f0c-4c17-9397-60e5909e12fb.png
86.48 KB
-
f84f644e-59c9-4519-a7f9-9b7cac01beb5.png
167.67 KB
-
f99bc19f-c719-4f54-9803-88f0f3bdff06.png
1.67 MB
-
f9ae4359-8cb3-44a3-b646-7bf517978a8b.png
357.82 KB
-
f9fb19ca-7119-40c4-b210-21e09c77f873.png
1.27 MB
-
fc31009b-17c5-44bc-8b91-111bbb6ddd24.png
911.02 KB
-
README.md
3.75 KB
Abstract
In Chapter 3 of my dissertation (tentatively titled " Becoming Users:Layers of People, Technology, and Power on the Internet. "), I describe how online user activities are datafied and monetized in subtle and often obfuscated ways. The chapter focuses on Google’s reCAPTCHA, a popular implementation of a CAPTCHA challenge. A CAPTCHA, or “Completely Automated Turning test to tell Computers and Humans Apart” is a simple task or challenge which is intended to differentiate between genuine human users and those who may be using software or other automated means to interact maliciously with a website, such as for spam, mass data scraping, or denial of service attacks. reCAPTCHA challenges are increasingly being hidden from direct view of the user, and instead assessing our mouse movements, browsing patterns, and other data to evaluate the likelihood that we are “authentic” users. These hidden challenges raise the stakes of understanding our own construction as Users because they obfuscate practices of surveillance and the ways that our activities as users are commodified by large corporations (Pettis, 2023). By studying the specifics of how such data collection works—that is, how we’re called upon and situated as Users—we can make more informed decisions about how we engage with the contemporary internet.
This data set contains metadata for the 214 reCAPTCHA elements that I encountered during my personal use of the Web for the period of one year (September 2022 through September 2023). Of these reCAPTCHAs, 137 were visible challenges—meaning that there was some indication of the presence of a reCAPTCHA challenge. The remaining 77 reCAPTCHAs were entirely hidden on the page. If I had not been running my browser extension, I would likely never have been aware of the use of a reCAPTCHA on the page. The data set also includes screenshots for 174 of the reCAPTCHAs. Screenshots that contain sensitive or private information have been excluded from public access. Researchers can request access to these additional files by contacting Ben Pettis <bpettis@wisc.edu>. A browsable and searchable version of the data is also available at https://capturingcaptcha.com
https://doi.org/10.5061/dryad.h70rxwdsr
Description of the data and file structure
Metadata about the reCAPTCHAs is all stored in a single JSON file in the “JSON Lines” format. This means that every line of the file contains a single record. For example:
{"_id":"b4f256f6-7503-42d9-9a18-b60c2331a7c6","status":3,"timestamp":{"$date":"2022-08-20T13:54:33.574Z"},"original_filename":"Screen Shot 2022-08-20 at 8.53.53 AM.png","new_filename":"b4f256f6-7503-42d9-9a18-b60c2331a7c6.png","privacy":true,"website_name":"Esurance","website_url":"esurance.com","website_type":"financial","website_type_other":"","visible":true,"challenge_description":"\"Protected by reCAPTCHA\" logo in the bottom corner","challenge_time":0,"challenge_attempts":0,"additional_description":"","accept_terms":true,"_keywords":["bottom","by","com","corner","esurance","financial","in","logo","protected","recaptcha","the"],"updated_at":{"$date":"2022-08-22T02:03:34.569Z"}}
Each record contains several fields:
- _id - unique identifier for the record. This can be appended to https://capturingcaptcha.com/submissions/ (e.g. https://capturingcaptcha.com/submissions/6957d3f5-3aa5-4eea-9475-8be4cee74574) to view online
- status - a numeric code to represent the data processing status of the record
- 1 Submitted, Pending Review (No File Uploaded)
- 2 Submitted, Pending Review and File Scan
- 3 Accepted Submission and File
- 4 Accepted Submission (No File Uploaded)
- 5 Withheld (hide entire submission from public website)
- 6 Pending Deletion
- 7 Possible Spam
- timestamp - when the reCAPTCHA was submitted
- original_filename - name of the file that was uploaded via the extension
- new_filename - newly generated name for the file before it is stored in Cloud Storage
- privacy
- true - hide screenshot from public access
- false - make screenshot public
- website_name - human readable name of the source website
- website_type - category of the website
- website_type_other - if “Other” category was selected, this field contains a written description
- challenge_description - written description of the type of reCAPTCHA challenge
- challenge_time - estimated number of seconds to complete the reCAPTCHA challenge
- challenge_attempts - number of attempts to successfully complete the reCAPTCHA challenge
- additional_description - written description/additional notes
- accept_terms - indicates whether I checked the “accept terms” box before submitting my screenshot
- _keywords - automatically generated keywords to improve website search functionality
In addition to the JSON file, there is a directory containing public screenshot files. These files are named with the same value as “new_filename” in each record.
Sharing/Access information
Link to other publicly accessible locations of the data:
Code/Software
I created a browser extension to support internet researchers interested in user interactions with specific Web elements. The extension searches for a specified HTML element and invites users to record their screens if it is detected. This is an adaptation of the development I’ve done for a project I am working on that is specifically interested in the Google reCAPTCHA and users’ interaction with these challenges. I’ve spun off this side project so that other researchers can use this approach in their own projects. Its code is available at: https://github.com/bpettis/html-search-and-record
I developed a custom Google Chrome extension which detects when a page contains a reCAPTCHA and prompts the user to save a screenshot or screen recording while also collecting basic metadata. During Summer 2022, I began work on this website to collate and present the screen captures that I save throughout the year. The purpose of collecting these examples of websites where reCAPTCHAs appear is to understand how this Web element is situated within websites and presented to users, along with sketching out the frequency of their use and on what kinds of websites. Given that I will only be collecting records of my own interactions with reCAPTCHAs, this will not be a comprehensive sample that I can generalize as representative of all Web users. Though my experiences of the reCAPTCHA will differ from those of any other person, this collection will nevertheless be useful for demonstrating how the interface element may be embedded within websites and presented to users. Following Niels Brügger’s descriptions of Web history methods, these screen capture techniques provide an effective way to preserve a portion of the Web as it was actually encountered by a person, as opposed to methods such as automated scraping. Therefore my dissertation offers a methodological contribution to Web historians by demonstrating a technique for identifying and preserving a representation of one Web element within a page, as opposed to focusing an analysis on a whole page or entire website.
The browser extension is configured to store data in a cloud-based document database running in MongoDB Atlas. Any screenshots or video recordings are uploaded to a Google Cloud Storage bucket. Both the database and cloud storage bucket are private and are restricted from direct access. The data and screenshots are viewable and searchable at https://capturingcaptcha.com. This data set represents an export of the database as of June 10, 2024. After this date, it is possible that data collection will be resumed, causing more information to be displayed in the online website.
The data was exported from the database to a single JSON file (lines format) using the mongoexport command line tool:
mongoexport --uri mongodb+srv://[database-url].mongodb.net/production --collection submissions --out captcha-out.json --username [databaseuser]