Home alone: Remote work, isolation, and mental health
Data files
Jun 02, 2026 version files 29.22 MB
-
README.md
35.41 KB
-
replication_Home_Alone_de-id.zip
29.19 MB
Abstract
How does remote work impact isolation and mental health? We draw on five nationally representative surveys of American workers (n=567,668) conducted from 2011 to 2024, omitting the peak pandemic years of 2020−2021. Our difference-in-differences approach compares changes in mental health among people in remoteable jobs — who experienced a large and persistent rise in remote work since COVID-19 — to people in non-remoteable jobs, where remote work increased far less. We find remote work increases time spent alone, worsens mental well-being across multiple measures, and increases the use of mental-health services and prescriptions. These effects are concentrated among individuals living alone. We estimate the rise of remote work explains about a third of the increase in isolation and mental distress between 2011−2019 and 2022−2024.
Dataset DOI: 10.5061/dryad.w3r22816p
Description of the data and file structure
This repository contains the data and code required to replicate the analyses in Emanuel, Harrington and Pallais (in review), testing the hypothesis that remote work increases time alone and mental distress. The repository houses the raw extracts of publicly available datasets and crosswalks needed to merge them.
Files and variables
Folder Structure
The main folder, replication_Home_Alone.zip, contains:
- code
- main.sh -- a bash script that runs everything
- analysis_submission.R -- the main script for producing final tables and figures from cleaned data
- cleaning -- files needed to produce the files in the data/clean folder from raw data that a replicator downloads themselves.
- helper_files -- scripts that have home-made functions for analysis
- data
- clean
- atus_clean.csv -- a cleaned, activity-level dataset, derived from the ATUS that can be used to produce our output tables
- atus_person_alldays.csv -- a cleaned, person-level, derived from the ATUS that can be used to produce our output tables
- gss_clean.csv -- a cleaned version of the GSS that can be used to produce our output tables
- meps_clean.csv-- a cleaned version of the MEPS that can be used to produce our output tables
- nhis_clean.csv -- a cleaned version of the NHIS that can be used to produce our output tables
- psid_clean.csv -- a cleaned version of the PSID that can be used to produce our output tables
- int
- teleworking_from_soc -- derived crosswalk that is used in our analysis
- raw
- atus -- empty, but where raw downloaded data should go
- crosswalks
- meps_medications_with_mh_flags.csv -- categorization of MEPS prescription medications as being used for mental health conditions
- 2018-occupation-code-list_2010-to-2018-crosswalk - Cenus 2018 occupation code list for crosswalk
- 2010-occ-codes_2002to2010xwalk.csv -- Census crosswalk to 2010 occupation codes
- national_M2018_dl.csv -- national crosswalk of 2018
- 2010-occ-codes_2010OccCodeList.csv -- Census occupation code list for crosswalk
- atus_ambient_socialization_final.csv -- categorization of ATUS activities as having ambient socialization or being totally isolated
- 2002-census-occupation-codes.csv - census occupation cose for crosswalking
- 2018-occupation-code-list_2018-census-occ-code-list -- Census 2018 crosswalk
- nhis_occ_dn.csv -- categorization of occupations' remotability in the NHIS occupation scheme
- gss -- empty, but where raw downloaded data should go
- meps -- empty, but where raw downloaded data should go
- nhis -- empty, but where raw downloaded data should go
- psid -- empty, but where raw downloaded data should go
- clean
- output -- all folders are empty, ready to receive the output from running the code.
- tables
- figures
- stats
Variables
Control variables, their names and definitions, are listed in SM Table M.9. Variable names in the cleaned datasets are given below.
NHIS
(Source: data/clean/nhis_clean.csv)
- aeffort: In the past 30 days, how often did you feel that everything was an effort? (none/little/some/most/all of the time)
- age: Age
- age2: Age as a quadratic.
- age3: Age as a cubic.
- ahopeless: In the past 30 days, how often did you feel hopeless? (none/little/some/most/all of the time)
- anervous: In the past 30 days, how often did you feel nervous? (none/little/some/most/all of the time)
- arestless: In the past 30 days, how often did you feel restless? (none/little/some/most/all of the time)
- asad: In the past 30 days, how often did you feel so sad nothing could cheer you up? (none/little/some/most/all of the time)
- aworthless: In the past 30 days, how often did you feel worthless? (none/little/some/most/all of the time)
- dep_weight: Weights used for analyses involving K6 distress (k6_distress_nhis).
- dn_prefix: The appropriate occupation prefix to merge on the Dingel-Neimann crosswalk.
- educ_level: The respondent's level of education, encoded.
- employed: Whether the respondent is employed, encoded.
- empstat: The employment status of the respondent, encoded.
- gender: The respondent's gender, encoded.
- k6_distress_nhis: The respondent's K6 Psychological Distress measure.
- living_with_partner_children: Whether the respondent lives with children or a partner.
- marst: Whether the respondent is married, encoded.
- mean_genai_index: The respondent's occupational exposure to generative AI.
- occ: The respondent's occupation group.
- occ_title: The title of the respondent's occupation group.
- parental_status: The respondent's parental status.
- post: Whether the respondent was surveyed post-COVID.
- race_general: The respondent's race.
- sampweight: Weights used for analyses involving frequency of depression (scaled_depfreq).
- scaled_depfreq: Number of days a year that respondent feels depressed.
- share_mh_care_either: Whether respondent saw a mental health professional or needed but could not afford mental health care in the last year.
- share_needed_no_afford_mh: Whether respondent needed but could not afford mental health care in the last year.
- share_saw_doctor_last_yr: Whether respondent saw a doctor in the last year.
- share_saw_mh: Whether a respondent saw a mental health professional in the last year.
- student: Whether the respondent is a student, encoded.
- unemployed: Whether the respondent is unemployed, encoded.
- weighted_teleworkable_avg: The continuous remotability measure of the respondent's occupation group.
- weighted_teleworkable_sd: Because the respondents are given occupation groups, this variable is the standard deviation of that occupation group's remotability.
- worked_last_yr: Whether the respondent worked in the last year, encoded.
- year: Year of survey.
MEPS
(Source: data/clean/meps_clean.csv)
- age: Age
- age2: Age as a quadratic.
- age3: Age as a cubic.
- avg_teleworkable_meps: The continuous remotability measure of the respondent's occupation group.
- educ_level: The respondent's level of education, encoded.
- gender: The respondent's gender, encoded.
- general_race: The respondent's race, encoded.
- living_alone_famsize: Whether the respondent lives alone (i.e., whether their family size is one).
- living_with_partner_children: Whether the respondent lives with children or a partner.
- marst: Whether the respondent is married, encoded.
- meps_num: The occupation group of the respondent, as defined by MEPS.
- occcat: The name of the the respondent's occupation group, as defined by MEPS.
- perweight: The weighting variable used for all MEPS analyses.
- post: Whether the respondent was surveyed post-COVID.
- share_wtr_on_anxiety_meds: Whether the respondent is on prescription medication to treat anxiety.
- share_wtr_on_any_meds: Whether the respondent is on prescription medication to treat any medical issues.
- share_wtr_on_any_mh_meds: Whether the respondent is on prescription medication to treat any mental health issues.
- share_wtr_on_depression_meds: Whether the respondent is on prescription medication to treat depression.
- share_wtr_on_depression_or_anxiety_meds: Whether the respondent is on prescription medication to treat depression or anxiety.
- share_wtr_on_non_mh_meds: Whether the respondent is on prescription medication to treat any non-mental health issues.
- var_teleworkable_meps: Because the respondents are given occupation groups, this variable is the standard deviation of that occupation group's remotability.
- year: Year of survey.
PSID
(Source: data/clean/psid_clean.csv)
- age: Age
- age2: Age as a quadratic.
- age3: Age as a cubic.
- cross_section_weight: The weighting variable used for all PSID analyses (renamed to weights in the analysis file for consistency).
- educ_level: The respondent's level of education.
- employed: Whether the respondent was employed at the time of the survey, encoded.
- everything_effort: In the past 30 days, how often did you feel that everything was an effort? (none/little/some/most/all of the time)
- fam_num1968: The family number, constructed by the PSID.
- feelings_interfere: How often the feelings from the K-6 Distress Scale interfere with daily life in the past 30 days (a lot, some, a little, or not at all)
- filledin_avg_zerovar: An indicator for if the respondent's occupation remotability was filled in by an average and if the variance of the occupations' remotability in that group is zero.
- filledin_by_average: An indicator for whether the respondent's occupation remotability was filled in by an average of similar occupations' remotabilities.
- filledin_by_hand: An indicator for whether the respondent's occupation remotability was filled in by hand.
- gender: The respondent's gender.
- genaiexp_estz_total: The respondent's occupational exposure to generative AI.
- general_race: The respondent's race, encoded.
- hopeless: In the past 30 days, how often did you feel hopeless? (none/little/some/most/all of the time)
- k6_distress: The respondent's Psychological Distress K6 Scale.
- live_alone: Whether the respondent lives alone (i.e., whether their family size is one), encoded.
- living_with_partner_children: Whether the respondent lives with children or a partner.
- marst_general: The respondent's marital status, encoded.
- nervous: In the past 30 days, how often did you feel nervous? (none/little/some/most/all of the time)
- occ: The respondent's occupation.
- parental_status: The respondent's parental status, encoded.
- period: A helper variable to determine which census occupation regime the respondent falls under, depending on the year of the survey (please see technical appendix for more information).
- person_num1968: A person in the panel, constructed by the PSID.
- post: Whether the respondent was surveyed after COVID.
- previous_dn_from_soc: The respondent's previous occupation remotability (if they are unemployed at the time of the survey).
- previous_occ: The respondent's previous occupation (if they are unemployed at the time of the survey).
- relation_hd: How the respondent is related to the head of the household (needed to determine who is asked the mental health questions and can be included in the analysis).
- restless: In the past 30 days, how often did you feel restless? (none/little/some/most/all of the time)
- sadness: In the past 30 days, how often did you feel so sad that nothing could cheer you up? (none/little/some/most/all of the time)
- soc2010: The respondent's occupation, harmonized to the 2010 SOC Code.
- student: Whether the respondent is a student, encoded.
- unemployed: Whether the respondent is unemployed, encoded.
- var_teleworkable: If the respondent's occupation remotability was filled in by an average of remotabilities of similar occupations, then this variable contains the variance of that occupation group.
- wfh_index_from_soc: The remotability of the respondent's current occupation.
- who_respondent: indication of whether respondent was asked the mental health questions and can be included in the analysis.
- worked_last_6mo: Whether the respondent worked in the last six months.
- worthless: In the past 30 days, how often did you feel worthless? (none/little/some/most/all of the time)
- year: The year of the survey.
GSS
(Source: data/clean/gss_clean.csv)
- AGE: Age
- AGE2: Age as a quadratic.
- AGE3: Age as a cubic.
- educ_level: The respondent's level of education, encoded.
- employed: Whether or not the respondent is employed, encoded.
- filledin_avg_zerovar: An indicator for if the respondent's occupation remotability was filled in by an average and if the variance of the occupations' remotability in that group is zero.
- filledin_by_average: An indicator for whether the respondent's occupation remotability was filled in by an average of similar occupations' remotabilities.
- filledin_by_hand: An indicator for whether the respondent's occupation remotability was filled in by hand.
- gender: The respondent's gender, encoded.
- genaiexp_estz_total: The respondent's occupational exposure to generative AI.
- HLTHMNTL: Respondent's mental health, mood, and ability to think (Excellent, very good, good, fair, poor.)
- marst: The respondent's marital status, encoded.
- OCC10: The respondent's occupation.
- parental_status: The respondent's parental status, encoded.
- politics: The respondent's political affiliation (Democrat, Republican, Independent/Other).
- post: Whether the respondent was surveyed after COVID.
- RACE: The respondent's race, encoded.
- soc2010: The respondent's occupation, harmonized to the 2010 SOC codes.
- student: Whether the respondent is a student, encoded.
- unemployed: Whether the respondent is unemployed, encoded.
- var_teleworkable: Variance of remotability across the respondent's occupation group (when remotability was filled in by an average).
- wfh_index_from_soc: The remotability of the respondent's current occupation.
- WRKSTAT: The respondent's employment status.
- WTSSNRPS: The weighting variable used for all GSS analyses (renamed to weights in the analysis file for consistency).
- YEAR: The year of the survey.
ATUS
(Sources: data/clean/atus_person_alldays.csv)
- age: Age
- age2: Age as a quadratic
- age3: Age as a cubic
- all_socialization_is_ambient: Whether all socialization the respondent engaged in that day was ambient (i.e., none was direct).
- all_work_remote: Whether all of the respondent's work that day was remote.
- any_socializing_with_friends_5to10: Whether the respondent socialized with any of their friends from 5-10 PM that day.
- day_week: The day of the week that the respondent was surveyed.
- educ_level: The respondent's level of education, encoded.
- filledin_avg_zerovar: An indicator for if the respondent's current occupation remotability was filled in by an average and if the variance of the occupations' remotability in that group is zero.
- filledin_avg_zerovar_cps: An indicator for if the respondent's previous occupation remotability was filled in by an average and if the variance of the occupations' remotability in that group is zero.
- filledin_by_average: An indicator for whether the respondent's previous (CPS) occupation remotability was filled in by an average of similar occupations' remotabilities.
- filledin_by_average_cps: CPS occupation analogue of
filledin_by_averagefor the respondent's previous occupation. - filledin_by_hand: An indicator for whether the respondent's current occupation remotability was filled in by hand because similar occupations did not exist.
- filledin_by_hand_cps: CPS occupation analogue of
filledin_by_hand_cpsfor the respondent's previous occupation. - gender: The respondent's gender, encoded.
- general_race: The respondent's race, encoded.
- hours_alone: The total hours the respondent spent alone.
- hours_ambient_socialization: The total hours the respondent spent ambiently socializing.
- hours_at_home: The total hours the respondent spent at home.
- hours_total_isolation: The total hours the respondent spent in total isolation.
- hours_with_others: The total hours the respondent spent with others.
- hours_worked: The total hours the respondent spent working.
- hours_worked_alone: The total hours the respondent spent working alone.
- hours_worked_at_home: The total hours the respondent spent working at home.
- hours_worked_at_office: The total hours the respondent spent working in the office.
- hours_worked_with_others: The total hours the respondent spent working with others.
- live_alone_hh_size: Whether the respondent lives alone according to the definition of household size (i.e., whether the household size is one.)
- living_with_partner_children: Whether the respondent lives with children or a partner.
- marst_general: The respondent's marital status, encoded.
- month: The month that the respondent participated in the ATUS.
- month_cps8: The month that the respondent participated in the CPS (because ATUS respondents are a subsample of CPS respondents).
- occ: The respondent's occupation.
- occ_cps8: The occupation that the respondent held at the time of the CPS.
- occ2010: The respondent's occupation, harmonized to the 2010 Census codes.
- parental_status: The respondent's parental status.
- post: Whether the survey was administered post-COVID.
- recently_unemployed: Whether the respondent was unemployed at the time of the ATUS but was not unemployed at the time of the CPS (approximately six months prior).
- retired: Whether the respondent is retired.
- share_all_activities_alone: Whether the respondent spent all their activities alone that day.
- share_all_activities_alone_at_home: Whether the respondent spent all their activities alone and at home that day.
- share_all_activities_at_home: Whether the respondent spent all their activities at home that day.
- share_all_activities_total_isolation: Whether the respondent spent all their activities in total isolation that day (i.e., none of their activities involved neither direct nor ambient socialization).
- share_all_socialization_is_ambient: Whether the respondent spent all of their activities alone or ambiently socializing with others.
- share_all_work_remote: Whether the respondent worked entirely remote that day.
- share_any_socializing_with_friends_5to10: Share-style version of
any_socializing_with_friends_5to10. - share_travel_for_work_rev: Whether the respondent traveled to work that day (reverse-coded, so 0 = traveled for work and 1 = did not travel to work).
- share_working_office_main_rev: Whether the respondent worked in the office that day (reverse-coded, so 0 = worked in the office and 1 = did not work in the office).
- weekday: Whether the activity diary was recorded on a weekday.
- weekend: Whether the activity diary was recorded on a weekend.
- weights: The weights used for all ATUS analyses.
- wfh_index_from_soc: The remotability of the respondent's current occupation.
- wfh_index_from_soc_cps: The remotability of the respondent's previous occupation (if the respondent was unemployed at the time of the ATUS).
- working_office_main: An indicator for whether the respondent did the majority of their work from the office that day.
- year: The year that the respondent took the survey.
- year_cps8: The year that the respondent participated in the CPS.
Teleworkability Crosswalk
(Sources: data/int/teleworking_from_soc.csv)
- filledin_by_hand: Whether this teleworkability index was filled in by hand (only in cases when the occupation was very clearly remote or not remote, was missing in the original Dingel-Neiman index, AND did not have any adjacent occupations from which an average teleworkability could be imputed).
- filledin_by_average5: Whether the occupation was filled in with an average of the teleworkability of occupations that shared the first five digits of the SOC with that occupation.
- filledin_by_average4: Whether the occupation was filled in with an average of the teleworkability of occupations that shared the first four digits of the SOC with that occupation.
- filledin_by_average: Whether the occupation was filled in by an average at all (i.e., whether filledin_by_average5 or filledin_by_average4 is TRUE).
- filledin_avg_zerovar: If the occupation was filled in by an average, an indicator for whether the occupations contributing to that average all have the same teleworkability index.
- occ: Occupation code.
- period: The period of the census occupation code regime that the occupation-teleworkability mapping applies to.
- title: The title of that occupation.
- var_teleworkable_first5: The variance of the teleworkability of the occupations that shared the first five digits of the SOC with that occupation.
- var_teleworkable_first4: The variance of the teleworkability of the occupations that shared the first four digits of the SOC with that occupation.
- var_teleworkable: If the occupation used an average of related occupations for its teleworkability index, then this is the variance of those contributing occupations' teleworkability. If the occupation did not use an average, then this value is zero.
- wfh_index_from_soc: Teleworkability index, derived from Dingel-Neiman.
- years: The years of the census occupation code regime that the occupation-teleworkability mapping applies to.
Ambient Socialization
(Sources: data/raw/crosswalks/atus_ambient_socialization_final.csv)
- activity: The code of the activity, which corresponds to the codes used in the ATUS.
- chat_ambient: Whether the activity can be tagged as ambient socialization.
- description: The description of the activity.
- uncertainty: Whether there is any uncertainty about whether this activity may involve ambient socialization.
2002 Occupation List
(Sources: data/raw/crosswalks/2002-census-occupation-codes.csv)
- V2: The title of the occupation.
- V3: The 2002 Census Occupation code.
- V4: The corresponding SOC code.
2010 Occupation List
(Sources: data/raw/crosswalks/2010-occ-codes_2010OccCodeList.csv)
- Occupation 2010 Description: The title of the occupation.
- 2010 Census Code: The 2010 Census Occupation code.
- 2010 SOC Code: The corresponding SOC code.
2018 Occupation List
(Sources: data/raw/crosswalks/2018-occupation-code-list_2018-census-occ-code-list.csv)
- 2018 Census Title: The title of the occupation.
- 2018 Census Code: The 2018 Census Occupation code.
Medications Crosswalk
(Sources: data/raw/crosswalks/meps_medications_with_mh_flags.csv)
- mental_health_related: Whether the medicine is mental health related.
- mh_flag: A 0/1 that is equivalent to mental_health_related.
- mh_use: What kinds of mental health issues the medication is used to treat.
- n: The number of times MEPS respondents required this medication within the sample.
- rxname: The name of the medicine
- rxname_upper: The name of the medicine, in all caps.
2018 to 2010 Occupation Crosswalk
(Sources: data/raw/crosswalks/2018-occupation-code-list_2010-to-2018-crosswalk.csv)
- 2010 SOC code: The 2010 SOC occupation code.
- 2010 Census code: The 2010 Census occupation code.
- 2010 Census Title: The occupation title in 2010.
- 2018 SOC code: The 2018 SOC occupation code.
- 2018 Census code: The 2018 Census occupation code.
- 2018 Census Title: The occupation title in 2018.
2010 to 2002 Occupation Crosswalk
(Sources: data/raw/crosswalks/2010-occ-codes_2002to2010xwalk.csv)
- 2002 SOC code: The 2002 SOC occupation code.
- 2002 Census code: The 2002 Census occupation code.
- 2002 Census Title: The occupation title in 2002.
- 2010 SOC code: The 2010 SOC occupation code.
- 2010 Census code: The 2010 Census occupation code.
- 2010 Census Title: The occupation title in 2010.
Occupation counts
(Sources: data/raw/crosswalks/national_M2018_dl.csv)
- OCC_CODE: The occupation code, according to Office of Employment Statistics (OES).
- OCC_GROUP: The level of the occupation group (e.g., detailed)
- TOT_EMP: The total number of employees in that occupation group in 2018.
NHIS Occupation Group Crosswalk
(Sources: data/raw/crosswalks/nhis_occ_dn.csv)
- Code: The occupation group code used in NHIS.
- Occupation Title: The name of the occupation group used in the NHIS.
- Period: The years during which this crosswalk applies.
- Dingel-Neimann prefix: The three-digit SOC prefix that corresponds to the NHIS group.
Code/software
Software
The analysis uses both Stata and R. Within Stata 18.0, there are no additional packages needed. It was last run in R 4.1.2, using the following packages:
• ggpubr 0.4.0
• ggforce 0.3.3
• remotes 2.4.2.1
• gridExtra 2.3
• formattable 0.2.1
• ggdist 3.3.3
• fastDummies 1.6.3
• readxl 1.3.1
• cdlTools 1.13
• chron 2.3-62
• haven 2.4.3
• Hmisc 4.6-0
• Formula 1.2-5
• survival 3.2-13
• lattice 0.20-45
• xlsx 0.6.5
• stargazer 5.2.3
• modi 0.1.2
• readstata13 0.10.1 • beepr 1.3
• margins 0.3.26
• janitor 2.1.0
• boot 1.3-28
• scales 1.4.0
• ggrepel 0.9.1
• xtable 1.8-4
• forcats 0.5.1
• dplyr 1.1.2
• purrr 1.0.1
• readr 2.1.1
• tidyr 1.1.4
• tibble 3.2.1
• tidyverse 1.3.1
• stringr 1.5.0
• stringi 1.7.12
• lubridate 1.8.0
• ggplot2 4.0.2
• RColorBrewer 1.1-3
• ipumsr 0.4.5
• ggthemes 4.2.4
• changepoint 2.2.4
• zoo 1.8-12
• cpt 1.0.2
• lfe 2.8-7.1
• Matrix 1.3-4
Code
Then main.sh is a bash program that calls all of the other programs and produces all figures, tables, and statistics in the paper and supplementary materials. If a replicator wants only to use the cleaned, derived datasets, they need only to run analysis submission.R.
Cleaning Code
- cleaning/00_dn_crosswalk.R applies the Dingel-Neimann index [1] and matches cen- sus occupation codes across different occupation regimes.
- cleaning/00_genAI_crosswalk.R applies the generative AI exposure index [7] and matches census occupation codes across different occupation regimes.
- cleaning/00_prep_atus.do reads in and combines ATUS data.
- cleaning/00_prep_psid.do reads in PSID data to make it more accessible in R, renames variables, and identifies the weighting scheme.
- cleaning/01_clean_atus.R cleans activity- and person-level time use data, merging into one clean dataset.
- cleaning/01_clean_gss.R cleans the GSS data and saves a clean dataset.
- cleaning/01_clean_meps.R cleans the MEPS data and assigns remotability measures to each occupation, and saves a clean dataset.
- cleaning/01_clean_nhis.R cleans the NHIS data, merging in with the relevant cross- walks, and saves a clean dataset.
- cleaning/01_clean_psid.R cleans the PSID data, identifies living arrangements, occupation remotability, and saves a clean dataset.
Analysis Code
- helper_files/cleanup.R removes files after each figure/table to maintain a clean workspace.
- helper_files/helpers.R creates useful user-written sub-programs that make graphs look a particular way, help make results from regression accessible, for example, analysis submission.R runs cleaning code labeled with ‘01 ’ and also runs analysis code to produce all the figures, tables, and in-text statistics.
Replication Workflow
To replicate our findings, please use the following steps:
- Original datasets must be downloaded from the hosts listed in the Data Access section ReadMe file below. (The publicly available datasets have too many indirect identifiers to be posted on Dryad.)
- Edit main.sh, code/00 prep atus.do, code/00 prep psid.do, and code/analysis.R files to adjust the default path to where you have saved the repository on your own machine.
- run main.sh, a bash script
- To clean the data from scratch, please toggle the clean from scratch variable analy- sis submission.R to TRUE (on line 83). Otherwise, the code pulls on the data files in data/clean to run the analyses.
To replicate without downloading raw datasets, simply run code/analysis_submission.R.
Within code/analysis_submission.R, each exhibit is identified in comments by its number in the paper or supplementary materials. For example, Figure 1 is identified by
“####=== Fig 1: Shifts in Work ===###”
As such, each exhibit can be jumped-to using R Studio’s document outline feature.
Access information
Data Access
All data used in this project are publicly available. Below we delineate the sources for each com- ponent of the data.
Dingel-Neiman Index (D-N) were downloaded from the authors’ github on August 27, 2025. A copy of the data is provided as part of this archive. The data are in the public domain [1].
American Time Use Survey (ATUS) data were downloaded from IPUMS on February 26, 2026 [2]. We use data from 2011-2024, extracting the relevant variables on all households. A copy of the data is provided as part of this archive.
ATUS does not allow for redistribution, except for the purpose of replication archives. Permissions as per https://www.ipums.org/about/terms have been obtained. The attached data file is intended only for replication purposes. Individuals are not to redistribute the data without permission. For all other uses of these data, please access data directly via the url above.
Panel Study of Income Dynamics (PSID) data were downloaded from https://psidonline.isr.umich.edu/default.aspx on February 25, 2026 [3]. We use data from 2011-2023, extracting the relevant variables on all households. The data are in the public domain.
PSID does not allow reposting of data, except in their own repository, per https://simba.isr.umich.edu/U/CondUse.aspx. As such, the extract we use must be downloaded separately from https://doi.org/10.3886/ICPSR303102.v1. It should be saved into the ``data/raw'' folder.
National Health Interview Survey (NHIS) data were downloaded from IPUMS on February 26, 2026 [4]. We use data from 2011-2024, extracting the relevant variables on all households.
NHIS does not allow for redistribution, except for the purpose of replication archives. Permissions as per https://www.ipums.org/about/terms have been obtained. The attached data file is intended only for replication purposes. Individuals are not to redistribute the data without permission. For all other uses of these data, please access data directly via the url above.
Medical Expenditure Panel Survey (MEPS) data were downloaded from IPUMS on February 26, 2026 [5]. We use data from 2011-2023, extracting the relevant variables on all households. A copy of the data is provided as part of this archive.
MEPS does not allow for redistribution, except for the purpose of replication archives. Permissions as per https://www.ipums.org/about/terms have been obtained. The attached data file is intended only for replication purposes. Individuals are not to redistribute the data without permission. For all other uses of these data, please access data directly via the url above.
General Social Survey (GSS) data were downloaded from NORC on February 25, 2026 [6]. We use data from 2018 and 2021, when the relevant mental health variable is available, extracting the relevant variables on all survey respondents. A copy of the data is provided as part of this archive. The data are in the public domain.
AI Index (AI) data were downloaded from the authors' website, on February 25, 2026 [7]. A copy of the data is provided as part of this archive. The data are in the public domain.
Ambient Socialization Activities. We created a list of activities that included ambient socialization (See SM Section A.10), which is included here.
Mental Health Medications. We created a list of medications that are commonly used to treat anxiety, depression, and other mental health issues (See SM Section A.5), which is included here.
Occupation Groupings. We created a crosswalk from occupation groupings in the NHIS and MEPS to prefixes of the SOC codes available in the original Dingel-Neimann crosswalks, which are included here.
Census Occupation Crosswalks data were downloaded from https://www.census.gov/topics/employment/industry-occupation/guidance/code-lists.html, the Census' Industry and Occupation Code Lists & Crosswalks database, on December 30, 2024 (2002 Census Crosswalk from 2000 SOC lists); February 27, 2026 (2002 to 2010 Census Occupation Crosswalk); August 14, 2025 (2010 to 2018 Census Occupation Crosswalk); and December 30, 2024 (Census 2000 Codes to SOC 2000 Codes; these are provided in PDF format, so we used OCR to convert them to Excel, which are named occ2000\_1, occ2000\_2, and occ2000\_3 in the crosswalk raw data). A copy of the data is provided as part of this archive. The data are in the public domain.
Bureau of Labor Statistics data were downloaded via Dingel and Neiman's makefile. We use BLS-provided occupation counts (national_M2018_dl.xlsx), the crosswalk from O-NET SOC codes to Census SOC codes (2010_to_SOC_Crosswalk.xlsx), and the crosswalk from the Office of Employment Statistics (OES) occupation codes to those of the Census (oes_2019_hybrid_structure.xlsx), all downloaded on August 27, 2025. We use these specific files as they were used to construct the Dingel-Neiman index, and we follow their Dingel and Neiman's style of aggregating remotability indices for individual occupations up to occupation groups. The data are in the public domain.
References
[1] Jonathan I Dingel and Brent Neiman. How Many Jobs Can be Done at Home? Journal of Public Economics, 189:104235, 2020.
[2] Sarah M. Flood, Liana C. Sayer, Daniel Backman, and Annie Chen. American Time Use Surveydata extract builder: Version 3.3 [dataset], 2025.
[3] Social Research Center. Panel Study of Income Dynamics, public use dataset., 2025.
[4] Lynn A. Blewett, Julia A. Rivera Drew, Andrew Fenelon, Miriam L. King, Kari C. W. Williams, Daniel Backman, Etienne Breton, Grace Cooper, and Stephanie Richards. IPUMS Health Surveys: National Health Interview Survey, version 8.1 [dataset], 2025.
[5] Lynn A. Blewett, Julia A. Rivera Drew, Andrew Fenelon, Daniel Backman, Etienne Breton, Grace Cooper, Stephanie Richards, and Renae Rogers. IPUMS Health Surveys: Medical Expenditure Panel Survey, version 3.0 [dataset], 2025.
[6] Tom W. Smith, Michael Davern, Jeremy Freese, and Stephen L. Morgan. General Social Surveys, 1972–2018 [machine-readable data file], 2019. Principal Investigator: Tom W. Smith; Co-Principal Investigators: Michael Davern, Jeremy Freese, and Stephen L. Morgan; Sponsored by the National Science Foundation; NORC ed.
[7] Gregor Schubert. Organizational technology ladders: Remote Work and Generative AI Adoption. Available at SSRN, 2025.
[8] U.S. Census Bureau. Industry and Occupation Code Lists & Crosswalks, 2025. Page last revised May 6, 2025.
[9] U.S. Bureau of Labor Statistics. Employment, 2025. Last modified September 15, 2025.
Human subjects data
All of the datasets are publicly available in de-identified form. They may be downloaded in their original form from the data hosts. The datasets posted here are derived datasets that have masked variables to further comply with Dryad's de-identification rules. In particular, we have masked the control variables including gender, marital status, parental status, race, educational attainment to ensure we have sufficiently few indirect identifiers.
