Loneliness and well-being in Finnish immigrants: A multimodal dataset from wearables and passive data collection
Data files
Jun 11, 2025 version files 1.97 GB
-
Loneliness_Dataset_Jun6.zip
1.97 GB
-
README.md
29.23 KB
Abstract
This dataset was collected from first-generation immigrants between 2022 and 2023. Over a 28-day period, 39 participants aged 18 to 65, fluent in English and experiencing loneliness (UCLA Loneliness Scale score ≥ 28) contributed to the study. Data collection utilized Samsung Watch Active 2, Oura Ring, AWARE, and Centralive smartphone application. This dataset contains raw data from photoplethysmogram (PPG), inertial measurement unit (IMU) readings, air pressure, and processed data on heart rate, heart rate variability, sleep metrics (bedtime, stages, quality), physical activity (steps, active calories, activity types), and smartphone usage patterns (screen time, notifications, call and message logs). Participants also completed ecological momentary assessments (EMA) and weekly surveys, including instruments like the Beck Depression Inventory (BDI), Patient Health Questionnaire-9 (PHQ-9), Perceived Stress Scale, Sense of Coherence Scale, Social Connectedness Scale, Twente Engagement with E-Health Technologies questionnaire, and the UCLA Loneliness Scale. This dataset can be used to study the interplay between loneliness, mental well-being, and daily behaviors of immigrants in a real-world context.
Overview
The dataset consists of longitudinal physiological, behavioral, and self-reported data collected from first-generation immigrants in Finland during 2022 and 2023. The study included 39 participants aged 18–65, all fluent in English and experiencing loneliness (UCLA Loneliness Scale score ≥28). Data were collected over a 28-day period using multimodal sources, including the Samsung Watch Active 2, Oura Ring, and the AWARE smartphone application.
The dataset includes raw and processed data on cardiovascular health, sleep patterns, physical activity, smartphone usage, and mental health assessments. Daily and weekly ecological momentary assessments (EMA) captured momentary emotional states, while structured surveys administered through Centralive provided insights into participants’ mental health and well-being.
Data and File Structure
At the root of the dataset directory, each participant has an individual folder, named “Participant_x,” where “x” represents a unique participant identifier. These folders contain all data collected from that participant, organized into four subfolders based on data sources:
- Aware: Passive smartphone usage data collected through the AWARE app.
- Oura: Sleep and activity data collected from the Oura Ring.
- Watch: Physiological and motion data from the Samsung Watch Active 2.
- Survey: Self-reported mental health and well-being assessments.
Each subfolder contains Comma-Separated Value (CSV) files with relevant data points. A description of the variables in each file is provided below.
Data Gaps and Exceptions
- Phone permissions: Specific power-management settings prevented continuous background recording.
- Device power-saving modes: Under real-life usage, the wearables sometimes entered a low-power state, interrupting scheduled data captures.
- Participant choice: Participants were informed they could remove the device if it interfered with normal activities; in some cases, they chose not to wear it consistently.
Due to phone permission settings, real-world device power-saving modes, or participant choice not to wear the device, certain expected data are missing.
-
Participant 25 (Oura)
Participant 25’s Oura device was never connected, so no Oura data exist for this participant. The Oura subfolder for Participant_25 remains empty.
-
Watch 12-Minute PPG Segments (Participants 14, 24, 46)
The 12-minute PPG segment every two hours for Participants 14, 24, and 46 are not available in the dataset.
1. Aware (Smartphone Data)
This folder contains passive smartphone usage and behavioral data collected using the AWARE app. Data files include:
- battery.csv: Logs on battery usage and charging events.
- calls.csv: Records of incoming and outgoing calls.
- messages.csv: Logs of sent and received text messages.
- notifications.csv: Information on received notifications.
- screen.csv: Data on screen usage, such as screen-on and screen-off events.
Table 1. Variables in battery.csv
Column Name | Description | Type |
---|---|---|
timestamp | 13-digit timestamp | Timestamp (milliseconds) |
participant | Participant ID | String |
battery_charge_start | Percentage of the battery when starting to charge | Integer |
battery_charge_end | Percentage of the battery when stopping the charge | Integer |
Table 2. Variables in calls.csv
Column Name | Description | Type |
---|---|---|
timestamp | 13-digit timestamp | Timestamp (milliseconds) |
participant | Participant ID | String |
dur | Length of the call session | Integer |
type | 1 – incoming, 2 – outgoing, 3 – missed | Integer |
Table 3. Variables in messages.csv
Column Name | Description | Type |
---|---|---|
timestamp | 13-digit timestamp | Timestamp (milliseconds) |
participant | Participant ID | String |
message_type | 1 – received, 2 – sent | Integer |
Table 4. Variables in notifications.csv
Column Name | Description | Type |
---|---|---|
timestamp | 13-digit timestamp | Timestamp (milliseconds) |
participant | Participant ID | String |
package_category | Application’s category in Google Play Store | String |
Table 5. Variables in screen.csv
Column Name | Description | Type |
---|---|---|
timestamp | 13-digit timestamp | Timestamp (milliseconds) |
participant | Participant ID | String |
screen_status | 0=off, 1=on, 2=locked, 3=unlocked | Integer |
2. Oura (Sleep and Activity Data)
This folder holds data collected from the Oura Ring, covering metrics related to sleep and physical activity. The dataset has been aligned by timestamp to ensure consistency across features.
- Variables with “sleep_5min” in their names were sampled every five minutes while participants were asleep.
- Variables with “activity_5min” were recorded every five minutes throughout the day.
- Variables with “1min” were recorded every one minute throughout the day.
- All other features were recorded once per day.
For variable descriptions, refer to the Oura API Documentation (Oura Ring Cloud API, v2).
Table 6. Variables in oura.csv
Column Name | Description | Type (Unit) |
---|---|---|
timestamp | 13-digit timestamp | Timestamp (milliseconds) |
participant | Participant ID | String |
OURA_activity_average_met | Average metabolic equivalent of task (MET), indicating the average activity intensity. | Float (calories) |
OURA_activity_cal_active | Total calories burned during active periods. | Integer |
OURA_activity_cal_total | Total calories burned | Integer |
OURA_activity_class_5min | Activity classification data every 5 minutes. (0=non wear, 1= rest, 2= inactive, 3=low activity, 4=medium activity, 5=high activity) | Integer |
OURA_activity_daily_movement | Daily total movement, usually measured in steps or distance traveled. | Integer |
OURA_activity_high | Total time spent in high-intensity activity, measured in minutes. | Integer |
OURA_activity_inactive | Total inactive time, measured in minutes. | Integer |
OURA_activity_inactivity_alerts | Number of inactivity alerts triggered after prolonged inactivity. | Integer |
OURA_activity_low | Total time spent in low-intensity activity, measured in minutes. | Integer |
OURA_activity_medium | Total time spent in medium-intensity activity, measured in minutes. | Integer |
OURA_activity_met_1min | Per-minute MET data. | Float |
OURA_activity_met_min_high | Total MET minutes for high-intensity activity. | Integer |
OURA_activity_met_min_inactive | Total MET minutes for inactivity. | Integer |
OURA_activity_met_min_low | Total MET minutes for low-intensity activity. | Integer |
OURA_activity_met_min_medium | Total MET minutes for medium-intensity activity. | Integer |
OURA_activity_non_wear | Total time when the device was not worn, measured in minutes. | Integer |
OURA_activity_rest | Total time at rest, measured in minutes. | Integer |
OURA_activity_rest_mode_state | Rest mode state, indicating whether the user was in rest mode. | Boolean |
OURA_activity_score | Activity score, assessing overall daily activity level. | Integer |
OURA_activity_score_meet_daily_targets | Score for meeting daily targets, evaluating if daily activity goals were achieved. | Integer |
OURA_activity_score_move_every_hour | Score for moving every hour, evaluating if the user had activity each hour. | Integer |
OURA_activity_score_recovery_time | Recovery time score, assessing user’s recovery status. | Integer |
OURA_activity_score_stay_active | Stay active score, evaluating user’s consistent activity levels. | Integer |
OURA_activity_score_training_frequency | Training frequency score, evaluating the user’s training frequency. | Integer |
OURA_activity_score_training_volume | Training volume score, assessing the overall volume of user’s training. | Integer |
OURA_activity_steps | Total steps, indicating the daily number of steps taken. | Integer |
OURA_activity_target_calories | Target calories, user-defined goal for daily calorie burn. | Integer |
OURA_activity_target_km | Target distance in kilometers, user-defined goal for daily walking or running distance. | Float |
OURA_activity_target_miles | Target distance in miles, user-defined goal for daily walking or running distance. | Float |
OURA_activity_to_target_km | Remaining distance to target in kilometers, indicating how much distance is left to reach the target. | Float |
OURA_activity_to_target_miles | Remaining distance to target in miles, indicating how much distance is left to reach the target. | Float |
OURA_activity_total | Total active time, measured in minutes. | Integer |
OURA_ideal_bedtime_bedtime_window_end | End time of the ideal bedtime window. | Timestamp |
OURA_ideal_bedtime_bedtime_window_start | Start time of the ideal bedtime window. | Timestamp |
OURA_readiness_period_id | Unique identifier for the readiness period. | String |
OURA_readiness_rest_mode_state | Readiness rest mode state, indicating whether the user was in rest mode. | Boolean |
OURA_readiness_score | Readiness score, assessing the user’s overall recovery and readiness state. | Integer |
OURA_readiness_score_activity_balance | Activity balance score, assessing balance between activity and rest. | Integer |
OURA_readiness_score_hrv_balance | HRV balance score, assessing balance in heart rate variability. | Integer |
OURA_readiness_score_previous_day | Previous day’s readiness score. | Integer |
OURA_readiness_score_previous_night | Previous night’s readiness score. | Integer |
OURA_readiness_score_recovery_index | Recovery index score, assessing user’s recovery speed. | Integer |
OURA_readiness_score_resting_hr | Resting heart rate score, assessing resting heart rate health. | Integer |
OURA_readiness_score_sleep_balance | Sleep balance score, assessing sleep quality and balance. | Integer |
OURA_readiness_score_temperature | Temperature score, assessing stability and health of temperature. | Integer |
OURA_sleep_average_breath_variation | Average breathing rate variation, assessing variation in breathing rate. | Float |
OURA_sleep_awake | Total awake time during sleep, measured in minutes. | Integer |
OURA_sleep_bedtime_end_delta | Difference between actual and planned wake-up time, measured in minutes. | Integer |
OURA_sleep_bedtime_start_delta | Difference between actual and planned bedtime, measured in minutes. | Integer |
OURA_sleep_breath_average | Average breathing rate, measured in breaths per minute. | Float |
OURA_sleep_deep | Time spent in deep sleep, measured in minutes. | Integer |
OURA_sleep_duration | Total sleep duration, measured in minutes. | Integer |
OURA_sleep_efficiency | Sleep efficiency, ratio of total sleep time to time in bed. | Integer |
OURA_sleep_got_up_count | Number of times user got up during sleep. | Integer |
OURA_sleep_hr_5min | Heart rate data every 5 minutes during sleep. | Integer |
OURA_sleep_hr_average | Average heart rate during sleep. | Integer |
OURA_sleep_hr_lowest | Lowest heart rate during sleep. | Integer |
OURA_sleep_hypnogram_5min | Sleep stage data every 5 minutes. (1=Deep Sleep, 2=Light Sleep, 3=REM Sleep, 4=Awake) | String |
OURA_sleep_is_longest | Indicates if the sleep period was the longest sleep session of the day. | Boolean |
OURA_sleep_light | Time spent in light sleep, measured in minutes. | Integer |
OURA_sleep_lowest_heart_rate_time_offset | Offset time to the lowest heart rate during sleep, measured in minutes. | Integer |
OURA_sleep_midpoint_at_delta | Offset between actual and planned sleep midpoint, measured in minutes. | Integer |
OURA_sleep_midpoint_time | Midpoint time of the sleep period. | Timestamp |
OURA_sleep_onset_latency | Sleep latency, time taken to fall asleep, measured in minutes. | Integer |
OURA_sleep_period_id | Unique identifier for the sleep period. | String |
OURA_sleep_rem | REM sleep duration, measured in minutes. | Integer |
OURA_sleep_restless | Restless time during sleep, indicating periods of disturbance. | Integer |
OURA_sleep_rmssd | RMSSD value during sleep, assessing heart rate variability. | Float |
OURA_sleep_rmssd_5min | RMSSD data every 5 minutes during sleep. | Float |
OURA_sleep_score | Sleep score, assessing overall sleep quality. | Integer |
OURA_sleep_score_alignment | Sleep alignment score, assessing alignment of sleep time with ideal time. | Integer |
OURA_sleep_score_deep | Deep sleep score, assessing quality of deep sleep. | Integer |
OURA_sleep_score_disturbances | Disturbance score, assessing interruptions in sleep. | Integer |
OURA_sleep_score_efficiency | Efficiency score, assessing effectiveness of sleep duration. | Integer |
OURA_sleep_score_latency | Latency score, assessing time taken to fall asleep. | Integer |
OURA_sleep_score_rem | REM sleep score, assessing quality of REM sleep. | Integer |
OURA_sleep_score_total | Total sleep score, assessing overall sleep quality. | Integer |
OURA_sleep_temperature_delta | Change in body temperature during sleep, measured in degrees Celsius. | Float |
OURA_sleep_temperature_deviation | Temperature deviation during sleep, indicating deviation from normal body temperature. | Float |
OURA_sleep_temperature_trend_deviation | Temperature trend deviation, indicating trend deviation over time. | Float |
OURA_sleep_total | Total time spent in sleep, measured in minutes. | Integer |
OURA_sleep_wake_up_count | Number of times user woke up during sleep. | Integer |
3. Watch (Physiological and Motion Data)
This folder contains raw physiological and motion data from the Samsung Watch Active 2. The device generated two types of PPG files:
- 30-second PPG files (automatically recorded when the Watch detected a sleep event)
- Scheduled 12-minute PPG segments (intended to record continuous PPG every two hours; see “Data Gaps and Exceptions” for missing intervals)
Each of these CSV files includes the following columns (sensor streams):
- PPG (Photoplethysmogram)
- Heart Rate (HRM)
- Accelerometer (accx, accy, accz)
- Gyroscope (gyrx, gyry, gyrz)
- Gravity Sensor (grax, gray, graz)
- Barometric Pressure Sensor
Table 7. Variables in Watch Data
Column Name | Description | Type |
---|---|---|
timestamp | 13-digit timestamp | Timestamp (milliseconds) |
ppg | Photoplethysmography signal | Float |
hrm | Heart rate measured in BPM | Integer |
accx | Acceleration in x-axis | Float |
accy | Acceleration in y-axis | Float |
accz | Acceleration in z-axis | Float |
grax | Gravity component in the x-axis | Float |
gray | Gravity component in the y-axis | Float |
graz | Gravity component in the z-axis | Float |
gyrx | Gyroscope measurement in the x-axis | Float |
gyry | Gyroscope measurement in the y-axis | Float |
gyrz | Gyroscope measurement in the z-axis | Float |
pressure | Atmospheric pressure | Float |
4. Survey (Self-Reported Assessments)
This folder contains self-reported survey data, administered at different time points through Centralive app. Surveys include:
- BDI: Beck Depression Inventory (beginning and end of study)
- PHQ-9: Patient Health Questionnaire-9 (weekly)
- Perceived Stress Scale: (beginning, weekly, and end of study)
- Sense of Coherence Scale: (beginning and end of study)
- Social Connectedness Scale: (beginning and end of study)
- Twente Engagement with E-health Technologies: (beginning, weekly, and end of study)
- UCLA Loneliness Scale: (end of study)
- EMA: Daily ecological momentary assessments capturing feelings of loneliness, connectedness, isolation, and positive/negative emotions (rated on a 0-10 scale).
Table 8. Additional Variables in Survey Files (Excluding Raw Scale Scores)
Column name | Description | Type |
---|---|---|
timestamp | 13-digit timestamp | datetime |
participant | Participant ID | string |
date | year-month-day | datetime |
status | Whether the submission is completed by participant | boolean |
Additional Notes
To further protect participant privacy while preserving the relative timing of events, all original 13-digit millisecond timestamps have been uniformly shifted by a randomly selected offset. This “date shifting” approach ensures the order and intervals of events remain exactly the same for each participant, supporting downstream analyses of within-participant temporal features (e.g., inter-event durations, activity bursts).
Human subjects data
All participants provided written informed consent to share their de-identified data for public research purposes at the time of enrollment.
To protect participant privacy and minimize the risk of re-identification, we applied the following de-identification procedures:
- All direct identifiers (e.g., names, contact information, device IDs) were removed.
- Each participant was assigned a pseudonymous identifier in the format Participant_#.
- Timestamp fields were randomly shifted to obscure precise timing while preserving temporal patterns essential for analysis.
- App identifiers were generalized into broader categories (e.g., “social media app”).
- GPS location data were excluded.
Design and set up
This study was designed to create a longitudinal dataset capturing physiological, behavioral, and psychological data from first-generation immigrants living in Finland. The dataset aims to support research on the relationship between mental health and daily lifestyle factors, providing a foundation for further detection algorithm development.
To achieve this, the study collected multimodal data over a 28-day period from every participant. Objective data were gathered from wearable devices, which recorded sleep patterns, physical activity, and cardiovascular health metrics and raw PPG signals. Passive smartphone data, such as screen usage, notifications, calls, and messages, were also collected to capture digital behavior patterns.
Subjective data were collected through EMAs delivered via push notifications and weekly self-report surveys. These assessments measured daily emotional states—loneliness, stress, depression, and social connectedness. By integrating multiple data sources, this dataset allows researchers to explore the complex interactions between mental health and lifestyle behaviors under free-living conditions.
Data collection
To facilitate continuous data collection and remote monitoring, the Centralive was used. Centralive is a digital health platform designed for continuous data collection, data storage, real-time monitoring, and remote management of participant engagement throughout the study. Data was collected using different applications, and wearable devices all centralized to the Centralive system. Then the collected data was transferred and stored in the Centralive’s cloud server. The Centralive’s dashboard was used to monitor the collected data to monitor participant’s engagement during data collection.
To collect the subjective daily EMAs and weekly surveys, the Centralive prompted the daily EMAs at 8 a.m., 2 p.m., 5 p.m., 8 p.m., and 10 p.m. to every participant. The daily EMA contains questions focusing on their current emotions including feelings of loneliness, social connectedness, and affect. The weekly EMA was open from 12 a.m. to 11:59 p.m. and prompted participants every Sunday.
Samsung watch active 2, equipped with Tizen open-source Operating System (TizenOS) was used to collect objective physiological signals. The device recorded photoplethysmography (PPG), accelerometer, and gyroscope data at a sampling rate of 20 Hz, while air pressure measurements were captured at 10 Hz. Data collection was scheduled at two-hour intervals, with each recording session lasting 12 minutes.
The Oura Ring was used to track participants' sleep and activity patterns throughout the study. Data collected by the Oura Ring, including sleep, activity metrics, and cardiac metrics including heart rate and heart rate variability sensed during sleep. Centralive utilized Open Authentication to securely access and retrieve these data, making them available to researchers on a daily basis for further analysis.
The AWARE framework was used to collect passive phone activity data. The AWARE app ran in the background on participants’ smartphones, continuously logging data without requiring active user input. The collected data included battery usage patterns, recording charging events and power consumption to monitor device usage trends. Call logs were also recorded, tracking incoming and outgoing calls with metadata such as timestamps and call duration, but without capturing conversation content. Similarly, message logs documented sent and received text messages, preserving metadata while ensuring privacy. Notifications data provided insights into participants’ digital engagement by logging received notifications, including app source and timestamps. Screen usage patterns were also recorded, capturing screen-on and screen-off events to estimate interaction frequency and duration.
Recruitment and enrollment
Participants were recruited between 2022 and 2023 through purposive and snowball sampling methods. Recruitment was conducted at various sites across Finland. Eligible participants were encouraged to recommend other first-generation immigrants who met the study criteria, expanding the recruitment network.
To be eligible for the study, participants had to meet the following criteria: (1) be between 18 and 65 years old, (2) be fluent in English, (3) have resided in Finland for a relatively short time, and (4) experience loneliness, as indicated by a UCLA Loneliness Scale score of 28 or higher. A total of 42 participants initially enrolled, but three withdrew before completing the study. Therefore, data from the remaining 39 participants were included in the final dataset.
Upon expressing interest, potential participants were screened to confirm eligibility. Those who qualified were scheduled for an in-person enrollment session. During the session, participants were provided with detailed information about the study, and the research team reviewed the informed consent form before obtaining written consent. Participants then completed baseline psychological assessments, including surveys measuring loneliness, depression, stress, and social connectedness.
After enrollment, participants were guided through the study setup process. The research team assisted in configuring wearable devices, including the Samsung Watch Active 2 and the Oura Ring, and ensured that all necessary applications (AWARE, Oura, Galaxy Wearable) were installed on participants’ smartphones. Instructions were provided on how to use and maintain the devices throughout the study.
Participants were required to wear the devices daily and respond to EMAs and weekly surveys. The research team remotely monitored data collection through the Centralive platform to understand participant use and support and data integrity.
Exit
At the end of the 28-day study period, participants received a final set of surveys through Centralive. These surveys included the Beck Depression Inventory (BDI), the Patient Health Questionnaire-9 (PHQ-9), the Perceived Stress Scale, the Sense of Coherence Scale, the Social Connectedness Scale, the Twente Engagement with E-Health Technologies questionnaire, and the UCLA Loneliness Scale.
After completing these final assessments, participants were instructed to remove all study-related applications from their smartphones, including AWARE, Oura, and Galaxy Wearable. They were also required to reset the Samsung Watch Active 2 and Oura Ring before returning the devices.