An inertial and positioning dataset for the walking activity
Abstract
We are publishing a walking activity dataset including inertial and positioning information from 18 volunteers including reference distance measured using a trundle wheel. The dataset includes a total of 96.7 Km walked by the volunteers, split into 203 separate tracks. The trundle wheel is of two types: it is either an analogue trundle wheel which provides the total amount of meters walked in a single track, or it is a sensorized trundle wheel which measures every revolution of the wheel, therefore recording a continuous incremental distance.
Each track has data from the accelerometer and gyroscope embedded in the phones, location information from the Global Navigation Satellite System (GNSS), and the step count obtained by the device. The dataset can be used to implement walking distance estimation algorithms and to explore data quality in the context of walking activity and physical capacity tests, fitness, and pedestrian navigation.
README: An Inertial and Positioning Dataset for the walking activity
https://doi.org/10.5061/dryad.n2z34tn5q
Description of the data and file structure
We publish a walking activity dataset including inertial and positioning information from 18 volunteers including reference distance measured using a trundle wheel. Each track has data from the accelerometer and gyroscope embedded in the phones, location information from the Global Navigation Satellite System (GNSS), and the step count obtained by the device.
Example code can be found at the following link
The data folder contains a metadata_tracks.csv file with attributes for each track. In addition, for every participant there is a folder (subject_X, X being a unique subject identifier), and for every track, a sub-folder (X_N, X being a unique subject identifier and N being the track identifier).
Every track folder contains the following CSV files:
Empty cells within a CSV file can be considered as missing data due to jittering, signal loss during data transmission, or recording errors.
events.csv | |
---|---|
signalStart | time at which the test starts (but not walking!), always set as 0ms. |
testStart | ms since signalStart when the GNSS signal reaches enough quality (15m accuracy). |
testEnd | ms since signalStart when data collection stops. |
positions.csv | |
---|---|
ms | milliseconds from signalStart. Zero ms corresponds to the value of testStart in events.csv. |
latitude, longitude, and altitude | geolocation coordinates. |
confInterval | confidence interval reported by the GNSS system heading. |
speed | sample-wise speed computed by the system. [m/s] |
heading | value of heading for that timestamp. |
orientation.csv | |
---|---|
ms | milliseconds from signalStart. |
alpha, beta, gamma | device orientation. |
steps.csv | |
---|---|
ms | milliseconds from signalStart. |
steps | incremental number of steps taken. |
startDate, endDate | ms intervals when those steps were taken. |
floorsUp | number of ascending floors (not to consider) |
floorsDown | number of descending floors (not to consider) |
distance | distance estimated by the smartphone (not to consider) |
motion.csv | |
---|---|
ms | milliseconds from signalStart. |
accelX, accelY, accelZ, accelWithGX, accelWithGY, accelWithGZ | acceleration without and with G force. [m/s^2] |
rotRateAlpha, rotRateBeta, rotRateGamma | rotation rate. |
interval | sampling frequency in ms. |
reference_cont_distance.csv | available if the reference distance is continuous |
---|---|
ms | milliseconds from signalStart |
distance | continuous incremental value of distance [m] |
Columns and description of the metadata_tracks.csv files and attributes:
Missing data for certain tracks (hasMotion: False or hasGNSS: False) is given by technical issues (subjects did not update smartphone app) or challenges to collect GNSS signal.
Column | Description |
---|---|
subject | unique subject ID. |
testID | test ID. |
testName | subjectID_testID. |
isPatient | boolean value True or False. |
distanceReference | total distance walked [m]. |
hasMotion | boolean value True or False if during the walk IMU was collected. |
hasGNSS | boolean value True or False if during the walk GNSS signal was received. |
device | brand, model and operating system of the smartphone. |
distanceByApp | distance measured by the Timed Walk App. |
totSteps | total steps taken during the walk. |
path curvature | 0,1, or 2 indicating a straight, gently curved (<6 90 deg curves) or curved path (>5 90 deg curves). |
total_gaps_time_inertial | total time in seconds where IMU was not received for more than 0.05 seconds. |
total_gaps_time_gnss | total time in seconds where GNSS signal was not received for more than 6 seconds. |
gt_type | "final" or "continuous" according to which type of reference distance was collected. |
country | country where the subject registered the track (UK or SE) |
gnss_anonimized | boolean True or False according to whether the geographical positions were anonymized. |
duration | duration of the walk [s]. |
fs_acc | average IMU sampling frequency. |
fs_gnss | average GNSS signal sampling frequency. |
fs_steps | average step counting sampling frequency. |
average_walking_speed | total reference distance divided by the test duration. |
smartphone_position | Used smartphone position (Hand held). |
smartphone app | Used smartphone app (Timed Walk App or Malisa). |
Methods
The proposed dataset is a collection of walks where participants used their own smartphones to capture inertial and positioning information. The participants involved in the data collection come from two sites. The first site is the Oxford University Hospitals NHS Foundation Trust, United Kingdom, where 10 participants (7 affected by cardiovascular diseases and 3 healthy individuals) performed unsupervised 6MWTs in an outdoor environment of their choice (ethical approval obtained by the UK National Health Service Health Research Authority protocol reference numbers: 17/WM/0355). All participants involved provided informed consent. The second site is at Malm ̈o University, in Sweden, where a group of 9 healthy researchers collected data.
This dataset can be used by researchers to develop distance estimation algorithms and how data quality impacts the estimation.
The walked paths are of variable length, duration, and shape. Participants were instructed to walk paths of increasing curvature, from straight to rounded. Irregular paths are particularly useful in determining limitations in the accuracy of walked distance algorithms. Two smartphone applications were developed for collecting the information of interest from the participants' devices, both available for Android and iOS operating systems. The first is a web-application that retrieves inertial data (acceleration, rotation rate, orientation) while connecting to the sensorized trundle wheel to record incremental reference distance [1]. The second app is the Timed Walk app [2], which guides the user in performing a walking test by signalling when to start and when to stop the walk while collecting both inertial and positioning data. All participants in the UK used the Timed Walk app.
The data collected during the walk is from the Inertial Measurement Unit (IMU) of the phone and, when available, the Global Navigation Satellite System (GNSS). In addition, the step count information is retrieved by the sensors embedded in each participant’s smartphone. With the dataset, we provide a descriptive table with the characteristics of each recording, including brand and model of the smartphone, duration, reference total distance, types of signals included and additionally scoring some relevant parameters related to the quality of the various signals. The path curvature is one of the most relevant parameters. Previous literature from our team, in fact, confirmed the negative impact of curved-shaped paths with the use of multiple distance estimation algorithms [3]. We visually inspected the walked paths and clustered them in three groups, a) straight path, i.e. no turns wider than 90 degrees, b) gently curved path, i.e. between one and five turns wider than 90 degrees, and c) curved path, i.e. more than five turns wider than 90 degrees. Other features relevant to the quality of collected signals are the total amount of time above a threshold (0.05s and 6s) where, respectively, inertial and GNSS data were missing due to technical issues or due to the app going in the background thus losing access to the sensors, sampling frequency of different data streams, average walking speed and the smartphone position. The start of each walk is set as 0 ms, thus not reporting time-related information. Walks locations collected in the UK are anonymized using the following approach: the first position is fixed to a central location of the city of Oxford (latitude: 51.7520, longitude: -1.2577) and all other positions are reassigned by applying a translation along the longitudinal and latitudinal axes which maintains the original distance and angle between samples. This way, the exact geographical location is lost, but the path shape and distances between samples are maintained. The difference between consecutive points “as the crow flies” and path curvature was numerically and visually inspected to obtain the same results as the original walks. Computations were made possible by using the Haversine Python library.
This research is partially funded by the Swedish Knowledge Foundation and the Internet of Things and People research center through the Synergy project Intelligent and Trustworthy IoT Systems.