Codebase & further details for the paper:
Addressing Data Quality Challenges in Observational Ambulatory Studies: Analysis, methodologies and practical solutions for wrist-worn wearable monitoring
In this project, we address data quality challenges encountered in remote wearable monitoring by utilizing two distinct datasets:
-
ETRI Lifelog 2020: Accessible at ETRI Lifelog 2020
https://nanum.etri.re.kr/share/schung/ETRILifelogDataset2020?lang=En_us -
mBrain21:
https://www.kaggle.com/datasets/jonvdrdo/mbrain21/data
For each identified challenge, denoted as C<ID>
, we have curated a dedicated notebook. These notebooks are specifically designed to demonstrate effective countermeasures against the respective challenges.
- 📰 How is the repository structured?
- 🗃️ How to acquire the data
- ✨ Challenges & features
- 📖 Citation
- 📝 License
├── code_utils <- module containing all shared code
│ ├── empatica <- Empatica E4 specific code (signal processing pipelines)
│ ├── etri <- ETRI specific code (data parsing, visualization, dashboard)
│ ├── mbrain <- mBrain specific code (data parsing, visualization, dashboard)
│ └── utils <- utility code (dashboard, dataframes, interaction analysis)
├── loc_data <- local data folder in which intermediate data is stored
└── notebooks <- Etri and mBrain specific notebooks
├── EmbracePlus.ipynb <- EmbracePlus demo notebook
├── etri
└── mBrain
This repository uses poetry as dependency manager.
A specification of the dependencies is provided in the pyproject.toml
and poetry.lock
files.
You can install the dependencies in your Python environment by executing the following steps;
- Install poetry: https://python-poetry.org/docs/#installation
- Activate you poetry environment by calling
poetry shell
- Install the dependencies by calling
poetry install
The ETRI lifelog 2020 is made available at https://nanum.etri.re.kr/share/schung/ETRILifelogDataset2020?lang=En_us.
In order to download the dataset, you should first create an account on the ETRI Nanum website. Afterwards, fill in the license agreement form, and upon approval, you will be able to download the dataset via the web platform.
A subset of the mBrain21 dataset is made available on Kaggle datasets: The dataset can be downloaded via the following command:
kaggle datasets download -d jonasvdd/mbrain21
Make sure that you've extended the path_conf.py file's hostname if- statement with your machine's hostname and that you've configured the paths to the mBrain
and ETRI
datasets.
Below, a subset of exemplified challenges and features are listed.
This section elaborates on the longitudinal time series visualization dashboards for both the ETRI and mBrain datasets.
Each dashboard contains, as can be observed in the figures below, a left column with selection boxes. The General flow to visualize a specific time series excerpt is as follows:
- Select a
folder
(in our case, all data from the ETRI and MBRAIN dataset are stored in the same folder - so you can only select from one option) - Select an user (e.g, user30 for the ETRI dataset)
note: After selection a folder and user, the time-span selection will be updated to the available time-span for the selected user-folder combination
- Select sensors (e.g. 'E4 accelerometer' and 'E4 temperature')
Finally, to visualize, press the run interact button.
Once the ETRI dataset has been downloaded and parsed via the ETRI parsing notebook, the corresponding dashboard script can be used to explore & analyse the data. The dashboard can be run via the following command (after activating the poetry shell)
python code_utils/etri/dashboard.py
The output should show the following:
Dash is running on http://0.0.0.0:\<PORT>
In the dashboard screenshot below, both the wearable data and the application event labels are visualized. One can immediately observe that this participant tends to be more alone during evenings (light blue shaded area of the lower row in the upper subplot). During the weekends (indicated with a gray shaded area), this participant tends to be alone and spend a lot of time at home.
The dashboard can be run via the following command (after activating the poetry shell)
python code_utils/mBrain/dashboard.py
The output will show the following:
Dash is running on http://0.0.0.0:\<PORT>
Below, we provide a screenshot of the mBrain dashboard. As can be observed from the selection box on the left side, the dashboard shows the headache timeline of the participant, along with the Empatica E4 its accelerometer signal and the smartphone light data. When hovering over a headache event, as shown in the upper plot, one can see the associated characteristics of the headache event.
The wearable non-wear detection is demonstrated in the C5.1_off_wrist_detection notebook.
Moreover, the C7_missing_data notebook demonstrates how this off-wrist pipeline can be used to remove non-wear bouts as a preprocessing step.
Below, a screenshot of the off-wrist pipeline devised by Böttcher et al. (2022) is shown.
The C5.1_label_off_wrist mBrain notebook demonstrates how large bouts of time-series data can be annotated using plotly-resampler.
Below a demo is shown on how this annotation tool can be used to label off-wrist
periods.
@article{van2024addressing,
title={Addressing Data Quality Challenges in Observational Ambulatory Studies: Analysis, Methodologies and Practical Solutions for Wrist-worn Wearable Monitoring},
author={Van Der Donckt, Jonas and Vandenbussche, Nicolas and Van Der Donckt, Jeroen and Chen, Stephanie and Stojchevska, Marija and De Brouwer, Mathias and Steenwinckel, Bram and Paemeleire, Koen and Ongenae, Femke and Van Hoecke, Sofie},
journal={arXiv preprint arXiv:2401.13518},
year={2024}
}
The code is available under the imec license.
👤 Jonas Van Der Donckt