Language-based representations have recently emerged as a promising approach for cross-domain Human Activity Recognition (HAR) in smart homes, where binary sensor streams are verbalized into natural-language descriptions and processed by pretrained encoders. However, prior work has typically fixed both the textualization scheme and the text embedding model, leaving open how linguistic design choices influence transferability. This paper presents a comprehensive factorial analysis of textualization and embedding strategies for language-based HAR. We systematically vary (i) how sensor event windows are expressed—across seven existing and novel sequential and summarized textualizations—and (ii) how they are embedded using lexical (TF-IDF), static (Word2Vec), and contextual (SBERT) encoders. Experiments on four public smart-home datasets under consistent in-domain and cross-domain transfer conditions reveal that textualization design, not encoder complexity, governs performance. Sequential, event-ordered sentences maximize in-domain accuracy, while single-sentence, schema-based summaries—such as the proposed Compound Sensor Summary (CSS)—generalize best across homes. Clause-level ablations further show that event descriptions drive recognition, whereas explicit timing information can reduce robustness by overfitting to home-specific schedules. Overall, our findings establish a reproducible framework for analyzing and designing language-based representations in HAR, demonstrating that linguistic structure—rather than deep contextualization—is the primary determinant of domain robustness in smart-home activity recognition.
- Python >= 3.12
All dependencies are specified in pyproject.toml. Key packages include:
- PyTorch (< 2.7)
- Lightning
- Sentence Transformers
- Scikit-learn
- Gensim
- Streamlit
- Matplotlib, Seaborn
- And more (see
pyproject.tomlfor complete list)
The recommended way to set up this project is using uv, a fast Python package installer and resolver.
- Install
uvif you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh- Create a virtual environment and install dependencies:
uv venv
source .venv/bin/activate # On Linux/macOS
uv pip install -e .If you prefer using pip:
python -m venv .venv
source .venv/bin/activate # On Linux/macOS
pip install -e .The project provides two main modes of operation through main.py:
Launch an interactive Streamlit dashboard to explore datasets and visualizations:
python main.py --mode exploreThis will start a Streamlit application for:
- Exploring activity data
- Exploring sensor event data and metadata
- Analyzing seven window textualization methods
- Experimenting with vectorization methods (TF-IDF, Word2Vec, SBERT) and visualizing UMAP projections
Execute the full experimental pipeline:
python main.py --mode runxpThis runs the complete experiment workflow as configured in config/experiment.toml.
All datasets are located in the testbed/ directory with the following structure:
testbed/
├── casas-aruba/
│ ├── activities.csv # Activity annotations
│ ├── sensors.csv # Sensor event data
│ └── meta.toml # Dataset metadata (sensors, activities)
├── casas-milan/
│ ├── activities.csv
│ ├── sensors.csv
│ └── meta.toml
├── ordonez-a/
│ ├── activities.csv
│ ├── sensors.csv
│ └── meta.toml
└── ordonez-b/
├── activities.csv
├── sensors.csv
└── meta.toml
Each dataset directory contains:
activities.csv: Ground truth activity labels with timestampssensors.csv: Raw sensor event datameta.toml: Metadata including sorted sensor names and activity labels
Configuration files are located in the config/ directory:
config/data.toml: Activity definitions for each testbedconfig/experiment.toml: Experiment parameters including:- Window parameters (
nb_days,pad_trunc_length) - Vectorization settings (TF-IDF, Word2Vec, SBERT)
- Model hyperparameters (MLP, LSTM)
- Cross-validation settings (
n_splits)
- Window parameters (
config/mailing.toml: Email notification settings (not included in repository)
To enable email notifications for experiment completion, create a config/mailing.toml file with the following structure:
[email]
sender = "your-email@gmail.com"
recipient = "recipient-email@gmail.com"
server = "smtp.gmail.com"
port = 587
secret = "your-app-password-here"Note: For Gmail, you'll need to generate an App Password rather than using your regular password.
All outputs are stored in the output/ directory:
output/
├── tidy_results.csv # Experiment results in tidy format
├── figures/ # Generated plots and visualizations
└── paper/ # Paper-ready assets
output/tidy_results.csv: Main results file containing experimental outcomesoutput/figures/: Visualizations and plots generated during analysisoutput/paper/: Publication-ready figures and tables
Pre-computed results are available in the imported/ directory:
imported/
├── tidy_results.csv # Main experiment results
└── tidy_results_ablation.csv # Ablation study results
The scripts/ directory contains Jupyter notebooks for analysis and visualization:
scripts/paper_assets.ipynb: Generates figures and tables for publicationsscripts/xp_runner_usage_demo.ipynb: Demonstrates how to run experiments programmatically
To use the notebooks:
jupyter notebook scripts/The core functionality is organized in the codebase/ package:
domain.py: Core domain models and enums (TestBed definitions)loading.py: Dataset loading utilitiespreprocessing.py: Data preprocessing functionswindowing.py: Sliding window generationtextualization.py: Window-to-text conversionvectorization.py: Text-to-vector encoding (TF-IDF, Word2Vec, SBERT)lstm.py: LSTM model implementationexperiment.py: Experiment orchestrationplotting.py: Visualization utilitiesreporting.py: Results reportingmailing.py: Email notification system
- Set up the environment:
uv venv && source .venv/bin/activate
uv pip install -e .- Explore the data interactively:
python main.py --mode explore- Run an experiment:
python main.py --mode runxpor use the notebook to run custom experiments:
jupyter notebook scripts/xp_runner_usage_demo.ipynb- Analyze results:
jupyter notebook scripts/paper_assets.ipynbIf you use this work in your research, please cite our paper:
@article{flowar-textual2026,
title={Towards Domain-Robust Activity Recognition using Textual Representations of Binary Sensor Events},
author={Ali Ncibi},
year={2026},
booktitle={Proceedings of the 18th International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},
publisher={SciTePress},
organization={INSTICC},
doi={XX.XXXX/XXXXX}
}See CITATION.cff for machine-readable citation information.
See LICENSE file for details.