Analyze Keystroke Data

The original dataset used here has approximately 136 million presses and 168 thousand volunteers.

After filtering out suspicious (too fast) and incorrect data (e.g., uppercase letter, but no shift pressed), about 55 million presses and 77 thousand volunteers are left.

Usage

Step 1: Filter and convert the dataset (Optional)

Download the 136 Million Keystrokes dataset. (1,4 GiB)
Extract the downloaded dataset into the extract_archive_in_here directory. (16 GiB unzipped)
▶️ Run filter_and_convert_keystroke_dataset.py.

If you'd like to skip this step, you can download the filtered version (filtered_events.csv.gz) from the releases section and move it to the dataset directory.

Step 2: Analyze converted dataset and create training dataset

▶️ Run analyze.py.

If you'd like to use the resulting training dataset to train prediction functions, head over to the evolve_tap_hold_predictors repository.

Evolved functions are used in the Predictive Tap-Hold community module for QMK.

📊 Open analyze.md to see the output for the default settings.

Dataset

The dataset used here can be found at: https://userinterfaces.aalto.fi/136Mkeystrokes/

Vivek Dhakal, Anna Maria Feit, Per Ola Kristensson, Antti Oulasvirta
Observations on Typing from 136 Million Keystrokes. 
In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, ACM, 2018.

@inproceedings{dhakal2018observations,
author = {Dhakal, Vivek and Feit, Anna and Kristensson, Per Ola and Oulasvirta, Antti},
booktitle = {Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18)},
title = {{Observations on Typing from 136 Million Keystrokes}},
year = {2018}
publisher = {ACM}
doi = {https://doi.org/10.1145/3173574.3174220}
keywords = {text entry, modern typing behavior, large-scale study}
}

You are free to use this data for non-commercial use in your own research or projects with attribution to the authors.

As a result, the same applies to all files in the /dataset directory and the GitHub release files that are based on the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
dataset		dataset
utils		utils
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
analyze.md		analyze.md
analyze.py		analyze.py
analyze_qmk_console_log.py		analyze_qmk_console_log.py
filter_and_convert_dataset2.py		filter_and_convert_dataset2.py
filter_and_convert_keystroke_dataset.py		filter_and_convert_keystroke_dataset.py
pyproject.toml		pyproject.toml
see_stats_of_training_data.py		see_stats_of_training_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyze Keystroke Data

Usage

Step 1: Filter and convert the dataset (Optional)

Step 2: Analyze converted dataset and create training dataset

Dataset

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Analyze Keystroke Data

Usage

Step 1: Filter and convert the dataset (Optional)

Step 2: Analyze converted dataset and create training dataset

Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages