This repository contains data preprocessing scripts for the following datasets in FinRegistry:
- DVV & DVV Extended
- ETK Pension
- FICC Intensive Care
- KELA Kanta (drug prescriptions & drug deliveries)
- KELA Drug Reimbursement
- KELA Drug Purchase
- Minimal Phenotype
- SF Causes of Death
- SF Education, Occupation, and SES
- THL AvoHilmo
- THL Birth
- THL Cancer
- THL Hilmo
- THL Infectious Diseases
- THL Malformations
- THL Social Assistance
- THL Social Hilmo
- THL Vaccination
In addition, scripts for generating a pedigree in FinRegistry are available here: https://github.com/dsgelab/FinRegistry_pedigree
The preprocessing steps of the datasets are summarized in the GitHub Releases. The code used for generating the processed dataset is attached to each release.
The repository also includes scripts used for profiling each dataset for the FinRegistry data dictionary. Please note that profiling list-type columns is not currently implemented.