Skip to content

Update thl mapping #40

Description

@piotor87

I was looking at the unmapped test and we have
s-p�lyer [u/ml] that clearly should be s-pölyer so i wrote a bash command to fix it. I downloaded the new thl mapping file from 2025
https://koodistopalvelu.kanta.fi/codeserver/pages/classification-view-page.xhtml?classificationKey=88&versionKey=120

and ran

pete@pete-VirtualBox:/media/sf_Dropbox/Projects/kanta_lab_preprocessing/finngen_qc/data$ iconv -f ISO-8859-1 -t UTF-8 120_1387444168447.txt | tr -d '\r' | awk -F';' 'NR==1{print "CodeId\tAbbreviation"} NR>1{abbr=$2; gsub(/ /,"",abbr); print $1"\t"tolower(abbr)}' | grep 7154
7154	s-pölyer
pete@pete-VirtualBox:/media/sf_Dropbox/Projects/kanta_lab_preprocessing/finngen_qc/data$ cat thl_lab_id_abbrv_map.tsv | grep 7154
7154	s-p�lyer

but then i was looking at the changes and there's more changes that also affect our data. with ~500 changes. fortunately if we look at mapped values

OLD_TEST OMOP_ID COUNT NEW_TEST IS_IN_USAGI
pt-ekg-12 3044889 1163408 ekg-12 true
pt-ekg-atk 3013512 299923 ekg-atk true
pt-fvspido 4010399 50607 fvspido false
pt-fvspird 3000492 38360 fvspird false
pt-klr-tje 4094501 13035 klr-tje false
pt-fvspiro 3000492 11757 fvspiro false
pt-fvspio 3000492 10517 fvspio false
-pncanho 3009595 5301 -pnjinho false
s-d-1 3011391 4661 s-d-1,25 false
pt-ekg-15 3004451 3021 ekg-15 false
pt-spirom 4133840 1441 spirom false
s-hasphe 3011951 633 s-haspähe true
s-vehne 3027231 528 s-vehnäe true
s-plyry 3023351 251 s-pölyry true
p-kupari 3001186 148 p-cu true

Some of them are already in the usagi file, others would need to be introduced

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions