Skip to content

Update regex pattern for diseaseFromSourceMappedId#237

Merged
ireneisdoomed merged 1 commit into
masterfrom
ds_4104_disease_pattern
Jan 20, 2026
Merged

Update regex pattern for diseaseFromSourceMappedId#237
ireneisdoomed merged 1 commit into
masterfrom
ds_4104_disease_pattern

Conversation

@DSuveges

@DSuveges DSuveges commented Dec 1, 2025

Copy link
Copy Markdown
Contributor

Context

Relaxed pattern for validating the contents of the diseaseFromSourceMappedId column. It might not capture all possible problems (eg. not enforces digits in the postfix), but the disease validation downstream takes care about these nuances.

@ireneisdoomed ireneisdoomed left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason behind this?

@DSuveges

DSuveges commented Jan 5, 2026

Copy link
Copy Markdown
Contributor Author

What's the reason behind this?
#4104: Some identifiers defy the {prefix}_{number} pattern. Also, given EFO keeps extending the list of imported ontologies, I'm wondering if we should drop this rather laborious, still crude form of validation completely for this field considering we are validating diseases anyway in PTS.

@ireneisdoomed

Copy link
Copy Markdown
Contributor

We decided after the chat with EVA that we wouldn't keep a whitelist of ontologies in the regex. I suggest changing the pattern to: ^[A-Za-z]+_\d+$

Comment thread schemas/disease_target_evidence.json Outdated
@DSuveges DSuveges force-pushed the ds_4104_disease_pattern branch from 1db5313 to ad636fa Compare January 20, 2026 14:08
Relaxed pattern of validating the contents of the `diseaseFromSourceMappedId` column.
@DSuveges DSuveges force-pushed the ds_4104_disease_pattern branch from ad636fa to c8fb7b8 Compare January 20, 2026 14:15
@ireneisdoomed ireneisdoomed merged commit 6e3109e into master Jan 20, 2026
2 checks passed
@ireneisdoomed ireneisdoomed deleted the ds_4104_disease_pattern branch January 20, 2026 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Relaxing disease identifier pattern in evidence schema

2 participants