Skip to content

update coloc readme to work with R14 data#646

Open
Lipastomies wants to merge 1 commit into
masterfrom
coloc_readme_update_r14.al
Open

update coloc readme to work with R14 data#646
Lipastomies wants to merge 1 commit into
masterfrom
coloc_readme_update_r14.al

Conversation

@Lipastomies

Copy link
Copy Markdown
Collaborator

Imported R14 coloc data, and in the process noticed that some things had been changed.

Added preprocessing step to coloc import, and fixed columns where necessary.

@Lipastomies Lipastomies self-assigned this Jun 10, 2026
@tobtobtob

tobtobtob commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Read this through, looks ok to me. This data loading pipeline is quite complicated. In future we should try to make it simpler, by for example extracting the python scripts into files, so that they can be run directly from the command line. Similarly to what we do in SQL repo. Maybe also this could be moved to SQL repo..? But not to do in this PR, I get that this is just an update to the old pipeline, no need to refactor this now.

(I'm not familiar with the colocalization stuff so maybe it would be good if someone else reviews this also)

@juhaa

juhaa commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Looks ok to me as well. As mentioned, this is quite complicated and could be simplified by quite a lot but needs time to be done properly. And definitely agree that this should be moved to the SQL repo.


## Optional : Data Fixes

Check the trait2 column for any rows where the entry begins with “seq”

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unknown to me. Need to fix this directly in the source rather than here.

@Lipastomies

Lipastomies commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Agree with that. We could do the following:

  1. Move all sql statements, like the coloc & variant table creation, index & view creation to a create_coloc.sql
  2. Make a script/scripts that do the necessary processing for the data to be importable directly into the tables using gcloud sql command.

That way it would be a four-step operation:

  1. create sql tables
  2. process data to ingestable form
  3. copy to bucket
  4. import to sql tables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants