update coloc readme to work with R14 data#646
Conversation
|
Read this through, looks ok to me. This data loading pipeline is quite complicated. In future we should try to make it simpler, by for example extracting the python scripts into files, so that they can be run directly from the command line. Similarly to what we do in SQL repo. Maybe also this could be moved to SQL repo..? But not to do in this PR, I get that this is just an update to the old pipeline, no need to refactor this now. (I'm not familiar with the colocalization stuff so maybe it would be good if someone else reviews this also) |
|
Looks ok to me as well. As mentioned, this is quite complicated and could be simplified by quite a lot but needs time to be done properly. And definitely agree that this should be moved to the SQL repo. |
|
|
||
| ## Optional : Data Fixes | ||
|
|
||
| Check the trait2 column for any rows where the entry begins with “seq” |
There was a problem hiding this comment.
This was unknown to me. Need to fix this directly in the source rather than here.
|
Agree with that. We could do the following:
That way it would be a four-step operation:
|
Imported R14 coloc data, and in the process noticed that some things had been changed.
Added preprocessing step to coloc import, and fixed columns where necessary.