A small container to get an OMOP CDM database running quickly, with support for both PostgreSQL and SQL Server.
Drop your data into data/, and run the container.
You can configure the container or CLI using the following environment variables:
DB_HOST: The hostname of the database. Default isdb.DB_PORT: The port number of the database. Default is5432.DB_USER: The username for the database. Default ispostgres.DB_PASSWORD: The password for the database. Default ispassword.DB_NAME: The name of the database. Default isomop.DIALECT: The type of database to use. Default ispostgresql, but can also bemssql.SCHEMA_NAME: The name of the schema to be created/used in the database. Default ispublic.DATA_DIR: The directory containing the data CSV files. Default isdata.SYNTHETIC: Load synthetic data (boolean). Default isfalseSYNTHETIC_NUMBER: Size of synthetic data,100or1000. Default is100.DELIMITER: The delimiter used to separate data. Default istab, can also be,
pip install omop-lite
python omop-lite --help
docker run -v ./data:/data ghcr.io/health-informatics-uon/omop-lite
# docker-compose.yml
services:
omop-lite:
image: ghcr.io/health-informatics-uon/omop-lite
volumes:
- ./data:/data
depends_on:
- db
db:
image: postgres:latest
environment:
- POSTGRES_DB=omop
- POSTGRES_PASSWORD=password
ports:
- "5432:5432"To install using Helm:
# Add the Helm repository
helm install omop-lite oci://ghcr.io/health-informatics-uon/charts/omop-lite --version 0.2.2The Helm chart deploys OMOP Lite as a Kubernetes Job that creates an OMOP CDM in a database. You can customise the installation using a values file:
# values.yaml
env:
dbHost: postgres
dbPort: "5432"
dbUser: postgres
dbPassword: postgres
dbName: omop_helm
dialect: postgresql
schemaName: public
synthetic: "false" Install with custom values:
helm install omop-lite omop-lite/omop-lite -f values.yamlIf you need synthetic data, some is provided in the synthetic directory. It provides a small amount of data to load quickly.
To load the synthetic data, run the container with the SYNTHETIC environment variable set to true.
- 100 is fake data
- 1000 is Synthea 1k data.
- 1001 is Synthea 1k data but with Specimen, Death, Device Exposure added in
You can provide your own data for loading into the tables by placing your files in the data/ directory. This should contain .csv files matching the data tables (DRUG_STRENGTH.csv, CONCEPT.csv, etc.).
To match the vocabulary files from Athena, this data should be tab-separated, but as a .csv file extension.
You can override the delimiter with DELIMITER configuration.
Adding a tsvector column to the concept table and an index on that column makes full-text search queries on the concept table run much faster.
Postgres does vector search too!
To enable these features in omop-lite, you can use the text-search profile
docker compose --profile text-search upTo do this, you need to have text-search/embeddings.parquet, containing concept_ids and embeddings (an example file is provided).
This uses pgvector to create an embeddings table.
If you're a developer and want to iterate on omop-lite quickly, there's a small subset of the vocabularies sufficient to build in synthetic/.
If you wish to test the vector search, there are matching embeddings in embeddings/embeddings.parquet.