Skip to content

Feature: Add external database volume and external database connection#155

Open
nshandra wants to merge 9 commits intodevelopmentfrom
feature/external-database
Open

Feature: Add external database volume and external database connection#155
nshandra wants to merge 9 commits intodevelopmentfrom
feature/external-database

Conversation

@nshandra
Copy link
Copy Markdown
Contributor

@nshandra nshandra commented Dec 5, 2025

📌 References

📝 Implementation

Add parameters to d2-docker start to allow either:

  • Store the DB in an external persistent volume
  • Use an external DB instead of a container

Testing

External Persistent Volume Case

This case will store the DB volume at the specified absolute path.

./d2-docker-dev.sh --log-level=DEBUG start <data_image> --external-db-volume="/absolute/path/to/volume_dir"

External DB Case

This case needs a psql DB runnig in the machine hosting the dockers or in a accessible network.

This DB has to have the correct settings:

  • /etc/postgresql/{ver}/main/postgresql.conf

    Set listen_addresses to:

    • '*' (easiest for testing)
    • 'localhost,172.17.0.1' (docker and DB in same system)
    • 'localhost,<reachable_ip>'
  • /etc/postgresql/{ver}/main/pg_hba.conf

    Add an entry to allow the connection. Example:

    # Docker bridge subnet (check with: docker network inspect bridge)
    host    dhis2           dhis            172.17.0.0/16           scram-sha-256
    # LAN
    host    dhis2           dhis            192.168.0.0/24          scram-sha-256

Running d2-cloner:

./d2-docker-dev.sh --log-level=DEBUG start <data_image> --external-db-url="postgresql://user:pass@host:port/db"

#1ck80y8

@nshandra nshandra requested a review from tokland December 5, 2025 08:55
@nshandra nshandra self-assigned this Dec 5, 2025
@nshandra nshandra marked this pull request as draft December 5, 2025 13:06
@nshandra nshandra marked this pull request as ready for review January 21, 2026 21:28
@tokland
Copy link
Copy Markdown
Contributor

tokland commented Jan 26, 2026

@nshandra Sorry, I had this pending. Can you merge with development? (re-add me when done)

@nshandra
Copy link
Copy Markdown
Contributor Author

@tokland done.

@tokland tokland requested review from tokland and removed request for tokland January 26, 2026 09:13
Copy link
Copy Markdown
Contributor

@tokland tokland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features worked for me (only I had to add a trailing comma, probably a merge artifact)

In-line comments about the code:

Comment thread src/d2_docker/utils.py Outdated
@@ -300,6 +318,9 @@ def run_docker_compose(
("ROOT_PATH", ROOT_PATH),
("PSQL_ENABLE_QUERY_LOGS", "") if not enable_postgres_queries_logging else None,
("GLOWROOT_PORT", get_port_glowroot(glowroot_port))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing trailing comma, the code does not run as-is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, i messed the merge, sorry about that.

"--external-db-volume",
metavar="DIRECTORY",
help="Directory for external database volume",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option worked for me. Only problem is that the final permissions of the folder are off:

$ ls -al | grep volume
drwx------  19    70    70 4.0K Apr 14 10:21 volume

However, that's something d2-docker has been ignoring all along - we probably should have user: "${UID}:${GID}" or similar in our docker-compose to avoid this kind of problems. Not sure it's the moment to try that (it may have unintended consequences), but I'd be a step in the good direction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test with data["services"]["db"]["user"] = "{}:{}".format(os.getuid(), os.getgid()) and as expected it failed with errors like:
initdb: could not look up effective user ID 1000: user does not exist
because "my" user (or more specifically, an user with the same UID) doesn't exist in the db container.

The next test was with data["services"]["db"]["user"] = "0:0". This one works because there should be a superuser (should we run postgresql as root instead of postgres? probably not). In practice i'm not sure how much of an improvement this is, you still need sudo to view the folder.

I guess a better solution is to have a DB container that creates a postgres user whose UID/GID matches the user launching d2-docker.

type=str,
metavar="postgresql://user:pass@host:port/dbname",
help="Use external PostgreSQL database"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked for me

"--load-dump-from-data",
action="store_true",
help="Load database dump from data container (only with --external-db-url)",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked for me, only that I had to explicitly create the database with psql for it to work. Is that the intended workflow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In README it explain this option as:

Use option --load-dump-from-data with --external-db-url to import the SQL dump to the external database. Equivalent to running without -k/--keep-containers. The receiving DB should have the appropriate config (DB owner, user permissions, postgis extension).

Given that setting up a new DB can be a complex process (installing packages, sudo access, remote access to the machine hosting the DB and setting IPs in pg_hba.conf) i decided to not include a DB setup in the script.

else
echo "Normal mode: $path"
$psql_cmd < "$path"
$psql_cmd <"$path"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting is being applied to code outside of the PR. It would be fine if the formatting were incorrect, but we don't have a defined prettifier, so there is no "incorrect" for now. Would the best approach be a .vscode/settings.json file where we set up the formatter and its options?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If im not mistaken .vscode/settings.json would mean using the same shell format plugin or include the configs for the popular ones.

In my case the plugin uses shfmt and that has no typical config file like .pylintrc. Closest thing is using an .editorconfig file with the printer flags:

binary_next_line = false
switch_case_indent = false
space_redirects = false
keep_padding = false
function_next_line = false

Also a format should be decided (not sure how many of our devs have strong preferences in sh formatting). So maybe this could be a topic for the technical decisions meetings?

For this PR i will revert this formatting.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anyone has a strong opinion about this, me neither, take the one your prefer.

Comment thread src/d2_docker/utils.py Outdated
("GLOWROOT_PORT", get_port_glowroot(glowroot_port))
("EXTERNAL_DB_VOLUME", external_db_volume) if external_db_volume else None,
("EXTERNAL_DB_URL", external_db_url) if external_db_url else None,
("LOAD_DUMP_FROM_DATA", "yes" if load_dump_from_data else "no") if external_db_url else "no",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grouping/parentheses look off, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, should be either (k, v) or None. With check_conflicting_external_params from src/d2_docker/commands/start.py there is no need for the if external_db_url check.

Comment thread src/d2_docker/utils.py Outdated
logger.info("Validating external database connection...")
try:
psql_cmd = ["psql", "-d", db_url, "-c", "SELECT now();", "&> /dev/null"]
run(psql_cmd, capture_output=False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems it worked, but I am curious about "&> /dev/null", since the default for run is shell=False.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That output capture does nothing indeed. Changed code to hide the SELECT output via capture_output=True. The exception still shows the psql command error output due to raise_on_error = True in run default.


if args.external_db_url:
utils.validate_external_db_connection(args.external_db_url)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a check to raise error if both options are passed, as they are mutually exclusive?

Also, a standalone --load-dump-from-data is not correct workflow.

Copy link
Copy Markdown
Contributor Author

@nshandra nshandra Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check_conflicting_external_params to raise exceptions with incorrect parameters.
Also added a line in README about the mutually exclusive parameters.

psql_cmd="$psql_base_cmd -v ON_ERROR_STOP=0"
psql_strict_cmd="$psql_base_cmd -v ON_ERROR_STOP=1"
pgrestore_cmd="pg_restore -h db -U dhis -d dhis2"
pgrestore_cmd="pg_restore ${db_url:-"-h db -U dhis dhis2"}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we had -d dhis2 but now it's directly dhis2. Is that correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking pg_restore docs it should have -d. Restored.

Comment thread src/d2_docker/utils.py Outdated
if parsed.hostname == "localhost":
hostname = "host.docker.internal"
else:
hostname = parsed.hostname
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] we can use hostname = ..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, changed to:
hostname = "host.docker.internal" if parsed.hostname == "localhost" else parsed.hostname

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants