Skip to content

chore(loader): improve the perf in loader-ci#724

Merged
imbajin merged 9 commits intoapache:masterfrom
lokidundun:ciImprove
Apr 15, 2026
Merged

chore(loader): improve the perf in loader-ci#724
imbajin merged 9 commits intoapache:masterfrom
lokidundun:ciImprove

Conversation

@lokidundun
Copy link
Copy Markdown
Contributor

@lokidundun lokidundun commented Apr 12, 2026

Purpose of the PR

Main Changes

  1. .github/workflows/loader-ci.yml
  • Migrated MySQL to GitHub Actions services: Adopted the official mysql:8.0 container service with health check configured. This eliminates the need to execute docker pull and docker run for every workflow run.
  • Added Hadoop caching: Cached ~/hadoop-3.3.6.tar.gz with a fixed cache key hadoop-3.3.6.
  • Added HugeGraph Server caching: Cached ~/hugegraph-cache-${{ env.COMMIT_ID }}; the cache key contains the commit ID to enable auto-invalidation when the version changes.
  • Removed the invocation of install-mysql.sh: MySQL is now provisioned by GitHub Actions services.
  1. hugegraph-loader/assembly/travis/install-hadoop.sh
  • Added local installation & cache validation: Skip re-download/extraction if /usr/local/hadoop already exists; use the cached tar.gz file directly if it exists (while the local directory is missing) to avoid repeated downloads from archive.apache.org.
  • Added writes to GITHUB_ENV: Ensure subsequent steps can correctly obtain HADOOP_HOME and PATH environment variables.
  • Made configuration writes idempotent: Only write core-site.xml and hdfs-site.xml if the configuration files do not exist or their content is mismatched.
  1. hugegraph-loader/assembly/travis/install-mysql.sh
  • Added idempotent check: Directly skip docker pull and docker run if the mysql:8.0 container is already running (e.g., provided by GitHub Actions services).
  1. hugegraph-loader/assembly/travis/install-hugegraph-from-source.sh
  • Added build caching: Check if the packaged apache-hugegraph-*.tar.gz exists in ~/hugegraph-cache-${COMMIT_ID} before building. Use the cached tarball directly and skip the time-consuming git clone + mvn package if it exists; copy the tarball to the cache directory for reuse in future runs after the build if it does not exist.

result:
image

Verifying these changes

  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • xxx

Does this PR potentially affect the following parts?

  • Nope
  • Dependencies (add/update license info)
  • Modify configurations
  • The public API
  • Other affects (typed here)

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 12, 2026
@github-actions github-actions Bot added the loader hugegraph-loader label Apr 12, 2026
@dosubot dosubot Bot added the ci Continuous integration label Apr 12, 2026
@lokidundun
Copy link
Copy Markdown
Contributor Author

@imbajin could you please take a look when you are convenient❤️

echo "PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" >> "${GITHUB_ENV}"
fi

echo "export HADOOP_HOME=${HADOOP_HOME}" >> ~/.bashrc
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The PR is making this script reusable across cached/repeated runs, but these unconditional ~/.bashrc appends still make it non-idempotent: every invocation adds another Hadoop segment to PATH, so a reused runner or local repro environment keeps growing shell state. It would be safer to guard these writes (or rely on GITHUB_ENV/GITHUB_PATH in CI) so repeated runs stay stable.

Suggested change
echo "export HADOOP_HOME=${HADOOP_HOME}" >> ~/.bashrc
if ! grep -qxF "export HADOOP_HOME=${HADOOP_HOME}" ~/.bashrc; then
echo "export HADOOP_HOME=${HADOOP_HOME}" >> ~/.bashrc
fi
if ! grep -qxF "export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native" ~/.bashrc; then
echo "export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native" >> ~/.bashrc
fi
if ! grep -qxF "export PATH=\$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" ~/.bashrc; then
echo "export PATH=\$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" >> ~/.bashrc
fi

DB_PASS="${2:-root}"

# Skip if MySQL is already running (e.g., provided by GitHub Actions service)
if docker ps | grep -q "mysql:5.7"; then
Copy link
Copy Markdown
Member

@imbajin imbajin Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ This broad container check changes the script semantics in a risky way. If a developer already has any unrelated mysql:5.7 container running, the script now exits without ensuring the requested test container name/password/port match what the loader tests expect.

That can turn a clean setup failure into a much harder-to-diagnose auth/connection failure later. Could we scope the idempotency check to the requested container instead?

Suggested change
if docker ps | grep -q "mysql:5.7"; then
if docker ps --format '{{.Names}}' | grep -qx "${DB_NAME}"; then
echo "MySQL container ${DB_NAME} is already running, skipping setup."
exit 0
fi

Also consider upgrading the mysql version to 8.x (stable version) if possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

set -ev

sudo wget https://archive.apache.org/dist/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gz
HADOOP_VERSION="2.8.5"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to use 3.x (stable version)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Apr 14, 2026
@@ -23,12 +27,20 @@ TRAVIS_DIR=$(dirname $0)
CONF=hugegraph-test/src/main/resources/hugegraph.properties
MYSQL_USERNAME=root

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ This skip check only matches an exact container name. GitHub Actions service containers are usually not named mysql, so the guard will miss the case you want to avoid and the script will still try to start a second MySQL instance.

Consider checking whether port 3306 is already occupied or probing the database itself instead.

.setParameter("rewriteBatchedStatements", "true")
.setParameter("useServerPrepStmts", "false")
.setParameter("autoReconnect", "true");
.setParameter("autoReconnect", "true")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ buildUrl() is shared by MYSQL and POSTGRESQL. Putting a MySQL-only auth workaround in the common path leaks this flag into the PostgreSQL URL too.

Please scope allowPublicKeyRetrieval to the MySQL branch instead.

Suggested change
.setParameter("autoReconnect", "true")
uriBuilder.setParameter("useSSL", "false")
.setParameter("characterEncoding", Constants.CHARSET.name())
.setParameter("rewriteBatchedStatements", "true")
.setParameter("useServerPrepStmts", "false")
.setParameter("autoReconnect", "true");

@imbajin imbajin changed the title Improve: Improve ci-loader chore(loader): improve the perf in loader-ci Apr 15, 2026
Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THX

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 15, 2026
@imbajin imbajin merged commit 39e4692 into apache:master Apr 15, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Continuous integration lgtm This PR has been approved by a maintainer loader hugegraph-loader size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improve][CI] Avoid repeated environment downloads in loader-ci

2 participants