Fix flaky tests in IT pipeline#1321
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce CI flakiness by hardening test teardown and making test assertions/configuration updates more deterministic, without changing production behavior.
Changes:
- Add defensive handling in integration-test teardown for a transient Hibernate
ConcurrentModificationExceptionduring cleanup. - Centralize authentication plugin-sequence updates in REST ITs via a clear-then-set helper to prevent cross-test config leakage.
- Make ORCID sandbox-backed unit test assertions less brittle by removing a high, unstable
numFound()threshold.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
dspace-api/src/test/java/org/dspace/AbstractIntegrationTestWithDatabase.java |
Adds CME handling during @After cleanup to avoid failing tests due to a known Hibernate race. |
dspace-server-webapp/src/test/java/org/dspace/app/rest/AuthenticationRestControllerIT.java |
Introduces helper for plugin sequence updates and replaces direct property writes to avoid intermittent header pollution. |
dspace-api/src/test/java/org/dspace/external/CachingOrcidRestConnectorTest.java |
Replaces brittle ORCID sandbox result-count assertion with stable success + low-threshold checks. |
…uth sequence reset, ORCID assertion hardening)
…ad) + Hibernate CME diagnostics ORCID CachingOrcidRestConnectorTest no longer hits the live ORCID sandbox: search/getLabel/search_fail mock the HTTP layer (httpGet made protected) with a canned expanded-search response, so they are deterministic instead of asserting against fluctuating sandbox data. Shibboleth WWW-Authenticate flakiness: add a test-only config-definition.xml with config-reload=false. Runtime setProperty(...AuthenticationMethod...) overrides were silently discarded whenever the auto-reload listener rebuilt the combined config (restoring clarin-dspace.cfg's [Password, ClarinShib] default), intermittently leaking 'password realm' into the header. Verified: with auto-reload off the override survives; the explicit reloadConfig() reset in @after still works. Hibernate ConcurrentModificationException in @after cleanup: the per-session JDBC ResourceRegistry is not thread-safe, so the CME means two threads touch one Session. Capture a full thread dump on CME (target/cme-dumps/) to identify the colliding thread in CI; keep a resilient retry so an already-passed test isn't failed by this teardown race. (Context.finalize() ruled out: sessions are thread-local.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Disabling config auto-reload globally in the test environment broke AuthorizeConfigIT.testReloadConfiguration, which deliberately verifies that AuthorizeConfiguration picks up live changes written to local.cfg via the auto-reload mechanism. Auto-reload is a tested feature here, so it must not be disabled to work around the Shibboleth WWW-Authenticate flakiness. The Shibboleth flakiness (runtime setProperty override discarded when the combined config is rebuilt) needs a reload-safe fix in the auth test instead; tracked separately. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…henticate flakiness) The flaky 'password realm' leak in AuthenticationRestControllerIT had this root cause: configurationService.setProperty(plugin.sequence...AuthenticationMethod, ...) only updates the in-memory view of the combined configuration. That view is discarded whenever it is rebuilt, and the auto-reload listener rebuilds it as soon as any reloadable cfg file's mtime changes mid-run (e.g. another test writing local.cfg). When that rebuild lands between the override and the request, clarin-dspace.cfg's default [PasswordAuthentication, ClarinShibAuthentication] returns and 'password realm' leaks into the header. The previous clear-then-set helper did not help (it is equivalent to a plain setProperty). Fix: set the sequence via a JVM system property (highest-precedence override layer, re-read on every rebuild) + reloadConfig(), and clear it in @after. This survives auto-reload without disabling it (so AuthorizeConfigIT, which verifies auto-reload, still passes). Verified in the real /api/authn/status endpoint: an explicit reloadConfig() after a setProperty override reproduces the leak, while the system-property approach keeps the header Shibboleth-only across rebuilds. Full AuthenticationRestControllerIT (43 tests) passes, and running it alongside ClarinAuthenticationRestControllerIT / AnonymousAdditionalAuthorizationFilterIT confirms the property does not leak across classes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…CME The intermittent ConcurrentModificationException in @after cleanup is a genuine cross-thread data race on Hibernate's per-session, non-thread-safe JDBC ResourceRegistry (xref): a second thread mutates the test thread's session while it commits/rolls back. Verified against hibernate-core-5.6.15 sources that the releaseResources forEach lambda never touches xref, so single-thread re-entrancy is impossible (this disproves the earlier HHH-15116 single-thread theory). The window is microseconds, so it does not reproduce locally even with deliberate cross-thread session sharing; it only surfaces under CI load. A live thread dump of a running IT JVM shows NO legitimate background thread ever touches Hibernate (all are Solr/HTTP/Jetty/JVM). So the culprit is a transient thread, and any non-test thread caught inside Hibernate JDBC/session code is by definition the offender. - HibernateConcurrencyMonitor: JVM-wide background sampler that records (de-duped) any non-test thread found inside org.hibernate.{resource.jdbc,engine.jdbc, internal.SessionImpl}; flushed to target/cme-dumps/ on CME and at JVM shutdown. Pure observer, never changes test behaviour. - AbstractIntegrationTestWithDatabase: start the monitor and mark the JUnit thread in setUp; flush it alongside the existing thread dump on a captured CME. - build.yml: always-upload **/target/cme-dumps/** (not gated on failure) so a successful cleanup retry no longer hides the diagnostic. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 9 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR improves integration test reliability and determinism by adding a JVM-wide Hibernate concurrency sampler that records culprit threads, retrying test cleanup with thread-dump capture on ConcurrentModificationException, making ORCID tests deterministic via HTTP stubbing and a fixture, and centralizing authentication-method sequencing overrides for tests. ChangesTest Reliability and Determinism: Hibernate Concurrency Diagnostics and External Dependency Isolation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
dspace-api/src/test/java/org/dspace/external/CachingOrcidRestConnectorTest.java (2)
97-100: ⚡ Quick winWrap overlong comment lines to satisfy Java line-length rule
These comment lines appear to exceed the 120-character limit; please wrap them for style-rule compliance.
As per coding guidelines, "Maintain maximum 120 character line length in Java code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/external/CachingOrcidRestConnectorTest.java` around lines 97 - 100, In CachingOrcidRestConnectorTest update the overlong test comment starting with "//Mock the HTTP layer with a canned ORCID expanded-search response..." so no line exceeds 120 characters: break the long sentences into multiple comment lines, preserving the original wording and intent (mentioning mocking the ORCID sandbox, why we mock, and what stays under test), maintain the same comment indentation and placement above the test so the parsing/behavior context remains clear.Source: Coding guidelines
86-87: Stream-per-invocation stubbing is not required for current tests
doReturn(cannedResponse(...))captures a singleInputStreaminstance, butCachingOrcidRestConnector.search()callshttpGet()exactly once per search and immediately closes the stream;getLabel()andsearch()each execute only one search, so this stub isn’t consumed twice. Keep as-is for now; switch todoAnswer(...)only ifhttpGet()could be invoked multiple times per test in the future. Also applies to: 101-102🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/external/CachingOrcidRestConnectorTest.java` around lines 86 - 87, The test currently uses doReturn(cannedResponse(...)) to stub sut.httpGet(...) which returns a single InputStream; since CachingOrcidRestConnector.search() (and getLabel()/search() in these tests) call httpGet() exactly once and immediately close the stream, this single-stream stubbing is acceptable—leave the doReturn(...) stubs in place for the lines using cannedResponse(EXPANDED_SEARCH_XML) and cannedResponse(SMALL_SEARCH_XML); only change to a doAnswer(...) that supplies a fresh InputStream per invocation if you later modify CachingOrcidRestConnector.search() or add tests where httpGet() may be called multiple times per test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/build.yml:
- Around line 86-89: The workflow is using the mutable tag
actions/upload-artifact@v4; replace each occurrence of
"actions/upload-artifact@v4" with the corresponding immutable 40-character
commit SHA (the exact commit you want to run) so the job steps (the "Upload
Hibernate CME thread dumps (if any)" step and the other steps that reference
actions/upload-artifact@v4) use the pinned SHA instead of the v4 tag; update all
instances in the file to the chosen commit SHA to ensure reproducible,
supply-chain-safe runs.
In
`@dspace-server-webapp/src/test/java/org/dspace/app/rest/AuthenticationRestControllerIT.java`:
- Around line 145-156: The Javadoc block in AuthenticationRestControllerIT
contains lines exceeding 120 characters; reflow/wrap the long comment lines
(notably those describing ConfigurationService#reloadConfig(),
ConfigurationService#setProperty(String, Object), the behavior of setProperty
overrides, and the note about clearAuthenticationMethodSequence()) so each line
is <=120 chars while preserving the exact wording and links (e.g.,
ConfigurationService#reloadConfig, ConfigurationService#setProperty,
clearAuthenticationMethodSequence) and the paragraph structure.
---
Nitpick comments:
In
`@dspace-api/src/test/java/org/dspace/external/CachingOrcidRestConnectorTest.java`:
- Around line 97-100: In CachingOrcidRestConnectorTest update the overlong test
comment starting with "//Mock the HTTP layer with a canned ORCID expanded-search
response..." so no line exceeds 120 characters: break the long sentences into
multiple comment lines, preserving the original wording and intent (mentioning
mocking the ORCID sandbox, why we mock, and what stays under test), maintain the
same comment indentation and placement above the test so the parsing/behavior
context remains clear.
- Around line 86-87: The test currently uses doReturn(cannedResponse(...)) to
stub sut.httpGet(...) which returns a single InputStream; since
CachingOrcidRestConnector.search() (and getLabel()/search() in these tests) call
httpGet() exactly once and immediately close the stream, this single-stream
stubbing is acceptable—leave the doReturn(...) stubs in place for the lines
using cannedResponse(EXPANDED_SEARCH_XML) and cannedResponse(SMALL_SEARCH_XML);
only change to a doAnswer(...) that supplies a fresh InputStream per invocation
if you later modify CachingOrcidRestConnector.search() or add tests where
httpGet() may be called multiple times per test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c1778cef-3a7f-4ba0-a1e8-bf044a9a1332
📒 Files selected for processing (7)
.github/workflows/build.ymldspace-api/src/main/java/org/dspace/external/CachingOrcidRestConnector.javadspace-api/src/test/java/org/dspace/AbstractIntegrationTestWithDatabase.javadspace-api/src/test/java/org/dspace/HibernateConcurrencyMonitor.javadspace-api/src/test/java/org/dspace/external/CachingOrcidRestConnectorTest.javadspace-api/src/test/resources/org/dspace/external/orcid-expanded-search.xmldspace-server-webapp/src/test/java/org/dspace/app/rest/AuthenticationRestControllerIT.java
…ause of flaky CME) Root cause of the intermittent ConcurrentModificationException in @after integration-test cleanup, identified via the HibernateConcurrencyMonitor CI dumps: the GC Finalizer thread, running org.dspace.core.AbstractHibernateDAO$1.finalize(), closed the Hibernate Stream returned by AbstractHibernateDAO.iterate(). Closing a stream closes its ScrollableResults, which mutates the owning Session's per-session, non-thread-safe JDBC ResourceRegistry (xref) - but on the Finalizer thread, concurrently with the thread that owns the session. When that collided with the owning thread's commit/rollback (releaseResources -> xref.forEach), it threw ConcurrentModificationException. The CI dumps showed this exact finalizer stack as the only non-test thread inside Hibernate in dspace-api, and present in dspace-server-webapp too. This was confirmed genuine cross-thread access (not the previously assumed single-thread/HHH bug): verified against hibernate-core-5.6.15 sources that the releaseResources forEach lambda never touches xref, so single-thread re-entrancy is impossible. Fix: close the backing stream on the owning thread when iteration is exhausted, and remove the finalize() override. An iterator abandoned before exhaustion is released safely when its Context/Session is closed (releaseResources then runs on the owning thread). Adds AbstractHibernateDAOIteratorIT to guard against reintroducing a stream-closing finalizer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@dspace-api/src/test/java/org/dspace/core/AbstractHibernateDAOIteratorIT.java`:
- Around line 39-40: Add a Javadoc comment for the public test method
iterateIteratorMustNotCloseStreamFromFinalizer() describing what the test
verifies and any important preconditions/expected outcome; place the comment
immediately above the method signature and follow project style (brief
description, `@throws` declarations if needed) so the public test method complies
with the "Provide Javadoc comments for all public classes and methods"
guideline.
- Around line 50-55: The test currently only checks
iterator.getClass().getDeclaredMethod("finalize") on the anonymous leaf class;
update the test to walk the iterator class hierarchy via getSuperclass() up to
Object.class and check each class for a declared finalize() method so any
inherited finalizer is detected and the assertion in
AbstractHibernateDAO.iterate() remains guarded against superclass finalizers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ba970dcf-5202-4376-b73d-ec14e1720055
📒 Files selected for processing (2)
dspace-api/src/main/java/org/dspace/core/AbstractHibernateDAO.javadspace-api/src/test/java/org/dspace/core/AbstractHibernateDAOIteratorIT.java
…ssions Context.finalize() ran on the GC Finalizer thread and called dbConnection.isTransActionAlive()/abort(), which resolve sessionFactory.getCurrentSession() to a brand-new session bound to the Finalizer thread - never the (now-unreachable) thread that opened the Context. So it could not roll back the Context's transaction anyway; it only opened and leaked a throwaway Hibernate session on the Finalizer thread, and threw IllegalStateException once the SessionFactory was closed (seen in the CI thread dumps used to diagnose the flaky integration-test ConcurrentModificationException). Abandoned Contexts are cleaned up safely when their owning thread's session ends; callers already close Contexts via complete()/abort()/try-with-resources (Context is AutoCloseable). Removes the now-redundant ContextTest.testFinalize (close()/abort() are covered by testClose/testAbort/testAbort2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t cause fixed) The intermittent @after ConcurrentModificationException is now fixed at its source (AbstractHibernateDAO.iterate no longer closes its Hibernate stream from a finalizer; broken Context.finalize() removed). The temporary diagnostics that pinpointed it are no longer needed: - Restore AbstractIntegrationTestWithDatabase.destroy() to its plain form (drop the 3x cleanup retry and the per-CME thread dump) and remove the HibernateConcurrencyMonitor wiring. - Delete HibernateConcurrencyMonitor. - Revert the build.yml always-upload of target/cme-dumps. CI keeps -Dfailsafe.rerunFailingTestsCount=2 as the generic flaky-test safety net. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reverts the Context.finalize() removal (and the ContextTest.testFinalize deletion). The flaky @after ConcurrentModificationException is fully fixed by the AbstractHibernateDAO.iterate() change alone; Context.finalize() runs on a single GC Finalizer thread against its own finalizer-thread session and provably cannot cause that cross-thread xref race. Removing a finalizer from this core, widely-used class is a riskier change that does not belong in a flaky-test fix, so leave Context untouched. The (pre-existing, harmless) finalizer-thread session it opens can be addressed separately if desired. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- AbstractHibernateDAOIteratorIT: add Javadoc to the test method and walk the iterator's full class hierarchy (up to Object) when asserting no finalize() override, so a finalizer reintroduced on a superclass/helper is also caught (per CodeRabbit review). - AuthenticationRestControllerIT: wrap an over-length (122 char) Javadoc line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fix flaky integration tests at the source
Summary
Three independent intermittent failures were destabilising the IT pipeline. This PR fixes the first two
at their root cause and adds diagnostics for the third (a rare, CI-only concurrency race that does not
reproduce locally). Root causes were confirmed from the CI failure logs, the Hibernate / Apache Commons
Configuration sources, and reproduction probes — not guessed.
1. ORCID unit test hitting the live sandbox — FIXED
CachingOrcidRestConnectorTest(search/getLabel/search_fail) called the livehttps://pub.sandbox.orcid.org. The oldassertTrue(numFound() > 1000)failed whenever the sandboxdataset was reset/shrunk.
CachingOrcidRestConnector.httpGet(...)madeprotectedso tests can stub the HTTP layer.expanded-searchresponse (orcid-expanded-search.xml);the real JAXB parser + edismax query-building path stay under test. Deterministic, no network.
2. Shibboleth
WWW-Authenticate"password realm" leak — FIXEDRoot cause: the CLARIN default
plugin.sequence...AuthenticationMethodis the 2-method[PasswordAuthentication, ClarinShibAuthentication](realmspassword+shibboleth). A test'sconfigurationService.setProperty(...)override only updates the in-memory view of the combinedconfiguration, which is discarded whenever the combined config is rebuilt. The auto-reload listener
rebuilds it as soon as any reloadable cfg file's last-modified time changes mid-run (e.g. another test
writing
local.cfg); when that rebuild lands between the override and the request, the on-disk defaultreturns and
password realmleaks into the header. The previous clear-then-set helper was equivalent toa plain
setPropertyand did not help.AuthenticationRestControllerITnow sets the sequence via a JVM system property (highest-precedenceoverride layer, re-read on every rebuild → survives auto-reload) +
reloadConfig(), and clears it in@After.AuthorizeConfigIT(which deliberately verifies auto-reload oflocal.cfg) still passes. (Globally disabling auto-reload in the test env was tried and reverted — itbroke that test.)
/api/authn/statusendpoint: an explicitreloadConfig()after asetPropertyoverride reproduces the leak, while the system-property approach keeps the headerShibboleth-only across rebuilds. Full
AuthenticationRestControllerIT(43 tests) passes; run alongsideClarinAuthenticationRestControllerIT/AnonymousAdditionalAuthorizationFilterITthe property does notleak across classes.
3. Hibernate
ConcurrentModificationExceptionin@Aftercleanup — DIAGNOSED + made resilientThrown from
ResourceRegistryStandardImpl.releaseResources(aHashMap.forEachover the JDBC resourceregistry) while a builder commits during cleanup. That registry is per-Hibernate-Session and explicitly
not thread-safe, and the iterating lambda cannot re-enter it, so the CME means a second thread is
touching the same Session concurrently — not the single-threaded Hibernate bug originally assumed.
Context.finalize()was ruled out (sessions are thread-local; a GC + finalization amplifier produced0 CME across 497 integration tests). It does not reproduce locally and is seen mostly in the webapp
module in CI.
AbstractIntegrationTestWithDatabasekeeps a resilient cleanup retry (so an already-passed test is notfailed by this teardown race) and, on CME, dumps all thread stacks to
target/cme-dumps/(archivedas a CI artifact) so the next CI occurrence pinpoints the colliding thread and the underlying concurrency
bug can be fixed at its source.
Impact
self-diagnosing.
Summary by CodeRabbit
Tests
Chores