Skip to content

[Port to dtq-dev] Issue ufal/clarin-dspace#1317 metadata health check#1307

Open
kosarko wants to merge 4 commits into
dataquest-dev:dtq-devfrom
ufal:backport-1338-to-dtq-dev
Open

[Port to dtq-dev] Issue ufal/clarin-dspace#1317 metadata health check#1307
kosarko wants to merge 4 commits into
dataquest-dev:dtq-devfrom
ufal:backport-1338-to-dtq-dev

Conversation

@kosarko

@kosarko kosarko commented Apr 17, 2026

Copy link
Copy Markdown

Port of ufal#1338 by @kuchtiak-ufal to dtq-dev.

Summary by CodeRabbit

  • New Features

    • Added a metadata health check that validates repository metadata quality, classifies errors and warnings, and produces both JSON and plain-text reports with aggregated counts and example messages (with configurable truncation/dispersion).
  • Chores

    • Enabled the new metadata check in the healthcheck module configuration.
  • Tests

    • Added integration tests covering the metadata check, reporting output, and truncated/aggregated message behavior.

* Issue 1317: health-check for metadata - initial commit

* implement MetadataCheck report

* improve MetadataCheck

* improve MetadataCheck with more sophisticated selection of error/warning messages

* code cleanup

* rename qa-metadata-error-patterns.json to metadata-check-patterns.json

* implement copilot suggestions

* add test for Metadata check to HealthReportIT

* improved documentation

* added test

* improve documentation

* set default error dispersion quota to 5

(cherry picked from commit 39157a5)

@milanmajchrak milanmajchrak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also failing integration tests.

Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
@kuchtiak-ufal

Copy link
Copy Markdown
Collaborator

Regarding "There are also failing integration tests".

There's a complex HealthReportIT test testing this check.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a MetadataCheck health report that runs the metadataqa curation task, classifies output by configurable regex patterns into errors/warnings, aggregates per-type counts with sampled examples, updates report JSON mappings, and includes integration tests exercising limits and dispersion behavior.

Changes

Metadata Check Implementation and Integration

Layer / File(s) Summary
MetadataCheck class structure and pattern loading
dspace-api/src/main/java/org/dspace/health/MetadataCheck.java (lines 1–71, 196–234), dspace-api/src/main/resources/metadata-check-patterns.json
Introduces MetadataCheck with configuration constants and static pattern maps; loads error/warning patterns from metadata-check-patterns.json; adds helpers to build in-memory pattern maps and format issue labels.
MetadataCheck execution and message processing
dspace-api/src/main/java/org/dspace/health/MetadataCheck.java (lines 73–194, 235–438, 440–482, 484–500)
Implements run(ReportInfo) to read config limits/dispersion quotas, execute metadataqa with a MetadataReporter, match lines to configured patterns, categorize as errors/warnings, aggregate counts, store example messages, and apply dispersion-based replacement when message-storage limits are exceeded.
Registration, report schema, and configuration
dspace/config/modules/healthcheck.cfg, dspace-api/src/main/resources/report-diff-fields.json
Adds Metadata check to healthcheck.checks and plugin.named.org.dspace.health.Check; extends report-diff-fields.json with errorCount and warningCount mappings and reorders licenses in the field order.
Integration tests and helper methods
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
Adds imports and two tests: testMetadataCheck (validates error/warning counts and per-type counts) and testMetadataCheckWithRestrictedReportSize (validates truncation/dispersion behavior); adds helpers to extract item URIs, select latest report, and find a named check in report JSON.

Sequence Diagram

sequenceDiagram
  participant Test as HealthReportIT
  participant Script as Health Report Script
  participant MetadataCheck as MetadataCheck.run()
  participant CurTask as metadataqa curation
  participant Reporter as MetadataReporter
  participant Patterns as Pattern Map
  Test->>Script: execute health-report (metadata check)
  Script->>MetadataCheck: invoke run()
  MetadataCheck->>MetadataCheck: read config limits and quotas
  MetadataCheck->>CurTask: execute curation task with Reporter
  CurTask->>Reporter: stream output lines
  Reporter->>Patterns: match messageKey against patterns
  Patterns-->>Reporter: classify as error or warning
  Reporter->>Reporter: aggregate counts, store examples
  Reporter-->>MetadataCheck: return aggregated results
  MetadataCheck->>Script: setReportJson(counts + messages)
  Script->>Test: return report JSON
  Test->>Test: assert counts and message truncation
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • dataquest-dev/DSpace#1033: Both PRs touch healthcheck wiring and register additional checks in dspace/config/modules/healthcheck.cfg.

Suggested reviewers

  • milanmajchrak
  • vidiecan
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is too brief and lacks required sections from the template (Problem description, Analysis, Testing scenarios, Copilot review checkboxes). Expand description to include Problem description section, detailed Analysis of the ported feature, confirm Manual Testing checklist completion, and indicate Copilot review status.
Docstring Coverage ⚠️ Warning Docstring coverage is 3.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: porting a metadata health check feature from the upstream repository to the dtq-dev branch.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (11)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (5)

370-370: 💤 Low value

Add explicit private access modifier to helper method.

The method lacks an explicit access modifier, defaulting to package-private. For test helper methods that should not be accessed outside this class, use explicit private.

Suggested fix
-ReportResult findLastReportResult(List<ReportResult> reportResults) {
+private ReportResult findLastReportResult(List<ReportResult> reportResults) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 370,
The helper method findLastReportResult currently has package-private visibility;
change its declaration to include the private modifier so it becomes private
ReportResult findLastReportResult(List<ReportResult> reportResults). Update the
method signature in HealthReportIT (method name: findLastReportResult) to be
explicitly private and ensure no external references rely on package visibility.

375-375: 💤 Low value

Add explicit private access modifier to helper method.

The method lacks an explicit access modifier, defaulting to package-private. For test helper methods that should not be accessed outside this class, use explicit private.

Suggested fix
-JsonNode findCheckByName(JsonNode root, String checkName) {
+private JsonNode findCheckByName(JsonNode root, String checkName) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 375,
The helper method findCheckByName in HealthReportIT is declared with default
(package-private) visibility; make it explicitly private to restrict access to
this test class only by adding the private access modifier to the method
declaration (i.e., change "JsonNode findCheckByName(...)" to "private JsonNode
findCheckByName(...)" within the HealthReportIT class).

256-256: ⚡ Quick win

Add Javadoc for public test method.

Per coding guidelines, all public methods should have Javadoc comments. Consider adding a brief description explaining how this test validates report size restrictions via configuration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 256,
Add a Javadoc comment to the public test method in HealthReportIT that is
annotated with `@Test` (the test validating report size restrictions via
configuration). Place a brief Javadoc immediately above the method (above the
`@Test` annotation) describing the test purpose, e.g., that it verifies report
size limits are enforced based on configuration and what behavior is asserted;
reference the test’s name in the description if present to make intent clear.
Ensure the Javadoc follows project style (one-line summary and optionally a
short sentence) and is applied to the public test method in class
HealthReportIT.

Source: Coding guidelines


209-214: ⚡ Quick win

Consider more flexible assertion approach for formatted output.

The assertions use hardcoded whitespace patterns (" ".repeat(15)) that tightly couple tests to the exact output formatting. If MetadataCheck adjusts padding logic, these tests will fail even though the functionality remains correct.

Alternative assertion approach

Consider using regex patterns or extracting numeric values separately:

-assertThat(messages.get(0), containsString("dc.relation issues:  " + " ".repeat(15) + "2"));
+assertThat(messages.get(0), containsString("dc.relation issues:"));
+assertThat(messages.get(0), containsString("2"));

Or use a regex matcher to verify the count appears after the label without enforcing exact spacing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
209 - 214, The assertions in HealthReportIT are brittle due to hardcoded
spacing; update the containsString checks on messages.get(0) to use flexible
regex or numeric extraction instead of " ".repeat(15). For each label (e.g.,
"dc.relation issues", "dc.title issues", "dc.type issues", "Error count total",
"dc.subject issues", "Warning count total") replace the containsString
assertions with a regex-based assertion (e.g., pattern "dc\\.relation
issues:\\s+2") or extract the numeric suffix via a Pattern/Matcher and assert
the parsed integer equals the expected count; keep the checks against
messages.get(0) and the same labels (MetadataCheck-related labels) so only
spacing enforcement is removed.

154-154: ⚡ Quick win

Add Javadoc for public test method.

Per coding guidelines, all public methods should have Javadoc comments. Consider adding a brief description of what this test validates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 154,
Add a Javadoc comment to the public test method in the HealthReportIT class (the
method annotated with `@Test` at the shown location); the comment should be a
brief one-line description stating what the test validates (e.g., expected
behavior or assertion being checked) and can omit `@param/`@return since it’s a
test method; place the Javadoc immediately above the method declaration so it
documents that specific test.

Source: Coding guidelines

dspace-api/src/main/java/org/dspace/health/MetadataCheck.java (6)

461-465: 💤 Low value

Clarify sentinel value for highestFrequency initialization.

Line 463 initializes highestFrequency to Integer.MAX_VALUE, which acts as a sentinel meaning "not yet calculated". This is non-obvious and could confuse maintainers. A brief comment explaining that this sentinel value ensures the replacement logic recalculates the actual frequency on first use (lines 407-409) would improve clarity.

📝 Suggested comment
         StoredMessagesInfo() {
             this.count = 0;
+            // Sentinel value; actual frequency is computed lazily in replaceMessage
             this.highestFrequency = Integer.MAX_VALUE;
             storedMessages = new TreeMap<>();
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
461 - 465, The constructor StoredMessagesInfo() sets highestFrequency to
Integer.MAX_VALUE as a sentinel meaning "not yet calculated", which is
non-obvious; update the constructor (StoredMessagesInfo) to add a short comment
next to the highestFrequency initialization explaining that Integer.MAX_VALUE is
used intentionally so the replacement logic in the class will trigger
recalculation on first use (i.e., it signals "uninitialized" and ensures the
algorithm that compares/replaces frequencies will compute the real highest
frequency when needed).

212-224: 💤 Low value

Simplify pattern collection with direct iteration.

The getPatterns method uses StreamSupport.stream(patterns.spliterator(), false) to iterate ArrayNode. Since ArrayNode implements Iterable<JsonNode>, you can iterate directly or use forEach without constructing a stream.

♻️ Simpler alternative
     private static Map<String, List<String>> getPatterns(JsonNode parentNode) {
         Map<String, List<String>> validationPatterns = new HashMap<>();
         parentNode.fieldNames().forEachRemaining(validationType -> {
             List<String> validationMessages = new ArrayList<>();
             ArrayNode patterns = parentNode.withArray(validationType);
-            StreamSupport.stream(patterns.spliterator(), false).forEach(message -> {
+            patterns.forEach(message -> {
                 validationMessages.add(message.asText());
             });
             validationPatterns.put(validationType, validationMessages);
         });
         return validationPatterns;
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
212 - 224, The getPatterns method currently builds validationMessages by
streaming over the ArrayNode via StreamSupport; simplify this by iterating the
ArrayNode directly (it implements Iterable<JsonNode>) — inside getPatterns,
replace StreamSupport.stream(patterns.spliterator(), false).forEach(...) with a
direct for-each loop or patterns.forEach(node -> ...) that calls
validationMessages.add(node.asText()), keeping the rest of the logic
(validationPatterns.put(validationType, validationMessages)) unchanged to
preserve behavior.

21-31: ⚡ Quick win

Consider standardizing on a single JSON library.

The code uses both Jackson (ObjectMapper, JsonNode) for reading input at lines 203, 206-208 and org.json (JSONObject, JSONArray) for constructing output at lines 92, 174-190. Mixing JSON libraries increases the dependency surface and can lead to version conflicts or inconsistent behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
21 - 31, The code mixes Jackson and org.json; standardize on Jackson in the
MetadataCheck class by replacing all org.json JSONObject/JSONArray usages (the
places constructing/reporting JSON output) with Jackson types
(ObjectNode/ArrayNode) created from the existing ObjectMapper, update the code
paths that currently call new JSONObject()/new JSONArray() to build
ObjectNode/ArrayNode trees, replace any JSONObject.toString() or similar with
ObjectMapper.writeValueAsString(), and remove org.json imports so only Jackson
imports remain (keep existing ObjectMapper/JsonNode/ArrayNode usage for parsing
and output).

65-71: ⚖️ Poor tradeoff

Static initialization failure will cause confusing class-loading errors.

If metadata-check-patterns.json is missing or malformed, the IOException caught at line 68 is wrapped in a RuntimeException that will fail during class initialization, potentially far from where the healthcheck is invoked. This makes debugging difficult because the stack trace will point to the static initializer rather than the health-report execution context.

Consider lazy-loading the patterns in run() or providing a more descriptive startup validation that fails fast with clear diagnostics.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
65 - 71, The static initializer in MetadataCheck currently calls loadPatterns()
and throws a RuntimeException on IOException which causes class-loading
failures; change this to lazy-load or validate patterns inside the run() method
instead: remove (or guard) the static loadPatterns() call and ensure
loadPatterns() is invoked from run() (or another startup-validation method) with
proper try/catch that logs a clear, descriptive error and returns a sensible
health-check result instead of letting initialization fail; reference
MetadataCheck, its static initializer, loadPatterns(), and run() when making the
change.

407-407: 💤 Low value

Remove unnecessary Objects.requireNonNull wrapper.

Line 407 wraps getMessageWithHighestCount() in Objects.requireNonNull(), but the method at lines 431-437 uses .orElseThrow() which already throws NoSuchElementException and never returns null. The requireNonNull is redundant.

🧹 Proposed simplification
-            String messageKeyWithHighestFrequency = Objects.requireNonNull(getMessageWithHighestCount(storedMessages));
+            String messageKeyWithHighestFrequency = getMessageWithHighestCount(storedMessages);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` at line 407,
The assignment to messageKeyWithHighestFrequency unnecessarily wraps
getMessageWithHighestCount(storedMessages) with Objects.requireNonNull; remove
the requireNonNull wrapper and assign the result directly since
getMessageWithHighestCount uses orElseThrow() and cannot return null—update the
line that assigns messageKeyWithHighestFrequency to call
getMessageWithHighestCount(storedMessages) directly (referencing the variable
messageKeyWithHighestFrequency and the method getMessageWithHighestCount).

335-340: ⚡ Quick win

Pattern matching is substring-based, not regex, which may confuse maintainers.

The code treats ^ and $ as literal string anchors for prefix/suffix matching rather than regex metacharacters. While this works correctly, calling them "patterns" (line 42, 62-63, 206-208, 314) and storing them in variables like errorPatterns suggests regex matching, but the implementation uses startsWith, endsWith, and contains for simple substring matching.

Consider either:

  1. Renaming variables/documentation to clarify these are "message filters" or "substring rules" rather than "patterns"
  2. Using actual Pattern.compile() and Matcher for true regex support if future extensibility is needed
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
335 - 340, The code in MetadataCheck uses variables like errorPatterns and a
local variable pattern but implements substring logic with
startsWith/endsWith/contains, which is misleading; rename errorPatterns to
messageFilters (and the local variable pattern to filter) and update inline
comments/docstrings to describe "substring filters" or "message rules"
(prefix/suffix/contains) instead of "patterns"; update all references to
errorPatterns, pattern, and any related method/field names in MetadataCheck to
the new names so maintainers understand this is simple substring matching
(alternatively, if you want true regex support, replace the
startsWith/endsWith/contains logic with Pattern.compile(filter) and
Matcher.matches() and update docs accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java`:
- Around line 337-341: Lines in MetadataCheck/MetadataReporter exceed the
120-char limit (notably the pattern-match conditional using messageKey/pattern
and the addMessage(...) calls and dispersion quota conditionals). Fix by
breaking long expressions into shorter named boolean/local variables (e.g.,
compute startsWithMatch, endsWithMatch, containsMatch from messageKey and
pattern) and by formatting addMessage(type, message, counts, storedMessagesInfo,
limit, dispersionQuota) with one argument per line or grouping to keep each line
<120 chars; do the same for the dispersion quota conditionals (assign
intermediate variables like isOverDispersion or hasQuota) to shorten the
conditional expressions. Ensure you update the code in the MetadataCheck class
where messageKey/pattern matching and addMessage calls occur and in the
dispersion quota checks so every logical fragment is on its own line and no line
exceeds 120 chars.
- Around line 78-79: Several lines in MetadataCheck exceed the 120-character
limit; split long statements and extract intermediate variables to shorten
lines. For the configuration reads (the assignment to maxErrorsToShow and the
other getIntProperty/getBooleanProperty calls), break the call across multiple
lines or assign configurationService.getIntProperty/getBooleanProperty(...) to a
clearly named local variable before using it so each line stays under 120 chars
(refer to the variable maxErrorsToShow and the other config variables in that
block). For the long Curator.curate(...) expression, pull each chained service
factory call into its own local variable (e.g., obtain the service factory, then
the curator, then call curate) so the final Curator.curate(...) invocation is
short; ensure string literals are wrapped or concatenated across lines if
needed. Keep identifiers unchanged and only restructure into multiple shorter
statements to comply with the 120-character rule.
- Around line 73-74: Add a Javadoc comment for the public method run in class
MetadataCheck: document the parameter ReportInfo ri (what it represents),
describe the returned String (what the report contains or format), and note any
exceptions or error conditions the method may throw or return values used to
indicate failure; place the Javadoc immediately above the public String
run(ReportInfo ri) declaration so it follows project guidelines for public
methods.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 266-267: The test in HealthReportIT mutates global config via
configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8)
and
configurationService.setProperty("healthcheck.metadata.error-dispersion-quota",
1) without restoring originals; capture the original values (e.g., using
configurationService.getProperty or getPropertyAsType) before setting, then
ensure they are restored in a finally block or an `@After` method so other tests
aren’t polluted; update the test method (or add a tearDown/@After method) to
restore the saved originals (handling null/missing keys appropriately) after the
assertions complete.

---

Nitpick comments:
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java`:
- Around line 461-465: The constructor StoredMessagesInfo() sets
highestFrequency to Integer.MAX_VALUE as a sentinel meaning "not yet
calculated", which is non-obvious; update the constructor (StoredMessagesInfo)
to add a short comment next to the highestFrequency initialization explaining
that Integer.MAX_VALUE is used intentionally so the replacement logic in the
class will trigger recalculation on first use (i.e., it signals "uninitialized"
and ensures the algorithm that compares/replaces frequencies will compute the
real highest frequency when needed).
- Around line 212-224: The getPatterns method currently builds
validationMessages by streaming over the ArrayNode via StreamSupport; simplify
this by iterating the ArrayNode directly (it implements Iterable<JsonNode>) —
inside getPatterns, replace StreamSupport.stream(patterns.spliterator(),
false).forEach(...) with a direct for-each loop or patterns.forEach(node -> ...)
that calls validationMessages.add(node.asText()), keeping the rest of the logic
(validationPatterns.put(validationType, validationMessages)) unchanged to
preserve behavior.
- Around line 21-31: The code mixes Jackson and org.json; standardize on Jackson
in the MetadataCheck class by replacing all org.json JSONObject/JSONArray usages
(the places constructing/reporting JSON output) with Jackson types
(ObjectNode/ArrayNode) created from the existing ObjectMapper, update the code
paths that currently call new JSONObject()/new JSONArray() to build
ObjectNode/ArrayNode trees, replace any JSONObject.toString() or similar with
ObjectMapper.writeValueAsString(), and remove org.json imports so only Jackson
imports remain (keep existing ObjectMapper/JsonNode/ArrayNode usage for parsing
and output).
- Around line 65-71: The static initializer in MetadataCheck currently calls
loadPatterns() and throws a RuntimeException on IOException which causes
class-loading failures; change this to lazy-load or validate patterns inside the
run() method instead: remove (or guard) the static loadPatterns() call and
ensure loadPatterns() is invoked from run() (or another startup-validation
method) with proper try/catch that logs a clear, descriptive error and returns a
sensible health-check result instead of letting initialization fail; reference
MetadataCheck, its static initializer, loadPatterns(), and run() when making the
change.
- Line 407: The assignment to messageKeyWithHighestFrequency unnecessarily wraps
getMessageWithHighestCount(storedMessages) with Objects.requireNonNull; remove
the requireNonNull wrapper and assign the result directly since
getMessageWithHighestCount uses orElseThrow() and cannot return null—update the
line that assigns messageKeyWithHighestFrequency to call
getMessageWithHighestCount(storedMessages) directly (referencing the variable
messageKeyWithHighestFrequency and the method getMessageWithHighestCount).
- Around line 335-340: The code in MetadataCheck uses variables like
errorPatterns and a local variable pattern but implements substring logic with
startsWith/endsWith/contains, which is misleading; rename errorPatterns to
messageFilters (and the local variable pattern to filter) and update inline
comments/docstrings to describe "substring filters" or "message rules"
(prefix/suffix/contains) instead of "patterns"; update all references to
errorPatterns, pattern, and any related method/field names in MetadataCheck to
the new names so maintainers understand this is simple substring matching
(alternatively, if you want true regex support, replace the
startsWith/endsWith/contains logic with Pattern.compile(filter) and
Matcher.matches() and update docs accordingly).

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Line 370: The helper method findLastReportResult currently has package-private
visibility; change its declaration to include the private modifier so it becomes
private ReportResult findLastReportResult(List<ReportResult> reportResults).
Update the method signature in HealthReportIT (method name:
findLastReportResult) to be explicitly private and ensure no external references
rely on package visibility.
- Line 375: The helper method findCheckByName in HealthReportIT is declared with
default (package-private) visibility; make it explicitly private to restrict
access to this test class only by adding the private access modifier to the
method declaration (i.e., change "JsonNode findCheckByName(...)" to "private
JsonNode findCheckByName(...)" within the HealthReportIT class).
- Line 256: Add a Javadoc comment to the public test method in HealthReportIT
that is annotated with `@Test` (the test validating report size restrictions via
configuration). Place a brief Javadoc immediately above the method (above the
`@Test` annotation) describing the test purpose, e.g., that it verifies report
size limits are enforced based on configuration and what behavior is asserted;
reference the test’s name in the description if present to make intent clear.
Ensure the Javadoc follows project style (one-line summary and optionally a
short sentence) and is applied to the public test method in class
HealthReportIT.
- Around line 209-214: The assertions in HealthReportIT are brittle due to
hardcoded spacing; update the containsString checks on messages.get(0) to use
flexible regex or numeric extraction instead of " ".repeat(15). For each label
(e.g., "dc.relation issues", "dc.title issues", "dc.type issues", "Error count
total", "dc.subject issues", "Warning count total") replace the containsString
assertions with a regex-based assertion (e.g., pattern "dc\\.relation
issues:\\s+2") or extract the numeric suffix via a Pattern/Matcher and assert
the parsed integer equals the expected count; keep the checks against
messages.get(0) and the same labels (MetadataCheck-related labels) so only
spacing enforcement is removed.
- Line 154: Add a Javadoc comment to the public test method in the
HealthReportIT class (the method annotated with `@Test` at the shown location);
the comment should be a brief one-line description stating what the test
validates (e.g., expected behavior or assertion being checked) and can omit
`@param/`@return since it’s a test method; place the Javadoc immediately above the
method declaration so it documents that specific test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c06009b-3f0d-4601-8de5-14c743daa6dd

📥 Commits

Reviewing files that changed from the base of the PR and between 82085cb and 24d3dbb.

📒 Files selected for processing (5)
  • dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
  • dspace-api/src/main/resources/metadata-check-patterns.json
  • dspace-api/src/main/resources/report-diff-fields.json
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
  • dspace/config/modules/healthcheck.cfg

Comment on lines +73 to +74
@Override
public String run(ReportInfo ri) {

@coderabbitai coderabbitai Bot Jun 10, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add Javadoc for public method.

The run() method is public and lacks Javadoc documentation. As per coding guidelines, all public methods in Java should have Javadoc comments describing parameters, return values, and any exceptions thrown.

📝 Suggested Javadoc
+    /**
+     * Executes the metadata quality check by running the metadataqa curation task
+     * across the repository and aggregating validation errors and warnings.
+     *
+     * `@param` ri Report information context (currently unused)
+     * `@return` Human-readable text report with error/warning statistics and sample messages
+     */
     `@Override`
     public String run(ReportInfo ri) {

As per coding guidelines: "Provide Javadoc comments for all public classes and methods" for files matching **/*.java.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
public String run(ReportInfo ri) {
/**
* Executes the metadata quality check by running the metadataqa curation task
* across the repository and aggregating validation errors and warnings.
*
* `@param` ri Report information context (currently unused)
* `@return` Human-readable text report with error/warning statistics and sample messages
*/
`@Override`
public String run(ReportInfo ri) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
73 - 74, Add a Javadoc comment for the public method run in class MetadataCheck:
document the parameter ReportInfo ri (what it represents), describe the returned
String (what the report contains or format), and note any exceptions or error
conditions the method may throw or return values used to indicate failure; place
the Javadoc immediately above the public String run(ReportInfo ri) declaration
so it follows project guidelines for public methods.

Source: Coding guidelines

@kuchtiak-ufal kuchtiak-ufal Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run() method is the implementation of the Check#run() method, which runs the check.
it's obvious what's doing. None of the extensions of the Check abstract class documents this method.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the base class Check#run() for Javadoc
fd -e java "Check.java" --exec grep -n -A5 "run\|abstract\|Javadoc\|\*" {}

# Check other subclasses of Check to see if they document run()
rg -n "public String run\|/\*\*" --type java -g "**Check.java" -C2

Length of output: 16536



@kuchtiak-ufal, you're right — I confirmed this by checking the codebase. The base class Check declares run() at line 28 with no Javadoc, and none of the other subclasses (LogAnalyserCheck, InfoCheck, ItemCheck, UserCheck, LicenseCheck, ChecksumCheck, EmbargoInfoCheck, EmbargoCheck) add Javadoc to their run() overrides either. Your implementation is consistent with the established convention throughout the package.

My comment was incorrect. Would you like me to resolve it?

Comment thread dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
Comment on lines +337 to +341
if ((startsWithCaret && messageKey.startsWith(pattern.substring(1))) ||
(endsWithDollar && messageKey.endsWith(pattern.substring(0, pattern.length() - 1))) ||
(!startsWithCaret && !endsWithDollar && messageKey.contains(pattern))
) {
addMessage(type, message, counts, storedMessagesInfo, limit, dispersionQuota);

@coderabbitai coderabbitai Bot Jun 10, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reduce line lengths to comply with 120-character limit in MetadataReporter.

Multiple lines in the MetadataReporter class likely exceed the 120-character maximum:

  • Lines 337-338: Pattern matching conditional
  • Line 341: addMessage call with six parameters
  • Line 352: addMessage call with six parameters
  • Line 399: Conditional checking dispersion quota
  • Line 418: Conditional checking dispersion quota
🔧 Example refactoring for line 341
-                        addMessage(type, message, counts, storedMessagesInfo, limit, dispersionQuota);
+                        addMessage(
+                            type, message, counts, storedMessagesInfo, limit, dispersionQuota);

As per coding guidelines: "Maintain maximum 120 character line length in Java code" for files matching **/*.java.

Also applies to: 352-352, 398-399, 417-418

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
337 - 341, Lines in MetadataCheck/MetadataReporter exceed the 120-char limit
(notably the pattern-match conditional using messageKey/pattern and the
addMessage(...) calls and dispersion quota conditionals). Fix by breaking long
expressions into shorter named boolean/local variables (e.g., compute
startsWithMatch, endsWithMatch, containsMatch from messageKey and pattern) and
by formatting addMessage(type, message, counts, storedMessagesInfo, limit,
dispersionQuota) with one argument per line or grouping to keep each line <120
chars; do the same for the dispersion quota conditionals (assign intermediate
variables like isOverDispersion or hasQuota) to shorten the conditional
expressions. Ensure you update the code in the MetadataCheck class where
messageKey/pattern matching and addMessage calls occur and in the dispersion
quota checks so every logical fragment is on its own line and no line exceeds
120 chars.

Source: Coding guidelines

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines are less than 120 chars.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check line lengths at and around the flagged lines
awk 'NR>=330 && NR<=360 { printf "%d (%d chars): %s\n", NR, length($0), $0 }' \
    dspace-api/src/main/java/org/dspace/health/MetadataCheck.java

echo "---"

awk 'NR>=390 && NR<=425 { printf "%d (%d chars): %s\n", NR, length($0), $0 }' \
    dspace-api/src/main/java/org/dspace/health/MetadataCheck.java

Length of output: 5402



@kuchtiak-ufal, you are correct. After measuring the actual line lengths, all flagged lines are well under the 120-character limit (the longest is line 407 at 119 chars, and line 338 at 114 chars). My original review comment was mistaken. Apologies for the noise!

Comment thread dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (1)

361-362: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Configuration cleanup fix is incomplete—restore original values in try/finally.

While the cleanup at the end of the test addresses the normal execution path, two issues remain:

  1. Not protected by try/finally: If an assertion or exception occurs before line 361, the configuration properties remain modified, polluting subsequent tests.
  2. Setting to null vs. restoring original values: The properties are set to null rather than restored to their original values. This may not properly restore defaults and could interfere with other tests that expect specific default values (typically max-errors-to-show=100 and error-dispersion-quota=5).
🔒 Recommended fix: capture original values and restore in finally block
 public void testMetadataCheckWithRestrictedReportSize() throws Exception {
-    // set max-errors-to-show to 8 and error-dispersion-quota to 1,
-    // This test has 14 errors in total, but the report will contain only 8 error messages.
-    // The errors with low frequency will be prioritized.
-    // The error-dispersion-quota set to 1 means that the number of errors shown
-    // for each error will be almost the same, in this case maximally 2 errors for each error type
     context.turnOffAuthorisationSystem();
 
     ConfigurationService configurationService = DSpaceServicesFactory.getInstance().getConfigurationService();
-    configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8);
-    configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", 1);
+    int originalMaxErrors = configurationService.getIntProperty("healthcheck.metadata.max-errors-to-show", 100);
+    int originalQuota = configurationService.getIntProperty("healthcheck.metadata.error-dispersion-quota", 5);
+    
+    try {
+        // set max-errors-to-show to 8 and error-dispersion-quota to 1,
+        // This test has 14 errors in total, but the report will contain only 8 error messages.
+        // The errors with low frequency will be prioritized.
+        // The error-dispersion-quota set to 1 means that the number of errors shown
+        // for each error will be almost the same, in this case maximally 2 errors for each error type
+        configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8);
+        configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", 1);
 
-    Community community = CommunityBuilder.createCommunity(context)
+        Community community = CommunityBuilder.createCommunity(context)
                 .withName("Community")
                 .build();
 
-    Collection collection = CollectionBuilder.createCollection(context, community)
+        Collection collection = CollectionBuilder.createCollection(context, community)
                 .withName("Collection")
                 .withSubmitterGroup(eperson)
                 .build();
 
-    Item item1 = ItemBuilder.createItem(context, collection)
+        Item item1 = ItemBuilder.createItem(context, collection)
                 .withTitle("Test item 1")
                 .withType("corpus")
                 .withSubject("Test subject")
                 .withMetadata("local", "branding", null, "Community")
                 .build();
 
-    Item item2 = ItemBuilder.createItem(context, collection)
+        Item item2 = ItemBuilder.createItem(context, collection)
                 .withTitle("Test item 2")
                 .withType("toolService")
                 .withSubject("Test subject")
                 .withMetadata("local", "branding", null, "Community")
                 .withMetadata("dc", "relation", "replaces", findItemUri(item1))
                 .build();
 
-    ItemBuilder.createItem(context, collection)
+        ItemBuilder.createItem(context, collection)
                 .withTitle("Test item 3")
                 .withType("toolService")
                 .withSubject("Test subject")
                 .withMetadata("local", "branding", null, "Community")
                 .withMetadata("dc", "relation", "isreplacedby", findItemUri(item2))
                 .build();
 
-    // create 4 items with missing title
-    for (int i = 0; i < 4; i++) {
-        ItemBuilder.createItem(context, collection)
-                .withType("toolService")
-                .withSubject("Test subject")
-                .withMetadata("local", "branding", null, "Community")
-                .build();
-    }
+        // create 4 items with missing title
+        for (int i = 0; i < 4; i++) {
+            ItemBuilder.createItem(context, collection)
+                    .withType("toolService")
+                    .withSubject("Test subject")
+                    .withMetadata("local", "branding", null, "Community")
+                    .build();
+        }
 
-    // create 4 items with missing type
-    for (int i = 4; i < 8; i++) {
-        ItemBuilder.createItem(context, collection)
-                .withTitle("Test Item " + i)
-                .withSubject("Test subject")
-                .withMetadata("local", "branding", null, "Community")
-                .build();
-    }
+        // create 4 items with missing type
+        for (int i = 4; i < 8; i++) {
+            ItemBuilder.createItem(context, collection)
+                    .withTitle("Test Item " + i)
+                    .withSubject("Test subject")
+                    .withMetadata("local", "branding", null, "Community")
+                    .build();
+        }
 
-    // create 4 items with duplicate type
-    for (int i = 8; i < 12; i++) {
-        ItemBuilder.createItem(context, collection)
-                .withTitle("Test Item " + i)
-                .withType("toolService")
-                .withType("corpus")
-                .withSubject("Test subject")
-                .withMetadata("local", "branding", null, "Community")
-                .build();
-    }
+        // create 4 items with duplicate type
+        for (int i = 8; i < 12; i++) {
+            ItemBuilder.createItem(context, collection)
+                    .withTitle("Test Item " + i)
+                    .withType("toolService")
+                    .withType("corpus")
+                    .withSubject("Test subject")
+                    .withMetadata("local", "branding", null, "Community")
+                    .build();
+        }
 
-    TestDSpaceRunnableHandler testDSpaceRunnableHandler = new TestDSpaceRunnableHandler();
+        TestDSpaceRunnableHandler testDSpaceRunnableHandler = new TestDSpaceRunnableHandler();
 
-    // with "health-report -c 5", only Metadata check is running
-    String[] args = new String[]{"health-report", "-c", "5"};
-    ScriptLauncher.handleScript(args, ScriptLauncher.getConfig(kernelImpl), testDSpaceRunnableHandler, kernelImpl);
+        // with "health-report -c 5", only Metadata check is running
+        String[] args = new String[]{"health-report", "-c", "5"};
+        ScriptLauncher.handleScript(args, ScriptLauncher.getConfig(kernelImpl), testDSpaceRunnableHandler, kernelImpl);
 
-    assertThat(testDSpaceRunnableHandler.getErrorMessages(), empty());
-    List<String> messages = testDSpaceRunnableHandler.getInfoMessages();
+        assertThat(testDSpaceRunnableHandler.getErrorMessages(), empty());
+        List<String> messages = testDSpaceRunnableHandler.getInfoMessages();
 
-    assertThat(messages, hasSize(1));
-    assertThat(messages.get(0), containsString("dc.relation issues:  " + " ".repeat(15) + "2"));
-    assertThat(messages.get(0), containsString("dc.title issues:     " + " ".repeat(15) + "4"));
-    assertThat(messages.get(0), containsString("dc.type issues:      " + " ".repeat(15) + "4"));
-    assertThat(messages.get(0), containsString("duplicate value issues:" + " ".repeat(13) + "4"));
-    assertThat(messages.get(0), containsString("Error count total:   " + " ".repeat(14) + "14"));
+        assertThat(messages, hasSize(1));
+        assertThat(messages.get(0), containsString("dc.relation issues:  " + " ".repeat(15) + "2"));
+        assertThat(messages.get(0), containsString("dc.title issues:     " + " ".repeat(15) + "4"));
+        assertThat(messages.get(0), containsString("dc.type issues:      " + " ".repeat(15) + "4"));
+        assertThat(messages.get(0), containsString("duplicate value issues:" + " ".repeat(13) + "4"));
+        assertThat(messages.get(0), containsString("Error count total:   " + " ".repeat(14) + "14"));
 
-    assertThat(messages.get(0), containsString("Errors:"));
+        assertThat(messages.get(0), containsString("Errors:"));
 
-    // check if dc.type error is present exactly 2 times
-    assertThat(StringUtils.countMatches(messages.get(0), "Does not have dc.type metadata"), is(2));
-    // check if dc.title error is present exactly 2 times
-    assertThat(StringUtils.countMatches(messages.get(0), "Item has no dc.title metadata"), is(2));
-    // check id duplicate value error is present exactly 2 times
-    assertThat(StringUtils.countMatches(messages.get(0), "value [dc.type] is present multiple times"), is(2));
+        // check if dc.type error is present exactly 2 times
+        assertThat(StringUtils.countMatches(messages.get(0), "Does not have dc.type metadata"), is(2));
+        // check if dc.title error is present exactly 2 times
+        assertThat(StringUtils.countMatches(messages.get(0), "Item has no dc.title metadata"), is(2));
+        // check id duplicate value error is present exactly 2 times
+        assertThat(StringUtils.countMatches(messages.get(0), "value [dc.type] is present multiple times"), is(2));
 
-    // check if all dc.relation errors are present
-    assertThat(StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.replaces"), is(1));
-    assertThat(
-            StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.isreplacedby"), is(1));
-    assertThat(messages.get(0), containsString("and more..."));
-
-    configurationService.setProperty("healthcheck.metadata.max-errors-to-show", null);
-    configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", null);
+        // check if all dc.relation errors are present
+        assertThat(StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.replaces"), is(1));
+        assertThat(
+                StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.isreplacedby"), is(1));
+        assertThat(messages.get(0), containsString("and more..."));
+    } finally {
+        configurationService.setProperty("healthcheck.metadata.max-errors-to-show", originalMaxErrors);
+        configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", originalQuota);
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines
361 - 362, Capture the original values of the two properties before modifying
them (use configurationService.getProperty or equivalent to read
"healthcheck.metadata.max-errors-to-show" and
"healthcheck.metadata.error-dispersion-quota"), then wrap the test logic that
mutates them in a try/finally and restore the saved originals in the finally
block (call configurationService.setProperty with the original values; if
originals were null/absent, remove/reset appropriately). Update references
around configurationService.setProperty(...) in HealthReportIT so the properties
are restored to their prior values regardless of test failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 361-362: Capture the original values of the two properties before
modifying them (use configurationService.getProperty or equivalent to read
"healthcheck.metadata.max-errors-to-show" and
"healthcheck.metadata.error-dispersion-quota"), then wrap the test logic that
mutates them in a try/finally and restore the saved originals in the finally
block (call configurationService.setProperty with the original values; if
originals were null/absent, remove/reset appropriately). Update references
around configurationService.setProperty(...) in HealthReportIT so the properties
are restored to their prior values regardless of test failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ebebc4a2-be81-4b68-af63-72821b98391d

📥 Commits

Reviewing files that changed from the base of the PR and between 24d3dbb and 16eb768.

📒 Files selected for processing (1)
  • dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants