[Port to dtq-dev] Issue ufal/clarin-dspace#1317 metadata health check#1307
[Port to dtq-dev] Issue ufal/clarin-dspace#1317 metadata health check#1307kosarko wants to merge 4 commits into
Conversation
* Issue 1317: health-check for metadata - initial commit * implement MetadataCheck report * improve MetadataCheck * improve MetadataCheck with more sophisticated selection of error/warning messages * code cleanup * rename qa-metadata-error-patterns.json to metadata-check-patterns.json * implement copilot suggestions * add test for Metadata check to HealthReportIT * improved documentation * added test * improve documentation * set default error dispersion quota to 5 (cherry picked from commit 39157a5)
milanmajchrak
left a comment
There was a problem hiding this comment.
There are also failing integration tests.
|
Regarding "There are also failing integration tests". There's a complex HealthReportIT test testing this check. |
📝 WalkthroughWalkthroughAdds a MetadataCheck health report that runs the metadataqa curation task, classifies output by configurable regex patterns into errors/warnings, aggregates per-type counts with sampled examples, updates report JSON mappings, and includes integration tests exercising limits and dispersion behavior. ChangesMetadata Check Implementation and Integration
Sequence DiagramsequenceDiagram
participant Test as HealthReportIT
participant Script as Health Report Script
participant MetadataCheck as MetadataCheck.run()
participant CurTask as metadataqa curation
participant Reporter as MetadataReporter
participant Patterns as Pattern Map
Test->>Script: execute health-report (metadata check)
Script->>MetadataCheck: invoke run()
MetadataCheck->>MetadataCheck: read config limits and quotas
MetadataCheck->>CurTask: execute curation task with Reporter
CurTask->>Reporter: stream output lines
Reporter->>Patterns: match messageKey against patterns
Patterns-->>Reporter: classify as error or warning
Reporter->>Reporter: aggregate counts, store examples
Reporter-->>MetadataCheck: return aggregated results
MetadataCheck->>Script: setReportJson(counts + messages)
Script->>Test: return report JSON
Test->>Test: assert counts and message truncation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (11)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (5)
370-370: 💤 Low valueAdd explicit
privateaccess modifier to helper method.The method lacks an explicit access modifier, defaulting to package-private. For test helper methods that should not be accessed outside this class, use explicit
private.Suggested fix
-ReportResult findLastReportResult(List<ReportResult> reportResults) { +private ReportResult findLastReportResult(List<ReportResult> reportResults) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 370, The helper method findLastReportResult currently has package-private visibility; change its declaration to include the private modifier so it becomes private ReportResult findLastReportResult(List<ReportResult> reportResults). Update the method signature in HealthReportIT (method name: findLastReportResult) to be explicitly private and ensure no external references rely on package visibility.
375-375: 💤 Low valueAdd explicit
privateaccess modifier to helper method.The method lacks an explicit access modifier, defaulting to package-private. For test helper methods that should not be accessed outside this class, use explicit
private.Suggested fix
-JsonNode findCheckByName(JsonNode root, String checkName) { +private JsonNode findCheckByName(JsonNode root, String checkName) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 375, The helper method findCheckByName in HealthReportIT is declared with default (package-private) visibility; make it explicitly private to restrict access to this test class only by adding the private access modifier to the method declaration (i.e., change "JsonNode findCheckByName(...)" to "private JsonNode findCheckByName(...)" within the HealthReportIT class).
256-256: ⚡ Quick winAdd Javadoc for public test method.
Per coding guidelines, all public methods should have Javadoc comments. Consider adding a brief description explaining how this test validates report size restrictions via configuration.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 256, Add a Javadoc comment to the public test method in HealthReportIT that is annotated with `@Test` (the test validating report size restrictions via configuration). Place a brief Javadoc immediately above the method (above the `@Test` annotation) describing the test purpose, e.g., that it verifies report size limits are enforced based on configuration and what behavior is asserted; reference the test’s name in the description if present to make intent clear. Ensure the Javadoc follows project style (one-line summary and optionally a short sentence) and is applied to the public test method in class HealthReportIT.Source: Coding guidelines
209-214: ⚡ Quick winConsider more flexible assertion approach for formatted output.
The assertions use hardcoded whitespace patterns (
" ".repeat(15)) that tightly couple tests to the exact output formatting. If MetadataCheck adjusts padding logic, these tests will fail even though the functionality remains correct.Alternative assertion approach
Consider using regex patterns or extracting numeric values separately:
-assertThat(messages.get(0), containsString("dc.relation issues: " + " ".repeat(15) + "2")); +assertThat(messages.get(0), containsString("dc.relation issues:")); +assertThat(messages.get(0), containsString("2"));Or use a regex matcher to verify the count appears after the label without enforcing exact spacing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines 209 - 214, The assertions in HealthReportIT are brittle due to hardcoded spacing; update the containsString checks on messages.get(0) to use flexible regex or numeric extraction instead of " ".repeat(15). For each label (e.g., "dc.relation issues", "dc.title issues", "dc.type issues", "Error count total", "dc.subject issues", "Warning count total") replace the containsString assertions with a regex-based assertion (e.g., pattern "dc\\.relation issues:\\s+2") or extract the numeric suffix via a Pattern/Matcher and assert the parsed integer equals the expected count; keep the checks against messages.get(0) and the same labels (MetadataCheck-related labels) so only spacing enforcement is removed.
154-154: ⚡ Quick winAdd Javadoc for public test method.
Per coding guidelines, all public methods should have Javadoc comments. Consider adding a brief description of what this test validates.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` at line 154, Add a Javadoc comment to the public test method in the HealthReportIT class (the method annotated with `@Test` at the shown location); the comment should be a brief one-line description stating what the test validates (e.g., expected behavior or assertion being checked) and can omit `@param/`@return since it’s a test method; place the Javadoc immediately above the method declaration so it documents that specific test.Source: Coding guidelines
dspace-api/src/main/java/org/dspace/health/MetadataCheck.java (6)
461-465: 💤 Low valueClarify sentinel value for
highestFrequencyinitialization.Line 463 initializes
highestFrequencytoInteger.MAX_VALUE, which acts as a sentinel meaning "not yet calculated". This is non-obvious and could confuse maintainers. A brief comment explaining that this sentinel value ensures the replacement logic recalculates the actual frequency on first use (lines 407-409) would improve clarity.📝 Suggested comment
StoredMessagesInfo() { this.count = 0; + // Sentinel value; actual frequency is computed lazily in replaceMessage this.highestFrequency = Integer.MAX_VALUE; storedMessages = new TreeMap<>(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines 461 - 465, The constructor StoredMessagesInfo() sets highestFrequency to Integer.MAX_VALUE as a sentinel meaning "not yet calculated", which is non-obvious; update the constructor (StoredMessagesInfo) to add a short comment next to the highestFrequency initialization explaining that Integer.MAX_VALUE is used intentionally so the replacement logic in the class will trigger recalculation on first use (i.e., it signals "uninitialized" and ensures the algorithm that compares/replaces frequencies will compute the real highest frequency when needed).
212-224: 💤 Low valueSimplify pattern collection with direct iteration.
The
getPatternsmethod usesStreamSupport.stream(patterns.spliterator(), false)to iterateArrayNode. SinceArrayNodeimplementsIterable<JsonNode>, you can iterate directly or useforEachwithout constructing a stream.♻️ Simpler alternative
private static Map<String, List<String>> getPatterns(JsonNode parentNode) { Map<String, List<String>> validationPatterns = new HashMap<>(); parentNode.fieldNames().forEachRemaining(validationType -> { List<String> validationMessages = new ArrayList<>(); ArrayNode patterns = parentNode.withArray(validationType); - StreamSupport.stream(patterns.spliterator(), false).forEach(message -> { + patterns.forEach(message -> { validationMessages.add(message.asText()); }); validationPatterns.put(validationType, validationMessages); }); return validationPatterns; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines 212 - 224, The getPatterns method currently builds validationMessages by streaming over the ArrayNode via StreamSupport; simplify this by iterating the ArrayNode directly (it implements Iterable<JsonNode>) — inside getPatterns, replace StreamSupport.stream(patterns.spliterator(), false).forEach(...) with a direct for-each loop or patterns.forEach(node -> ...) that calls validationMessages.add(node.asText()), keeping the rest of the logic (validationPatterns.put(validationType, validationMessages)) unchanged to preserve behavior.
21-31: ⚡ Quick winConsider standardizing on a single JSON library.
The code uses both Jackson (
ObjectMapper,JsonNode) for reading input at lines 203, 206-208 and org.json (JSONObject,JSONArray) for constructing output at lines 92, 174-190. Mixing JSON libraries increases the dependency surface and can lead to version conflicts or inconsistent behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines 21 - 31, The code mixes Jackson and org.json; standardize on Jackson in the MetadataCheck class by replacing all org.json JSONObject/JSONArray usages (the places constructing/reporting JSON output) with Jackson types (ObjectNode/ArrayNode) created from the existing ObjectMapper, update the code paths that currently call new JSONObject()/new JSONArray() to build ObjectNode/ArrayNode trees, replace any JSONObject.toString() or similar with ObjectMapper.writeValueAsString(), and remove org.json imports so only Jackson imports remain (keep existing ObjectMapper/JsonNode/ArrayNode usage for parsing and output).
65-71: ⚖️ Poor tradeoffStatic initialization failure will cause confusing class-loading errors.
If
metadata-check-patterns.jsonis missing or malformed, theIOExceptioncaught at line 68 is wrapped in aRuntimeExceptionthat will fail during class initialization, potentially far from where the healthcheck is invoked. This makes debugging difficult because the stack trace will point to the static initializer rather than the health-report execution context.Consider lazy-loading the patterns in
run()or providing a more descriptive startup validation that fails fast with clear diagnostics.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines 65 - 71, The static initializer in MetadataCheck currently calls loadPatterns() and throws a RuntimeException on IOException which causes class-loading failures; change this to lazy-load or validate patterns inside the run() method instead: remove (or guard) the static loadPatterns() call and ensure loadPatterns() is invoked from run() (or another startup-validation method) with proper try/catch that logs a clear, descriptive error and returns a sensible health-check result instead of letting initialization fail; reference MetadataCheck, its static initializer, loadPatterns(), and run() when making the change.
407-407: 💤 Low valueRemove unnecessary
Objects.requireNonNullwrapper.Line 407 wraps
getMessageWithHighestCount()inObjects.requireNonNull(), but the method at lines 431-437 uses.orElseThrow()which already throwsNoSuchElementExceptionand never returnsnull. TherequireNonNullis redundant.🧹 Proposed simplification
- String messageKeyWithHighestFrequency = Objects.requireNonNull(getMessageWithHighestCount(storedMessages)); + String messageKeyWithHighestFrequency = getMessageWithHighestCount(storedMessages);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` at line 407, The assignment to messageKeyWithHighestFrequency unnecessarily wraps getMessageWithHighestCount(storedMessages) with Objects.requireNonNull; remove the requireNonNull wrapper and assign the result directly since getMessageWithHighestCount uses orElseThrow() and cannot return null—update the line that assigns messageKeyWithHighestFrequency to call getMessageWithHighestCount(storedMessages) directly (referencing the variable messageKeyWithHighestFrequency and the method getMessageWithHighestCount).
335-340: ⚡ Quick winPattern matching is substring-based, not regex, which may confuse maintainers.
The code treats
^and$as literal string anchors for prefix/suffix matching rather than regex metacharacters. While this works correctly, calling them "patterns" (line 42, 62-63, 206-208, 314) and storing them in variables likeerrorPatternssuggests regex matching, but the implementation usesstartsWith,endsWith, andcontainsfor simple substring matching.Consider either:
- Renaming variables/documentation to clarify these are "message filters" or "substring rules" rather than "patterns"
- Using actual
Pattern.compile()andMatcherfor true regex support if future extensibility is needed🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines 335 - 340, The code in MetadataCheck uses variables like errorPatterns and a local variable pattern but implements substring logic with startsWith/endsWith/contains, which is misleading; rename errorPatterns to messageFilters (and the local variable pattern to filter) and update inline comments/docstrings to describe "substring filters" or "message rules" (prefix/suffix/contains) instead of "patterns"; update all references to errorPatterns, pattern, and any related method/field names in MetadataCheck to the new names so maintainers understand this is simple substring matching (alternatively, if you want true regex support, replace the startsWith/endsWith/contains logic with Pattern.compile(filter) and Matcher.matches() and update docs accordingly).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java`:
- Around line 337-341: Lines in MetadataCheck/MetadataReporter exceed the
120-char limit (notably the pattern-match conditional using messageKey/pattern
and the addMessage(...) calls and dispersion quota conditionals). Fix by
breaking long expressions into shorter named boolean/local variables (e.g.,
compute startsWithMatch, endsWithMatch, containsMatch from messageKey and
pattern) and by formatting addMessage(type, message, counts, storedMessagesInfo,
limit, dispersionQuota) with one argument per line or grouping to keep each line
<120 chars; do the same for the dispersion quota conditionals (assign
intermediate variables like isOverDispersion or hasQuota) to shorten the
conditional expressions. Ensure you update the code in the MetadataCheck class
where messageKey/pattern matching and addMessage calls occur and in the
dispersion quota checks so every logical fragment is on its own line and no line
exceeds 120 chars.
- Around line 78-79: Several lines in MetadataCheck exceed the 120-character
limit; split long statements and extract intermediate variables to shorten
lines. For the configuration reads (the assignment to maxErrorsToShow and the
other getIntProperty/getBooleanProperty calls), break the call across multiple
lines or assign configurationService.getIntProperty/getBooleanProperty(...) to a
clearly named local variable before using it so each line stays under 120 chars
(refer to the variable maxErrorsToShow and the other config variables in that
block). For the long Curator.curate(...) expression, pull each chained service
factory call into its own local variable (e.g., obtain the service factory, then
the curator, then call curate) so the final Curator.curate(...) invocation is
short; ensure string literals are wrapped or concatenated across lines if
needed. Keep identifiers unchanged and only restructure into multiple shorter
statements to comply with the 120-character rule.
- Around line 73-74: Add a Javadoc comment for the public method run in class
MetadataCheck: document the parameter ReportInfo ri (what it represents),
describe the returned String (what the report contains or format), and note any
exceptions or error conditions the method may throw or return values used to
indicate failure; place the Javadoc immediately above the public String
run(ReportInfo ri) declaration so it follows project guidelines for public
methods.
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 266-267: The test in HealthReportIT mutates global config via
configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8)
and
configurationService.setProperty("healthcheck.metadata.error-dispersion-quota",
1) without restoring originals; capture the original values (e.g., using
configurationService.getProperty or getPropertyAsType) before setting, then
ensure they are restored in a finally block or an `@After` method so other tests
aren’t polluted; update the test method (or add a tearDown/@After method) to
restore the saved originals (handling null/missing keys appropriately) after the
assertions complete.
---
Nitpick comments:
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java`:
- Around line 461-465: The constructor StoredMessagesInfo() sets
highestFrequency to Integer.MAX_VALUE as a sentinel meaning "not yet
calculated", which is non-obvious; update the constructor (StoredMessagesInfo)
to add a short comment next to the highestFrequency initialization explaining
that Integer.MAX_VALUE is used intentionally so the replacement logic in the
class will trigger recalculation on first use (i.e., it signals "uninitialized"
and ensures the algorithm that compares/replaces frequencies will compute the
real highest frequency when needed).
- Around line 212-224: The getPatterns method currently builds
validationMessages by streaming over the ArrayNode via StreamSupport; simplify
this by iterating the ArrayNode directly (it implements Iterable<JsonNode>) —
inside getPatterns, replace StreamSupport.stream(patterns.spliterator(),
false).forEach(...) with a direct for-each loop or patterns.forEach(node -> ...)
that calls validationMessages.add(node.asText()), keeping the rest of the logic
(validationPatterns.put(validationType, validationMessages)) unchanged to
preserve behavior.
- Around line 21-31: The code mixes Jackson and org.json; standardize on Jackson
in the MetadataCheck class by replacing all org.json JSONObject/JSONArray usages
(the places constructing/reporting JSON output) with Jackson types
(ObjectNode/ArrayNode) created from the existing ObjectMapper, update the code
paths that currently call new JSONObject()/new JSONArray() to build
ObjectNode/ArrayNode trees, replace any JSONObject.toString() or similar with
ObjectMapper.writeValueAsString(), and remove org.json imports so only Jackson
imports remain (keep existing ObjectMapper/JsonNode/ArrayNode usage for parsing
and output).
- Around line 65-71: The static initializer in MetadataCheck currently calls
loadPatterns() and throws a RuntimeException on IOException which causes
class-loading failures; change this to lazy-load or validate patterns inside the
run() method instead: remove (or guard) the static loadPatterns() call and
ensure loadPatterns() is invoked from run() (or another startup-validation
method) with proper try/catch that logs a clear, descriptive error and returns a
sensible health-check result instead of letting initialization fail; reference
MetadataCheck, its static initializer, loadPatterns(), and run() when making the
change.
- Line 407: The assignment to messageKeyWithHighestFrequency unnecessarily wraps
getMessageWithHighestCount(storedMessages) with Objects.requireNonNull; remove
the requireNonNull wrapper and assign the result directly since
getMessageWithHighestCount uses orElseThrow() and cannot return null—update the
line that assigns messageKeyWithHighestFrequency to call
getMessageWithHighestCount(storedMessages) directly (referencing the variable
messageKeyWithHighestFrequency and the method getMessageWithHighestCount).
- Around line 335-340: The code in MetadataCheck uses variables like
errorPatterns and a local variable pattern but implements substring logic with
startsWith/endsWith/contains, which is misleading; rename errorPatterns to
messageFilters (and the local variable pattern to filter) and update inline
comments/docstrings to describe "substring filters" or "message rules"
(prefix/suffix/contains) instead of "patterns"; update all references to
errorPatterns, pattern, and any related method/field names in MetadataCheck to
the new names so maintainers understand this is simple substring matching
(alternatively, if you want true regex support, replace the
startsWith/endsWith/contains logic with Pattern.compile(filter) and
Matcher.matches() and update docs accordingly).
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Line 370: The helper method findLastReportResult currently has package-private
visibility; change its declaration to include the private modifier so it becomes
private ReportResult findLastReportResult(List<ReportResult> reportResults).
Update the method signature in HealthReportIT (method name:
findLastReportResult) to be explicitly private and ensure no external references
rely on package visibility.
- Line 375: The helper method findCheckByName in HealthReportIT is declared with
default (package-private) visibility; make it explicitly private to restrict
access to this test class only by adding the private access modifier to the
method declaration (i.e., change "JsonNode findCheckByName(...)" to "private
JsonNode findCheckByName(...)" within the HealthReportIT class).
- Line 256: Add a Javadoc comment to the public test method in HealthReportIT
that is annotated with `@Test` (the test validating report size restrictions via
configuration). Place a brief Javadoc immediately above the method (above the
`@Test` annotation) describing the test purpose, e.g., that it verifies report
size limits are enforced based on configuration and what behavior is asserted;
reference the test’s name in the description if present to make intent clear.
Ensure the Javadoc follows project style (one-line summary and optionally a
short sentence) and is applied to the public test method in class
HealthReportIT.
- Around line 209-214: The assertions in HealthReportIT are brittle due to
hardcoded spacing; update the containsString checks on messages.get(0) to use
flexible regex or numeric extraction instead of " ".repeat(15). For each label
(e.g., "dc.relation issues", "dc.title issues", "dc.type issues", "Error count
total", "dc.subject issues", "Warning count total") replace the containsString
assertions with a regex-based assertion (e.g., pattern "dc\\.relation
issues:\\s+2") or extract the numeric suffix via a Pattern/Matcher and assert
the parsed integer equals the expected count; keep the checks against
messages.get(0) and the same labels (MetadataCheck-related labels) so only
spacing enforcement is removed.
- Line 154: Add a Javadoc comment to the public test method in the
HealthReportIT class (the method annotated with `@Test` at the shown location);
the comment should be a brief one-line description stating what the test
validates (e.g., expected behavior or assertion being checked) and can omit
`@param/`@return since it’s a test method; place the Javadoc immediately above the
method declaration so it documents that specific test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2c06009b-3f0d-4601-8de5-14c743daa6dd
📒 Files selected for processing (5)
dspace-api/src/main/java/org/dspace/health/MetadataCheck.javadspace-api/src/main/resources/metadata-check-patterns.jsondspace-api/src/main/resources/report-diff-fields.jsondspace-api/src/test/java/org/dspace/scripts/HealthReportIT.javadspace/config/modules/healthcheck.cfg
| @Override | ||
| public String run(ReportInfo ri) { |
There was a problem hiding this comment.
Add Javadoc for public method.
The run() method is public and lacks Javadoc documentation. As per coding guidelines, all public methods in Java should have Javadoc comments describing parameters, return values, and any exceptions thrown.
📝 Suggested Javadoc
+ /**
+ * Executes the metadata quality check by running the metadataqa curation task
+ * across the repository and aggregating validation errors and warnings.
+ *
+ * `@param` ri Report information context (currently unused)
+ * `@return` Human-readable text report with error/warning statistics and sample messages
+ */
`@Override`
public String run(ReportInfo ri) {As per coding guidelines: "Provide Javadoc comments for all public classes and methods" for files matching **/*.java.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| public String run(ReportInfo ri) { | |
| /** | |
| * Executes the metadata quality check by running the metadataqa curation task | |
| * across the repository and aggregating validation errors and warnings. | |
| * | |
| * `@param` ri Report information context (currently unused) | |
| * `@return` Human-readable text report with error/warning statistics and sample messages | |
| */ | |
| `@Override` | |
| public String run(ReportInfo ri) { |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
73 - 74, Add a Javadoc comment for the public method run in class MetadataCheck:
document the parameter ReportInfo ri (what it represents), describe the returned
String (what the report contains or format), and note any exceptions or error
conditions the method may throw or return values used to indicate failure; place
the Javadoc immediately above the public String run(ReportInfo ri) declaration
so it follows project guidelines for public methods.
Source: Coding guidelines
There was a problem hiding this comment.
The run() method is the implementation of the Check#run() method, which runs the check.
it's obvious what's doing. None of the extensions of the Check abstract class documents this method.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the base class Check#run() for Javadoc
fd -e java "Check.java" --exec grep -n -A5 "run\|abstract\|Javadoc\|\*" {}
# Check other subclasses of Check to see if they document run()
rg -n "public String run\|/\*\*" --type java -g "**Check.java" -C2Length of output: 16536
@kuchtiak-ufal, you're right — I confirmed this by checking the codebase. The base class Check declares run() at line 28 with no Javadoc, and none of the other subclasses (LogAnalyserCheck, InfoCheck, ItemCheck, UserCheck, LicenseCheck, ChecksumCheck, EmbargoInfoCheck, EmbargoCheck) add Javadoc to their run() overrides either. Your implementation is consistent with the established convention throughout the package.
My comment was incorrect. Would you like me to resolve it?
| if ((startsWithCaret && messageKey.startsWith(pattern.substring(1))) || | ||
| (endsWithDollar && messageKey.endsWith(pattern.substring(0, pattern.length() - 1))) || | ||
| (!startsWithCaret && !endsWithDollar && messageKey.contains(pattern)) | ||
| ) { | ||
| addMessage(type, message, counts, storedMessagesInfo, limit, dispersionQuota); |
There was a problem hiding this comment.
Reduce line lengths to comply with 120-character limit in MetadataReporter.
Multiple lines in the MetadataReporter class likely exceed the 120-character maximum:
- Lines 337-338: Pattern matching conditional
- Line 341:
addMessagecall with six parameters - Line 352:
addMessagecall with six parameters - Line 399: Conditional checking dispersion quota
- Line 418: Conditional checking dispersion quota
🔧 Example refactoring for line 341
- addMessage(type, message, counts, storedMessagesInfo, limit, dispersionQuota);
+ addMessage(
+ type, message, counts, storedMessagesInfo, limit, dispersionQuota);As per coding guidelines: "Maintain maximum 120 character line length in Java code" for files matching **/*.java.
Also applies to: 352-352, 398-399, 417-418
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@dspace-api/src/main/java/org/dspace/health/MetadataCheck.java` around lines
337 - 341, Lines in MetadataCheck/MetadataReporter exceed the 120-char limit
(notably the pattern-match conditional using messageKey/pattern and the
addMessage(...) calls and dispersion quota conditionals). Fix by breaking long
expressions into shorter named boolean/local variables (e.g., compute
startsWithMatch, endsWithMatch, containsMatch from messageKey and pattern) and
by formatting addMessage(type, message, counts, storedMessagesInfo, limit,
dispersionQuota) with one argument per line or grouping to keep each line <120
chars; do the same for the dispersion quota conditionals (assign intermediate
variables like isOverDispersion or hasQuota) to shorten the conditional
expressions. Ensure you update the code in the MetadataCheck class where
messageKey/pattern matching and addMessage calls occur and in the dispersion
quota checks so every logical fragment is on its own line and no line exceeds
120 chars.
Source: Coding guidelines
There was a problem hiding this comment.
The lines are less than 120 chars.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check line lengths at and around the flagged lines
awk 'NR>=330 && NR<=360 { printf "%d (%d chars): %s\n", NR, length($0), $0 }' \
dspace-api/src/main/java/org/dspace/health/MetadataCheck.java
echo "---"
awk 'NR>=390 && NR<=425 { printf "%d (%d chars): %s\n", NR, length($0), $0 }' \
dspace-api/src/main/java/org/dspace/health/MetadataCheck.javaLength of output: 5402
@kuchtiak-ufal, you are correct. After measuring the actual line lengths, all flagged lines are well under the 120-character limit (the longest is line 407 at 119 chars, and line 338 at 114 chars). My original review comment was mistaken. Apologies for the noise!
There was a problem hiding this comment.
♻️ Duplicate comments (1)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java (1)
361-362:⚠️ Potential issue | 🟠 Major | ⚡ Quick winConfiguration cleanup fix is incomplete—restore original values in try/finally.
While the cleanup at the end of the test addresses the normal execution path, two issues remain:
- Not protected by try/finally: If an assertion or exception occurs before line 361, the configuration properties remain modified, polluting subsequent tests.
- Setting to null vs. restoring original values: The properties are set to
nullrather than restored to their original values. This may not properly restore defaults and could interfere with other tests that expect specific default values (typicallymax-errors-to-show=100anderror-dispersion-quota=5).🔒 Recommended fix: capture original values and restore in finally block
public void testMetadataCheckWithRestrictedReportSize() throws Exception { - // set max-errors-to-show to 8 and error-dispersion-quota to 1, - // This test has 14 errors in total, but the report will contain only 8 error messages. - // The errors with low frequency will be prioritized. - // The error-dispersion-quota set to 1 means that the number of errors shown - // for each error will be almost the same, in this case maximally 2 errors for each error type context.turnOffAuthorisationSystem(); ConfigurationService configurationService = DSpaceServicesFactory.getInstance().getConfigurationService(); - configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8); - configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", 1); + int originalMaxErrors = configurationService.getIntProperty("healthcheck.metadata.max-errors-to-show", 100); + int originalQuota = configurationService.getIntProperty("healthcheck.metadata.error-dispersion-quota", 5); + + try { + // set max-errors-to-show to 8 and error-dispersion-quota to 1, + // This test has 14 errors in total, but the report will contain only 8 error messages. + // The errors with low frequency will be prioritized. + // The error-dispersion-quota set to 1 means that the number of errors shown + // for each error will be almost the same, in this case maximally 2 errors for each error type + configurationService.setProperty("healthcheck.metadata.max-errors-to-show", 8); + configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", 1); - Community community = CommunityBuilder.createCommunity(context) + Community community = CommunityBuilder.createCommunity(context) .withName("Community") .build(); - Collection collection = CollectionBuilder.createCollection(context, community) + Collection collection = CollectionBuilder.createCollection(context, community) .withName("Collection") .withSubmitterGroup(eperson) .build(); - Item item1 = ItemBuilder.createItem(context, collection) + Item item1 = ItemBuilder.createItem(context, collection) .withTitle("Test item 1") .withType("corpus") .withSubject("Test subject") .withMetadata("local", "branding", null, "Community") .build(); - Item item2 = ItemBuilder.createItem(context, collection) + Item item2 = ItemBuilder.createItem(context, collection) .withTitle("Test item 2") .withType("toolService") .withSubject("Test subject") .withMetadata("local", "branding", null, "Community") .withMetadata("dc", "relation", "replaces", findItemUri(item1)) .build(); - ItemBuilder.createItem(context, collection) + ItemBuilder.createItem(context, collection) .withTitle("Test item 3") .withType("toolService") .withSubject("Test subject") .withMetadata("local", "branding", null, "Community") .withMetadata("dc", "relation", "isreplacedby", findItemUri(item2)) .build(); - // create 4 items with missing title - for (int i = 0; i < 4; i++) { - ItemBuilder.createItem(context, collection) - .withType("toolService") - .withSubject("Test subject") - .withMetadata("local", "branding", null, "Community") - .build(); - } + // create 4 items with missing title + for (int i = 0; i < 4; i++) { + ItemBuilder.createItem(context, collection) + .withType("toolService") + .withSubject("Test subject") + .withMetadata("local", "branding", null, "Community") + .build(); + } - // create 4 items with missing type - for (int i = 4; i < 8; i++) { - ItemBuilder.createItem(context, collection) - .withTitle("Test Item " + i) - .withSubject("Test subject") - .withMetadata("local", "branding", null, "Community") - .build(); - } + // create 4 items with missing type + for (int i = 4; i < 8; i++) { + ItemBuilder.createItem(context, collection) + .withTitle("Test Item " + i) + .withSubject("Test subject") + .withMetadata("local", "branding", null, "Community") + .build(); + } - // create 4 items with duplicate type - for (int i = 8; i < 12; i++) { - ItemBuilder.createItem(context, collection) - .withTitle("Test Item " + i) - .withType("toolService") - .withType("corpus") - .withSubject("Test subject") - .withMetadata("local", "branding", null, "Community") - .build(); - } + // create 4 items with duplicate type + for (int i = 8; i < 12; i++) { + ItemBuilder.createItem(context, collection) + .withTitle("Test Item " + i) + .withType("toolService") + .withType("corpus") + .withSubject("Test subject") + .withMetadata("local", "branding", null, "Community") + .build(); + } - TestDSpaceRunnableHandler testDSpaceRunnableHandler = new TestDSpaceRunnableHandler(); + TestDSpaceRunnableHandler testDSpaceRunnableHandler = new TestDSpaceRunnableHandler(); - // with "health-report -c 5", only Metadata check is running - String[] args = new String[]{"health-report", "-c", "5"}; - ScriptLauncher.handleScript(args, ScriptLauncher.getConfig(kernelImpl), testDSpaceRunnableHandler, kernelImpl); + // with "health-report -c 5", only Metadata check is running + String[] args = new String[]{"health-report", "-c", "5"}; + ScriptLauncher.handleScript(args, ScriptLauncher.getConfig(kernelImpl), testDSpaceRunnableHandler, kernelImpl); - assertThat(testDSpaceRunnableHandler.getErrorMessages(), empty()); - List<String> messages = testDSpaceRunnableHandler.getInfoMessages(); + assertThat(testDSpaceRunnableHandler.getErrorMessages(), empty()); + List<String> messages = testDSpaceRunnableHandler.getInfoMessages(); - assertThat(messages, hasSize(1)); - assertThat(messages.get(0), containsString("dc.relation issues: " + " ".repeat(15) + "2")); - assertThat(messages.get(0), containsString("dc.title issues: " + " ".repeat(15) + "4")); - assertThat(messages.get(0), containsString("dc.type issues: " + " ".repeat(15) + "4")); - assertThat(messages.get(0), containsString("duplicate value issues:" + " ".repeat(13) + "4")); - assertThat(messages.get(0), containsString("Error count total: " + " ".repeat(14) + "14")); + assertThat(messages, hasSize(1)); + assertThat(messages.get(0), containsString("dc.relation issues: " + " ".repeat(15) + "2")); + assertThat(messages.get(0), containsString("dc.title issues: " + " ".repeat(15) + "4")); + assertThat(messages.get(0), containsString("dc.type issues: " + " ".repeat(15) + "4")); + assertThat(messages.get(0), containsString("duplicate value issues:" + " ".repeat(13) + "4")); + assertThat(messages.get(0), containsString("Error count total: " + " ".repeat(14) + "14")); - assertThat(messages.get(0), containsString("Errors:")); + assertThat(messages.get(0), containsString("Errors:")); - // check if dc.type error is present exactly 2 times - assertThat(StringUtils.countMatches(messages.get(0), "Does not have dc.type metadata"), is(2)); - // check if dc.title error is present exactly 2 times - assertThat(StringUtils.countMatches(messages.get(0), "Item has no dc.title metadata"), is(2)); - // check id duplicate value error is present exactly 2 times - assertThat(StringUtils.countMatches(messages.get(0), "value [dc.type] is present multiple times"), is(2)); + // check if dc.type error is present exactly 2 times + assertThat(StringUtils.countMatches(messages.get(0), "Does not have dc.type metadata"), is(2)); + // check if dc.title error is present exactly 2 times + assertThat(StringUtils.countMatches(messages.get(0), "Item has no dc.title metadata"), is(2)); + // check id duplicate value error is present exactly 2 times + assertThat(StringUtils.countMatches(messages.get(0), "value [dc.type] is present multiple times"), is(2)); - // check if all dc.relation errors are present - assertThat(StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.replaces"), is(1)); - assertThat( - StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.isreplacedby"), is(1)); - assertThat(messages.get(0), containsString("and more...")); - - configurationService.setProperty("healthcheck.metadata.max-errors-to-show", null); - configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", null); + // check if all dc.relation errors are present + assertThat(StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.replaces"), is(1)); + assertThat( + StringUtils.countMatches(messages.get(0), "does not refer back via dc.relation.isreplacedby"), is(1)); + assertThat(messages.get(0), containsString("and more...")); + } finally { + configurationService.setProperty("healthcheck.metadata.max-errors-to-show", originalMaxErrors); + configurationService.setProperty("healthcheck.metadata.error-dispersion-quota", originalQuota); + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java` around lines 361 - 362, Capture the original values of the two properties before modifying them (use configurationService.getProperty or equivalent to read "healthcheck.metadata.max-errors-to-show" and "healthcheck.metadata.error-dispersion-quota"), then wrap the test logic that mutates them in a try/finally and restore the saved originals in the finally block (call configurationService.setProperty with the original values; if originals were null/absent, remove/reset appropriately). Update references around configurationService.setProperty(...) in HealthReportIT so the properties are restored to their prior values regardless of test failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java`:
- Around line 361-362: Capture the original values of the two properties before
modifying them (use configurationService.getProperty or equivalent to read
"healthcheck.metadata.max-errors-to-show" and
"healthcheck.metadata.error-dispersion-quota"), then wrap the test logic that
mutates them in a try/finally and restore the saved originals in the finally
block (call configurationService.setProperty with the original values; if
originals were null/absent, remove/reset appropriately). Update references
around configurationService.setProperty(...) in HealthReportIT so the properties
are restored to their prior values regardless of test failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ebebc4a2-be81-4b68-af63-72821b98391d
📒 Files selected for processing (1)
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java
Port of ufal#1338 by @kuchtiak-ufal to
dtq-dev.Summary by CodeRabbit
New Features
Chores
Tests