Health report, report diff fixes#1254
Conversation
…f comparison logic
There was a problem hiding this comment.
Pull request overview
This PR updates the Health Report and Report Diff CLI tooling in CLARIN-DSpace to improve comparison correctness when reports contain different check selections and to align CLI option handling/help output.
Changes:
- Remove the legacy
healthcheckCLI command fromlauncher.xmland rely on the Spring script-service basedhealth-report. - Update
report-diffto normalize report JSON to the intersection of check names, add a “Skipped Checks” section, and make key-field mappings independent of check ordering via selector-based paths. - Standardize help flags to
-h/--helpand update tests/fixtures accordingly.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
dspace/config/launcher.xml |
Removes legacy healthcheck command entry. |
dspace-api/src/test/java/org/dspace/scripts/ReportDiffIT.java |
Updates help flag usage and adds coverage for skipped checks / name-based comparisons. |
dspace-api/src/main/resources/report-diff-fields.json |
Switches from index-based paths to selector-by-check-name paths. |
dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiffScriptConfiguration.java |
Changes -i to -h and clarifies -c semantics for filtering comparisons. |
dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java |
Implements intersection normalization, skipped-check reporting, and selector-based field resolution. |
dspace-api/src/main/java/org/dspace/app/healthreport/HealthReportScriptConfiguration.java |
Changes help flag to -h, supports multiple -c values, and renames output option to -r. |
dspace-api/src/main/java/org/dspace/app/healthreport/HealthReport.java |
Implements multi-check -c parsing, help renaming, and output option rename. |
| <class>org.dspace.storage.bitstore.BitStoreMigrate</class> | ||
| </step> | ||
| </command> | ||
| <command> | ||
| <name>healthcheck</name> | ||
| <description>Create health check report</description> | ||
| <step> | ||
| <class>org.dspace.health.Report</class> | ||
| </step> | ||
| </command> | ||
|
|
||
| <command> | ||
| <name>checker</name> | ||
| <description>Run the checksum checker</description> | ||
| <step> | ||
| <class>org.dspace.app.checker.ChecksumChecker</class> |
There was a problem hiding this comment.
launcher.xml no longer exposes the legacy healthcheck command, but the implementation class org.dspace.health.Report still exists in the codebase. If this removal is intentional, consider either removing/deprecating the legacy class as well, or adding a compatibility alias / release note so existing automation using dspace healthcheck doesn’t break unexpectedly.
There was a problem hiding this comment.
@milanmajchrak In dspace/health, there are still many classes from LINDAT (version 5). Which of them do we not want? Does solving this issue also remove the old unused classes (for example, the healthcheck problem that may call a process with the error mentioned by Milan Kuchtiak)?
There was a problem hiding this comment.
I'd also remove the org.dspace.health.Report class i it's not used anymore.
Why to keep it there?
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java:385
displayReportDates()also assumes the newest report is at the end offindAll()(reverses index viaallReports.size() - 1 - i). SincefindAll()is unordered, the listing and max-entry truncation can be inconsistent. Prefer querying reports already ordered bylastModified/idDESC and applying the limit at the query level.
List<ReportResult> allReports = reportResultService.findAll(context);
// Determine how many reports to process, respecting maxEntries if it's within valid range
long limitCount = (maxEntries > 0 && maxEntries < allReports.size()) ? maxEntries : allReports.size();
Map<String, List<DateWithArgs>> reportDatesMap = new HashMap<>();
for (long i = 0; i < limitCount; i++) {
// the newest report is at the end of the list, so we reverse the index
ReportResult report = allReports.get(allReports.size() - 1 - (int) i);
String formattedDate = FORMATTER.format(report.getLastModified()
dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java:272
- This test picks the "latest"
ReportResultviafindAll(context).get(size - 1), butReportResultDAOinheritsAbstractHibernateDAO.findAll()which has no ORDER BY. This can make the test flaky if the DB returns rows in a different order or other reports exist. Sort bylastModified/idexplicitly (or add a dedicated query for the most recent report) before asserting onlatest.getArgs().
context.reloadEntity(eperson);
List<ReportResult> allReports = reportResultService.findAll(context);
ReportResult latest = allReports.get(allReports.size() - 1);
assertThat(handler.getErrorMessages(), empty());
assertThat(latest.getArgs(), containsString("-c: 2"));
assertThat(latest.getArgs(), containsString("-c: 3"));
Health Report & Report Diff — User GuideTwo CLI/REST scripts that help administrators monitor the state of a CLARIN-DSpace repository over time:
Both scripts can be invoked from the command line ( the DSpace REST API ( below are written for the bash launcher; on Windows replace
1.
|
| context.reloadEntity(eperson); | ||
| List<ReportResult> allReports = reportResultService.findAll(context); | ||
| ReportResult latest = allReports.get(allReports.size() - 1); | ||
|
|
| private void displayReportDates() { | ||
| try (Context context = new Context()) { | ||
| context.setCurrentUser(ePersonService.find(context, getEpersonIdentifier())); | ||
| List<ReportResult> allReports = reportResultService.findAll(context); | ||
| // Determine how many reports to process, respecting the `limit` if it's within valid range | ||
| long limitCount = (limit > 0 && limit < allReports.size()) ? limit : allReports.size(); | ||
| // findAll() does not guarantee ordering; sort by lastModified ascending so the | ||
| // newest reports are at the end of the list. | ||
| allReports.sort(Comparator.comparing(ReportResult::getLastModified)); | ||
| // Determine how many reports to process, respecting maxEntries if it's within valid range | ||
| long limitCount = (maxEntries > 0 && maxEntries < allReports.size()) ? maxEntries : allReports.size(); |
| " -o, --output Specify a file to save the report\n\n" + | ||
| "If you want to execute only one check using -c, use check index:\n" + checksNamesToString() + "\n" | ||
| " -c, --check Perform specific check(s) by index (0-" + (getNumberOfChecks() - 1) + | ||
| "). Accepts multiple space-separated values, e.g. -c 0 3 4\n" + |
There was a problem hiding this comment.
Also mention here, that the default value is: All checks
| "If you want to execute only one check using -c, use check index:\n" + checksNamesToString() + "\n" | ||
| " -c, --check Perform specific check(s) by index (0-" + (getNumberOfChecks() - 1) + | ||
| "). Accepts multiple space-separated values, e.g. -c 0 3 4\n" + | ||
| " -f, --for Specify the last N days to consider (positive integer)\n" + |
There was a problem hiding this comment.
Please mention here what is the default, e.g.
The default value is: 7
|
Would be nice to see the report number in the report result/email ? Otherwise, user needs to run Soething like: |
| options.getOption("c").setType(String.class); | ||
| Option checkOption = Option.builder("c").longOpt("check").hasArgs() | ||
| .desc(String.format("Filter comparison to one or more specific checks by index (0 to %d). " + | ||
| "Repeat the flag (e.g. -c 1 -c 3) to compare multiple checks from both reports.", |
There was a problem hiding this comment.
would be good if -c option in health-report would have the same format as -c option in report-diff.
|
Regarding incompatible reports, comparing the report-diff, I got the following results (example):
I think, all the information started from Section 3: is not needed to show. The report-diff, for incompatible reports, may be shorter: |
| private void defaultDate(Context context) { | ||
| if (Objects.nonNull(from) && Objects.nonNull(to)) { | ||
| private void defaultReportIds(Context context) { | ||
| if (Objects.nonNull(sourceReportId) && Objects.nonNull(targetReportId)) { |
There was a problem hiding this comment.
I'd move this check to internalRun() method (prior to this method call)
| return; | ||
| } | ||
| handler.logInfo("No dates specified, using the last two dates from the database."); | ||
| handler.logInfo("No report IDs specified, using the last two reports from the database."); |
There was a problem hiding this comment.
I can see redundant messages here (one logWarning and one logInfo):
dspace@074ee81d33a2:/dspace/bin$ ./dspace report-diff -s 19
The script has started
Only one of '-s'/'-t' was specified. The last two reports from the database will be compared instead.
No report IDs specified, using the last two reports from the database.
CLARIN DSpace: Repository Health Report Diff
...
| defaultDate(context); | ||
| // If only one of -s/-t is specified, warn the user and fall back to the | ||
| // last two reports for comparison. | ||
| if ((sourceReportId == null) ^ (targetReportId == null)) { |
There was a problem hiding this comment.
In case of sourceReportId is specified and targetReportId not, I'd compare the sourceReportId with the last one.
|
|
||
| if (fromReport == null || toReport == null) { | ||
| handler.logInfo("No reports found for specified dates."); | ||
| handler.logInfo("No reports found for specified report IDs."); |
There was a problem hiding this comment.
This is not entirely correct message.
One id may be correct, other not.
I'd use:
if (fromReport == null) {
handler.logInfo("No report found for report ID:" + sourceReportId);
return;
);
if (toReport == null) {
handler.logInfo("No report found for report ID:" + targetReportId);
return;
);
| * @throws IOException if JSON parsing fails | ||
| */ | ||
| private String generateEnhancedKeyChangesTable(String oldJson, String newJson, | ||
| Date fromDate, Date toDate) throws IOException { |
There was a problem hiding this comment.
fromDate, toDate were replaced with sourceReportId, targetReportId.
Please, update the JavaDoc
Problem description
Followed issues:
https://github.com/dataquest-dev/dspace-customers/issues/430
#1250
Manual Testing (if applicable)
Copilot review