Skip to content

Health report, report diff fixes#1254

Open
Kasinhou wants to merge 16 commits into
dtq-devfrom
ufal/health-report-checks-fix
Open

Health report, report diff fixes#1254
Kasinhou wants to merge 16 commits into
dtq-devfrom
ufal/health-report-checks-fix

Conversation

@Kasinhou
Copy link
Copy Markdown

@Kasinhou Kasinhou commented Feb 23, 2026

Problem description

Followed issues:
https://github.com/dataquest-dev/dspace-customers/issues/430
#1250

Manual Testing (if applicable)

Copilot review

  • Requested review from Copilot

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Health Report and Report Diff CLI tooling in CLARIN-DSpace to improve comparison correctness when reports contain different check selections and to align CLI option handling/help output.

Changes:

  • Remove the legacy healthcheck CLI command from launcher.xml and rely on the Spring script-service based health-report.
  • Update report-diff to normalize report JSON to the intersection of check names, add a “Skipped Checks” section, and make key-field mappings independent of check ordering via selector-based paths.
  • Standardize help flags to -h/--help and update tests/fixtures accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
dspace/config/launcher.xml Removes legacy healthcheck command entry.
dspace-api/src/test/java/org/dspace/scripts/ReportDiffIT.java Updates help flag usage and adds coverage for skipped checks / name-based comparisons.
dspace-api/src/main/resources/report-diff-fields.json Switches from index-based paths to selector-by-check-name paths.
dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiffScriptConfiguration.java Changes -i to -h and clarifies -c semantics for filtering comparisons.
dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java Implements intersection normalization, skipped-check reporting, and selector-based field resolution.
dspace-api/src/main/java/org/dspace/app/healthreport/HealthReportScriptConfiguration.java Changes help flag to -h, supports multiple -c values, and renames output option to -r.
dspace-api/src/main/java/org/dspace/app/healthreport/HealthReport.java Implements multi-check -c parsing, help renaming, and output option rename.

Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java Outdated
Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java Outdated
Comment on lines 7 to 15
<class>org.dspace.storage.bitstore.BitStoreMigrate</class>
</step>
</command>
<command>
<name>healthcheck</name>
<description>Create health check report</description>
<step>
<class>org.dspace.health.Report</class>
</step>
</command>

<command>
<name>checker</name>
<description>Run the checksum checker</description>
<step>
<class>org.dspace.app.checker.ChecksumChecker</class>
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

launcher.xml no longer exposes the legacy healthcheck command, but the implementation class org.dspace.health.Report still exists in the codebase. If this removal is intentional, consider either removing/deprecating the legacy class as well, or adding a compatibility alias / release note so existing automation using dspace healthcheck doesn’t break unexpectedly.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milanmajchrak In dspace/health, there are still many classes from LINDAT (version 5). Which of them do we not want? Does solving this issue also remove the old unused classes (for example, the healthcheck problem that may call a process with the error mentioned by Milan Kuchtiak)?

Copy link
Copy Markdown
Collaborator

@kuchtiak-ufal kuchtiak-ufal May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also remove the org.dspace.health.Report class i it's not used anymore.
Why to keep it there?

Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java Outdated
Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java
@Paurikova2
Copy link
Copy Markdown

Paurikova2 commented Feb 25, 2026

Problem (from issue) Class + Method Changed Test Name
1328 -i/--info renamed to -h/--help in health-report HealthReport#setup, HealthReport#internalRun, HealthReport#printHelp HealthReportIT#testHelpOption
1328 -h/--help option declared for health-report CLI HealthReportScriptConfiguration#getOptions HealthReportIT#testHelpOption
1328 -o/--output renamed to -r/--report; field fileNamereportFile HealthReport#setup, HealthReport#internalRun HealthReportIT#testReportFileSaved
1328 -r/--report option declared in CLI options HealthReportScriptConfiguration#getOptions HealthReportIT#testReportFileSaved
1328 -c accepts multiple values (List<Integer> replaces single int); uses getOptionValues HealthReport#setup HealthReportIT#testMultipleChecks, HealthReportIT#testLicenseCheck
1328 -c declared as multi-value with setArgs(Integer.MAX_VALUE) HealthReportScriptConfiguration#getOptions HealthReportIT#testMultipleChecks
1328 Validate -c: out-of-range index → logError + throw ParseException HealthReport#setup HealthReportIT#testInvalidCheckOutOfRange
1328 Validate -c: non-integer value → logError + throw ParseException HealthReport#setup HealthReportIT#testInvalidCheckNonInteger
1328 Validate -f: value ≤ 0 → logError + throw ParseException HealthReport#setup HealthReportIT#testInvalidForDaysZero
1328 Validate -f: non-integer value → logError + throw ParseException HealthReport#setup HealthReportIT#testInvalidForDaysNonInteger
1328 printCommandlineOptions updated to use getOptionValues for multi-value -c HealthReport#printCommandlineOptions HealthReportIT#testMultipleChecks
1328 -i/--info renamed to -h/--help in report-diff ReportDiff#setup, ReportDiff#internalRun, ReportDiff#printHelp ReportDiffIT#testHelpInformation
1328 -h/--help option declared for report-diff CLI ReportDiffScriptConfiguration#getOptions ReportDiffIT#testHelpInformation
1328 Legacy healthcheck CLI command removed from launcher dspace/config/launcher.xml
1334 Reports with different check lists produced null values — normalize both to intersection of check names ReportDiff#normalizeReportsToIntersection ReportDiffIT#testSkippedChecksSection, ReportDiffIT#testIntersectionComparisonNoNullsForCommonChecks
1334 Checks absent from one report silently ignored — "Skipped Checks" section added ReportDiff#generateReportComparison ReportDiffIT#testSkippedChecksSection, ReportDiffIT#testIntersectionComparisonNoNullsForCommonChecks
1334 -c in report-diff filtered by check name, not position, after normalization ReportDiff#normalizeReportsToIntersection ReportDiffIT#testCompareSpecificCheck
1334 report-diff-fields.json paths used numeric indices (/checks/0/…) breaking resolution when check order changed — replaced with name selectors (/checks/[name=Item summary]/…) report-diff-fields.json ReportDiffIT#testEnhancedKeyChangesTable, ReportDiffIT#testProfessionalReportFormat, ReportDiffIT#testSizeDifferenceFormatting
1334 resolveFieldPath did not support name-selector syntax — added XPath-like [attr=value] segment resolution ReportDiff#resolveFieldPath, ReportDiff#splitPathSegments ReportDiffIT#testEnhancedKeyChangesTable, ReportDiffIT#testProfessionalReportFormat, ReportDiffIT#testSizeDifferenceFormatting
1334 Key Changes table showed null for fields when one report had fewer checks — skip fields missing in either normalized report ReportDiff#generateEnhancedKeyChangesTable ReportDiffIT#testIntersectionComparisonNoNullsForCommonChecks, ReportDiffIT#testEnhancedKeyChangesTable

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java
Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java Outdated
Comment thread dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java Outdated
Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (2)

dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java:385

  • displayReportDates() also assumes the newest report is at the end of findAll() (reverses index via allReports.size() - 1 - i). Since findAll() is unordered, the listing and max-entry truncation can be inconsistent. Prefer querying reports already ordered by lastModified/id DESC and applying the limit at the query level.
            List<ReportResult> allReports = reportResultService.findAll(context);
            // Determine how many reports to process, respecting maxEntries if it's within valid range
            long limitCount = (maxEntries > 0 && maxEntries < allReports.size()) ? maxEntries : allReports.size();
            Map<String, List<DateWithArgs>> reportDatesMap = new HashMap<>();
            for (long i = 0; i < limitCount; i++) {
                // the newest report is at the end of the list, so we reverse the index
                ReportResult report = allReports.get(allReports.size() - 1 - (int) i);
                String formattedDate = FORMATTER.format(report.getLastModified()

dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java:272

  • This test picks the "latest" ReportResult via findAll(context).get(size - 1), but ReportResultDAO inherits AbstractHibernateDAO.findAll() which has no ORDER BY. This can make the test flaky if the DB returns rows in a different order or other reports exist. Sort by lastModified/id explicitly (or add a dedicated query for the most recent report) before asserting on latest.getArgs().
                context.reloadEntity(eperson);
                List<ReportResult> allReports = reportResultService.findAll(context);
                ReportResult latest = allReports.get(allReports.size() - 1);

                assertThat(handler.getErrorMessages(), empty());
                assertThat(latest.getArgs(), containsString("-c: 2"));
                assertThat(latest.getArgs(), containsString("-c: 3"));

Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java
Comment thread dspace-api/src/main/java/org/dspace/app/reportdiff/ReportDiff.java Outdated
Comment thread dspace-api/src/test/java/org/dspace/scripts/HealthReportIT.java Outdated
@Kasinhou
Copy link
Copy Markdown
Author

Health Report & Report Diff — User Guide

Two CLI/REST scripts that help administrators monitor the state of a CLARIN-DSpace

repository over time:

  • health-report — runs a set of health checks against the database and

    produces a single textual report. Each run is also stored in the database so it

    can be diffed later.

  • report-diff — compares two stored health reports and shows what changed

    (e.g. new orphan bitstreams, new items without bundles, etc.).

Both scripts can be invoked from the command line (dspace/dspace.bat) or via

the DSpace REST API (/api/system/scripts/{name}/processes). The CLI examples

below are written for the bash launcher; on Windows replace dspace with

dspace.bat.


1. health-report

What it does

Runs every health check registered under

org.dspace.app.healthreport.HealthReport and writes a human-readable summary.

The report is always persisted to the report_result table so it can be

referenced by report-diff later.

Common options

| Short | Long | Description |

|-------|-------------|-------------|

| -h | --help | Show help. |

| -c | --check | Run only the listed check(s) by index. Repeatable, e.g. -c 0 -c 2. |

| -e | --email | Send the finished report by email to the given address. |

| -r | --report | Write the report to a file at the given path (in addition to DB storage). |

| -v | --verbose | Include extra detail (matched objects, IDs, etc.) in the report. |

The full list of available check indices is shown in -h output and depends

on HealthReport.getNumberOfChecks().

Examples

Run a full report and store it in the DB:

dspace health-report

Run only checks 0 and 3 and also write the result to a file:

dspace health-report -c 0 -c 3 -r /tmp/today.txt

Run a full report and mail it:

dspace health-report -e admin@example.org

REST equivalent

POST /api/system/scripts/health-report/processes

Content-Type: multipart/form-data



properties=[{"name":"-c","value":"0"},{"name":"-c","value":"3"},{"name":"-v","value":""}]

Use the returned processId to poll status at

/api/system/processes/{id} and read the output at

/api/system/processes/{id}/output.


2. report-diff

What it does

Compares two health reports already stored in the database and prints the

differences — checks that changed, new findings, findings that disappeared.

Useful for "what got worse / better since last week?"

Common options

| Short | Long | Description |

|-------|------------|-------------|

| -h | --help | Show help. |

| -l | --list | List available reports (ID, timestamp, args). Use this first to find report IDs. |

| -m | --max | Limit the number of entries shown by -l. |

| -s | --source | Source report ID (the "old" / baseline report). |

| -t | --target | Target report ID (the "new" report to compare against). |

| -c | --check | Limit the comparison to one or more check indices. Repeatable, e.g. -c 1 -c 4. |

| -e | --email | Email the diff result to the given address. |

Defaults & safety

  • If you specify neither -s nor -t, the script automatically compares

    the two most recent reports in the database.

  • If you specify only one of -s/-t, the other is ignored with a warning

    and the script falls back to the two most recent reports.

  • If the database does not contain at least two reports, the script logs an

    informative message and exits without error.

Typical workflow

  1. Find the report IDs you want to compare:

    dspace report-diff -l
    
    dspace report-diff -l -m 10        # only the 10 newest entries per type
    

    Output looks like:

    
    Report Type: health-report
    
      - ID: 42 | 2026-05-19 08:00:01 | (no args)
    
      - ID: 41 | 2026-05-18 08:00:02 | -c: 0, -c: 3
    
      ...
    
    
  2. Diff the two reports you care about:

    dspace report-diff -s 41 -t 42
    
  3. (Optional) Restrict the diff to specific checks only:

    dspace report-diff -s 41 -t 42 -c 0 -c 3
    
  4. (Optional) Mail the diff:

    dspace report-diff -s 41 -t 42 -e admin@example.org
    

REST equivalent

Start a diff of the last two reports:

POST /api/system/scripts/report-diff/processes

Content-Type: multipart/form-data



properties=[]

Diff specific reports, only certain checks:

POST /api/system/scripts/report-diff/processes

Content-Type: multipart/form-data



properties=[

  {"name":"-s","value":"41"},

  {"name":"-t","value":"42"},

  {"name":"-c","value":"0"},

  {"name":"-c","value":"3"}

]

Just list available reports (no diff performed):

POST /api/system/scripts/report-diff/processes

Content-Type: multipart/form-data



properties=[{"name":"-l","value":""}]

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comment on lines +267 to +270
context.reloadEntity(eperson);
List<ReportResult> allReports = reportResultService.findAll(context);
ReportResult latest = allReports.get(allReports.size() - 1);

Comment on lines 383 to +391
private void displayReportDates() {
try (Context context = new Context()) {
context.setCurrentUser(ePersonService.find(context, getEpersonIdentifier()));
List<ReportResult> allReports = reportResultService.findAll(context);
// Determine how many reports to process, respecting the `limit` if it's within valid range
long limitCount = (limit > 0 && limit < allReports.size()) ? limit : allReports.size();
// findAll() does not guarantee ordering; sort by lastModified ascending so the
// newest reports are at the end of the list.
allReports.sort(Comparator.comparing(ReportResult::getLastModified));
// Determine how many reports to process, respecting maxEntries if it's within valid range
long limitCount = (maxEntries > 0 && maxEntries < allReports.size()) ? maxEntries : allReports.size();
" -o, --output Specify a file to save the report\n\n" +
"If you want to execute only one check using -c, use check index:\n" + checksNamesToString() + "\n"
" -c, --check Perform specific check(s) by index (0-" + (getNumberOfChecks() - 1) +
"). Accepts multiple space-separated values, e.g. -c 0 3 4\n" +
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also mention here, that the default value is: All checks

"If you want to execute only one check using -c, use check index:\n" + checksNamesToString() + "\n"
" -c, --check Perform specific check(s) by index (0-" + (getNumberOfChecks() - 1) +
"). Accepts multiple space-separated values, e.g. -c 0 3 4\n" +
" -f, --for Specify the last N days to consider (positive integer)\n" +
Copy link
Copy Markdown
Collaborator

@kuchtiak-ufal kuchtiak-ufal May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention here what is the default, e.g.

The default value is: 7

@kuchtiak-ufal
Copy link
Copy Markdown
Collaborator

Would be nice to see the report number in the report result/email ?

Otherwise, user needs to run
./dspace report-diff -l
to see the report numbers

Soething like:

HEALTH REPORT 329:

######################

General Information:
Generated: 2026-05-25 11:20:33
From - Till: 2026-05-18 - 2026-05-25
Url: http://localhost:4000/

Assetstore size     : 3 GB
Log dir size        : 78 MB

options.getOption("c").setType(String.class);
Option checkOption = Option.builder("c").longOpt("check").hasArgs()
.desc(String.format("Filter comparison to one or more specific checks by index (0 to %d). " +
"Repeat the flag (e.g. -c 1 -c 3) to compare multiple checks from both reports.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good if -c option in health-report would have the same format as -c option in report-diff.

@kuchtiak-ufal
Copy link
Copy Markdown
Collaborator

Regarding incompatible reports, comparing the report-diff, I got the following results (example):

dspace@074ee81d33a2:/dspace/bin$ ./dspace report-diff -s 19 -t 24

The script has started
No common checks found between the two reports for comparison.
CLARIN DSpace: Repository Health Report Diff

Section 1: Executive Summary

Report Type: healthcheck
Source Report: ID 19 at 2026-05-25 10:41:27.655
Target Report: ID 24 at 2026-05-25 11:51:03.66
Report Period: 1 hour 9 minutes 36 seconds

Key Changes Between Reports

No significant changes detected between reports.

Section 2: Skipped Checks

The following checks could not be compared because they were not present in both reports.

Only in source report (ID 19):
  - Item summary

Only in target report (ID 24):
  - General Information

Section 3: Detailed Change Log

Changes Summary
- Total operations: 0
- Fields modified: 0

No differences found.
The script has completed

I think, all the information started from Section 3: is not needed to show.

The report-diff, for incompatible reports, may be shorter:

The script has started
No common checks found between the two reports for comparison.
CLARIN DSpace: Repository Health Report Diff

Section 1: Executive Summary

Report Type: healthcheck
Source Report: ID 19 at 2026-05-25 10:41:27.655
Target Report: ID 24 at 2026-05-25 11:51:03.66
Report Period: 1 hour 9 minutes 36 seconds

Section 2: Skipped Checks

The following checks could not be compared because they were not present in both reports.

Only in source report (ID 19):
  - Item summary

Only in target report (ID 24):
  - General Information

The script has completed

private void defaultDate(Context context) {
if (Objects.nonNull(from) && Objects.nonNull(to)) {
private void defaultReportIds(Context context) {
if (Objects.nonNull(sourceReportId) && Objects.nonNull(targetReportId)) {
Copy link
Copy Markdown
Collaborator

@kuchtiak-ufal kuchtiak-ufal May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this check to internalRun() method (prior to this method call)

return;
}
handler.logInfo("No dates specified, using the last two dates from the database.");
handler.logInfo("No report IDs specified, using the last two reports from the database.");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see redundant messages here (one logWarning and one logInfo):

dspace@074ee81d33a2:/dspace/bin$ ./dspace report-diff -s 19

The script has started

Only one of '-s'/'-t' was specified. The last two reports from the database will be compared instead.
No report IDs specified, using the last two reports from the database.
CLARIN DSpace: Repository Health Report Diff
...

defaultDate(context);
// If only one of -s/-t is specified, warn the user and fall back to the
// last two reports for comparison.
if ((sourceReportId == null) ^ (targetReportId == null)) {
Copy link
Copy Markdown
Collaborator

@kuchtiak-ufal kuchtiak-ufal May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of sourceReportId is specified and targetReportId not, I'd compare the sourceReportId with the last one.


if (fromReport == null || toReport == null) {
handler.logInfo("No reports found for specified dates.");
handler.logInfo("No reports found for specified report IDs.");
Copy link
Copy Markdown
Collaborator

@kuchtiak-ufal kuchtiak-ufal May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not entirely correct message.
One id may be correct, other not.

I'd use:

if (fromReport == null) {
    handler.logInfo("No report found for report ID:" + sourceReportId);
    return;
);
if (toReport == null) {
    handler.logInfo("No report found for report ID:" + targetReportId);
    return;
);

* @throws IOException if JSON parsing fails
*/
private String generateEnhancedKeyChangesTable(String oldJson, String newJson,
Date fromDate, Date toDate) throws IOException {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fromDate, toDate were replaced with sourceReportId, targetReportId.

Please, update the JavaDoc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UFAL/Health report bugs and improvements

4 participants