Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
cc95d8a
Initial commit
May 5, 2026
f415779
feat: initialize CAP bookshop sample application
samyuktaprabhu May 6, 2026
7b3a000
Merge pull request #1 from I560666/sam-add-sample-app
May 6, 2026
8282df1
add plugin scaffold
samyuktaprabhu May 6, 2026
50a118e
update plugin scaffold
samyuktaprabhu May 7, 2026
b100138
Merge pull request #2 from I560666/sam-setup-plugin-initial
May 12, 2026
bb0a871
enable attachments
samyuktaprabhu May 13, 2026
f4173e0
enable attachments
samyuktaprabhu May 13, 2026
5e82b23
setup orchestrator
samyuktaprabhu May 20, 2026
f25bd3b
rename handler and update to junit 5
samyuktaprabhu May 20, 2026
a7ea29d
initial Extraction Orchestrator setup
samyuktaprabhu May 21, 2026
a37e5fd
async process :)
samyuktaprabhu May 21, 2026
055a659
add tests
samyuktaprabhu May 21, 2026
e8e1ee9
bounded thread pool
samyuktaprabhu May 21, 2026
84034c6
review fixes
samyuktaprabhu May 27, 2026
024c9e3
Merge pull request #3 from I560666/sam-embed-attachments-plugin
May 27, 2026
b8bb7a3
Merge remote-tracking branch 'origin/main' into sam-2753-orchestrator…
samyuktaprabhu May 27, 2026
8a3e5f3
revert unintended change
samyuktaprabhu May 27, 2026
7dd4e78
review fixes
samyuktaprabhu May 27, 2026
5ee9f67
review fixes
samyuktaprabhu May 27, 2026
8319198
make the working directory path absolute
samyuktaprabhu May 28, 2026
6548596
Merge pull request #4 from I560666/sam-2753-orchestrator-setup
May 29, 2026
d54be4a
enable attachments
samyuktaprabhu May 13, 2026
aaba381
attachment content retrieval
samyuktaprabhu May 22, 2026
41c2215
clean up
samyuktaprabhu May 29, 2026
8ddf50c
tiny improvements / comments
samyuktaprabhu May 29, 2026
8d184b9
Merge pull request #5 from I560666/sam-2758-attachment-retrieval
Jun 1, 2026
af28906
chore(hyperspace): 🤖 Add PR Bot Configuration
hyperspace-insights[bot] Jun 1, 2026
633786c
Create main.yml
samyuktaprabhu Jun 1, 2026
5b45f7b
Merge pull request #3 from cap-java/sam-ci-workflow-2
samyuktaprabhu Jun 2, 2026
7e51fb3
Merge pull request #2 from cap-java/pr-bot/configuration
samyuktaprabhu Jun 2, 2026
bf05af0
initial commit
samyuktaprabhu Jun 2, 2026
b3bd2b0
credential handling using service binding and running using local hyb…
samyuktaprabhu Jun 2, 2026
b82ba7e
update pom.xml
samyuktaprabhu Jun 2, 2026
9893929
tests
samyuktaprabhu Jun 2, 2026
f8d1292
bot review fixes
samyuktaprabhu Jun 2, 2026
f2e8779
apply spotless
samyuktaprabhu Jun 2, 2026
8092bfe
skip extraction early when Document AI service is unavailable
samyuktaprabhu Jun 2, 2026
e9633e6
Merge pull request #5 from cap-java/sam-2759-credential-handling-2
samyuktaprabhu Jun 5, 2026
884ac94
custom flows :)
samyuktaprabhu Jun 5, 2026
387d22c
remove sonar code
samyuktaprabhu Jun 5, 2026
522bb6a
remove threads
samyuktaprabhu Jun 5, 2026
76eaec3
bot review fixes
samyuktaprabhu Jun 5, 2026
b0e8b95
bot review fixes
samyuktaprabhu Jun 5, 2026
cfeddbb
review fixes - I
samyuktaprabhu Jun 10, 2026
00c7e03
review fixes - II
samyuktaprabhu Jun 10, 2026
ac70dac
review fixes - III
samyuktaprabhu Jun 10, 2026
c1fa3e2
initial commit
samyuktaprabhu Jun 10, 2026
7adc9f5
Merge pull request #7 from cap-java/sam-2772-flows
rjayasinghe Jun 10, 2026
81c816f
fix: review fixes
samyuktaprabhu Jun 18, 2026
b27bfc5
refactor: use outbox
samyuktaprabhu Jun 22, 2026
73950e3
feat: add standalone document upload entry point via Jobs media entity
samyuktaprabhu Jun 11, 2026
c9b4bf1
refactor: remove attachments plugin entry point
samyuktaprabhu Jun 23, 2026
20cdcd8
refactor: rename registration files / classes
samyuktaprabhu Jun 23, 2026
af74107
feat: implement programmatic way of using the plugin
samyuktaprabhu Jun 24, 2026
7a9f33f
fix: review fixes
samyuktaprabhu Jun 24, 2026
7f77947
Merge pull request #8 from cap-java/sam-2762-document-submission
rjayasinghe Jun 25, 2026
c7c1d81
Merge pull request #9 from cap-java/doc-ai-2801-feature/multiple-entr…
rjayasinghe Jun 25, 2026
b9eba42
feat: outbox + scheduler + results
samyuktaprabhu Jun 25, 2026
eef72dc
fix: review fix
samyuktaprabhu Jun 29, 2026
feb5c35
fix: fix: use wildcard ServiceName for DocumentExtractionResult handl…
samyuktaprabhu Jun 29, 2026
bb6404a
chore: update the cds version to LTS
samyuktaprabhu Jun 29, 2026
c59bec2
Merge pull request #11 from cap-java/doc-ai-2773-feature/outbox-sched…
samyuktaprabhu Jun 29, 2026
6e51cb5
docs: add javadcos to classes and methods
samyuktaprabhu Jun 29, 2026
9dd3cbd
fix: review fix
samyuktaprabhu Jun 29, 2026
5b9bc67
docs: write documentation about the plugin
samyuktaprabhu Jun 29, 2026
2955ec3
feat: make polling intervals dynamic
samyuktaprabhu Jun 29, 2026
a4dbff4
Merge pull request #12 from cap-java/doc-ai-2752-docs/javadcos
samyuktaprabhu Jun 30, 2026
d9214af
refactor: rename all packages to com.sap.cds.feature.documentai.* + e…
samyuktaprabhu Jun 30, 2026
150b3aa
docs: remove integration tests from documentation
samyuktaprabhu Jun 30, 2026
9d7e77e
Merge pull request #13 from cap-java/doc-ai-2775-docs/document-ai-plu…
samyuktaprabhu Jun 30, 2026
0f9890f
docs: remove integration tests from documentation
samyuktaprabhu Jun 30, 2026
5ca94da
chore: apply spotless
samyuktaprabhu Jun 30, 2026
f649e0c
Merge pull request #14 from cap-java/doc-ai-2752-refactor/tiny-enhanc…
samyuktaprabhu Jun 30, 2026
eab5d39
tests: add integration tests
samyuktaprabhu Jun 30, 2026
475ca01
fix: review fixes
samyuktaprabhu Jul 1, 2026
7bb95f5
Merge pull request #15 from cap-java/doc-ai-2840-tests/integration-tests
samyuktaprabhu Jul 1, 2026
1c13566
Add 'cds-feature-sap-document-ai/' from commit '7bb95f5cdc5b32cc7e787…
Schmarvinius Jul 3, 2026
6da248d
refactor(document-ai): reorganize imported tree into cds-ai layout
Schmarvinius Jul 3, 2026
53b6524
chore(document-ai): remove duplicated and obsolete imported files
Schmarvinius Jul 3, 2026
e06f346
refactor(document-ai): adapt plugin pom to cds-ai conventions
Schmarvinius Jul 3, 2026
28ac763
refactor(document-ai): normalize license headers and plugin name refe…
Schmarvinius Jul 3, 2026
9547d98
chore(document-ai): wire plugin, integration tests and sample into re…
Schmarvinius Jul 3, 2026
c2ed8e2
fix(document-ai): resolve missing dependency version and CI-friendly …
Schmarvinius Jul 3, 2026
bbfa500
style: apply spotless across the reactor
Schmarvinius Jul 3, 2026
b4913cd
ci: preinstall cds-feature-sap-document-ai for integration tests
Schmarvinius Jul 3, 2026
a31d322
fix(mtx-local): pin @sap/cds ^9 at workspace root to prevent duplicat…
Schmarvinius Jul 3, 2026
e270483
adapt readme
Schmarvinius Jul 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/integration-tests/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ runs:
maven-version: ${{ inputs.maven-version }}

- name: Build dependencies for integration tests
run: mvn clean install -ntp -B -pl cds-feature-ai-core,cds-feature-recommendations,cds-starter-ai -am -DskipTests
run: mvn clean install -ntp -B -pl cds-feature-ai-core,cds-feature-recommendations,cds-feature-sap-document-ai,cds-starter-ai -am -DskipTests
shell: bash

- name: Integration Tests (spring)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,12 @@ public void onInferenceClient(InferenceClientContext context) {
"Inference client is not available without an AI Core service binding");
}

/** Resolves (or creates) the resource group name for the given tenant using the configured prefix. */
/**
* Resolves (or creates) the resource group name for the given tenant using the configured prefix.
*/
public String resolveResourceGroup(String tenantId) {
return tenantResourceGroupCache.computeIfAbsent(tenantId, id -> config.resourceGroupPrefix() + id);
return tenantResourceGroupCache.computeIfAbsent(
tenantId, id -> config.resourceGroupPrefix() + id);
}

/** Returns the mock tenant cache for test inspection. */
Expand Down
409 changes: 409 additions & 0 deletions cds-feature-sap-document-ai/README.md

Large diffs are not rendered by default.

225 changes: 225 additions & 0 deletions cds-feature-sap-document-ai/docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Implementation Details

## Table of Contents

- [Links](#links)
- [Folder Structure](#folder-structure)
- [Feature](#feature)
- [CDS Model](#cds-model)
- [Configuration](#configuration)
- [Handlers](#handlers)
- [Services](#services)
- [Outbox and Polling](#outbox-and-polling)
- [Exceptions](#exceptions)
- [Extraction Lifecycle](#extraction-lifecycle)
- [Status State Machine](#status-state-machine)
- [Tests](#tests)
- [Unit Tests](#unit-tests)
- [Quality Tools](#quality-tools)

---

## Links

- [CAP Java Plugin Concept](https://cap.cloud.sap/docs/java/building-plugins#building-plugins)
- [CAP Java Outbox Documentation](https://cap.cloud.sap/docs/java/outbox#outboxing-cap-service-events)
- [SAP Document AI Documentation](https://help.sap.com/docs/document-ai?locale=en-US)
- [Enabling DIE Service on SAP BTP Cloud Foundry](https://help.sap.com/docs/document-ai/sap-document-ai/enabling-service-in-cloud-foundry-environment?locale=en-US)
- [CAP Java Getting Started](https://cap.cloud.sap/docs/java/getting-started)

---

## Folder Structure

| Folder | Description |
|---|---|
| `sap-document-ai` | Core implementation of the Document AI plugin |
| `sap-document-ai/src/main/java` | Java source files for handlers, services, configuration, and model classes |
| `sap-document-ai/src/main/resources/cds` | CDS model files shipped with the plugin |
| `sap-document-ai/src/main/resources/META-INF/services` | Java `ServiceLoader` registration for `CdsRuntimeConfiguration` |
| `sap-document-ai/src/test/java` | Unit tests |
| `bookshop` | Sample CAP Java application demonstrating plugin integration |
| `bookshop/srv` | Spring Boot application module for the sample |
| `bookshop/db` | CDS data model for the sample |
| `bookshop/app` | Fiori UI applications for the sample |
| `docs` | Design and architecture documentation |

---

## Feature

The plugin is implemented in the `sap-document-ai` module. The following Java packages make up the implementation:

| Package | Description |
|---|---|
| `com.sap.cds.feature.documentai.configuration` | Bootstraps all plugin components and registers them with the CDS runtime at startup |
| `com.sap.cds.feature.documentai.handlers` | CDS event handlers for document submission and outbox-driven polling |
| `com.sap.cds.feature.documentai.service` | Core extraction service, processing service, status enum, and transition validator |
| `com.sap.cds.feature.documentai.service.client` | HTTP client abstraction for the DIE REST API |
| `com.sap.cds.feature.documentai.service.model` | Immutable value objects used as internal data transfer types |
| `com.sap.cds.feature.documentai.service.exceptions` | Typed exceptions for error classification |
| `com.sap.cds.feature.documentai.service.utils` | Utility classes |

### CDS Model

The CDS model is defined in:

```
sap-document-ai/src/main/resources/cds/com.sap.cds/sap-document-ai/
```

Per the [CAP Java plugin concept](https://cap.cloud.sap/docs/java/building-plugins#building-plugins), this path makes the model available to consuming applications via the `cds-maven-plugin` `resolve` goal.

The model contains the following files:

| File | Description |
|---|---|
| `document-ai-service.cds` | Defines `DocumentAiService` with the `DocumentExtraction` (inbound) and `DocumentExtractionResult` (outbound) events |
| `extraction-job.cds` | Defines the internal `ExtractionJob` entity used to persist job state across the extraction lifecycle |
| `index.cds` | Entry point that imports both files; resolved by the CAP plugin mechanism |

The `ExtractionJob` entity uses `cuid` (auto-generated UUID primary key) and `managed` (auto-populated audit fields). It tracks the job `status`, `tenantId`, the DIE-assigned `documentAiJobId`, and the raw `extractionResult`. The table is deployed automatically as part of the consuming application's CDS schema deployment — no manual DDL is required.

### Configuration

`DocumentAiServiceConfiguration` implements `CdsRuntimeConfiguration` and is the plugin's sole entry point into the CDS runtime. It is discovered automatically via the Java `ServiceLoader` mechanism.

At startup it:
- Registers `ExtractionServiceImpl` as a named CDS service in the service catalog.
- Resolves the DIE service binding from the environment by the label `sap-document-information-extraction`.
- Constructs an OAuth2-authenticated HTTP destination via the SAP Cloud SDK if a binding is found.
- Wires all resolved dependencies into `ExtractionServiceImpl`.
- Registers `DocumentSubmissionHandler` unconditionally.
- Registers `ExtractionPollingHandler` only when a valid DIE client was successfully built.

If no binding is found or the destination cannot be initialised, the plugin starts in degraded mode — events are accepted and jobs are queued as `PENDING`, but no extraction processing occurs.

### Handlers

| Handler | Description |
|---|---|
| `DocumentSubmissionHandler` | Listens for `DocumentExtraction` events on any `ApplicationService`. Service-name-agnostic by design — consumers emit events from their own service without coupling to the plugin's internal service name. Delegates to `ExtractionService` and completes the event context. |
| `ExtractionPollingHandler` | Registered against the persistent unordered outbox. Polls the DIE service for all active jobs on each invocation. Self-reschedules after the configured interval if jobs remain active. Stops automatically when all jobs reach a terminal status. |

### Services

| Service / Class | Description |
|---|---|
| `ExtractionService` | CAP service interface registered in the service catalog. Exposes `triggerExtraction()` for new submissions and `updateExtractionResult()` for poll-driven status updates. |
| `ExtractionServiceImpl` | Central orchestrator. Creates and persists extraction jobs, coordinates submission via the processing service, schedules polling via the outbox, and enforces the status state machine on every update using optimistic locking. |
| `DocumentAiProcessingService` | Abstraction over the HTTP client. Provides an `isAvailable()` check that allows `ExtractionServiceImpl` to degrade gracefully when no DIE binding is present. |
| `DefaultDocumentAiClient` | Concrete HTTP client. Submits documents to DIE via a multipart `POST` and polls job status via `GET`. All DIE communication is authenticated via SAP Cloud SDK OAuth2 destinations. |
| `StatusTransitionValidator` | Stateless utility that enforces the permitted status transitions. Called before every status update to prevent invalid state machine transitions. |

### Outbox and Polling

The plugin uses the CDS **persistent unordered outbox** for all polling scheduling. This design choice means:

- Polling is entirely **event-driven** — it runs only when there are active jobs.
- No background thread or fixed scheduler is active when the system is idle.
- Resilience across restarts is guaranteed — if the application restarts mid-poll, the outbox re-delivers the pending event automatically.
- Polling stops automatically when all jobs reach a terminal status (`DONE` or `FAILED`) and resumes when the next document is submitted.

The poll interval defaults to 3 seconds and is configurable via `cds.document-ai.polling.interval-seconds` in `application.yaml`.

### Exceptions

Errors from DIE interactions are classified into three typed exceptions nested under `DocumentAiException`:

| Exception | Condition |
|---|---|
| `DocumentAiException.Connectivity` | Network-level failure reaching DIE (timeout, DNS, etc.) |
| `DocumentAiException.Request` | Non-2xx HTTP response from DIE; carries the status code and response body |
| `DocumentAiException.Processing` | Malformed or missing fields in the DIE response |

Two additional exceptions govern internal state management:

| Exception | Condition |
|---|---|
| `ConcurrentJobUpdateException` | Raised when an optimistic lock update detects that a concurrent writer has already advanced the job |
| `IllegalStatusTransitionException` | Raised when a requested status transition is not permitted by the state machine |

---

## Extraction Lifecycle

```
Application
└─ emit DocumentExtraction(fileName, mimeType, content, options)
DocumentSubmissionHandler
└─ ExtractionService.triggerExtraction()
├─ Persist ExtractionJob (status=PENDING)
├─ DIE unavailable ──► return PENDING result
└─ DIE available
└─ POST multipart document to DIE
└─ receive dieJobId
└─ update job → SUBMITTED
└─ submit poll task to outbox
▼ (after configured interval, via outbox)
ExtractionPollingHandler
└─ GET DIE job status for each SUBMITTED / RUNNING job
├─ RUNNING → update job → RUNNING, reschedule
├─ DONE → update job → DONE
│ emit DocumentExtractionResult
│ └─ consumer @On handler invoked
└─ FAILED → update job → FAILED (terminal)
```

---

## Status State Machine

```
PENDING ──► SUBMITTED ──► RUNNING ──► DONE
│ │ │
└────────►────┴────────►───┴──────► FAILED
```

| Transition | Trigger |
|---|---|
| `PENDING → SUBMITTED` | Document successfully submitted to DIE |
| `PENDING → FAILED` | Unrecoverable error during submission |
| `SUBMITTED → RUNNING` | DIE reports that the job is in progress |
| `SUBMITTED → DONE` | DIE reports completion without an intermediate RUNNING status |
| `SUBMITTED → FAILED` | DIE reports a processing failure |
| `RUNNING → DONE` | DIE processing completed successfully |
| `RUNNING → FAILED` | DIE reports a processing failure |

`DONE` and `FAILED` are terminal states. No further transitions are permitted from either status.

---

## Tests

### Unit Tests

Unit tests are located in `sap-document-ai/src/test/java`. Each production class has a corresponding test class. The following test classes are implemented:

| Test Class | What is tested |
|---|---|
| `DocumentSubmissionHandlerTest` | Event handler delegation, PENDING and FAILED logging |
| `ExtractionServiceImplTest` | Job creation, submission flow, concurrent update handling, failure marking, outbox scheduling |
| `ExtractionPollingHandlerTest` | Poll cycle logic, DIE status mapping, result emission, self-rescheduling, per-job error isolation |
| `DefaultDocumentAiClientTest` | HTTP submit and poll calls, response parsing, error wrapping for all three exception types |
| `DocumentAiServiceConfigurationTest` | Startup wiring, binding resolution, conditional handler registration |
| `StatusTransitionValidatorTest` | All valid and invalid transitions |
| `ExceptionsTest` | Exception message and cause propagation |

Tests use Mockito for dependencies and AssertJ for assertions. The `jacoco-maven-plugin` enforces a minimum instruction coverage of **85%** across the plugin bundle (generated code excluded).

---

## Quality Tools

| Tool | Definition | Description |
|---|---|---|
| Spotless | `sap-document-ai/pom.xml` | Enforces Google Java Format and SAP license headers on all source files |
| PMD / CPD | `sap-document-ai/pom.xml` | Static analysis and copy-paste detection; SAP Cloud SDK ruleset applied; generated code excluded |
| JaCoCo | `sap-document-ai/pom.xml` | Enforces 85% minimum instruction coverage; generated code excluded |
| Maven Compiler | `sap-document-ai/pom.xml` | Enforces Java 17 (`--release 17`) |
Loading
Loading