Alternative approaches to validating rules not expressible in static json schema

I want to explore the idea presented here https://github.com/MobilityData/gbfs-validator/issues/153 as an alternative to patching existing schemas while keeping the benefit of schema validation for these rules. We also consider the alternative of a programmatic (non-interoperable) approach.

A detailed analysis written with help from Claude:

## Summary

The GBFS Validator currently implements custom validation rules (that cannot be expressed in static JSON schemas) by dynamically patching schemas at runtime. This approach has significant problems:

1. **High implementation complexity** - Rules require deep knowledge of JSON Schema structure and JsonPath API
2. **Risk of rule conflicts** - No enforcement mechanism prevents rules from interfering with each other
3. **Confusing validation reports** - Errors reference constraints that don't exist in static schemas
4. **Implementation-specific logic** - Other GBFS validators must reimplement the entire patching system

**Two viable alternatives have been identified:**

### Option A: Programmatic Validation
Run custom validation checks after schema validation, generating errors directly in code.
- **Easiest to maintain** - Clear validation logic
- ❌ No interoperability (each validator reimplements)

### Option B: Schema Templates with Placeholders
Replace placeholders in schema templates with actual values at runtime.
- **Interoperable** - All validators use same templates from upstream GBFS spec
- **Transparent** - Templates visible in schema files
- **Standards-based** - Single source of truth
- ❌ Requires upstream coordination
- ❌ Only valuable for multi-implementation ecosystem

## Current Implementation Analysis

### How Schema Patching Works Today

The validation flow:

1. Load static JSON schemas from `src/main/resources/schema/v{version}/{feedName}.json`
2. For each feed being validated, retrieve applicable custom rules
3. Apply rules sequentially, each receiving:
   - A JsonPath `DocumentContext` wrapping the schema JSON
   - A map of all loaded GBFS feeds
4. Rules extract data from feeds (e.g., valid pricing plan IDs) and inject into schemas:
   - Add enum constraints for reference validation
   - Add required field constraints based on feed presence
   - Build if/then/else conditional schemas
5. Convert patched JSONObject to Everit Schema and validate

**Key Files**:
- `CustomRuleSchemaPatcher.java:31-42` - Interface all rules implement
- `AbstractVersion.java:118-162` - Orchestrates rule application via stream reduce
- `FileValidator.java:59-84` - Entry point for validation

### Current Custom Rules (8 total)

All rules in `gbfs-validator-java/src/main/java/org/entur/gbfs/validation/validator/rules/`:

**Reference Validation Rules** (enum constraints):
1. `NoInvalidReferenceToPricingPlansInVehicleStatus` - pricing_plan_id must exist in system_pricing_plans
2. `NoInvalidReferenceToPricingPlansInVehicleTypes` - pricing plan IDs in vehicle_types must be valid
3. `NoInvalidReferenceToRegionInStationInformation` - region_id must exist in system_regions
4. `NoInvalidReferenceToVehicleTypesInStationStatus` - vehicle_type_id must exist in vehicle_types

**Conditional Required Field Rules**:
5. `NoMissingVehicleTypesAvailableWhenVehicleTypesExists` - vehicle_types_available required when vehicle_types feed exists
6. `NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist` - vehicle_type_id required and valid when vehicle_types exists
7. `NoMissingStoreUriInSystemInformation` - rental_apps required when rental_uris exist in stations/vehicles
8. `NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles` - current_range_meters required for motorized vehicles (if/then/else schema)

### Implementation Complexity Examples

#### Simple Rule: Reference Validation (24 logic lines)

`NoInvalidReferenceToRegionInStationInformation.java:41-58`:

```java
@Override
public DocumentContext addRule(
  DocumentContext rawSchemaDocumentContext,
  Map<String, JSONObject> feeds
) {
  // Extract valid region IDs from system_regions feed
  JSONObject systemRegionsFeed = feeds.get("system_regions");
  JSONArray regionIds = systemRegionsFeed != null
    ? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
    : new JSONArray();

  // Navigate to region_id property in schema (6 levels deep)
  JSONObject regionIdSchema = rawSchemaDocumentContext.read(
    "$.properties.data.properties.stations.items.properties.region_id"
  );

  // Add enum constraint
  regionIdSchema.put("enum", regionIds);

  // Write back to schema
  return rawSchemaDocumentContext.set(path, regionIdSchema);
}
```

**Complexity factors**:
- Must construct correct JsonPath expression (error-prone strings)
- Requires understanding JSON Schema structure (where to inject constraint)
- Mix of read/modify/write operations on JSONObjects
- Defensive null handling for missing feeds
- No compile-time safety for schema paths

#### Complex Rule: Multi-Feed Conditional (82 logic lines)

`NoMissingStoreUriInSystemInformation.java:47-129`:

This rule checks **two different feeds** (vehicle_status AND station_information) to determine if rental_apps should be required in system_information:

```java
// Check vehicle_status for rental URIs
JSONObject vehicleStatusFeed = feeds.get(vehicleStatusFileName);
String vehiclesKey = vehicleStatusFileName.equals("vehicle_status")
  ? "vehicles" : "bikes";  // Backward compatibility

if (!(JSONArray) JsonPath.parse(vehicleStatusFeed)
    .read("$.data." + vehiclesKey + "[:1].rental_uris.ios")).isEmpty()) {
  hasIosRentalUris = true;
}

// Check station_information for rental URIs
JSONObject stationInformationFeed = feeds.get("station_information");
if (!(JSONArray) JsonPath.parse(stationInformationFeed)
    .read("$.data.stations[:1].rental_uris.ios")).isEmpty()) {
  hasIosRentalUris = true;
}

// Conditionally modify system_information schema
if (hasIosRentalUris || hasAndroidRentalUris) {
  JSONArray dataRequired = rawSchemaDocumentContext.read("$.properties.data.required");
  dataRequired.put("rental_apps");

  JSONObject rentalAppsSchema = rawSchemaDocumentContext.read(
    "$.properties.data.properties.rental_apps"
  );
  JSONArray rentalAppRequired = new JSONArray();
  if (hasIosRentalUris) rentalAppRequired.put("ios");
  if (hasAndroidRentalUris) rentalAppRequired.put("android");
  rentalAppsSchema.put("required", rentalAppRequired);
}
```

**Additional complexity**:
- Dynamic JsonPath construction based on feed type
- Checks multiple feeds with different structures
- Accumulates boolean flags across feeds
- Modifies multiple schema locations
- Handles backward compatibility with legacy feeds

#### Most Complex: Conditional Schema with Filters (64 logic lines)

`NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles.java:56-119`:

Uses JsonPath filters to extract only motorized vehicle types, then builds an if/then/else schema:

```java
// Filter for motorized vehicles
private static final Filter motorizedVehicleTypesFilter = Filter.filter(
  where("propulsion_type").in(List.of("electric_assist", "electric", "combustion"))
);

// Extract motorized vehicle type IDs
JSONArray motorizedVehicleTypeIds = JsonPath.parse(vehicleTypesFeed)
  .read("$.data.vehicle_types[?].vehicle_type_id", motorizedVehicleTypesFilter);

// Build complex if/then schema
bikeItemsSchema
  .put("if", new JSONObject()
    .put("properties", new JSONObject()
      .put("vehicle_type_id", new JSONObject().put("enum", motorizedVehicleTypeIds))
    )
    .put("required", new JSONArray().put("vehicle_type_id"))
  )
  .put("then", new JSONObject()
    .put("required", new JSONArray().put("current_range_meters"))
  );
```

Builds JSON Schema conditional: "If vehicle_type_id is a motorized type, then current_range_meters is required"

## Problem 1: Implementation Complexity

### What Makes Rules Hard to Write

1. **JsonPath Expertise Required**:
   - Schema paths are 4-6 levels deep: `$.properties.data.properties.stations.items.properties.region_id`
   - Data extraction uses wildcards and filters: `$.data.vehicle_types[?].vehicle_type_id`
   - Array slicing for optimization: `[:1]` to get first element
   - No compile-time validation - wrong paths fail at runtime

2. **JSON Schema Structure Knowledge**:
   - Must know where to inject constraints (properties vs items vs required array)
   - Different patterns for different constraint types (enum vs required vs if/then)
   - Schema structure varies by GBFS version (bikes vs vehicles)

3. **JSONObject Manipulation**:
   - Mix of `DocumentContext.read()`, `JSONObject.put()`, `JSONObject.append()`, `DocumentContext.set()`
   - In-place mutations vs functional returns
   - Manual schema copying to prevent cache mutation

4. **Cross-Feed Data Dependencies**:
   - Rules must extract data from multiple feeds
   - Different feeds have different structures
   - Defensive null handling required throughout

### Maintenance Burden

- Adding a new rule requires 20-80 lines of complex code
- Rules are tightly coupled to schema structure - schema changes break rules
- Backward compatibility adds conditional logic (bikes vs vehicles)
- No abstraction layer - each rule duplicates navigation patterns
- Testing requires understanding entire validation flow

## Problem 2: Risk of Rule Conflicts

### Current Conflict Mitigation

`AbstractVersion.java:158-161`:

```java
// Must make a copy of the schema, otherwise it will be mutated by json-path
return patcher.addRule(
  JsonPath.parse(new JSONObject(schema.toMap())),
  feedMap
).json();
```

- Each rule gets a fresh copy of the schema from the previous rule's output
- Prevents mutation of cached raw schemas
- Rules applied sequentially via stream reduce

### What's NOT Protected

**Scenario 1: Multiple rules modifying same required array**

If two rules both append to `$.properties.data.properties.stations.items.required`:
- First rule adds `vehicle_types_available`
- Second rule adds `vehicle_docks_available`
- **Works correctly** - both fields end up in required array

BUT if second rule **replaces** instead of appending:
```java
requiredArray = new JSONArray().put("vehicle_docks_available");  // Oops, lost first rule's addition
```

**Scenario 2: Rules modifying same property**

If two rules both target `vehicle_type_id`:
- First rule adds: `{ "enum": [...] }`
- Second rule adds: `{ "pattern": "..." }`
- **Second rule could overwrite** if it does `vehicleTypeIdSchema.put("enum", ...)` again

### No Enforcement Mechanism

- Rules are carefully designed by humans to avoid conflicts
- No validation that combined rules produce valid JSON Schema
- No declaration of which schema paths a rule modifies
- Adding new rules requires manual review for conflicts
- Refactoring risks introducing subtle conflicts

## Problem 3: Confusing Validation Reports

### Error Structure

`FileValidationError.java:29-34`:

```java
public record FileValidationError(
  String schemaPath,      // From ValidationException.getSchemaLocation()
  String violationPath,   // From ValidationException.getPointerToViolation()
  String message,
  String keyword
)
```

These values come directly from Everit's `ValidationException` which validates against the **patched schema**.

### The Confusion

**Example Error**:
```json
{
  "schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
  "violationPath": "#/data/vehicles/0/pricing_plan_id",
  "message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
  "keyword": "enum"
}
```

**User investigates**: Opens `vehicle_status.json` schema and navigates to `properties.data.properties.vehicles.items.properties.pricing_plan_id`:

```json
{
  "pricing_plan_id": {
    "type": "string",
    "description": "The plan_id of the pricing plan this vehicle is eligible for"
  }
}
```

**No enum constraint!** 😕

### Why This Happens

1. Static schema doesn't have the enum constraint
2. `NoInvalidReferenceToPricingPlansInVehicleStatus` dynamically added it:
   ```java
   pricingPlanIdSchema.put("enum", pricingPlanIds);  // Added at runtime
   ```
3. Validation error references the patched schema location
4. User has no way to know this came from a custom rule
5. Error is technically correct but misleading

### User Experience Impact

- **Schema inspection is useless** - Errors reference constraints that aren't in schema files
- **Can't trace error source** - No indication which custom rule caused the error
- **Documentation doesn't help** - Static schema docs don't explain dynamic constraints
- **Debugging is hard** - Must understand entire custom rules system to interpret errors
- **Other validator implementations will have different errors** - No standard way to report these dynamic constraints

## Problem 4: Implementation-Specific Logic

### Current Architecture is Java-Specific

The patching system is tightly coupled to:

1. **Jayway JsonPath library** (Java/JVM)
   - `DocumentContext` API for schema manipulation
   - `Filter` API for complex queries
   - Configuration with `JsonOrgJsonProvider`

2. **org.json JSONObject** (Java)
   - JSONObject/JSONArray manipulation
   - Conversion to/from Maps

3. **Everit JSON Schema Validator** (Java)
   - Schema loading from JSONObject
   - ValidationException structure

4. **Java Streams and Collections**
   - Stream reduce for rule application
   - Map/List for rule registration

### Other Validators Must Reimplement

For a Python validator to implement the same rules:
- Reimplement all 8 custom rules in Python
- Use different JSON manipulation library (likely different API)
- Use different JsonPath library (or write path logic manually)
- Use different schema validator (likely different error structure)
- **Results in different behavior** - No guarantee of identical validation

For a JavaScript/Go/Rust validator:
- Same story - complete reimplementation
- Different libraries, different patterns
- Risk of divergence in rule logic

### No Interoperability

- Each validator implementation has its own custom rules
- No shared definition of what the rules should do
- GBFS spec can't standardize the dynamic constraints
- Validation results differ across implementations
- Users get different errors depending on which validator they use

## Proposed Solution: Schema Templates with Placeholders

### High-Level Concept

Instead of patching schemas at runtime with code, use **schema templates** that declare placeholders for dynamic values:

**Current approach** (code injects enum):
```java
// Code in NoInvalidReferenceToRegionInStationInformation
JSONArray regionIds = JsonPath.parse(systemRegionsFeed)
  .read("$.data.regions[*].region_id");
regionIdSchema.put("enum", regionIds);
```

**Proposed approach** (template with placeholder):
```json
{
  "region_id": {
    "type": "string",
    "description": "ID of the region where station is located",
    "enum": "${VALID_REGION_IDS}"
  }
}
```

Validator performs simple string replacement:
```java
String schema = loadSchemaAsString("station_information.json");
String regionIds = extractRegionIds(feeds.get("system_regions"));  // ["R1","R2","R3"]
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds);  // Simple string replace
```

Result after replacement:
```json
{
  "region_id": {
    "type": "string",
    "description": "ID of the region where station is located",
    "enum": ["R1", "R2", "R3"]
  }
}
```

### Benefits

#### 1. Dramatically Simpler Implementation

**Before** (24 lines of complex Java):
```java
public DocumentContext addRule(DocumentContext rawSchemaDocumentContext, Map<String, JSONObject> feeds) {
  JSONObject systemRegionsFeed = feeds.get("system_regions");
  JSONObject regionIdSchema = rawSchemaDocumentContext.read(
    "$.properties.data.properties.stations.items.properties.region_id"
  );
  JSONArray regionIds = systemRegionsFeed != null
    ? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
    : new JSONArray();
  regionIdSchema.put("enum", regionIds);
  return rawSchemaDocumentContext.set(
    "$.properties.data.properties.stations.items.properties.region_id",
    regionIdSchema
  );
}
```

**After** (3 lines of simple text processing):
```java
String regionIds = extractIds(feeds.get("system_regions"), "$.data.regions[*].region_id");
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds);
```

- No JsonPath navigation of schemas
- No JSONObject manipulation
- No schema structure knowledge required
- Just text find/replace operations

#### 2. Eliminates Rule Conflicts

Templates define exactly where values go:
```json
{
  "required": ["station_id", "num_bikes_available", "${CONDITIONAL_REQUIRED_FIELDS}"],
  "properties": {
    "vehicle_type_id": {
      "type": "string",
      "enum": "${VALID_VEHICLE_TYPE_IDS}"
    }
  }
}
```

- Placeholders are pre-positioned by schema authors
- No runtime conflict possible
- Multiple rules can't modify same location - only one placeholder per location
- Schema templating validates placeholders are well-formed

#### 3. Transparent Validation Reports

Error example with templates:
```json
{
  "schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
  "violationPath": "#/data/vehicles/0/pricing_plan_id",
  "message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
  "keyword": "enum"
}
```

User opens `vehicle_status.json` **template schema**:
```json
{
  "pricing_plan_id": {
    "type": "string",
    "description": "The plan_id of the pricing plan this vehicle is eligible for",
    "enum": "${VALID_PRICING_PLAN_IDS}"
  }
}
```

**Aha!** 💡 The enum constraint exists in the template with a placeholder. User now understands:
- The enum is populated from system_pricing_plans feed
- The error means their pricing_plan_id doesn't match system_pricing_plans
- The template serves as documentation of the dynamic behavior

#### 4. Interoperability Across Validators

**Schema templates live in upstream GBFS spec repository** (e.g., MobilityData/gbfs)

All validators (Java, Python, JavaScript, Go, Rust, etc.) implement the same simple logic:

**Python validator**:
```python
schema = load_schema_text("station_information.json")
region_ids = extract_ids(feeds["system_regions"], "$.data.regions[*].region_id")
schema = schema.replace('"${VALID_REGION_IDS}"', region_ids)
```

**JavaScript validator**:
```javascript
let schema = loadSchemaText("station_information.json");
const regionIds = extractIds(feeds["system_regions"], "$.data.regions[*].region_id");
schema = schema.replace('"${VALID_REGION_IDS}"', regionIds);
```

**Go validator**:
```go
schema := loadSchemaText("station_information.json")
regionIds := extractIds(feeds["system_regions"], "$.data.regions[*].region_id")
schema = strings.Replace(schema, `"${VALID_REGION_IDS}"`, regionIds, 1)
```

All produce **identical results** because:
- Same template schemas from upstream
- Same placeholder names
- Same replacement logic
- Same validation behavior

### Implementation Strategy

#### Step 1: Define Placeholder Convention

Propose to GBFS spec maintainers:

**Placeholder Syntax**: `${VARIABLE_NAME}`
- Consistent with many templating systems
- Easy to identify in JSON
- Won't conflict with valid JSON values (requires escaping)

**Example Placeholders**:
- `${VALID_PRICING_PLAN_IDS}` - Array of valid pricing plan IDs
- `${VALID_VEHICLE_TYPE_IDS}` - Array of valid vehicle type IDs
- `${VALID_REGION_IDS}` - Array of valid region IDs
- `${CONDITIONAL_REQUIRED_FIELDS}` - Array of conditionally required field names

**Placement Rules**:
- Placeholders for arrays: `"enum": "${VALID_IDS}"` (replace entire value)
- Placeholders for arrays in arrays: `"required": ["station_id", "${CONDITIONAL_FIELDS}"]` (replace item in array)
- Placeholders for objects: `"if": "${CONDITIONAL_SCHEMA}"` (replace entire object)

#### Step 2: Create Template Schemas

Update existing GBFS JSON schemas with placeholders:

**Example: station_information.json**

Before (static schema):
```json
{
  "properties": {
    "region_id": {
      "type": "string",
      "description": "ID of the region where station is located"
    }
  }
}
```

After (template schema):
```json
{
  "properties": {
    "region_id": {
      "type": "string",
      "description": "ID of the region where station is located",
      "enum": "${VALID_REGION_IDS}"
    }
  }
}
```

**Example: station_status.json**

Before:
```json
{
  "required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported"]
}
```

After:
```json
{
  "required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported", "${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}"]
}
```

#### Step 3: Implement Template Processing

**New class**: `SchemaTemplateProcessor`

```java
public class SchemaTemplateProcessor {

  public String processTemplate(String templateSchema, Map<String, JSONObject> feeds) {
    String processed = templateSchema;

    // Replace each placeholder
    processed = replaceValidPricingPlanIds(processed, feeds);
    processed = replaceValidVehicleTypeIds(processed, feeds);
    processed = replaceValidRegionIds(processed, feeds);
    processed = replaceConditionalRequiredFields(processed, feeds);
    // ... etc

    return processed;
  }

  private String replaceValidRegionIds(String schema, Map<String, JSONObject> feeds) {
    JSONObject systemRegions = feeds.get("system_regions");
    if (systemRegions == null) {
      return schema.replace("\"${VALID_REGION_IDS}\"", "[]");
    }

    JSONArray regionIds = JsonPath.parse(systemRegions)
      .read("$.data.regions[*].region_id");

    return schema.replace("\"${VALID_REGION_IDS}\"", regionIds.toString());
  }

  // Similar methods for other placeholders...
}
```

**Integration** in `AbstractVersion.java`:

```java
public Schema getSchema(String feedName, Map<String, JSONObject> feedMap) {
  String templateSchema = loadSchemaAsString(feedName);  // Load as text, not JSONObject
  String processedSchema = templateProcessor.processTemplate(templateSchema, feedMap);
  return loadSchema(new JSONObject(processedSchema));  // Parse and build validator
}
```

#### Step 4: Maintain Backward Compatibility

During transition, support both approaches:

1. **Flag in configuration**: `useSchemaTemplates` (default: false)
2. **When false**: Use existing CustomRuleSchemaPatcher system
3. **When true**: Use new SchemaTemplateProcessor
4. **Template schemas**: Stored alongside static schemas (e.g., `schema/v2.3/templates/`)
5. **Gradual migration**: One rule at a time, validate results match

Eventually deprecate and remove custom rule patching system.

#### Step 5: Upstream Contribution

Work with MobilityData/GBFS maintainers:

1. **Propose placeholder specification** - Document placeholder syntax and semantics
2. **Create template schemas** - For all versions (2.1, 2.2, 2.3, 3.0)
3. **Add template documentation** - Explain dynamic constraints in spec
4. **Publish template schemas** - In official GBFS schema repository
5. **Reference in spec** - GBFS specification references template schemas

## Mapping Current Rules to Templates

### Reference Validation Rules → Enum Placeholders

| Current Rule | Template Placeholder | Schema Location |
|-------------|---------------------|-----------------|
| NoInvalidReferenceToPricingPlansInVehicleStatus | `${VALID_PRICING_PLAN_IDS}` | vehicle_status.json, free_bike_status.json → pricing_plan_id/enum |
| NoInvalidReferenceToPricingPlansInVehicleTypes | `${VALID_PRICING_PLAN_IDS}` | vehicle_types.json → default_pricing_plan_id/enum, pricing_plan_ids/items/enum |
| NoInvalidReferenceToRegionInStationInformation | `${VALID_REGION_IDS}` | station_information.json → region_id/enum |
| NoInvalidReferenceToVehicleTypesInStationStatus | `${VALID_VEHICLE_TYPE_IDS}` | station_status.json → vehicle_type_id/enum, vehicle_type_ids/items/enum |

### Conditional Required Fields → Array Placeholders

| Current Rule | Template Placeholder | Schema Location |
|-------------|---------------------|-----------------|
| NoMissingVehicleTypesAvailableWhenVehicleTypesExists | `${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}` | station_status.json → required (append) |
| NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist | `${CONDITIONAL_REQUIRED_VEHICLE_STATUS_FIELDS}` | vehicle_status.json → required (append) |
| NoMissingStoreUriInSystemInformation | `${CONDITIONAL_REQUIRED_SYSTEM_INFO_FIELDS}` | system_information.json → required (append) |

### Complex Conditional → Object Placeholder

| Current Rule | Template Placeholder | Schema Location |
|-------------|---------------------|-----------------|
| NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles | `${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA}` | vehicle_status.json → vehicles/items (merge if/then) |

**Template for motorized vehicles** (in vehicle_status.json):
```json
{
  "items": {
    "allOf": [
      { "$ref": "#/definitions/vehicle" },
      "${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA}"
    ]
  }
}
```

**Placeholder value** (computed):
```json
{
  "if": {
    "properties": {
      "vehicle_type_id": { "enum": ["type_1", "type_3"] }
    },
    "required": ["vehicle_type_id"]
  },
  "then": {
    "required": ["current_range_meters"]
  }
}
```

## Alternative Approach: Programmatic Validation

There's a third option that was not initially considered: **programmatic validation** - checking data directly in code rather than modifying schemas or using templates.

### How It Works

Instead of patching schemas or using templates, run additional validation **after** schema validation:

```
Load static schemas → Validate with Everit → Run custom validators → Combine errors → Report
```

**New interface**:
```java
public interface CustomValidator {
  List<FileValidationError> validate(Map<String, JSONObject> feeds);
  String getTargetFeed();
  String getDescription();
}
```

**Example implementation** (~30 lines vs 60 for schema patching):
```java
public class ValidateRegionReferences implements CustomValidator {

  @Override
  public List<FileValidationError> validate(Map<String, JSONObject> feeds) {
    List<FileValidationError> errors = new ArrayList<>();

    JSONObject systemRegions = feeds.get("system_regions");
    JSONObject stationInfo = feeds.get("station_information");
    if (systemRegions == null || stationInfo == null) return errors;

    // Extract valid region IDs
    List<String> regionIdList = JsonPath.parse(systemRegions)
      .read("$.data.regions[*].region_id");
    Set<String> validRegionIds = new HashSet<>(regionIdList);

    // Check each station
    JSONArray stations = stationInfo.getJSONObject("data").getJSONArray("stations");
    for (int i = 0; i < stations.length(); i++) {
      JSONObject station = stations.getJSONObject(i);

      if (station.has("region_id")) {
        String regionId = station.getString("region_id");

        if (!validRegionIds.contains(regionId)) {
          errors.add(new FileValidationError(
            null,
            "#/data/stations/" + i + "/region_id",
            "region_id '" + regionId + "' does not exist in system_regions",
            "invalid_reference"
          ));
        }
      }
    }

    return errors;
  }

  @Override
  public String getTargetFeed() { return "station_information"; }

  @Override
  public String getDescription() {
    return "Validates region_id values exist in system_regions";
  }
}
```

### Advantages

1. **Dramatically simpler** - 50% less code than schema patching, no JsonPath schema navigation
2. **Clearer logic** - Direct iteration and checks, obvious what's being validated
3. **Better error messages** - Custom messages like "region_id 'R999' does not exist in system_regions"
4. **Type safety** - Working with `Set<String>`, not stringly-typed JSONObjects
5. **Easier testing** - Direct unit tests, no schema knowledge required
6. **Faster** - No schema parsing/modification overhead
7. **Quick implementation** - 3-4 weeks total (no upstream coordination)
8. **No conflicts** - Validators are independent, can't interfere

### Disadvantages

1. **No interoperability** - Each validator implementation must reimplement in their language
2. **Validation logic separate from schema** - Can't see full validation picture in schema files
3. **Error format differences** - `schemaPath` is null/N/A for programmatic checks

### Comparison Summary

| Aspect | Schema Patching | Templates | Programmatic |
|--------|----------------|-----------|--------------|
| **Code per rule** | 24-82 lines | ~3 (template) + 20-40 (replacement) | 20-40 lines |
| **Complexity** | High | Low | Low |
| **Readability** | Poor | Good | Excellent |
| **Error messages** | Confusing | Clear | Excellent |
| **Performance** | Slow | Medium | Fast |
| **Interoperability** | None | ⭐⭐⭐⭐⭐ | None |
| **Maintainability** | Poor | Good | Excellent |
| **Type safety** | None | None | Good |
| **Testing** | Hard | Medium | Easy |
| **Implementation time** | N/A (current) | 2-3 months | 3-4 weeks |
| **Ecosystem benefit** | None | High | Low |

## Decision Framework

The right choice depends on the project's goals:

### Choose Option A (Programmatic Validation) if:

- ✅ This is the primary/only GBFS validator implementation
- ✅ Simplicity and maintainability are top priorities
- ❌ Ecosystem-wide standardization is not a primary goal

**Implementation effort**: 3-4 weeks

### Choose Option B (Schema Templates) if:

- ✅ Multiple GBFS validator implementations need to stay synchronized
- ✅ Ecosystem-wide interoperability is a primary goal
- ✅ Schemas should document all validation rules (transparency)

**Implementation effort**: 2-3 months (including upstream contribution)

### Never Choose: Current Schema Patching ❌

The current approach has **no advantages** over either alternative:
- ❌ Most complex (JsonPath + schema structure knowledge)
- ❌ Worst error messages (phantom schema references)
- ❌ No interoperability anyway
- ❌ Hard to maintain and test
- ❌ Slowest performance

## Conclusion

**Current assessment**: The schema patching approach should be replaced with one of the two alternatives.

Both alternatives are significantly better than the current approach:
- **Option A (Programmatic)**: Best for developer experience, maintainability, and quick wins
- **Option B (Templates)**: Best for ecosystem standardization and interoperability

**Neither option requires backwards compatibility support** - both allow clean migration.




Current Rule	Template Placeholder	Schema Location
NoInvalidReferenceToPricingPlansInVehicleStatus	`${VALID_PRICING_PLAN_IDS}`	vehicle_status.json, free_bike_status.json → pricing_plan_id/enum
NoInvalidReferenceToPricingPlansInVehicleTypes	`${VALID_PRICING_PLAN_IDS}`	vehicle_types.json → default_pricing_plan_id/enum, pricing_plan_ids/items/enum
NoInvalidReferenceToRegionInStationInformation	`${VALID_REGION_IDS}`	station_information.json → region_id/enum
NoInvalidReferenceToVehicleTypesInStationStatus	`${VALID_VEHICLE_TYPE_IDS}`	station_status.json → vehicle_type_id/enum, vehicle_type_ids/items/enum

Current Rule	Template Placeholder	Schema Location
NoMissingVehicleTypesAvailableWhenVehicleTypesExists	`${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}`	station_status.json → required (append)
NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist	`${CONDITIONAL_REQUIRED_VEHICLE_STATUS_FIELDS}`	vehicle_status.json → required (append)
NoMissingStoreUriInSystemInformation	`${CONDITIONAL_REQUIRED_SYSTEM_INFO_FIELDS}`	system_information.json → required (append)

Aspect	Schema Patching	Templates	Programmatic
Code per rule	24-82 lines	~3 (template) + 20-40 (replacement)	20-40 lines
Complexity	High	Low	Low
Readability	Poor	Good	Excellent
Error messages	Confusing	Clear	Excellent
Performance	Slow	Medium	Fast
Interoperability	None	⭐⭐⭐⭐⭐	None
Maintainability	Poor	Good	Excellent
Type safety	None	None	Good
Testing	Hard	Medium	Easy
Implementation time	N/A (current)	2-3 months	3-4 weeks
Ecosystem benefit	None	High	Low

Alternative approaches to validating rules not expressible in static json schema #122

Description

Summary

Option A: Programmatic Validation

Option B: Schema Templates with Placeholders

Current Implementation Analysis

How Schema Patching Works Today

Current Custom Rules (8 total)

Implementation Complexity Examples

Simple Rule: Reference Validation (24 logic lines)

Complex Rule: Multi-Feed Conditional (82 logic lines)

Most Complex: Conditional Schema with Filters (64 logic lines)

Problem 1: Implementation Complexity

What Makes Rules Hard to Write

Maintenance Burden

Problem 2: Risk of Rule Conflicts

Current Conflict Mitigation

What's NOT Protected

No Enforcement Mechanism

Problem 3: Confusing Validation Reports

Error Structure

The Confusion

Why This Happens

User Experience Impact

Problem 4: Implementation-Specific Logic

Current Architecture is Java-Specific

Other Validators Must Reimplement

No Interoperability

Proposed Solution: Schema Templates with Placeholders

High-Level Concept

Benefits

1. Dramatically Simpler Implementation

2. Eliminates Rule Conflicts

3. Transparent Validation Reports

4. Interoperability Across Validators

Implementation Strategy

Step 1: Define Placeholder Convention

Step 2: Create Template Schemas

Step 3: Implement Template Processing

Step 4: Maintain Backward Compatibility

Step 5: Upstream Contribution

Mapping Current Rules to Templates

Reference Validation Rules → Enum Placeholders

Conditional Required Fields → Array Placeholders

Complex Conditional → Object Placeholder

Alternative Approach: Programmatic Validation

How It Works

Advantages

Disadvantages

Comparison Summary

Decision Framework

Choose Option A (Programmatic Validation) if:

Choose Option B (Schema Templates) if:

Never Choose: Current Schema Patching ❌

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions