I want to explore the idea presented here MobilityData/gbfs-validator#153 as an alternative to patching existing schemas while keeping the benefit of schema validation for these rules. We also consider the alternative of a programmatic (non-interoperable) approach.
A detailed analysis written with help from Claude:
Summary
The GBFS Validator currently implements custom validation rules (that cannot be expressed in static JSON schemas) by dynamically patching schemas at runtime. This approach has significant problems:
- High implementation complexity - Rules require deep knowledge of JSON Schema structure and JsonPath API
- Risk of rule conflicts - No enforcement mechanism prevents rules from interfering with each other
- Confusing validation reports - Errors reference constraints that don't exist in static schemas
- Implementation-specific logic - Other GBFS validators must reimplement the entire patching system
Two viable alternatives have been identified:
Option A: Programmatic Validation
Run custom validation checks after schema validation, generating errors directly in code.
- Easiest to maintain - Clear validation logic
- ❌ No interoperability (each validator reimplements)
Option B: Schema Templates with Placeholders
Replace placeholders in schema templates with actual values at runtime.
- Interoperable - All validators use same templates from upstream GBFS spec
- Transparent - Templates visible in schema files
- Standards-based - Single source of truth
- ❌ Requires upstream coordination
- ❌ Only valuable for multi-implementation ecosystem
Current Implementation Analysis
How Schema Patching Works Today
The validation flow:
- Load static JSON schemas from
src/main/resources/schema/v{version}/{feedName}.json
- For each feed being validated, retrieve applicable custom rules
- Apply rules sequentially, each receiving:
- A JsonPath
DocumentContext wrapping the schema JSON
- A map of all loaded GBFS feeds
- Rules extract data from feeds (e.g., valid pricing plan IDs) and inject into schemas:
- Add enum constraints for reference validation
- Add required field constraints based on feed presence
- Build if/then/else conditional schemas
- Convert patched JSONObject to Everit Schema and validate
Key Files:
CustomRuleSchemaPatcher.java:31-42 - Interface all rules implement
AbstractVersion.java:118-162 - Orchestrates rule application via stream reduce
FileValidator.java:59-84 - Entry point for validation
Current Custom Rules (8 total)
All rules in gbfs-validator-java/src/main/java/org/entur/gbfs/validation/validator/rules/:
Reference Validation Rules (enum constraints):
NoInvalidReferenceToPricingPlansInVehicleStatus - pricing_plan_id must exist in system_pricing_plans
NoInvalidReferenceToPricingPlansInVehicleTypes - pricing plan IDs in vehicle_types must be valid
NoInvalidReferenceToRegionInStationInformation - region_id must exist in system_regions
NoInvalidReferenceToVehicleTypesInStationStatus - vehicle_type_id must exist in vehicle_types
Conditional Required Field Rules:
5. NoMissingVehicleTypesAvailableWhenVehicleTypesExists - vehicle_types_available required when vehicle_types feed exists
6. NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist - vehicle_type_id required and valid when vehicle_types exists
7. NoMissingStoreUriInSystemInformation - rental_apps required when rental_uris exist in stations/vehicles
8. NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles - current_range_meters required for motorized vehicles (if/then/else schema)
Implementation Complexity Examples
Simple Rule: Reference Validation (24 logic lines)
NoInvalidReferenceToRegionInStationInformation.java:41-58:
@Override
public DocumentContext addRule(
DocumentContext rawSchemaDocumentContext,
Map<String, JSONObject> feeds
) {
// Extract valid region IDs from system_regions feed
JSONObject systemRegionsFeed = feeds.get("system_regions");
JSONArray regionIds = systemRegionsFeed != null
? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
: new JSONArray();
// Navigate to region_id property in schema (6 levels deep)
JSONObject regionIdSchema = rawSchemaDocumentContext.read(
"$.properties.data.properties.stations.items.properties.region_id"
);
// Add enum constraint
regionIdSchema.put("enum", regionIds);
// Write back to schema
return rawSchemaDocumentContext.set(path, regionIdSchema);
}
Complexity factors:
- Must construct correct JsonPath expression (error-prone strings)
- Requires understanding JSON Schema structure (where to inject constraint)
- Mix of read/modify/write operations on JSONObjects
- Defensive null handling for missing feeds
- No compile-time safety for schema paths
Complex Rule: Multi-Feed Conditional (82 logic lines)
NoMissingStoreUriInSystemInformation.java:47-129:
This rule checks two different feeds (vehicle_status AND station_information) to determine if rental_apps should be required in system_information:
// Check vehicle_status for rental URIs
JSONObject vehicleStatusFeed = feeds.get(vehicleStatusFileName);
String vehiclesKey = vehicleStatusFileName.equals("vehicle_status")
? "vehicles" : "bikes"; // Backward compatibility
if (!(JSONArray) JsonPath.parse(vehicleStatusFeed)
.read("$.data." + vehiclesKey + "[:1].rental_uris.ios")).isEmpty()) {
hasIosRentalUris = true;
}
// Check station_information for rental URIs
JSONObject stationInformationFeed = feeds.get("station_information");
if (!(JSONArray) JsonPath.parse(stationInformationFeed)
.read("$.data.stations[:1].rental_uris.ios")).isEmpty()) {
hasIosRentalUris = true;
}
// Conditionally modify system_information schema
if (hasIosRentalUris || hasAndroidRentalUris) {
JSONArray dataRequired = rawSchemaDocumentContext.read("$.properties.data.required");
dataRequired.put("rental_apps");
JSONObject rentalAppsSchema = rawSchemaDocumentContext.read(
"$.properties.data.properties.rental_apps"
);
JSONArray rentalAppRequired = new JSONArray();
if (hasIosRentalUris) rentalAppRequired.put("ios");
if (hasAndroidRentalUris) rentalAppRequired.put("android");
rentalAppsSchema.put("required", rentalAppRequired);
}
Additional complexity:
- Dynamic JsonPath construction based on feed type
- Checks multiple feeds with different structures
- Accumulates boolean flags across feeds
- Modifies multiple schema locations
- Handles backward compatibility with legacy feeds
Most Complex: Conditional Schema with Filters (64 logic lines)
NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles.java:56-119:
Uses JsonPath filters to extract only motorized vehicle types, then builds an if/then/else schema:
// Filter for motorized vehicles
private static final Filter motorizedVehicleTypesFilter = Filter.filter(
where("propulsion_type").in(List.of("electric_assist", "electric", "combustion"))
);
// Extract motorized vehicle type IDs
JSONArray motorizedVehicleTypeIds = JsonPath.parse(vehicleTypesFeed)
.read("$.data.vehicle_types[?].vehicle_type_id", motorizedVehicleTypesFilter);
// Build complex if/then schema
bikeItemsSchema
.put("if", new JSONObject()
.put("properties", new JSONObject()
.put("vehicle_type_id", new JSONObject().put("enum", motorizedVehicleTypeIds))
)
.put("required", new JSONArray().put("vehicle_type_id"))
)
.put("then", new JSONObject()
.put("required", new JSONArray().put("current_range_meters"))
);
Builds JSON Schema conditional: "If vehicle_type_id is a motorized type, then current_range_meters is required"
Problem 1: Implementation Complexity
What Makes Rules Hard to Write
-
JsonPath Expertise Required:
- Schema paths are 4-6 levels deep:
$.properties.data.properties.stations.items.properties.region_id
- Data extraction uses wildcards and filters:
$.data.vehicle_types[?].vehicle_type_id
- Array slicing for optimization:
[:1] to get first element
- No compile-time validation - wrong paths fail at runtime
-
JSON Schema Structure Knowledge:
- Must know where to inject constraints (properties vs items vs required array)
- Different patterns for different constraint types (enum vs required vs if/then)
- Schema structure varies by GBFS version (bikes vs vehicles)
-
JSONObject Manipulation:
- Mix of
DocumentContext.read(), JSONObject.put(), JSONObject.append(), DocumentContext.set()
- In-place mutations vs functional returns
- Manual schema copying to prevent cache mutation
-
Cross-Feed Data Dependencies:
- Rules must extract data from multiple feeds
- Different feeds have different structures
- Defensive null handling required throughout
Maintenance Burden
- Adding a new rule requires 20-80 lines of complex code
- Rules are tightly coupled to schema structure - schema changes break rules
- Backward compatibility adds conditional logic (bikes vs vehicles)
- No abstraction layer - each rule duplicates navigation patterns
- Testing requires understanding entire validation flow
Problem 2: Risk of Rule Conflicts
Current Conflict Mitigation
AbstractVersion.java:158-161:
// Must make a copy of the schema, otherwise it will be mutated by json-path
return patcher.addRule(
JsonPath.parse(new JSONObject(schema.toMap())),
feedMap
).json();
- Each rule gets a fresh copy of the schema from the previous rule's output
- Prevents mutation of cached raw schemas
- Rules applied sequentially via stream reduce
What's NOT Protected
Scenario 1: Multiple rules modifying same required array
If two rules both append to $.properties.data.properties.stations.items.required:
- First rule adds
vehicle_types_available
- Second rule adds
vehicle_docks_available
- Works correctly - both fields end up in required array
BUT if second rule replaces instead of appending:
requiredArray = new JSONArray().put("vehicle_docks_available"); // Oops, lost first rule's addition
Scenario 2: Rules modifying same property
If two rules both target vehicle_type_id:
- First rule adds:
{ "enum": [...] }
- Second rule adds:
{ "pattern": "..." }
- Second rule could overwrite if it does
vehicleTypeIdSchema.put("enum", ...) again
No Enforcement Mechanism
- Rules are carefully designed by humans to avoid conflicts
- No validation that combined rules produce valid JSON Schema
- No declaration of which schema paths a rule modifies
- Adding new rules requires manual review for conflicts
- Refactoring risks introducing subtle conflicts
Problem 3: Confusing Validation Reports
Error Structure
FileValidationError.java:29-34:
public record FileValidationError(
String schemaPath, // From ValidationException.getSchemaLocation()
String violationPath, // From ValidationException.getPointerToViolation()
String message,
String keyword
)
These values come directly from Everit's ValidationException which validates against the patched schema.
The Confusion
Example Error:
{
"schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
"violationPath": "#/data/vehicles/0/pricing_plan_id",
"message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
"keyword": "enum"
}
User investigates: Opens vehicle_status.json schema and navigates to properties.data.properties.vehicles.items.properties.pricing_plan_id:
{
"pricing_plan_id": {
"type": "string",
"description": "The plan_id of the pricing plan this vehicle is eligible for"
}
}
No enum constraint! 😕
Why This Happens
- Static schema doesn't have the enum constraint
NoInvalidReferenceToPricingPlansInVehicleStatus dynamically added it:
pricingPlanIdSchema.put("enum", pricingPlanIds); // Added at runtime
- Validation error references the patched schema location
- User has no way to know this came from a custom rule
- Error is technically correct but misleading
User Experience Impact
- Schema inspection is useless - Errors reference constraints that aren't in schema files
- Can't trace error source - No indication which custom rule caused the error
- Documentation doesn't help - Static schema docs don't explain dynamic constraints
- Debugging is hard - Must understand entire custom rules system to interpret errors
- Other validator implementations will have different errors - No standard way to report these dynamic constraints
Problem 4: Implementation-Specific Logic
Current Architecture is Java-Specific
The patching system is tightly coupled to:
-
Jayway JsonPath library (Java/JVM)
DocumentContext API for schema manipulation
Filter API for complex queries
- Configuration with
JsonOrgJsonProvider
-
org.json JSONObject (Java)
- JSONObject/JSONArray manipulation
- Conversion to/from Maps
-
Everit JSON Schema Validator (Java)
- Schema loading from JSONObject
- ValidationException structure
-
Java Streams and Collections
- Stream reduce for rule application
- Map/List for rule registration
Other Validators Must Reimplement
For a Python validator to implement the same rules:
- Reimplement all 8 custom rules in Python
- Use different JSON manipulation library (likely different API)
- Use different JsonPath library (or write path logic manually)
- Use different schema validator (likely different error structure)
- Results in different behavior - No guarantee of identical validation
For a JavaScript/Go/Rust validator:
- Same story - complete reimplementation
- Different libraries, different patterns
- Risk of divergence in rule logic
No Interoperability
- Each validator implementation has its own custom rules
- No shared definition of what the rules should do
- GBFS spec can't standardize the dynamic constraints
- Validation results differ across implementations
- Users get different errors depending on which validator they use
Proposed Solution: Schema Templates with Placeholders
High-Level Concept
Instead of patching schemas at runtime with code, use schema templates that declare placeholders for dynamic values:
Current approach (code injects enum):
// Code in NoInvalidReferenceToRegionInStationInformation
JSONArray regionIds = JsonPath.parse(systemRegionsFeed)
.read("$.data.regions[*].region_id");
regionIdSchema.put("enum", regionIds);
Proposed approach (template with placeholder):
{
"region_id": {
"type": "string",
"description": "ID of the region where station is located",
"enum": "${VALID_REGION_IDS}"
}
}
Validator performs simple string replacement:
String schema = loadSchemaAsString("station_information.json");
String regionIds = extractRegionIds(feeds.get("system_regions")); // ["R1","R2","R3"]
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds); // Simple string replace
Result after replacement:
{
"region_id": {
"type": "string",
"description": "ID of the region where station is located",
"enum": ["R1", "R2", "R3"]
}
}
Benefits
1. Dramatically Simpler Implementation
Before (24 lines of complex Java):
public DocumentContext addRule(DocumentContext rawSchemaDocumentContext, Map<String, JSONObject> feeds) {
JSONObject systemRegionsFeed = feeds.get("system_regions");
JSONObject regionIdSchema = rawSchemaDocumentContext.read(
"$.properties.data.properties.stations.items.properties.region_id"
);
JSONArray regionIds = systemRegionsFeed != null
? JsonPath.parse(systemRegionsFeed).read("$.data.regions[*].region_id")
: new JSONArray();
regionIdSchema.put("enum", regionIds);
return rawSchemaDocumentContext.set(
"$.properties.data.properties.stations.items.properties.region_id",
regionIdSchema
);
}
After (3 lines of simple text processing):
String regionIds = extractIds(feeds.get("system_regions"), "$.data.regions[*].region_id");
schema = schema.replace("\"${VALID_REGION_IDS}\"", regionIds);
- No JsonPath navigation of schemas
- No JSONObject manipulation
- No schema structure knowledge required
- Just text find/replace operations
2. Eliminates Rule Conflicts
Templates define exactly where values go:
{
"required": ["station_id", "num_bikes_available", "${CONDITIONAL_REQUIRED_FIELDS}"],
"properties": {
"vehicle_type_id": {
"type": "string",
"enum": "${VALID_VEHICLE_TYPE_IDS}"
}
}
}
- Placeholders are pre-positioned by schema authors
- No runtime conflict possible
- Multiple rules can't modify same location - only one placeholder per location
- Schema templating validates placeholders are well-formed
3. Transparent Validation Reports
Error example with templates:
{
"schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum",
"violationPath": "#/data/vehicles/0/pricing_plan_id",
"message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])",
"keyword": "enum"
}
User opens vehicle_status.json template schema:
{
"pricing_plan_id": {
"type": "string",
"description": "The plan_id of the pricing plan this vehicle is eligible for",
"enum": "${VALID_PRICING_PLAN_IDS}"
}
}
Aha! 💡 The enum constraint exists in the template with a placeholder. User now understands:
- The enum is populated from system_pricing_plans feed
- The error means their pricing_plan_id doesn't match system_pricing_plans
- The template serves as documentation of the dynamic behavior
4. Interoperability Across Validators
Schema templates live in upstream GBFS spec repository (e.g., MobilityData/gbfs)
All validators (Java, Python, JavaScript, Go, Rust, etc.) implement the same simple logic:
Python validator:
schema = load_schema_text("station_information.json")
region_ids = extract_ids(feeds["system_regions"], "$.data.regions[*].region_id")
schema = schema.replace('"${VALID_REGION_IDS}"', region_ids)
JavaScript validator:
let schema = loadSchemaText("station_information.json");
const regionIds = extractIds(feeds["system_regions"], "$.data.regions[*].region_id");
schema = schema.replace('"${VALID_REGION_IDS}"', regionIds);
Go validator:
schema := loadSchemaText("station_information.json")
regionIds := extractIds(feeds["system_regions"], "$.data.regions[*].region_id")
schema = strings.Replace(schema, `"${VALID_REGION_IDS}"`, regionIds, 1)
All produce identical results because:
- Same template schemas from upstream
- Same placeholder names
- Same replacement logic
- Same validation behavior
Implementation Strategy
Step 1: Define Placeholder Convention
Propose to GBFS spec maintainers:
Placeholder Syntax: ${VARIABLE_NAME}
- Consistent with many templating systems
- Easy to identify in JSON
- Won't conflict with valid JSON values (requires escaping)
Example Placeholders:
${VALID_PRICING_PLAN_IDS} - Array of valid pricing plan IDs
${VALID_VEHICLE_TYPE_IDS} - Array of valid vehicle type IDs
${VALID_REGION_IDS} - Array of valid region IDs
${CONDITIONAL_REQUIRED_FIELDS} - Array of conditionally required field names
Placement Rules:
- Placeholders for arrays:
"enum": "${VALID_IDS}" (replace entire value)
- Placeholders for arrays in arrays:
"required": ["station_id", "${CONDITIONAL_FIELDS}"] (replace item in array)
- Placeholders for objects:
"if": "${CONDITIONAL_SCHEMA}" (replace entire object)
Step 2: Create Template Schemas
Update existing GBFS JSON schemas with placeholders:
Example: station_information.json
Before (static schema):
{
"properties": {
"region_id": {
"type": "string",
"description": "ID of the region where station is located"
}
}
}
After (template schema):
{
"properties": {
"region_id": {
"type": "string",
"description": "ID of the region where station is located",
"enum": "${VALID_REGION_IDS}"
}
}
}
Example: station_status.json
Before:
{
"required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported"]
}
After:
{
"required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported", "${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}"]
}
Step 3: Implement Template Processing
New class: SchemaTemplateProcessor
public class SchemaTemplateProcessor {
public String processTemplate(String templateSchema, Map<String, JSONObject> feeds) {
String processed = templateSchema;
// Replace each placeholder
processed = replaceValidPricingPlanIds(processed, feeds);
processed = replaceValidVehicleTypeIds(processed, feeds);
processed = replaceValidRegionIds(processed, feeds);
processed = replaceConditionalRequiredFields(processed, feeds);
// ... etc
return processed;
}
private String replaceValidRegionIds(String schema, Map<String, JSONObject> feeds) {
JSONObject systemRegions = feeds.get("system_regions");
if (systemRegions == null) {
return schema.replace("\"${VALID_REGION_IDS}\"", "[]");
}
JSONArray regionIds = JsonPath.parse(systemRegions)
.read("$.data.regions[*].region_id");
return schema.replace("\"${VALID_REGION_IDS}\"", regionIds.toString());
}
// Similar methods for other placeholders...
}
Integration in AbstractVersion.java:
public Schema getSchema(String feedName, Map<String, JSONObject> feedMap) {
String templateSchema = loadSchemaAsString(feedName); // Load as text, not JSONObject
String processedSchema = templateProcessor.processTemplate(templateSchema, feedMap);
return loadSchema(new JSONObject(processedSchema)); // Parse and build validator
}
Step 4: Maintain Backward Compatibility
During transition, support both approaches:
- Flag in configuration:
useSchemaTemplates (default: false)
- When false: Use existing CustomRuleSchemaPatcher system
- When true: Use new SchemaTemplateProcessor
- Template schemas: Stored alongside static schemas (e.g.,
schema/v2.3/templates/)
- Gradual migration: One rule at a time, validate results match
Eventually deprecate and remove custom rule patching system.
Step 5: Upstream Contribution
Work with MobilityData/GBFS maintainers:
- Propose placeholder specification - Document placeholder syntax and semantics
- Create template schemas - For all versions (2.1, 2.2, 2.3, 3.0)
- Add template documentation - Explain dynamic constraints in spec
- Publish template schemas - In official GBFS schema repository
- Reference in spec - GBFS specification references template schemas
Mapping Current Rules to Templates
Reference Validation Rules → Enum Placeholders
| Current Rule |
Template Placeholder |
Schema Location |
| NoInvalidReferenceToPricingPlansInVehicleStatus |
${VALID_PRICING_PLAN_IDS} |
vehicle_status.json, free_bike_status.json → pricing_plan_id/enum |
| NoInvalidReferenceToPricingPlansInVehicleTypes |
${VALID_PRICING_PLAN_IDS} |
vehicle_types.json → default_pricing_plan_id/enum, pricing_plan_ids/items/enum |
| NoInvalidReferenceToRegionInStationInformation |
${VALID_REGION_IDS} |
station_information.json → region_id/enum |
| NoInvalidReferenceToVehicleTypesInStationStatus |
${VALID_VEHICLE_TYPE_IDS} |
station_status.json → vehicle_type_id/enum, vehicle_type_ids/items/enum |
Conditional Required Fields → Array Placeholders
| Current Rule |
Template Placeholder |
Schema Location |
| NoMissingVehicleTypesAvailableWhenVehicleTypesExists |
${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS} |
station_status.json → required (append) |
| NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist |
${CONDITIONAL_REQUIRED_VEHICLE_STATUS_FIELDS} |
vehicle_status.json → required (append) |
| NoMissingStoreUriInSystemInformation |
${CONDITIONAL_REQUIRED_SYSTEM_INFO_FIELDS} |
system_information.json → required (append) |
Complex Conditional → Object Placeholder
| Current Rule |
Template Placeholder |
Schema Location |
| NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles |
${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA} |
vehicle_status.json → vehicles/items (merge if/then) |
Template for motorized vehicles (in vehicle_status.json):
{
"items": {
"allOf": [
{ "$ref": "#/definitions/vehicle" },
"${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA}"
]
}
}
Placeholder value (computed):
{
"if": {
"properties": {
"vehicle_type_id": { "enum": ["type_1", "type_3"] }
},
"required": ["vehicle_type_id"]
},
"then": {
"required": ["current_range_meters"]
}
}
Alternative Approach: Programmatic Validation
There's a third option that was not initially considered: programmatic validation - checking data directly in code rather than modifying schemas or using templates.
How It Works
Instead of patching schemas or using templates, run additional validation after schema validation:
Load static schemas → Validate with Everit → Run custom validators → Combine errors → Report
New interface:
public interface CustomValidator {
List<FileValidationError> validate(Map<String, JSONObject> feeds);
String getTargetFeed();
String getDescription();
}
Example implementation (~30 lines vs 60 for schema patching):
public class ValidateRegionReferences implements CustomValidator {
@Override
public List<FileValidationError> validate(Map<String, JSONObject> feeds) {
List<FileValidationError> errors = new ArrayList<>();
JSONObject systemRegions = feeds.get("system_regions");
JSONObject stationInfo = feeds.get("station_information");
if (systemRegions == null || stationInfo == null) return errors;
// Extract valid region IDs
List<String> regionIdList = JsonPath.parse(systemRegions)
.read("$.data.regions[*].region_id");
Set<String> validRegionIds = new HashSet<>(regionIdList);
// Check each station
JSONArray stations = stationInfo.getJSONObject("data").getJSONArray("stations");
for (int i = 0; i < stations.length(); i++) {
JSONObject station = stations.getJSONObject(i);
if (station.has("region_id")) {
String regionId = station.getString("region_id");
if (!validRegionIds.contains(regionId)) {
errors.add(new FileValidationError(
null,
"#/data/stations/" + i + "/region_id",
"region_id '" + regionId + "' does not exist in system_regions",
"invalid_reference"
));
}
}
}
return errors;
}
@Override
public String getTargetFeed() { return "station_information"; }
@Override
public String getDescription() {
return "Validates region_id values exist in system_regions";
}
}
Advantages
- Dramatically simpler - 50% less code than schema patching, no JsonPath schema navigation
- Clearer logic - Direct iteration and checks, obvious what's being validated
- Better error messages - Custom messages like "region_id 'R999' does not exist in system_regions"
- Type safety - Working with
Set<String>, not stringly-typed JSONObjects
- Easier testing - Direct unit tests, no schema knowledge required
- Faster - No schema parsing/modification overhead
- Quick implementation - 3-4 weeks total (no upstream coordination)
- No conflicts - Validators are independent, can't interfere
Disadvantages
- No interoperability - Each validator implementation must reimplement in their language
- Validation logic separate from schema - Can't see full validation picture in schema files
- Error format differences -
schemaPath is null/N/A for programmatic checks
Comparison Summary
| Aspect |
Schema Patching |
Templates |
Programmatic |
| Code per rule |
24-82 lines |
~3 (template) + 20-40 (replacement) |
20-40 lines |
| Complexity |
High |
Low |
Low |
| Readability |
Poor |
Good |
Excellent |
| Error messages |
Confusing |
Clear |
Excellent |
| Performance |
Slow |
Medium |
Fast |
| Interoperability |
None |
⭐⭐⭐⭐⭐ |
None |
| Maintainability |
Poor |
Good |
Excellent |
| Type safety |
None |
None |
Good |
| Testing |
Hard |
Medium |
Easy |
| Implementation time |
N/A (current) |
2-3 months |
3-4 weeks |
| Ecosystem benefit |
None |
High |
Low |
Decision Framework
The right choice depends on the project's goals:
Choose Option A (Programmatic Validation) if:
- ✅ This is the primary/only GBFS validator implementation
- ✅ Simplicity and maintainability are top priorities
- ❌ Ecosystem-wide standardization is not a primary goal
Implementation effort: 3-4 weeks
Choose Option B (Schema Templates) if:
- ✅ Multiple GBFS validator implementations need to stay synchronized
- ✅ Ecosystem-wide interoperability is a primary goal
- ✅ Schemas should document all validation rules (transparency)
Implementation effort: 2-3 months (including upstream contribution)
Never Choose: Current Schema Patching ❌
The current approach has no advantages over either alternative:
- ❌ Most complex (JsonPath + schema structure knowledge)
- ❌ Worst error messages (phantom schema references)
- ❌ No interoperability anyway
- ❌ Hard to maintain and test
- ❌ Slowest performance
Conclusion
Current assessment: The schema patching approach should be replaced with one of the two alternatives.
Both alternatives are significantly better than the current approach:
- Option A (Programmatic): Best for developer experience, maintainability, and quick wins
- Option B (Templates): Best for ecosystem standardization and interoperability
Neither option requires backwards compatibility support - both allow clean migration.
I want to explore the idea presented here MobilityData/gbfs-validator#153 as an alternative to patching existing schemas while keeping the benefit of schema validation for these rules. We also consider the alternative of a programmatic (non-interoperable) approach.
A detailed analysis written with help from Claude:
Summary
The GBFS Validator currently implements custom validation rules (that cannot be expressed in static JSON schemas) by dynamically patching schemas at runtime. This approach has significant problems:
Two viable alternatives have been identified:
Option A: Programmatic Validation
Run custom validation checks after schema validation, generating errors directly in code.
Option B: Schema Templates with Placeholders
Replace placeholders in schema templates with actual values at runtime.
Current Implementation Analysis
How Schema Patching Works Today
The validation flow:
src/main/resources/schema/v{version}/{feedName}.jsonDocumentContextwrapping the schema JSONKey Files:
CustomRuleSchemaPatcher.java:31-42- Interface all rules implementAbstractVersion.java:118-162- Orchestrates rule application via stream reduceFileValidator.java:59-84- Entry point for validationCurrent Custom Rules (8 total)
All rules in
gbfs-validator-java/src/main/java/org/entur/gbfs/validation/validator/rules/:Reference Validation Rules (enum constraints):
NoInvalidReferenceToPricingPlansInVehicleStatus- pricing_plan_id must exist in system_pricing_plansNoInvalidReferenceToPricingPlansInVehicleTypes- pricing plan IDs in vehicle_types must be validNoInvalidReferenceToRegionInStationInformation- region_id must exist in system_regionsNoInvalidReferenceToVehicleTypesInStationStatus- vehicle_type_id must exist in vehicle_typesConditional Required Field Rules:
5.
NoMissingVehicleTypesAvailableWhenVehicleTypesExists- vehicle_types_available required when vehicle_types feed exists6.
NoMissingOrInvalidVehicleTypeIdInVehicleStatusWhenVehicleTypesExist- vehicle_type_id required and valid when vehicle_types exists7.
NoMissingStoreUriInSystemInformation- rental_apps required when rental_uris exist in stations/vehicles8.
NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles- current_range_meters required for motorized vehicles (if/then/else schema)Implementation Complexity Examples
Simple Rule: Reference Validation (24 logic lines)
NoInvalidReferenceToRegionInStationInformation.java:41-58:Complexity factors:
Complex Rule: Multi-Feed Conditional (82 logic lines)
NoMissingStoreUriInSystemInformation.java:47-129:This rule checks two different feeds (vehicle_status AND station_information) to determine if rental_apps should be required in system_information:
Additional complexity:
Most Complex: Conditional Schema with Filters (64 logic lines)
NoMissingCurrentRangeMetersInVehicleStatusForMotorizedVehicles.java:56-119:Uses JsonPath filters to extract only motorized vehicle types, then builds an if/then/else schema:
Builds JSON Schema conditional: "If vehicle_type_id is a motorized type, then current_range_meters is required"
Problem 1: Implementation Complexity
What Makes Rules Hard to Write
JsonPath Expertise Required:
$.properties.data.properties.stations.items.properties.region_id$.data.vehicle_types[?].vehicle_type_id[:1]to get first elementJSON Schema Structure Knowledge:
JSONObject Manipulation:
DocumentContext.read(),JSONObject.put(),JSONObject.append(),DocumentContext.set()Cross-Feed Data Dependencies:
Maintenance Burden
Problem 2: Risk of Rule Conflicts
Current Conflict Mitigation
AbstractVersion.java:158-161:What's NOT Protected
Scenario 1: Multiple rules modifying same required array
If two rules both append to
$.properties.data.properties.stations.items.required:vehicle_types_availablevehicle_docks_availableBUT if second rule replaces instead of appending:
Scenario 2: Rules modifying same property
If two rules both target
vehicle_type_id:{ "enum": [...] }{ "pattern": "..." }vehicleTypeIdSchema.put("enum", ...)againNo Enforcement Mechanism
Problem 3: Confusing Validation Reports
Error Structure
FileValidationError.java:29-34:These values come directly from Everit's
ValidationExceptionwhich validates against the patched schema.The Confusion
Example Error:
{ "schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum", "violationPath": "#/data/vehicles/0/pricing_plan_id", "message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])", "keyword": "enum" }User investigates: Opens
vehicle_status.jsonschema and navigates toproperties.data.properties.vehicles.items.properties.pricing_plan_id:{ "pricing_plan_id": { "type": "string", "description": "The plan_id of the pricing plan this vehicle is eligible for" } }No enum constraint! 😕
Why This Happens
NoInvalidReferenceToPricingPlansInVehicleStatusdynamically added it:User Experience Impact
Problem 4: Implementation-Specific Logic
Current Architecture is Java-Specific
The patching system is tightly coupled to:
Jayway JsonPath library (Java/JVM)
DocumentContextAPI for schema manipulationFilterAPI for complex queriesJsonOrgJsonProviderorg.json JSONObject (Java)
Everit JSON Schema Validator (Java)
Java Streams and Collections
Other Validators Must Reimplement
For a Python validator to implement the same rules:
For a JavaScript/Go/Rust validator:
No Interoperability
Proposed Solution: Schema Templates with Placeholders
High-Level Concept
Instead of patching schemas at runtime with code, use schema templates that declare placeholders for dynamic values:
Current approach (code injects enum):
Proposed approach (template with placeholder):
{ "region_id": { "type": "string", "description": "ID of the region where station is located", "enum": "${VALID_REGION_IDS}" } }Validator performs simple string replacement:
Result after replacement:
{ "region_id": { "type": "string", "description": "ID of the region where station is located", "enum": ["R1", "R2", "R3"] } }Benefits
1. Dramatically Simpler Implementation
Before (24 lines of complex Java):
After (3 lines of simple text processing):
2. Eliminates Rule Conflicts
Templates define exactly where values go:
{ "required": ["station_id", "num_bikes_available", "${CONDITIONAL_REQUIRED_FIELDS}"], "properties": { "vehicle_type_id": { "type": "string", "enum": "${VALID_VEHICLE_TYPE_IDS}" } } }3. Transparent Validation Reports
Error example with templates:
{ "schemaPath": "#/properties/data/properties/vehicles/items/properties/pricing_plan_id/enum", "violationPath": "#/data/vehicles/0/pricing_plan_id", "message": "instance value (plan_999) not found in enum (possible values: [\"plan_1\",\"plan_2\"])", "keyword": "enum" }User opens
vehicle_status.jsontemplate schema:{ "pricing_plan_id": { "type": "string", "description": "The plan_id of the pricing plan this vehicle is eligible for", "enum": "${VALID_PRICING_PLAN_IDS}" } }Aha! 💡 The enum constraint exists in the template with a placeholder. User now understands:
4. Interoperability Across Validators
Schema templates live in upstream GBFS spec repository (e.g., MobilityData/gbfs)
All validators (Java, Python, JavaScript, Go, Rust, etc.) implement the same simple logic:
Python validator:
JavaScript validator:
Go validator:
All produce identical results because:
Implementation Strategy
Step 1: Define Placeholder Convention
Propose to GBFS spec maintainers:
Placeholder Syntax:
${VARIABLE_NAME}Example Placeholders:
${VALID_PRICING_PLAN_IDS}- Array of valid pricing plan IDs${VALID_VEHICLE_TYPE_IDS}- Array of valid vehicle type IDs${VALID_REGION_IDS}- Array of valid region IDs${CONDITIONAL_REQUIRED_FIELDS}- Array of conditionally required field namesPlacement Rules:
"enum": "${VALID_IDS}"(replace entire value)"required": ["station_id", "${CONDITIONAL_FIELDS}"](replace item in array)"if": "${CONDITIONAL_SCHEMA}"(replace entire object)Step 2: Create Template Schemas
Update existing GBFS JSON schemas with placeholders:
Example: station_information.json
Before (static schema):
{ "properties": { "region_id": { "type": "string", "description": "ID of the region where station is located" } } }After (template schema):
{ "properties": { "region_id": { "type": "string", "description": "ID of the region where station is located", "enum": "${VALID_REGION_IDS}" } } }Example: station_status.json
Before:
{ "required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported"] }After:
{ "required": ["station_id", "num_bikes_available", "is_installed", "is_renting", "is_returning", "last_reported", "${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}"] }Step 3: Implement Template Processing
New class:
SchemaTemplateProcessorIntegration in
AbstractVersion.java:Step 4: Maintain Backward Compatibility
During transition, support both approaches:
useSchemaTemplates(default: false)schema/v2.3/templates/)Eventually deprecate and remove custom rule patching system.
Step 5: Upstream Contribution
Work with MobilityData/GBFS maintainers:
Mapping Current Rules to Templates
Reference Validation Rules → Enum Placeholders
${VALID_PRICING_PLAN_IDS}${VALID_PRICING_PLAN_IDS}${VALID_REGION_IDS}${VALID_VEHICLE_TYPE_IDS}Conditional Required Fields → Array Placeholders
${CONDITIONAL_REQUIRED_STATION_STATUS_FIELDS}${CONDITIONAL_REQUIRED_VEHICLE_STATUS_FIELDS}${CONDITIONAL_REQUIRED_SYSTEM_INFO_FIELDS}Complex Conditional → Object Placeholder
${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA}Template for motorized vehicles (in vehicle_status.json):
{ "items": { "allOf": [ { "$ref": "#/definitions/vehicle" }, "${MOTORIZED_VEHICLE_CONDITIONAL_SCHEMA}" ] } }Placeholder value (computed):
{ "if": { "properties": { "vehicle_type_id": { "enum": ["type_1", "type_3"] } }, "required": ["vehicle_type_id"] }, "then": { "required": ["current_range_meters"] } }Alternative Approach: Programmatic Validation
There's a third option that was not initially considered: programmatic validation - checking data directly in code rather than modifying schemas or using templates.
How It Works
Instead of patching schemas or using templates, run additional validation after schema validation:
New interface:
Example implementation (~30 lines vs 60 for schema patching):
Advantages
Set<String>, not stringly-typed JSONObjectsDisadvantages
schemaPathis null/N/A for programmatic checksComparison Summary
Decision Framework
The right choice depends on the project's goals:
Choose Option A (Programmatic Validation) if:
Implementation effort: 3-4 weeks
Choose Option B (Schema Templates) if:
Implementation effort: 2-3 months (including upstream contribution)
Never Choose: Current Schema Patching ❌
The current approach has no advantages over either alternative:
Conclusion
Current assessment: The schema patching approach should be replaced with one of the two alternatives.
Both alternatives are significantly better than the current approach:
Neither option requires backwards compatibility support - both allow clean migration.