Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WIP - This feature hasn't been extensively tested
This pull request introduces the initial release of the GTFS Diff Engine, a memory-efficient Python library and CLI for comparing two GTFS feeds and producing a structured diff conforming to the GTFS Diff v2 schema. The changes include a robust implementation of the core diff logic, a clear public API, a command-line interface, detailed documentation, and supporting scripts for end-to-end usage.
The most important changes are:
Core Functionality and API:
engine.py, exposing a singlediff_feeds()function that returns a typed Pydantic model representing the diff result. [1] [2]gtfs_definitions.py, with a helper for primary key lookup.Command-Line Interface and Tooling:
gtfs-diff) incli.py, supporting options for output file, row change cap, pretty-printing, and feed download timestamps.compare_feeds.shto automate downloading two GTFS feeds by URL and running the diff tool, with argument parsing and error handling.Documentation and Examples:
README.mdwith a comprehensive overview, installation instructions, usage examples, API reference, supported files table, output schema example, and implementation notes on memory efficiency.docs/architecture.mddetailing design goals, module structure, streaming diff algorithm, edge case handling, and future improvements.Packaging and Project Setup:
pyproject.tomlfor installation, development, and test dependencies, and sets up the CLI entry point.__init__.pyand__main__.py. [1] [2]These changes collectively deliver a ready-to-use, well-documented GTFS diff engine suitable for both programmatic and CLI-based workflows.