release/2026 06 18#133
Open
ar7casper wants to merge 85 commits into
Open
Conversation
ar7casper
commented
Jun 22, 2026
Collaborator
- fix(parsers/c): exclude on path components in extract_all, not a substring of the abspath
- fix(parsers/php): repair the PHP call-graph builder (tag grammar, callbacks, new, scopes, imports)
- fix(parsers/python): repair 6 call-graph edge-fidelity bugs in the Python CallGraphBuilder
- Fix JS DependencyResolver call-edge fidelity
- fix(parsers/go): correct call-graph receiver classification, package resolution, dispatch, os filtering
- fix(parsers/zig): canonical call-graph API + correct call extraction and resolution
- fix(parsers/python): segment-match path exclusion/classification + resolve relative-import anchors
- fix(parsers): repair Ruby + PHP call-graph resolution
- fix(parsers/zig): align function_extractor with the real tree-sitter-zig grammar + lazy import
- fix(reachability): seed native/cross-language entry points, fix Zig reachability crash, stop silent zero-seed blackout
- fix(scanner): isolate optional-stage failures and disambiguate skip causes
- fix(parsers/go): give CodeQL DB-create the same timeout budget as analyze
- fix(parsers/php): extract procedural top-level + closure units; seed PHP entry points
- fix(parsers/c): codeql build-mode-none, optional-stage success, exclude orchestrators from pytest
- fix(c-parser): deterministic include resolution + call-graph precision/recall
- fix(parsers/js): discover and resolve .mjs/.cjs modules in context_assembler
- fix(ruby-extractor): repair 7 Ruby FunctionExtractor extraction/classification bugs
- fix(parsers): anchor is_test_file detection to path components + basename (zig/python/php/ruby)
- fix(c-parser): anchor is_test_file patterns to path segments / filename boundaries
- fix(file_io): write_json atomically (temp + os.replace) to survive interrupted writes
- fix(openant-cli): bound Python subprocess in Invoke with an automatic timeout
- fix(parsers/python): surface top-level callGraph/reverseCallGraph in analyzer_output
- fix(js-analyzer): repair JS/TS analyzer inventory, call-graph, schema and functionId bugs
- fix(reachability): make get_all_reachable() overwrite the cache so a stale bounded False is corrected
- docs(validate_dataset_schema): cite experiment.py analyze_unit() instead of drifting line numbers
- fix(progress): floor ETA remaining at 0 so retry double-counts don't show negative time
- fix(analyze): add Go --exploitable-all parity; priority-sort --limit truncation
- fix(js-parser): export FILE_BOUNDARY at module level for cross-parser parity
- fix(cli/runtime): key dependency-staleness on corePath, not only the pyproject hash
- fix(cli/config): enforce 0600 on the config file even when it pre-exists (CWE-732)
- fix(export_csv): neutralize CSV/formula injection in exported cells (CWE-1236)
- fix(context): anchor entry-point path exclusions on components, not substrings
- fix(js-parser): treat a zero-JS-file repo as an empty result, not a crash
- fix(cli): add --limit to the enhance command across all three layers
- fix(enhance/analyzer): retry 529-overloaded, single-shot checkpoint, exploitable filter + bounded agentic retry
- fix(rate_limiter): re-check the backoff deadline after sleeping (close TOCTOU)
- fix(agentic-enhancer): cap conversation input to avoid context overflow
- fix(reporter): correct build_pipeline_output return annotation + docstring to tuple[str, int]
- fix(parsers/ruby): posix-normalize call-graph file keys for cross-platform resolution
- test(parsers/c): normalize recorded path to forward slashes for Windows CI
- test(parsers/python): normalize recorded path to forward slashes for Windows CI
- test(cli/config): skip POSIX-perms test on Windows
- test(enhance): supply dummy ANTHROPIC_API_KEY so resilience tests run without a real credential
- build(deps): declare tree-sitter-zig so the zig parser imports on a clean install
- build(deps): declare tree-sitter-zig so the zig parser imports on a clean install
- fix(parsers/php): anchor file-discovery exclusion to repo-relative path components
…tring of the abspath
c/function_extractor.py extract_all skipped files via `any(excl in str(file_path) for excl in
['.git','build','test','node_modules'])` -- an unanchored substring test against the absolute path. So
files whose path merely contained a token were wrongly skipped ('src/latest/main.c' and 'contest/sol.c'
contain 'test'), and an ancestor of repo_path containing a token poisoned the whole scan (a checkout
under '/home/tester/' excluded every file). Match on path COMPONENTS relative to repo_path instead,
using c's own token set.
Scope: c's member of the cross-parser extract_all substring family. The python/php/ruby extract_all
siblings carry DIFFERENT token sets (vendor*/tmp*/venv*) and are not widened here.
Tests: tests/test_c_extract_all_path_components.py (spies process_file: files with a token in a path
*segment* are processed; real test/ and .git dirs stay excluded). RED 1 failed (all excluded via
ancestor-poison) -> GREEN 1 passed; full suite 177 passed / 63 skipped. The test loads the C
function_extractor under a unique module name via importlib so it does not pollute
sys.modules['function_extractor'] for the sibling python parser tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lbacks, new, scopes, imports) Several defects in parsers/php/call_graph_builder.py + function_extractor.py; before this the PHP call graph was effectively empty (see #1). 1. Tag grammar: the call extractor re-parsed tag-stripped function bodies with ts_php.language_php() (the tag-REQUIRING grammar), which treats tagless input as inert `text` -- so ZERO call nodes were found and the entire PHP call graph was empty. Use language_php_only(). Foundational: every fix below only produces edges because of it. 2. Higher-order callbacks: call_user_func / call_user_func_array / array_map / array_filter / array_walk / array_reduce / usort / uasort / uksort / preg_replace_callback are in PHP_BUILTINS, so _resolve_function_call returned None before inspecting their callback argument. A CALLBACK_BUILTINS map now resolves the string-literal callback ('fn', 'Class::method', or ['Class','method']) at the builtin's callback position. The outer builtin call stays filtered; variable/closure callbacks have no static target. 3. use-import node type: _extract_imports matched `use_declaration`, but tree-sitter-php emits `namespace_use_declaration`, so `use App\Service\Foo;` imports were never recorded. Match the real node type; handle grouped `use A, B;` and strip `as Alias`. 4. Import-file matching: _resolve_simple_call matched `import_name in file_path` -- an unanchored substring -- so an import 'Bar' matched 'app/BarBaz/x.php'. Match the import file name (anchored: endswith the basename `.php`, or exact). 5. new/__construct: `new Foo(...)` parses as object_creation_expression, which the traversal never visited, so constructors were untracked. Add the node type + resolve the class __construct. 6. self/static/parent scopes: in tree-sitter-php `self`/`static`/`parent` are `relative_scope` nodes, not `name`, so _resolve_scoped_call never captured the scope and ALL self::/static::/parent:: calls were silently dropped. Handle relative_scope; resolve parent:: in the superclass (from the class's `extends`, namespace-stripped), with no fall-back to the caller's own class (which would mis-link an override's parent:: call to itself). 7. Case-insensitive builtins: _is_builtin compared case-sensitively, but PHP function names are case-insensitive, so e.g. CALL_USER_FUNC was not recognized. Compare name.lower(). Scope: PHP-specific (the c/python/ruby/zig/go call-graph builders are separate implementations). Tests: tests/test_php_call_graph.py (new; the package had no php call-graph tests) -- loaded under a unique importlib name (call_graph_builder/function_extractor are basenames shared by every parser). Eight behavioral tests: tagless body produces edges, callback builtins resolve, new->__construct, parent::->superclass, parent:: with a namespaced superclass, case-insensitive builtins, anchored import match, namespace_use_declaration import. RED 8 failed (pre-fix) -> GREEN 8 passed; ruff clean; full suite 184 passed / 63 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…thon CallGraphBuilder Six independent defects in libs/openant-core/parsers/python/call_graph_builder.py, each pinned by a RED->GREEN regression test. All fixes are Python-AST-native (ast.Call, ast.Assign, ast.Name, the extractor's classes[...]['bases'] model); no code or tokens copied from the sibling go/php/ruby/zig builders, which are untouched. - Package __init__.py re-export resolution: _resolve_via_package_init follows names re-exported through a package __init__.py to their origin module (was: no __init__.py probing -> dropped edge). - Local-variable type dispatch: _collect_local_types + _resolve_class_method dispatch `v = ClassName(); v.method()` to the constructed type's method (was: routed to _resolve_module_call, which knows only import/same-file class names -> dropped edge). - super() resolution: _resolve_super_call walks the caller class's bases (cross-file) to resolve super().method() (was: stub returning None). - Deterministic output: sort callee + reverse-caller lists on emit for reproducible output (was: set-iteration order, PYTHONHASHSEED-dependent). - Higher-order callback args: _resolve_callback_args emits edges for function references passed as call arguments, e.g. sorted(xs, key=func) (was: only node.func inspected). Deliberately not addressed here (documented scope boundaries, no code change): - getattr(obj,'m')(args) is statically unresolvable string dispatch. - A chained inner().method() outer method on a Call result is unresolvable; ast.walk already captures the resolvable inner call. Tests: tests/test_python_call_graph_builder.py (unique importlib name; 5 tests covering the 6 fixed bugs). RED 5 failed (pre-fix) -> GREEN 5 passed. Suite 181 passed / 63 skipped / 0 failed (base 176/63/0). Determinism verified across PYTHONHASHSEED 0/1/12345. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dependency_resolver.js only. Four independent call-graph defects:
- Bare call `foo()` no longer binds to a class method `Class.foo`. _resolveCall
now resolves only free functions (!className) at both the same-file and
unique-simple-name sites; methods stay reachable via this./obj./Class. paths.
Mirrors the Ruby/PHP siblings.
- `const x = new C(); x.m()` now resolves to C.m via a per-function
_extractLocalTypes pre-pass + a local-var-type branch in _resolveMethodCall;
built-in constructors (Map/Set/...) skipped.
- buildCallGraph drops self-edges (calledId === funcId) from the standalone-call
regex matching a function's own name; matches the C sibling's `c != func_id`
invariant.
- CLI JSON.parse is wrapped in try/catch emitting a readable diagnostic + exit 1
instead of an uncaught V8 SyntaxError.
Deferred / not-a-defect (not fixed here):
- Import-based resolution needs typescript_analyzer.js to emit imports
(cross-cutting); repoRoot is a dead field.
- eval/Function are in a call-graph noise filter, not a security sink;
indirect_calls is a phantom field. Correct-by-design over-claim, no resolver
defect.
- The silent-drop / dead this.imports path: no consumer for an unresolvedCalls
channel; import disambiguation is cross-cutting.
The five funcId parse sites all use split(':')[0] (first colon), consistent with
the unit_generator.js id-parse fix; no resolver change was needed there.
Tests: tests/parsers/javascript/test_dependency_resolver_edges.py (Node harness on
the resolver + CLI subprocess). 7 failed pre-fix (3 control/guard tests green at
base) -> 10 passed after the fix. Full suite 227 passed / 22 skipped
(Go-binary, unrelated) / 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…resolution, dispatch, os filtering Four coupled defects in parsers/go/go_parser/callgraph.go, the Go call-graph resolver: 1. Receiver classification ignored the import table. isLikelyPackage classified a selector receiver `x` in `x.Sel()` as a package using a name-shape heuristic (lowercase first letter, len<=10) with no import context, so every short lowercase local (db, tx, ctx, w, r) was treated as a package and its method calls were misrouted to package resolution. Now a receiver is a package iff its name is one of the file's import aliases; the heuristic function is removed. 2. Package-call resolution matched the alias, not the package. resolvePackageCall resolved the import path (pkgPath := imports[pkgAlias]) but then matched candidates with strings.Contains(funcID, pkgAlias) -- the user-chosen alias. So `import u "example.com/user"` failed to resolve (the alias `u` is absent from the funcID), and any funcID merely containing the alias as a substring matched, emitting edges to unrelated packages. Now it matches the package directory (the last component of the resolved import path) against the directory the candidate's file lives in. 3. Dispatch misrouted a plain function's simple call. resolveCalls took the self/method branch when `call.Receiver == callerInfo.ClassName`, which is ""=="" for a plain function (ClassName=="") making a simple call (Receiver==""), so `greet()` from `main()` went to resolveMethodCall(name, "", file), found nothing, and the edge was lost. The branch is now guarded on callerInfo.ClassName != "", so non-method callers fall through to resolveSimpleCall. 4. The whole os package was skipped as a builtin. builtins included "os": true, so the call-extraction filter dropped os.StartProcess and other OS-level sinks before they could be seen. "os" is no longer blanket-skipped. (Such calls resolve to no in-repo target -- surfacing external sinks as graph nodes is a separate feature -- but they are no longer filtered out of the extracted call list.) Scope: the Go AST import-resolution logic is unique to go_parser; the python/php/ruby/zig call-graph builders are separate implementations and are not affected. extractor.go/types.go are unchanged. Tests: parsers/go/go_parser/callgraph_test.go (new; the package had no tests). Four tests exercise the stable extractCalls / resolvePackageCall / resolveCalls boundaries: local var not classified as a package, package resolution by directory not alias, plain-func simple call not misrouted to method resolution, and os sink not filtered as a builtin. RED 4 failed (pre-fix) -> GREEN 4 passed; go vet clean; go test ./... ok. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…and resolution Six defects in parsers/zig/call_graph_builder.py: 1. API parity: the zig builder exposed only build()->Dict + save_results(), diverging from the canonical CallGraphBuilder (c/php/python/ruby) which has build_call_graph()->None, export()->Dict, get_statistics()->Dict, get_dependencies()/get_callers(). Added those methods; build() is retained as a back-compat wrapper (build_call_graph()+export()) since the pipeline (zig/test_pipeline.py) consumes its return value. 2. Statistics: get_statistics() now also reports avg_in_degree and max_in_degree (previously only the out-degree side was emitted). 3. @call extraction: `@call(.modifier, fn, args)` parses as a `builtin_function` node, so the wrapped function `fn` was never recorded. It is now extracted as the call target (the @call builtin itself stays filtered). 4. Method-call recall + safe resolution: at base a method call `obj.method()` parses as `call_expression` over a `field_expression`, but the extractor only handled the non-existent `field_access` node type, so method calls produced no edges. field_expression callees are now extracted. And _resolve_call no longer returns ALL same-named candidates on a bare-name collision (which would link a.method() to every struct's method); when genuinely ambiguous it returns nothing. Trade-off: lower recall for ambiguous bare-name calls, higher precision (no namespace leak). 5. Import-file matching: step 2 matched `imp in candidate_file` -- an unanchored substring -- so 'util.zig' matched 'myutil.zig_x/...'. Now matches the import file name exactly (== or path-suffix). 6. Stdlib import filter: `@import("std")`/`("builtin")`/`("root")` are not file imports; they are now skipped in resolution (previously they substring-matched candidate paths). Scope: zig-specific (the c/php/python/ruby builders are separate implementations and are the parity baseline, not siblings to change). function_extractor.py is unchanged. Tests: tests/test_zig_call_graph_builder.py (new; the package had no tests, and the tree-sitter-zig grammar must be installed to run them). Four tests: API parity + in-degree stats, @call extraction, method-call extraction via field_expression, and exact-import / stdlib-filter / conservative resolution. RED 4 failed (pre-fix) -> GREEN 4 passed; ruff clean; full suite 180 passed / 63 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…solve relative-import anchors
Three independent defects in parsers/python/function_extractor.py:
1. extract_all(): the no-args scan excluded files with
`any(excl in str(file_path) for excl in [...])` -- an unanchored substring test on the full path, so
a file whose path merely contains a token ('myvenv/keep.py' contains 'venv') was silently dropped,
and an ancestor directory containing a token could exclude the whole scan. Now matches whole path
SEGMENTS: `{tokens} & set(file_path.relative_to(repo_path).parts)`. Python's own token set
(__pycache__/.git/venv/.venv/node_modules) is preserved.
2. classify_function(): classification used `'<token>' in path_lower` substring tests, so
'interviews/api.py' was classified 'view_function'. 'view_function' is in
entry_point_detector.ENTRY_POINT_TYPES (:26-32), so that misclassification became a false entry-point
seed that cascades into false reachability (consumed at entry_point_detector.py:177). The 'views'
token now matches a whole path segment via a new _path_has_segment helper. The 'middleware' token is
given the same segment fix because it shares the substring defect, but note 'middleware' (the python
label) is NOT in ENTRY_POINT_TYPES -- so that half is classification accuracy, not a reachability
change. The 'test' classifier is left as a substring on purpose (test-file conventions use 'tests/'
and 'test_*'/'*_test' forms a segment match would miss; 'test' is not an entry-point type, so it
seeds no false reachability).
3. extract_imports(): the ast.ImportFrom branch read node.module but never node.level, so relative
imports lost their package anchor ('from . import X' stored bare 'X'; 'from ..pkg import Y' stored
anchor-less 'pkg.Y'). call_graph_builder._resolve_import then rebuilt a wrong/no file path and the
edges were dropped (verified: pre-fix the candidate resolves to None, post-fix it resolves to the real
pkg/sub/helpers.py). Now reconstructs the absolute anchor from the importing file's package location
(level=1 -> own package, level=2 -> parent, ...); over-deep levels degrade to no leading dot. Absolute
imports (level=0) are unchanged.
Scope: the php/ruby function_extractor.py extract_all + classify siblings carry related defects and are
not widened here.
Tests: tests/test_python_function_extractor.py -- loads the module under a unique importlib name (the
bare 'function_extractor' name is shared by five other parsers, so a plain import would pollute
sys.modules for the rest of the suite). Three checks: segment-vs-substring exclusion, entry-point
classification by segment, and relative-import anchor reconstruction. RED 3 failed (pre-fix) -> GREEN 3
passed; full suite 179 passed / 63 skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolution defects in the Ruby and PHP call-graph builders dropped real edges and emitted false ones. Each parser is fixed parser-native with its own grammar; no token was copied between them. Ruby (parsers/ruby/call_graph_builder.py): - Module-function resolution: add a methods_by_module index + thread caller module_name. Module self/sibling and Module.method calls now resolve; a bare call from outside any module no longer leaks an edge to a module function (the same-file and unique-name fallbacks now require both class_name and module_name unset). - super: visit the bare `super` node and the `super(args)` call head; resolve to the same-named method on the superclass. - Class.new: resolve `new` to the class initialize before the builtin filter that previously dropped it. - require_relative anchoring: anchor require_relative to the caller dir and normalize ./ and ../; match require by anchored file name, replacing the unanchored substring that over-matched any path containing the import string. - send/public_send/__send__: read the literal symbol argument and resolve the dispatched method. PHP (parsers/php/call_graph_builder.py): - Cross-namespace bare-call leak: thread the caller's namespace_name and scope the unique-name fallback to the caller's namespace so a bare call no longer leaks across namespaces. PHP-native analog of the Ruby module-leak fix (different grammar/fields; implemented independently). Tests: tests/parsers/ruby/test_call_graph_builder.py (10), tests/parsers/php/test_call_graph_builder.py (2). Both modules loaded under unique importlib names (call_graph_builder.py / function_extractor.py are basenames shared by every parser). 11 failed pre-fix (1 same-namespace guard green at base by design) -> 12 passed after the fix; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…zig grammar + lazy import
Pinned against the actually installed grammar (tree-sitter-zig 1.1.2) by
AST-probing it directly; several AST node-type names had drifted. Three defects
in libs/openant-core/parsers/zig/function_extractor.py:
1. Struct extraction was dead and fields could be emitted as functions. Struct
handling gated on node types the 1.1.2 grammar never emits (VarDecl /
container_decl / ContainerDecl); the grammar emits variable_declaration and
struct_declaration, so total_classes was always 0 and the literal
container_field->method path was unreachable. Corrected the node-type names;
methods now flow through the recursive walk which only treats
function_declaration as a unit, so container_field (struct FIELDS) are never
emitted. The dedicated _extract_struct_methods helper (home of the original
container_field bug) is removed as superseded.
2. A struct method was emitted twice (qualified Type.method by the eager scan,
then bare by an unconditional recursion that never threaded current_struct).
Now thread the struct name into the recursion so a method produces the same
func_id and the dict de-duplicates instead of producing a bare twin. Also fix
a name-capture bug in _extract_function (it grabbed the last identifier,
recording `fn init(...) Point {` under the return type `Point`); it now
captures the first identifier and stops. Regression test pins that the de-dup
does not drop nested struct / method / @import.
3. tree_sitter_zig was imported at module top level and called at class-body
time, raising ImportError in any clean install lacking the package, which was
declared in neither pyproject.toml nor requirements.txt. Load the grammar
lazily on first FunctionExtractor construction with a clear error message, and
declare tree-sitter-zig>=1.1.2 in pyproject.toml + requirements.txt.
(call_graph_builder.py carries the same top-level import but is out of this
change's scope; the dependency declarations cover it too.)
New tests/parsers/zig/test_zig_extractor_followup.py (loaded via importlib unique
name because function_extractor.py recurs across parser packages). RED at base
6 failed / 1 passed; GREEN 7 passed. Full suite 183 passed, 63 skipped, 0 failed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eachability crash, stop silent zero-seed blackout
Six filed defects converge on the shared EntryPointDetector + reachability seeding
path; one is a Zig-local crash.
Central detector (utilities/agentic_enhancer/entry_point_detector.py):
- Add 'main' / 'http_handler' / 'middleware' to ENTRY_POINT_TYPES. The C/Go/Zig
parsers emit these program/web entry unit_types, but the set only knew the
Python/Express web vocabulary, so a compiled binary seeded zero entry points
(total reachability blackout).
- Add _unit_type() dual-key read (snake 'unit_type' + camel 'unitType') for
Check-1, Check-4, details and statistics. The per-parser reachable path
normalizes under camelCase 'unitType' (parsers/{c,php,ruby}/test_pipeline.py:257)
while the detector read snake-only, so Check-1/Check-4 were dead on that path
even for valid entry types. (The earlier-filed file_path/filePath locus is
phantom; the operative residual is exactly this dual-read gap.)
Zig classifier (parsers/zig/function_extractor.py):
- _classify_function: main -> 'main' (was generic 'function'), matching C/Go so a
Zig binary's entry point is now seedable.
Zig reachability filter (parsers/zig/test_pipeline.py):
- Rewrite apply_reachability_filter to the real EntryPointDetector(functions,
call_graph).detect_entry_points() / ReachabilityAnalyzer(functions,
reverse_call_graph, entry_points).get_all_reachable() contract. The old code
called a non-existent API; imports succeed (sys.path), so except ImportError
never fired and the wrong-arity TypeError crashed every Zig parse at
--processing-level reachable. Derived against Zig's own snake-case data shape
(no token-copy of the C normalizer).
Empty-seed safety-net (core/parser_adapter.py):
- A zero entry-point seed previously emptied the dataset silently (100% reduction
reported as SUCCESS) — the dominant failure for non-web library/stdlib targets.
Degrade to pass-through (units preserved, filtering NOT applied) + record a loud
warning in the filter metadata so the blackout can never be silent. The broader
generic-library seeding heuristic is architectural and out of scope.
Tests (RED->GREEN): tests/test_entry_point_detector_native_seeds.py (6),
tests/parsers/zig/test_zig_main_classification.py (3),
tests/parsers/zig/test_zig_reachability_api.py (2, reproduced the exact
TypeError), tests/test_reachability_empty_seed.py (2). 12 failed pre-fix (1
guard green at base by design) -> 13 passed after the fix. Full suite 189 passed /
63 skipped / 0 failed. go test/vet/build clean in parsers/go/go_parser (types.go
unchanged, gofmt-clean). ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…auses Three bugs in core/scanner.py, fixed together as they share the optional-stage skip/error machinery. Error-handling: the optional enhance, verify and dynamic-test stages ran inside step_context but had no inner try/except, unlike app-context and llm-reachability. step_context re-raises (core/step_report.py:57), so an optional-stage error escaped scan_repository to cli.py's blanket except and discarded the completed parse/analyze work. Each optional stage now warns and continues on failure, matching the existing app-context pattern. Required stages (parse, analyze, build-output) still propagate their errors. Skip-cause conflation: skipped_steps recorded one bare string for distinct causes (verify auto-skip vs opt-out both -> 'verify'; dynamic-test collapsed several causes -> 'dynamic-test'). Fixed additively: a new skipped_step_reasons dict on ScanResult (emitted as steps_skipped_reasons in scan.report.json) records the disambiguated cause per skipped step. The bare skipped_steps / steps_skipped strings are unchanged so existing telemetry consumers keep working. New tests/test_scanner.py (7 tests; LLM/Docker stages monkeypatched so the orchestration runs offline). Full suite 183 passed, 63 skipped, 0 failed. ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lyze go/test_pipeline.py ran `codeql database create` with timeout=600s and `codeql database analyze` with timeout=1800s. DB creation compiles the source -- the slower stage for compiled languages -- so the slower stage had the smaller budget and times out first. On a create timeout run_codeql_analysis records the codeql stage success=False and returns; run_full_pipeline prints "continuing with reachable units only" and proceeds, and apply_codeql_filter (test_pipeline.py:1009, success-gated) is skipped -- so the written dataset is reachable-only with CodeQL findings dropped. The run does report success=False / exit 1 (all_success ANDs the failed stage, :1040-1045, :1199), so an exit-code CI gate still catches it; the harm is the silently-degraded artifact for any consumer that reads the results rather than the exit code. Extract both stage timeouts to named module constants and set the create budget equal to analyze (1800s), so the create stage is no longer shortchanged. Scope: the c/php/js/ruby test_pipeline.py orchestrators carry the same inverted timeouts and are separate units -- not widened here. Tests: tests/test_go_test_pipeline_codeql_timeout.py -- two checks that read the source rather than importing the module (which does module-level sys.path manipulation and shares its name with sibling parser test_pipeline.py): the create timeout >= analyze timeout, and that both call sites pass the named constants (not inline literals, which could silently drift the budget back). RED -> GREEN; full suite 178 passed, 63 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…PHP entry points
Two extraction/seeding defects on the PHP analysis path, plus the PHP
entry-point-seeding gap they depend on. All changes are confined to the extraction
layer (parsers/php/function_extractor.py) and the entry-point detector
(utilities/agentic_enhancer/entry_point_detector.py); the PHP call-graph builder is
untouched.
1. Procedural top-level blackout:
_extract_functions_from_tree emitted units only for named definitions; top-level
procedural statements (assignments, echo, add_action(...) hook registrations)
fell through the catch-all else and produced NO unit, so a WordPress-style
plugin.php was invisible to reachability seeding. The Python parser has a
module-level synthesizer (extract_module_level_code -> unit_type='module_level');
PHP had none. Adds _extract_module_level_unit (called from process_file),
synthesising a <file>:__module__ unit from program-level statements. Handles
braceless + braced namespaces; emits nothing for files with no file-scope code.
2. PHP entry-point seeding:
entry_point_detector USER_INPUT_PATTERNS / MODULE_LEVEL_INPUT_PATTERNS were
Python/JS-only, so a PHP handler reading $_POST was never an entry point (Check 3)
and the module_level unit could not fire Check 4. Adds PHP superglobals
($_GET/$_POST/$_REQUEST/$_COOKIE/$_SERVER/$_FILES/$_ENV/$_SESSION), php://input /
filter_input, and WordPress hook idioms (add_action/add_filter/do_action/
apply_filters) for the module-level path.
3. Anonymous closures + arrow functions as units:
anonymous_function / arrow_function nodes fell through the same else and were
never modeled; the named-definition walk also did not descend into function/method
bodies, so nested closures were unreachable. Adds a closure dispatch branch
(unit_type='closure', synthetic {closure@line:col} name) and makes
function_definition / method_declaration recurse into their bodies. The
closure-DISPATCH edge ($cb() -> closure) lives in call_graph_builder.py and is out
of this file's scope; this fixes only the extraction half.
Out of scope (not fixed here):
- The use_declaration -> namespace_use_declaration node-type correction is already
handled by the existing PHP import-extraction code in _extract_imports; re-touching
it here would duplicate that change. Left untouched.
- Aliased `use Foo\Bar as Baz` -> alias-to-FQN translation lives in
call_graph_builder.py::_resolve_class_call (out of this file's scope). An
alias-capture in function_extractor alone would be unobservable (the only consumer
of the imports map is call_graph_builder.py) and would risk regressing
import-matching. No no-op change made.
Tests: tests/parsers/php/test_php_extractor.py (new; the package had no PHP
extractor tests). Modules loaded under unique importlib names (function_extractor.py
is a basename shared by every parser). Eight tests covering module_level synthesis,
no-false-positive on class-only files, PHP superglobal entry-point seeding (Check 3
+ Check 4), and closure/arrow-function unit extraction. 6 failed pre-fix (2 guard
tests green at base by design) -> 8 passed after the fix. ruff clean; full suite 184
passed / 63 skipped (suite excluding the new file is exactly 176/63 — zero
regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…de orchestrators from pytest Three defects surfaced via parsers/c/test_pipeline.py (the C pipeline orchestrator): 1. CodeQL build mode: `codeql database create` for cpp omitted --build-mode=none, so CodeQL defaulted to autobuild and silently degraded (dropping findings) on the no-build / extracted-source repos this pipeline runs on. Pass --build-mode=none. 2. Optional-stage success: overall success ANDed over ALL recorded stages, but the optional stages (CodeQL analysis/filter, reachability filter, context enhancer, exploitable filter) write success=False when they fail or are skipped -- so an optional-stage failure forced a spurious pipeline failure (exit 1). Introduce OPTIONAL_STAGES + a _compute_success() that requires only the non-optional stages. (Cross-parser family: the same all_success conjunction exists in the go/php/ruby/javascript pipelines -- separate units, not widened here.) 3. pytest collection collision: the six parsers/<lang>/test_pipeline.py files are CLI pipeline ORCHESTRATORS, not pytest tests, but share a basename, so `pytest parsers/` fails with import-file-mismatch. __init__.py does not fix it (their bare local imports -- `from repository_scanner import ...` -- make them un-importable as package modules; adding __init__.py merely moves the collision). Add a root conftest.py that excludes them from collection (collect_ignore_glob), which is correct since they are not tests. The canonical suite is unaffected (pytest.ini already scopes testpaths = tests). Scope: c-specific for #1 (go's codeql create may need its own --build-mode -- separate unit) and #2 (the all_success family is fixed per-parser); #3 is repo-wide (the conftest excludes all six orchestrators). function_extractor / other parsers unchanged. Tests: tests/test_c_pipeline.py -- (1) source-read that the C codeql create cmd passes --build-mode=none; (2) behavioral _compute_success() ignores optional-stage failures but fails on a required-stage failure (loaded via a CONTAINED importlib import that pops the polluting sibling-parser modules in a finally, since c/test_pipeline.py does bare local imports); (3) the root conftest excludes the orchestrators. RED 3 failed (pre-fix) -> GREEN; ruff clean; full suite 179 passed, 63 skipped, 0 failed; `pytest parsers/ --co` now collects cleanly (was import-file-mismatch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n/recall
c/call_graph_builder built include_map by basename/suffix match into an unordered
set, then resolved calls by first-of-set, and the regex fallback / call-name
extraction emitted several classes of false or missing edges. This repairs both
the include-resolution determinism and the resolver precision/recall in one change
to the C call-graph builder.
Deterministic, path-anchored include resolution:
Two compounding faults:
(1) bare endswith(inc) over-matched any tail (include "x.h" -> "src/prefix-x.h")
and every same-basename header repo-wide;
(2) first-match over the unordered set made the resolved callee depend on set
iteration order -> flipped across PYTHONHASHSEED.
Fix: require a path-component boundary (other_file == inc or endswith('/'+inc))
and iterate sorted(included_files) for a stable lexicographic tiebreak.
Resolver precision/recall:
- _extract_calls_regex scanned raw code, so call-shaped tokens inside // or /* */
comments or "..." / '...' literals became phantom edges. Blank out comments and
string/char literals (length/newline preserving) before the regex scan.
- obj->cb() (field_expression callee) was reduced to the bare field name and
resolved against the global free-function index, wiring the member/function-pointer
call to an unrelated free function. _extract_call_name now declines (returns None)
for field_expression -> no false edge. The resolver has no receiver-type model, so
name-only binding of a member call is wrong.
- A function passed by name as a callback argument (qsort(..., my_cmp),
pthread_create(..., worker, ...), signal(2, &handler)) produced no edge because
only the call's 'function' child was inspected. Scan the 'arguments' child for
bare-identifier / &name args that resolve to a known function and emit the
caller->callback edge. Non-function identifiers do not resolve, so data args
create no edge.
- The unique-name (and included-header and prototype) fallbacks returned a static
(file-local) function in another translation unit, violating C internal linkage.
New _is_visible_from guard requires a cross-file candidate to be non-static;
same-file definitions stay visible. is_static and file_path are already in the
extractor output.
Out of scope (no code change): recording unresolved/ambiguous direct-name calls,
field-pointer derefs, and tbl[i]() subscript-expression callees into an
indirect_calls store — that sink does not exist in any of the parsers (0 producers /
0 consumers); building it end-to-end is out of scope. Dropping these calls emits no
false edge (the safe outcome).
Tests (hermetic): tests/parsers/c/test_c_include_map_determinism.py (2; over-match
+ {F_bar,F_foo} flip across PYTHONHASHSEED 0..9) and
tests/parsers/c/test_c_call_resolution_precision.py (9). Full suite: 187 passed,
63 skipped (no regression). ruff check -> All checks passed.
Scope: C-only (grep -c 'include_map\[.*\] = set()': c=1, python/php/ruby/zig=0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sembler context_assembler.js omitted .mjs (ESM) and .cjs (CommonJS) from both its source-discovery glob (findSourceFiles) and its import-resolution extension lists. So .mjs/.cjs files were never loaded into the TypeScript program (getSourceFile -> undefined) and imports targeting them were silently unresolved -- dropped call-graph edges. The sibling repository_scanner.js already includes .mjs/.cjs in its sourceExtensions; this brings context_assembler to the same canonical set (.js/.ts/.jsx/.tsx/.mjs/.cjs). Three sites updated: - findSourceFiles discovery glob: add mjs|cjs. - import-resolution extensions: add .mjs/.cjs + /index.mjs / /index.cjs. - resolveModuleFunctionDefinition extensions (the sibling site -- it omitted even .jsx/.tsx): bring to the canonical set. Tests: tests/test_context_assembler_mjs.py (ContextAssembler.findSourceFiles discovers .mjs/.cjs; skips portably where the JS parser's node deps aren't installed). RED 1 failed (['main.js']) -> GREEN 1 passed; JS parser suite 10 passed; full venv suite 218 passed, 22 skipped, 0 failed (with node deps installed, which unlocks the otherwise-skipped JS tests). node --check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ification bugs
The Ruby tree-walk dispatched on only method/singleton_method/class/module nodes
and tracked no method visibility, so several real Ruby constructs were silently
dropped or mislabeled, degrading the function inventory and the call graph /
entry-point analysis built on it.
Fixes:
- Nested (`module Outer; module Inner`) and compact (`module Outer::Inner`) module
names were flattened/dropped — now read via the `name` field (handles constant +
scope_resolution) and concatenated as `Outer::Inner` instead of replaced, so
qualified calls resolve.
- `define_method(:sym)` do..end / { } class-body metaprogramming now emits method
units (parses as a `call` node).
- `alias_method :a, :b` (symbol + string forms) now extracted.
- `alias new old` keyword form (`alias` node) now extracted.
- Method visibility (private/protected/public) now threaded through the traversal
and stored on each unit; the arg-form `private :sym` is consumed without leaking
into the bare-marker toggle.
- Non-public controller methods (params helpers, before_action targets) classified
before the controller branch, so they are no longer over-flagged as route_handler
/ entry points; public actions unchanged.
- Top-level Sinatra route DSL (`get '/path' do..end`) now emits route_handler units.
Method-body calls are never traversed (no false positives); non-matching
top-level/class-body calls fall through to a normal child descent.
Scope / compatibility: adds a `visibility` field to emitted units (additive).
`unit_type` for controller non-public methods narrows from `route_handler` to
`private_method`/`protected_method` (correctness fix; removes the entry-point
over-claim). `_classify_function` and `_process_method_node` gained an optional
`visibility` parameter (default `public`); no public API signature break. The
arg-form `private :sym` targeted privatization is out of scope and left as a
documented no-op toggle.
Tests: tests/parsers/ruby/test_function_extractor.py. The test imports the Ruby
extractor package-qualified (from parsers.ruby.function_extractor import
FunctionExtractor) so it does not shadow the python/php modules of the same basename
in the shared pytest session. 9 failed pre-fix -> 9 passed after the fix (8 bug
tests + 1 arg-form visibility-leak regression guard). Full suite: 185 passed, 63
skipped, 0 failed. ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…name (zig/python/php/ruby)
Each parser's RepositoryScanner test-file detection used an unanchored
substring scan (`pattern in path_lower` / `pattern in part`) over its own
test-token set, so ordinary sources whose name merely contained a token
(latest_release, Contest.php, fastest.zig, contest/) were misclassified as
tests and silently pruned from scan_results['files'] when skip_tests=True.
The downstream extractor only consumes that file list, so functions in those
files vanished from the parsed dataset (silent recall loss).
Fix per parser, each keeping its OWN native token set (not copied across
parsers): match directory tokens against whole path components and filename
tokens against the basename via startswith/endswith.
- python -- dirs {test, tests}; basename prefix test_; suffix _test.py; exact conftest.py
- php (PHPUnit) -- dirs {test, tests, spec}; prefix test_/phpunit; suffix _test.php; PSR
PascalCase *Test.php matched on the original-case basename so FooTest.php is a test
while Contest.php is not
- ruby (Minitest/RSpec) -- dirs {test, tests, spec}; prefix test_; suffix _test.rb/_spec.rb
- zig -- dirs {test, tests, spec, specs} exact; basename stem prefix test_/spec_; suffix
_test.zig/_spec.zig
The zig scanner's `skip_tests` constructor default is also aligned to False (it
was the lone parser defaulting True, making the data loss active-by-default for
programmatic callers; the CLI already overrides via --skip-tests).
The C scanner's test detection and the JS scanner are out of scope here (the JS
scanner's isTestFile is already anchored). C function_extractor extract_all
exclusion is a different mechanism and is not touched.
Per-parser regression tests (importlib unique-name load, since repository_scanner.py
recurs across parser dirs) pin both directions: previously-misclassified non-test
names now return False, and genuine test files still return True. RED pre-fix:
python 3 / php 2 / ruby 2 / zig 6 failed. GREEN: 31 passed. Full suite 207 passed,
63 skipped, 0 failed. ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…me boundaries RepositoryScanner.is_test_file matched test_patterns as bare substrings (`pattern in path_lower`), so ordinary source files whose name/path merely CONTAINED a pattern were wrongly skipped as tests when skip_tests=True: - latest_value.c (contains 'test_') -> skipped - contest/foo.c (contains 'test/') -> skipped Match directory patterns (test/, tests/, fuzz/) against whole path *segments*, and filename patterns against the basename: '_test' as a stem suffix and 'test_' as a prefix. The _test suffix is matched on the stem (before the extension) so _test.cc / _test.cxx — which the old substring check also caught — stay detected, bounded by is_source_file. Adds tests/parsers/c/test_repository_scanner_is_test_file.py (RED->GREEN; both true-positive and false-positive directions, incl. the _test.cc regression case). GREEN: 2 passed (venv py3.11); ruff clean. The same substring bug exists in the python/php/ruby repository_scanner.py — handled in their own PRs (systemic family). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…terrupted writes
write_json opened the target in "w" (truncating it to 0 bytes) and then streamed json.dump.
A crash mid-serialization (SIGKILL / OOM / power loss) left the target partial or empty and
destroyed the prior good copy, so the next read_json raised JSONDecodeError -- e.g. a
corrupted checkpoint.json that breaks resume.
Serialize to a temp file in the same directory, flush + fsync, then os.replace onto the
target (atomic rename). An interrupted write leaves only the temp file (unlinked on a caught
error); the existing target is never truncated. All write_json call sites inherit the fix.
The temp uses a `.tmp` suffix (not `.json`) so a hard-crash leftover can't be miscounted by
directory scanners that do os.listdir + endswith(".json") (e.g. core/checkpoint.py).
Tests: tests/test_file_io_atomic_write.py (2 cases: interrupted write preserves the prior
file + no temp leak; normal round-trip incl. non-ASCII). RED 1 failed (JSONDecodeError) ->
GREEN 2 passed (venv py3.11). ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… timeout Invoke built the Python subprocess with exec.Command (no context, no deadline), so a hung parser blocked the CLI forever on io.Copy/cmd.Wait. The only recovery was a user-delivered SIGINT, leaving headless callers (CI, scheduled scans, checkpoint.go quiet=true) with no recovery path. Switch the single subprocess-build site to exec.CommandContext driven by context.WithTimeout(defaultInvokeTimeout) — mirroring cmd/docker.go. The Invoke signature is unchanged, so all 10 callers are unaffected and the SIGINT fast-path is kept as a secondary mechanism. Killing the process is not sufficient on its own: a descendant can hold the stdout/stderr pipe write-ends open, leaving io.Copy blocked after the parent dies. cmd.WaitDelay force-closes the inherited FDs, and a watchdog goroutine closes the pipe read-ends on ctx.Done() so the in-flight reads return. defaultInvokeTimeout defaults to 30m and is a package var so tests can shrink it. New invoke_test.go::TestInvoke_HangingSubprocessIsBoundedByTimeout runs a fake parser that sleeps past the deadline and asserts Invoke returns within a bounded window (RED: blocked the full budget; GREEN: returns at the deadline). go test ./internal/python/ ok (21 passed); go vet + gofmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…analyzer_output
generate_analyzer_output() emitted only {"functions": ...}, dropping the top-level callGraph /
reverseCallGraph that the analyzer_output schema carries. They were already computed in
call_graph_result (CallGraphBuilder.export() returns call_graph / reverse_call_graph) but were
never passed in, so the python parser's analyzer_output.json was schema-inconsistent with the
documented format.
The analyzer_output schema uses camelCase top-level keys: PARSER_UPGRADE_PLAN documents
"callGraph", and the JS pipeline emits/asserts camelCase callGraph / reverseCallGraph
(funcId -> [funcIds]). Pass call_graph_result into generate_analyzer_output (optional arg,
back-compatible) and emit those two keys from its snake_case source keys.
Scope: filePath is intentionally NOT a per-function field -- consumers (RepositoryIndex,
dependency_resolver) derive it from the func_id key via funcId.split(':')[0]. The original
filing's indirect_calls field is phantom (grep finds no such data produced anywhere) and is
excluded. The classes/imports top-level keys are a related but separately-scoped schema item.
Tests: tests/test_parse_repository_analyzer_schema.py (2 cases). RED 1 failed -> GREEN 2 passed
(venv py3.11). ruff clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… and functionId bugs
Fixes a cluster of defects in the JavaScript/TypeScript analysis pipeline,
grouped into five areas plus a local functionId parse fix. All root causes are in
the JS parser files; the c/ruby call_graph_builder.py siblings are correct parity
references and are not edited.
J1 call-edge extraction:
- extractCallsFromFunction passed funcNode.getKind() to getDescendantsOfKind
instead of ts.SyntaxKind.CallExpression, so every out-edge list was empty.
J2 AST inventory gaps (extractFunctionsFromFile now visits these shapes):
- module.exports = class {} (ClassExpression)
- anon export default function, Foo.prototype.bar=fn, this.method=fn,
Object.assign(proto, ...)
- this.x=function(){} assigned in a constructor body
- class get/set accessors were dropped because ts-morph getMethods() excludes
GetAccessor/SetAccessor. Now also iterate getGetAccessors()+getSetAccessors() at
all three class-member sites (inventory builder, callGraph builder, and the
single-member lookup), so `get x()`/`set x(v)` emit as Class.x functions WITH
callGraph companions.
- Object.defineProperty(X.prototype,"n",{value:fn})
- bare top-level CallExpression modules
- HOC const-initializers (memo/forwardRef/styled)
- barrel re-export object-literal property values
J3 Pattern-A companion:
- buildCallGraphForFile now visits the same 5 emit paths as
extractFunctionsFromFile so callGraph keeps pace with functions.
J4 call-edge content:
- walk getArguments() to record callback identifiers
- normalize callee name to an identifier (JS-native)
- bucket unresolved/dynamic calls into indirect_calls
J5 schema and framework classification:
- emit snake_case reverse_call_graph, repository, and per-function parameters
- route-handler coverage now spans Koa/Fastify on BOTH axes -- the param-shape
classifier (_hasRouteHandlerSignature) and the route-REGISTRATION walker.
Added the `route` verb to EXPRESS_VERBS and the `fastify`/`koa` receiver stems
to EXPRESS_RECEIVER_STEMS / _isPlausibleExpressReceiver (additive; existing
Express detection unchanged) so `fastify.get('/x', handler)` and `koa.get(...)`
synthesise route_handler entry-point units. Also fixed the React
(request,response) false-positive.
functionId parse:
- unit_generator.js and test_pipeline.py split the "<filePath>:<functionName>"
id on the last colon, mangling multi-colon Express ids. Switched to first-colon
split to match the dependency_resolver contract. The id format is unchanged
(backward-compatible). The dependency_resolver.js half is deferred.
Tests: RED-first tests under tests/parsers/javascript/ drive the Node analyzer /
UnitGenerator on inline fixtures, including class get/set accessors and
Fastify/Koa route registration. Suite: 245 passed, 22 skipped, 0 failed. ruff
clean; node --check clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…stale bounded False is corrected ReachabilityAnalyzer has two cache-population paths that disagree: - is_reachable_from_entry_point() is a depth-BOUNDED reverse BFS; for a function farther than max_depth hops from an entry point it caches False -- a false negative; - get_all_reachable() is an UNBOUNDED forward BFS (the complete reachable set). get_all_reachable() updated the shared _reachability_cache only for keys NOT already present, so a stale False seeded by an earlier bounded per-query call was frozen and never corrected -- the complete pass could not fix it (order-dependent wrong result). Overwrite the cache unconditionally: get_all_reachable() is authoritative. Because its forward graph is built from the same reverse edges the bounded BFS walks, this can only flip a stale False->True, never introduce a false positive. Tests: tests/test_reachability_cache.py -- a function reachable only beyond max_depth is cached False by the reverse BFS, then get_all_reachable() corrects it to True (and a disconnected node stays unreachable, a within-depth node stays reachable). RED 1 failed (stale False frozen) -> GREEN 2 passed (venv py3.11); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ead of drifting line numbers
Two comments in validate_unit() cited hardcoded experiment.py line numbers ("lines 186-196",
"line 192") that had drifted: experiment.py:186-196 is now load_application_context(), unrelated
to unit.code/files_included. The actual code/files_included handling lives in experiment.py
analyze_unit(). Replace the brittle line-number citations with the stable function-name anchor so
they cannot re-drift.
Doc/comment-only: no behavioral change, no regression test. Verified: 0 residual
"experiment.py lines N" citations remain; experiment.py analyze_unit() handles code_field
+ files_included; ruff clean (venv py3.11).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…show negative time ProgressReporter.report() does an unconditional `self.completed += 1`, so a retried unit re-reports and `completed` can exceed `total`. _estimate_remaining then computed `remaining_units = self.total - self.completed` < 0, rendering the ETA as a negative duration (e.g. "ETA ~-30s"). Floor it at max(0, ...). Tests: tests/test_progress_eta_no_underflow.py (2 cases). RED 1 failed -> GREEN 2 passed; full suite 178 passed / 63 skipped (py3.11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…truncation Two confirmed bugs on the analyze path; both fixes are additive and keep existing callers' behavior unchanged when the new code path is not taken. Go --exploitable-all missing: the Python backend (openant/cli.py analyze_p) defines both --exploitable-all and --exploitable-only in a mutually-exclusive group (-> exploitable_filter 'all'|'strict'|None), but the Go CLI defined/forwarded only --exploitable-only, so `openant analyze --exploitable-all` failed with `unknown flag`. The no-flag DEFAULT is most inclusive (analyzer runs the filter under `if exploitable_filter:`), so the omitted flag is a narrowing cost filter with no recall loss. FIX: add --exploitable-all (BoolVar + forward) mirroring the Python help; mark the two exploitable flags mutually exclusive. Extract the inline argv build into a pure buildAnalyzePyArgs helper (mirrors buildParsePyArgs) so the flag-forwarding contract is unit-testable without spawning Python. --limit drops high-value units: run_analysis truncated with a raw head-slice `units = units[:limit]` over parser-sorted alphabetical-by-path units (Doc/ before Lib/), dropping high-value code with no relevance weighting. FIX: extract _apply_limit(units, limit) that priority-sorts by enhancement security_classification (exploitable > vulnerable_internal > other) before the head-slice; stable within a tier, no-op when limit is unset, reads classification mode-agnostically (agent_context or llm_context). The bias is bounded to --limit runs; full runs pass no limit. The sibling head-slice at experiment.py:526 is separate and not touched here. Tests: cmd/analyze_flags_test.go (flag defined + forwarded via buildAnalyzePyArgs) + tests/test_analyzer_limit_priority.py (6 tests). go test ./cmd/ ok, gofmt + vet clean; python regression suite passing; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… parity
The JavaScript unit generator declared FILE_BOUNDARY as a function-local
`const` inside _assembleEnhancedCode and exported only `{ UnitGenerator }`, so
`require(...).FILE_BOUNDARY` was undefined. Every sibling parser (python:60,
php/c/ruby:35) declares this marker at module level and makes it importable;
the JS parser was the outlier, trapping the canonical boundary string in a
method body where no external consumer could reach it.
Move the declaration to module level next to the requires, reference it
unchanged inside _assembleEnhancedCode (resolves via lexical scope, output
byte-identical), and add it to module.exports. No runtime/behavior change.
Tests: new tests/parsers/javascript/test_unit_generator_file_boundary.py
(node-subprocess seam) asserts the export exists and equals the canonical
marker string. RED 2 failed pre-fix; GREEN 2 passed. Full suite 219 passed,
22 skipped, 0 failed; ruff + node --check clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pyproject hash The managed Python venv is a single global path (~/.openant/venv) shared by every worktree. The staleness check (depsStalenessAt) keyed only on pyproject.toml content, so two worktrees with identical pyproject.toml were considered "up to date" even though the venv's editable install (`pip install -e <corePath>`) pointed at whichever worktree installed last -- a binary built in worktree B would silently import worktree A's Python source. Introduce depsHash(corePath) = sha256(corePath + "\0" + pyproject.toml contents) and use it for both the install baseline stamp and the staleness comparison. A change of corePath (switching worktrees) now reads as stale and triggers a reinstall that re-points the editable install at the active source. (hashFile remains a tested file-hash utility, no longer on the staleness path.) Tests: internal/python/runtime_corepath_test.go (two worktrees with byte-identical pyproject but different corePath -> the second is detected as stale; same corePath -> not stale). RED 1 failed -> GREEN; go test ./internal/python/ ok (existing staleness + hashFile tests unaffected); gofmt + vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sts (CWE-732) config.Save() wrote the config (which may hold an API key) with os.WriteFile(path, data, 0600). os.WriteFile only applies the mode when it CREATES the file; if config.json already existed with looser permissions (e.g. 0644), the secret stayed world/group-readable after Save(). Enforce 0600 explicitly via os.Chmod after the write (CWE-732, insecure permissions for a secret-bearing file). Tests: internal/config/config_perms_test.go (a pre-existing 0644 secret config becomes 0600 after Save). RED 1 failed (0644) -> GREEN; go test ./internal/config/ ok; gofmt + go vet clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These PRs predated #69's llm-provider refactor and were never rebased, so their tests referenced removed APIs (AnthropicClient, ContextEnhancer(client=)) and #69's new registry credential probe. Updated test construction/mocking to the post-#69 API only; no assertions changed, no production code touched. Full suite: 729 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.