fix: cap wildcard import expansion to avoid token explosion by mashraf-222 · Pull Request #1951 · codeflash-ai/codeflash

mashraf-222 · 2026-04-01T16:22:00Z

Problem

Wildcard imports like import org.jooq.* expand to 870+ types, causing 5 minutes of disk I/O per function before discovering the 4000-token skeleton budget is exceeded. In jOOQ, 89% of functions (70/79) were skipped due to token overflow from wildcard imports.

The expand_wildcard_import() function globs all .java files in the package directory unconditionally, and the token budget check in get_java_imported_type_skeletons() only fires after reading each file and parsing its skeleton — by which point hundreds of files have already been read from disk.

Root Cause

context.py:933-940: Wildcard expansion happens without any count limit or early bailout.
import_resolver.py:223-252: expand_wildcard_import() returns all types unconditionally.

Fix

`import_resolver.py`

Added max_types parameter to expand_wildcard_import() for early termination
Added filter_names parameter to only include types matching a given set

`context.py`

Added MAX_WILDCARD_TYPES_UNFILTERED = 50 constant
When a wildcard expands to >50 types:
- If target code references specific types → re-expand with filter_names=priority_types (only referenced types)
- If no target types available → cap at 50 (first 50 found)
Small wildcards (<50) are expanded fully as before

This turns a 5-minute failure into <1 second resolution with only the relevant types included.

Test Coverage

New test test_large_wildcard_is_filtered_to_referenced_types:

Creates 70 types in a package (exceeds cap of 50)
Target code references only Type000 and Type001
Verifies only referenced types appear in result, not the full 70

All 4 existing edge case tests pass unchanged.

Closes CF-1085

🤖 Generated with Claude Code

…ute stalls Wildcard imports like `import org.jooq.*` expand to 870+ types, causing 5 minutes of disk I/O per function before the token budget check kicks in. 89% of jOOQ functions were skipped due to this. When a wildcard expands to >50 types, filter to only types referenced in the target method's code. This turns a 5-minute failure into a <1 second resolution with only the relevant types included. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-04-01T16:22:34Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

The prek mypy hook runs on changed files and bypasses the pyproject.toml tests/ exclude, surfacing pre-existing errors in both context.py and test_context.py that block CI for this PR. Fixes applied: - Import Language from language_enum instead of base (base re-exports are not explicit; strict mypy flags attr-defined) - Annotate _extract_class_declaration, _import_to_statement, get_java_imported_type_skeletons, and resolved_imports - Guard None start/end_line in _extract_function_source_by_lines and find_helper_functions; guard None file_path in the import skeleton loop - Drop unreachable `if not node: continue` in _extract_public_method_signatures (JavaMethodNode.node is non-nullable) - Add -> None to every test method and fix an `int | None` comparison in test_context.py All 880 Java tests pass after the change.

Rich renders the banner panel with box-drawing characters (╭, ╮, │, etc.) that cp1252 cannot decode. On Windows, subprocess.run(..., text=True) uses cp1252 by default, so decoding the child stdout raises UnicodeDecodeError and subprocess sets result.stdout to None — breaking the assertion with a misleading "argument of type 'NoneType' is not iterable". Pass encoding="utf-8" explicitly so the test passes on every platform.

mashraf-222 · 2026-04-28T16:45:03Z

Review

Bug premise verified — real. expand_wildcard_import() in codeflash/languages/java/import_resolver.py on main globs *.java unbounded (lines 223-252). The token-budget guard in codeflash/languages/java/context.py:1005 is checked AFTER per-file read + skeleton extraction + tokenization — so for org.jooq.* (~870 types) the pipeline still does 870 disk reads even though only a handful of tokens will fit the budget. The 89%-skip / 5-min-per-function observation on jOOQ is consistent with that.

Fix is well-scoped. Two-phase expansion (count-capped probe → filter-by-referenced-types OR truncate to first 50) matches the neighboring IMPORTED_SKELETON_TOKEN_BUDGET = 4000 guard style; test in test_context.py covers the happy path (70 types, 2 referenced).

CI blockers addressed in the last two commits:

prek / prek was failing on 9 pre-existing mypy errors in context.py that the prek hook surfaced because this PR touches the file — none caused by the PR itself. Fixed all of them:
- Import Language from language_enum directly (the re-export via base.py is implicit and strict mypy rejects attr-defined).
- Annotate _extract_class_declaration, _import_to_statement, get_java_imported_type_skeletons, and resolved_imports: list[ResolvedImport].
- Guard None start/end_line in _extract_function_source_by_lines and find_helper_functions; guard None resolved.file_path in the import skeleton loop.
- Drop the unreachable if not node: continue in _extract_public_method_signatures (JavaMethodNode.node is non-nullable).
- Plus -> None on all 94 tests in test_context.py and one int | None comparison fix.
  880/880 Java tests pass locally.
unit-tests (windows-latest, 3.13) — test_help_banner.py was failing on main (Rich box-drawing characters undecodable under cp1252). Added encoding="utf-8" to both subprocess.run calls — same fix as PR fix: decode help-banner test subprocess output as UTF-8 #2120.

Non-blocking gaps worth following up on: no direct unit tests for expand_wildcard_import(max_types=…) / filter_names=… on the resolver (only integration-tested); no boundary test at exactly 50/51; the priority_types-empty + overflow fallback path at context.py:953 is not tested. Worth a small follow-up.

Ready for re-review.

mashraf-222 added 2 commits April 8, 2026 17:11

Merge branch 'main' into cf-1085-cap-wildcard-import-expansion

29879f1

Merge branch 'main' into cf-1085-cap-wildcard-import-expansion

e95f701

mashraf-222 requested review from HeshamHM28, KRRT7 and misrasaurabh1 as code owners April 28, 2026 13:35

mashraf-222 mentioned this pull request Apr 28, 2026

fix: decode help-banner test subprocess output as UTF-8 #2120

Merged

Merge branch 'main' into cf-1085-cap-wildcard-import-expansion

e92e201

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cap wildcard import expansion to avoid token explosion#1951

fix: cap wildcard import expansion to avoid token explosion#1951
mashraf-222 wants to merge 6 commits intomainfrom
cf-1085-cap-wildcard-import-expansion

mashraf-222 commented Apr 1, 2026

Uh oh!

claude Bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

mashraf-222 commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mashraf-222 commented Apr 1, 2026

Problem

Root Cause

Fix

import_resolver.py

context.py

Test Coverage

Uh oh!

claude Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mashraf-222 commented Apr 28, 2026

Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`import_resolver.py`

`context.py`

claude Bot commented Apr 1, 2026 •

edited

Loading