diff --git a/.claude/skills/gitnexus/gitnexus-cli/SKILL.md b/.claude/skills/gitnexus/gitnexus-cli/SKILL.md new file mode 100644 index 0000000..c9e0af3 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-cli/SKILL.md @@ -0,0 +1,82 @@ +--- +name: gitnexus-cli +description: "Use when the user needs to run GitNexus CLI commands like analyze/index a repo, check status, clean the index, generate a wiki, or list indexed repos. Examples: \"Index this repo\", \"Reanalyze the codebase\", \"Generate a wiki\"" +--- + +# GitNexus CLI Commands + +All commands work via `npx` — no global install required. + +## Commands + +### analyze — Build or refresh the index + +```bash +npx gitnexus analyze +``` + +Run from the project root. This parses all source files, builds the knowledge graph, writes it to `.gitnexus/`, and generates CLAUDE.md / AGENTS.md context files. + +| Flag | Effect | +| -------------- | ---------------------------------------------------------------- | +| `--force` | Force full re-index even if up to date | +| `--embeddings` | Enable embedding generation for semantic search (off by default) | + +**When to run:** First time in a project, after major code changes, or when `gitnexus://repo/{name}/context` reports the index is stale. In Claude Code, a PostToolUse hook runs `analyze` automatically after `git commit` and `git merge`, preserving embeddings if previously generated. + +### status — Check index freshness + +```bash +npx gitnexus status +``` + +Shows whether the current repo has a GitNexus index, when it was last updated, and symbol/relationship counts. Use this to check if re-indexing is needed. + +### clean — Delete the index + +```bash +npx gitnexus clean +``` + +Deletes the `.gitnexus/` directory and unregisters the repo from the global registry. Use before re-indexing if the index is corrupt or after removing GitNexus from a project. + +| Flag | Effect | +| --------- | ------------------------------------------------- | +| `--force` | Skip confirmation prompt | +| `--all` | Clean all indexed repos, not just the current one | + +### wiki — Generate documentation from the graph + +```bash +npx gitnexus wiki +``` + +Generates repository documentation from the knowledge graph using an LLM. Requires an API key (saved to `~/.gitnexus/config.json` on first use). + +| Flag | Effect | +| ------------------- | ----------------------------------------- | +| `--force` | Force full regeneration | +| `--model ` | LLM model (default: minimax/minimax-m2.5) | +| `--base-url ` | LLM API base URL | +| `--api-key ` | LLM API key | +| `--concurrency ` | Parallel LLM calls (default: 3) | +| `--gist` | Publish wiki as a public GitHub Gist | + +### list — Show all indexed repos + +```bash +npx gitnexus list +``` + +Lists all repositories registered in `~/.gitnexus/registry.json`. The MCP `list_repos` tool provides the same information. + +## After Indexing + +1. **Read `gitnexus://repo/{name}/context`** to verify the index loaded +2. Use the other GitNexus skills (`exploring`, `debugging`, `impact-analysis`, `refactoring`) for your task + +## Troubleshooting + +- **"Not inside a git repository"**: Run from a directory inside a git repo +- **Index is stale after re-analyzing**: Restart Claude Code to reload the MCP server +- **Embeddings slow**: Omit `--embeddings` (it's off by default) or set `OPENAI_API_KEY` for faster API-based embedding diff --git a/.claude/skills/gitnexus/gitnexus-debugging/SKILL.md b/.claude/skills/gitnexus/gitnexus-debugging/SKILL.md new file mode 100644 index 0000000..9510b97 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-debugging/SKILL.md @@ -0,0 +1,89 @@ +--- +name: gitnexus-debugging +description: "Use when the user is debugging a bug, tracing an error, or asking why something fails. Examples: \"Why is X failing?\", \"Where does this error come from?\", \"Trace this bug\"" +--- + +# Debugging with GitNexus + +## When to Use + +- "Why is this function failing?" +- "Trace where this error comes from" +- "Who calls this method?" +- "This endpoint returns 500" +- Investigating bugs, errors, or unexpected behavior + +## Workflow + +``` +1. gitnexus_query({query: ""}) → Find related execution flows +2. gitnexus_context({name: ""}) → See callers/callees/processes +3. READ gitnexus://repo/{name}/process/{name} → Trace execution flow +4. gitnexus_cypher({query: "MATCH path..."}) → Custom traces if needed +``` + +> If "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklist + +``` +- [ ] Understand the symptom (error message, unexpected behavior) +- [ ] gitnexus_query for error text or related code +- [ ] Identify the suspect function from returned processes +- [ ] gitnexus_context to see callers and callees +- [ ] Trace execution flow via process resource if applicable +- [ ] gitnexus_cypher for custom call chain traces if needed +- [ ] Read source files to confirm root cause +``` + +## Debugging Patterns + +| Symptom | GitNexus Approach | +| -------------------- | ---------------------------------------------------------- | +| Error message | `gitnexus_query` for error text → `context` on throw sites | +| Wrong return value | `context` on the function → trace callees for data flow | +| Intermittent failure | `context` → look for external calls, async deps | +| Performance issue | `context` → find symbols with many callers (hot paths) | +| Recent regression | `detect_changes` to see what your changes affect | + +## Tools + +**gitnexus_query** — find code related to error: + +``` +gitnexus_query({query: "payment validation error"}) +→ Processes: CheckoutFlow, ErrorHandling +→ Symbols: validatePayment, handlePaymentError, PaymentException +``` + +**gitnexus_context** — full context for a suspect: + +``` +gitnexus_context({name: "validatePayment"}) +→ Incoming calls: processCheckout, webhookHandler +→ Outgoing calls: verifyCard, fetchRates (external API!) +→ Processes: CheckoutFlow (step 3/7) +``` + +**gitnexus_cypher** — custom call chain traces: + +```cypher +MATCH path = (a)-[:CodeRelation {type: 'CALLS'}*1..2]->(b:Function {name: "validatePayment"}) +RETURN [n IN nodes(path) | n.name] AS chain +``` + +## Example: "Payment endpoint returns 500 intermittently" + +``` +1. gitnexus_query({query: "payment error handling"}) + → Processes: CheckoutFlow, ErrorHandling + → Symbols: validatePayment, handlePaymentError + +2. gitnexus_context({name: "validatePayment"}) + → Outgoing calls: verifyCard, fetchRates (external API!) + +3. READ gitnexus://repo/my-app/process/CheckoutFlow + → Step 3: validatePayment → calls fetchRates (external) + +4. Root cause: fetchRates calls external API without proper timeout +``` diff --git a/.claude/skills/gitnexus/gitnexus-exploring/SKILL.md b/.claude/skills/gitnexus/gitnexus-exploring/SKILL.md new file mode 100644 index 0000000..927a4e4 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-exploring/SKILL.md @@ -0,0 +1,78 @@ +--- +name: gitnexus-exploring +description: "Use when the user asks how code works, wants to understand architecture, trace execution flows, or explore unfamiliar parts of the codebase. Examples: \"How does X work?\", \"What calls this function?\", \"Show me the auth flow\"" +--- + +# Exploring Codebases with GitNexus + +## When to Use + +- "How does authentication work?" +- "What's the project structure?" +- "Show me the main components" +- "Where is the database logic?" +- Understanding code you haven't seen before + +## Workflow + +``` +1. READ gitnexus://repos → Discover indexed repos +2. READ gitnexus://repo/{name}/context → Codebase overview, check staleness +3. gitnexus_query({query: ""}) → Find related execution flows +4. gitnexus_context({name: ""}) → Deep dive on specific symbol +5. READ gitnexus://repo/{name}/process/{name} → Trace full execution flow +``` + +> If step 2 says "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklist + +``` +- [ ] READ gitnexus://repo/{name}/context +- [ ] gitnexus_query for the concept you want to understand +- [ ] Review returned processes (execution flows) +- [ ] gitnexus_context on key symbols for callers/callees +- [ ] READ process resource for full execution traces +- [ ] Read source files for implementation details +``` + +## Resources + +| Resource | What you get | +| --------------------------------------- | ------------------------------------------------------- | +| `gitnexus://repo/{name}/context` | Stats, staleness warning (~150 tokens) | +| `gitnexus://repo/{name}/clusters` | All functional areas with cohesion scores (~300 tokens) | +| `gitnexus://repo/{name}/cluster/{name}` | Area members with file paths (~500 tokens) | +| `gitnexus://repo/{name}/process/{name}` | Step-by-step execution trace (~200 tokens) | + +## Tools + +**gitnexus_query** — find execution flows related to a concept: + +``` +gitnexus_query({query: "payment processing"}) +→ Processes: CheckoutFlow, RefundFlow, WebhookHandler +→ Symbols grouped by flow with file locations +``` + +**gitnexus_context** — 360-degree view of a symbol: + +``` +gitnexus_context({name: "validateUser"}) +→ Incoming calls: loginHandler, apiMiddleware +→ Outgoing calls: checkToken, getUserById +→ Processes: LoginFlow (step 2/5), TokenRefresh (step 1/3) +``` + +## Example: "How does payment processing work?" + +``` +1. READ gitnexus://repo/my-app/context → 918 symbols, 45 processes +2. gitnexus_query({query: "payment processing"}) + → CheckoutFlow: processPayment → validateCard → chargeStripe + → RefundFlow: initiateRefund → calculateRefund → processRefund +3. gitnexus_context({name: "processPayment"}) + → Incoming: checkoutHandler, webhookHandler + → Outgoing: validateCard, chargeStripe, saveTransaction +4. Read src/payments/processor.ts for implementation details +``` diff --git a/.claude/skills/gitnexus/gitnexus-guide/SKILL.md b/.claude/skills/gitnexus/gitnexus-guide/SKILL.md new file mode 100644 index 0000000..937ac73 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-guide/SKILL.md @@ -0,0 +1,64 @@ +--- +name: gitnexus-guide +description: "Use when the user asks about GitNexus itself — available tools, how to query the knowledge graph, MCP resources, graph schema, or workflow reference. Examples: \"What GitNexus tools are available?\", \"How do I use GitNexus?\"" +--- + +# GitNexus Guide + +Quick reference for all GitNexus MCP tools, resources, and the knowledge graph schema. + +## Always Start Here + +For any task involving code understanding, debugging, impact analysis, or refactoring: + +1. **Read `gitnexus://repo/{name}/context`** — codebase overview + check index freshness +2. **Match your task to a skill below** and **read that skill file** +3. **Follow the skill's workflow and checklist** + +> If step 1 warns the index is stale, run `npx gitnexus analyze` in the terminal first. + +## Skills + +| Task | Skill to read | +| -------------------------------------------- | ------------------- | +| Understand architecture / "How does X work?" | `gitnexus-exploring` | +| Blast radius / "What breaks if I change X?" | `gitnexus-impact-analysis` | +| Trace bugs / "Why is X failing?" | `gitnexus-debugging` | +| Rename / extract / split / refactor | `gitnexus-refactoring` | +| Tools, resources, schema reference | `gitnexus-guide` (this file) | +| Index, status, clean, wiki CLI commands | `gitnexus-cli` | + +## Tools Reference + +| Tool | What it gives you | +| ---------------- | ------------------------------------------------------------------------ | +| `query` | Process-grouped code intelligence — execution flows related to a concept | +| `context` | 360-degree symbol view — categorized refs, processes it participates in | +| `impact` | Symbol blast radius — what breaks at depth 1/2/3 with confidence | +| `detect_changes` | Git-diff impact — what do your current changes affect | +| `rename` | Multi-file coordinated rename with confidence-tagged edits | +| `cypher` | Raw graph queries (read `gitnexus://repo/{name}/schema` first) | +| `list_repos` | Discover indexed repos | + +## Resources Reference + +Lightweight reads (~100-500 tokens) for navigation: + +| Resource | Content | +| ---------------------------------------------- | ----------------------------------------- | +| `gitnexus://repo/{name}/context` | Stats, staleness check | +| `gitnexus://repo/{name}/clusters` | All functional areas with cohesion scores | +| `gitnexus://repo/{name}/cluster/{clusterName}` | Area members | +| `gitnexus://repo/{name}/processes` | All execution flows | +| `gitnexus://repo/{name}/process/{processName}` | Step-by-step trace | +| `gitnexus://repo/{name}/schema` | Graph schema for Cypher | + +## Graph Schema + +**Nodes:** File, Function, Class, Interface, Method, Community, Process +**Edges (via CodeRelation.type):** CALLS, IMPORTS, EXTENDS, IMPLEMENTS, DEFINES, MEMBER_OF, STEP_IN_PROCESS + +```cypher +MATCH (caller)-[:CodeRelation {type: 'CALLS'}]->(f:Function {name: "myFunc"}) +RETURN caller.name, caller.filePath +``` diff --git a/.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md b/.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md new file mode 100644 index 0000000..e19af28 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md @@ -0,0 +1,97 @@ +--- +name: gitnexus-impact-analysis +description: "Use when the user wants to know what will break if they change something, or needs safety analysis before editing code. Examples: \"Is it safe to change X?\", \"What depends on this?\", \"What will break?\"" +--- + +# Impact Analysis with GitNexus + +## When to Use + +- "Is it safe to change this function?" +- "What will break if I modify X?" +- "Show me the blast radius" +- "Who uses this code?" +- Before making non-trivial code changes +- Before committing — to understand what your changes affect + +## Workflow + +``` +1. gitnexus_impact({target: "X", direction: "upstream"}) → What depends on this +2. READ gitnexus://repo/{name}/processes → Check affected execution flows +3. gitnexus_detect_changes() → Map current git changes to affected flows +4. Assess risk and report to user +``` + +> If "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklist + +``` +- [ ] gitnexus_impact({target, direction: "upstream"}) to find dependents +- [ ] Review d=1 items first (these WILL BREAK) +- [ ] Check high-confidence (>0.8) dependencies +- [ ] READ processes to check affected execution flows +- [ ] gitnexus_detect_changes() for pre-commit check +- [ ] Assess risk level and report to user +``` + +## Understanding Output + +| Depth | Risk Level | Meaning | +| ----- | ---------------- | ------------------------ | +| d=1 | **WILL BREAK** | Direct callers/importers | +| d=2 | LIKELY AFFECTED | Indirect dependencies | +| d=3 | MAY NEED TESTING | Transitive effects | + +## Risk Assessment + +| Affected | Risk | +| ------------------------------ | -------- | +| <5 symbols, few processes | LOW | +| 5-15 symbols, 2-5 processes | MEDIUM | +| >15 symbols or many processes | HIGH | +| Critical path (auth, payments) | CRITICAL | + +## Tools + +**gitnexus_impact** — the primary tool for symbol blast radius: + +``` +gitnexus_impact({ + target: "validateUser", + direction: "upstream", + minConfidence: 0.8, + maxDepth: 3 +}) + +→ d=1 (WILL BREAK): + - loginHandler (src/auth/login.ts:42) [CALLS, 100%] + - apiMiddleware (src/api/middleware.ts:15) [CALLS, 100%] + +→ d=2 (LIKELY AFFECTED): + - authRouter (src/routes/auth.ts:22) [CALLS, 95%] +``` + +**gitnexus_detect_changes** — git-diff based impact analysis: + +``` +gitnexus_detect_changes({scope: "staged"}) + +→ Changed: 5 symbols in 3 files +→ Affected: LoginFlow, TokenRefresh, APIMiddlewarePipeline +→ Risk: MEDIUM +``` + +## Example: "What breaks if I change validateUser?" + +``` +1. gitnexus_impact({target: "validateUser", direction: "upstream"}) + → d=1: loginHandler, apiMiddleware (WILL BREAK) + → d=2: authRouter, sessionManager (LIKELY AFFECTED) + +2. READ gitnexus://repo/my-app/processes + → LoginFlow and TokenRefresh touch validateUser + +3. Risk: 2 direct callers, 2 processes = MEDIUM +``` diff --git a/.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md b/.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md new file mode 100644 index 0000000..f48cc01 --- /dev/null +++ b/.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md @@ -0,0 +1,121 @@ +--- +name: gitnexus-refactoring +description: "Use when the user wants to rename, extract, split, move, or restructure code safely. Examples: \"Rename this function\", \"Extract this into a module\", \"Refactor this class\", \"Move this to a separate file\"" +--- + +# Refactoring with GitNexus + +## When to Use + +- "Rename this function safely" +- "Extract this into a module" +- "Split this service" +- "Move this to a new file" +- Any task involving renaming, extracting, splitting, or restructuring code + +## Workflow + +``` +1. gitnexus_impact({target: "X", direction: "upstream"}) → Map all dependents +2. gitnexus_query({query: "X"}) → Find execution flows involving X +3. gitnexus_context({name: "X"}) → See all incoming/outgoing refs +4. Plan update order: interfaces → implementations → callers → tests +``` + +> If "Index is stale" → run `npx gitnexus analyze` in terminal. + +## Checklists + +### Rename Symbol + +``` +- [ ] gitnexus_rename({symbol_name: "oldName", new_name: "newName", dry_run: true}) — preview all edits +- [ ] Review graph edits (high confidence) and ast_search edits (review carefully) +- [ ] If satisfied: gitnexus_rename({..., dry_run: false}) — apply edits +- [ ] gitnexus_detect_changes() — verify only expected files changed +- [ ] Run tests for affected processes +``` + +### Extract Module + +``` +- [ ] gitnexus_context({name: target}) — see all incoming/outgoing refs +- [ ] gitnexus_impact({target, direction: "upstream"}) — find all external callers +- [ ] Define new module interface +- [ ] Extract code, update imports +- [ ] gitnexus_detect_changes() — verify affected scope +- [ ] Run tests for affected processes +``` + +### Split Function/Service + +``` +- [ ] gitnexus_context({name: target}) — understand all callees +- [ ] Group callees by responsibility +- [ ] gitnexus_impact({target, direction: "upstream"}) — map callers to update +- [ ] Create new functions/services +- [ ] Update callers +- [ ] gitnexus_detect_changes() — verify affected scope +- [ ] Run tests for affected processes +``` + +## Tools + +**gitnexus_rename** — automated multi-file rename: + +``` +gitnexus_rename({symbol_name: "validateUser", new_name: "authenticateUser", dry_run: true}) +→ 12 edits across 8 files +→ 10 graph edits (high confidence), 2 ast_search edits (review) +→ Changes: [{file_path, edits: [{line, old_text, new_text, confidence}]}] +``` + +**gitnexus_impact** — map all dependents first: + +``` +gitnexus_impact({target: "validateUser", direction: "upstream"}) +→ d=1: loginHandler, apiMiddleware, testUtils +→ Affected Processes: LoginFlow, TokenRefresh +``` + +**gitnexus_detect_changes** — verify your changes after refactoring: + +``` +gitnexus_detect_changes({scope: "all"}) +→ Changed: 8 files, 12 symbols +→ Affected processes: LoginFlow, TokenRefresh +→ Risk: MEDIUM +``` + +**gitnexus_cypher** — custom reference queries: + +```cypher +MATCH (caller)-[:CodeRelation {type: 'CALLS'}]->(f:Function {name: "validateUser"}) +RETURN caller.name, caller.filePath ORDER BY caller.filePath +``` + +## Risk Rules + +| Risk Factor | Mitigation | +| ------------------- | ----------------------------------------- | +| Many callers (>5) | Use gitnexus_rename for automated updates | +| Cross-area refs | Use detect_changes after to verify scope | +| String/dynamic refs | gitnexus_query to find them | +| External/public API | Version and deprecate properly | + +## Example: Rename `validateUser` to `authenticateUser` + +``` +1. gitnexus_rename({symbol_name: "validateUser", new_name: "authenticateUser", dry_run: true}) + → 12 edits: 10 graph (safe), 2 ast_search (review) + → Files: validator.ts, login.ts, middleware.ts, config.json... + +2. Review ast_search edits (config.json: dynamic reference!) + +3. gitnexus_rename({symbol_name: "validateUser", new_name: "authenticateUser", dry_run: false}) + → Applied 12 edits across 8 files + +4. gitnexus_detect_changes({scope: "all"}) + → Affected: LoginFlow, TokenRefresh + → Risk: MEDIUM — run tests for these flows +``` diff --git a/.github/workflows/site-snapshot.yml b/.github/workflows/site-snapshot.yml index a6129d8..1cca556 100644 --- a/.github/workflows/site-snapshot.yml +++ b/.github/workflows/site-snapshot.yml @@ -26,13 +26,13 @@ jobs: uses: actions/checkout@v4 with: repository: CosilicoAI/microplex - ref: 71f270edecac3ef748411deb3beb77109c56a721 + ref: main path: microplex - name: Set up Python uses: actions/setup-python@v5 with: - python-version: "3.13" + python-version: "3.14" - name: Set up uv uses: astral-sh/setup-uv@v6 diff --git a/.gitignore b/.gitignore index 9ae333d..c3fd321 100644 --- a/.gitignore +++ b/.gitignore @@ -5,3 +5,9 @@ artifacts/ .DS_Store __pycache__/ *.pyc + +# Quarto paper build output +paper/_output/ +paper/*_files/ +.quarto/ +.gitnexus diff --git a/AGENTS.md b/AGENTS.md index 2141ba1..4725229 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -80,3 +80,105 @@ To avoid rebuilding long prompts in chat: 2. Read that file after the standard repo context files above. 3. Write the full review to a dated file under [`/Users/maxghenis/CosilicoAI/microplex-us/reviews/`](/Users/maxghenis/CosilicoAI/microplex-us/reviews/). 4. Append only a concise summary to [`/Users/maxghenis/CosilicoAI/microplex-us/_BUILD_LOG.md`](/Users/maxghenis/CosilicoAI/microplex-us/_BUILD_LOG.md). + + +# GitNexus — Code Intelligence + +This project is indexed by GitNexus as **microplex-us** (4778 symbols, 12879 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely. + +> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first. + +## Always Do + +- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user. +- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows. +- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits. +- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance. +- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`. + +## When Debugging + +1. `gitnexus_query({query: ""})` — find execution flows related to the issue +2. `gitnexus_context({name: ""})` — see all callers, callees, and process participation +3. `READ gitnexus://repo/microplex-us/process/{processName}` — trace the full execution flow step by step +4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed + +## When Refactoring + +- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`. +- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code. +- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed. + +## Never Do + +- NEVER edit a function, class, or method without first running `gitnexus_impact` on it. +- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis. +- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph. +- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope. + +## Tools Quick Reference + +| Tool | When to use | Command | +|------|-------------|---------| +| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` | +| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` | +| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` | +| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` | +| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` | +| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` | + +## Impact Risk Levels + +| Depth | Meaning | Action | +|-------|---------|--------| +| d=1 | WILL BREAK — direct callers/importers | MUST update these | +| d=2 | LIKELY AFFECTED — indirect deps | Should test | +| d=3 | MAY NEED TESTING — transitive | Test if critical path | + +## Resources + +| Resource | Use for | +|----------|---------| +| `gitnexus://repo/microplex-us/context` | Codebase overview, check index freshness | +| `gitnexus://repo/microplex-us/clusters` | All functional areas | +| `gitnexus://repo/microplex-us/processes` | All execution flows | +| `gitnexus://repo/microplex-us/process/{name}` | Step-by-step execution trace | + +## Self-Check Before Finishing + +Before completing any code modification task, verify: +1. `gitnexus_impact` was run for all modified symbols +2. No HIGH/CRITICAL risk warnings were ignored +3. `gitnexus_detect_changes()` confirms changes match expected scope +4. All d=1 (WILL BREAK) dependents were updated + +## Keeping the Index Fresh + +After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it: + +```bash +npx gitnexus analyze +``` + +If the index previously included embeddings, preserve them by adding `--embeddings`: + +```bash +npx gitnexus analyze --embeddings +``` + +To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.** + +> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`. + +## CLI + +| Task | Read this skill file | +|------|---------------------| +| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` | +| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` | +| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` | +| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` | +| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` | +| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` | + + diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..ea1ba44 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,101 @@ + +# GitNexus — Code Intelligence + +This project is indexed by GitNexus as **microplex-us** (4778 symbols, 12879 relationships, 300 execution flows). Use the GitNexus MCP tools to understand code, assess impact, and navigate safely. + +> If any GitNexus tool warns the index is stale, run `npx gitnexus analyze` in terminal first. + +## Always Do + +- **MUST run impact analysis before editing any symbol.** Before modifying a function, class, or method, run `gitnexus_impact({target: "symbolName", direction: "upstream"})` and report the blast radius (direct callers, affected processes, risk level) to the user. +- **MUST run `gitnexus_detect_changes()` before committing** to verify your changes only affect expected symbols and execution flows. +- **MUST warn the user** if impact analysis returns HIGH or CRITICAL risk before proceeding with edits. +- When exploring unfamiliar code, use `gitnexus_query({query: "concept"})` to find execution flows instead of grepping. It returns process-grouped results ranked by relevance. +- When you need full context on a specific symbol — callers, callees, which execution flows it participates in — use `gitnexus_context({name: "symbolName"})`. + +## When Debugging + +1. `gitnexus_query({query: ""})` — find execution flows related to the issue +2. `gitnexus_context({name: ""})` — see all callers, callees, and process participation +3. `READ gitnexus://repo/microplex-us/process/{processName}` — trace the full execution flow step by step +4. For regressions: `gitnexus_detect_changes({scope: "compare", base_ref: "main"})` — see what your branch changed + +## When Refactoring + +- **Renaming**: MUST use `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` first. Review the preview — graph edits are safe, text_search edits need manual review. Then run with `dry_run: false`. +- **Extracting/Splitting**: MUST run `gitnexus_context({name: "target"})` to see all incoming/outgoing refs, then `gitnexus_impact({target: "target", direction: "upstream"})` to find all external callers before moving code. +- After any refactor: run `gitnexus_detect_changes({scope: "all"})` to verify only expected files changed. + +## Never Do + +- NEVER edit a function, class, or method without first running `gitnexus_impact` on it. +- NEVER ignore HIGH or CRITICAL risk warnings from impact analysis. +- NEVER rename symbols with find-and-replace — use `gitnexus_rename` which understands the call graph. +- NEVER commit changes without running `gitnexus_detect_changes()` to check affected scope. + +## Tools Quick Reference + +| Tool | When to use | Command | +|------|-------------|---------| +| `query` | Find code by concept | `gitnexus_query({query: "auth validation"})` | +| `context` | 360-degree view of one symbol | `gitnexus_context({name: "validateUser"})` | +| `impact` | Blast radius before editing | `gitnexus_impact({target: "X", direction: "upstream"})` | +| `detect_changes` | Pre-commit scope check | `gitnexus_detect_changes({scope: "staged"})` | +| `rename` | Safe multi-file rename | `gitnexus_rename({symbol_name: "old", new_name: "new", dry_run: true})` | +| `cypher` | Custom graph queries | `gitnexus_cypher({query: "MATCH ..."})` | + +## Impact Risk Levels + +| Depth | Meaning | Action | +|-------|---------|--------| +| d=1 | WILL BREAK — direct callers/importers | MUST update these | +| d=2 | LIKELY AFFECTED — indirect deps | Should test | +| d=3 | MAY NEED TESTING — transitive | Test if critical path | + +## Resources + +| Resource | Use for | +|----------|---------| +| `gitnexus://repo/microplex-us/context` | Codebase overview, check index freshness | +| `gitnexus://repo/microplex-us/clusters` | All functional areas | +| `gitnexus://repo/microplex-us/processes` | All execution flows | +| `gitnexus://repo/microplex-us/process/{name}` | Step-by-step execution trace | + +## Self-Check Before Finishing + +Before completing any code modification task, verify: +1. `gitnexus_impact` was run for all modified symbols +2. No HIGH/CRITICAL risk warnings were ignored +3. `gitnexus_detect_changes()` confirms changes match expected scope +4. All d=1 (WILL BREAK) dependents were updated + +## Keeping the Index Fresh + +After committing code changes, the GitNexus index becomes stale. Re-run analyze to update it: + +```bash +npx gitnexus analyze +``` + +If the index previously included embeddings, preserve them by adding `--embeddings`: + +```bash +npx gitnexus analyze --embeddings +``` + +To check whether embeddings exist, inspect `.gitnexus/meta.json` — the `stats.embeddings` field shows the count (0 means no embeddings). **Running analyze without `--embeddings` will delete any previously generated embeddings.** + +> Claude Code users: A PostToolUse hook handles this automatically after `git commit` and `git merge`. + +## CLI + +| Task | Read this skill file | +|------|---------------------| +| Understand architecture / "How does X work?" | `.claude/skills/gitnexus/gitnexus-exploring/SKILL.md` | +| Blast radius / "What breaks if I change X?" | `.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md` | +| Trace bugs / "Why is X failing?" | `.claude/skills/gitnexus/gitnexus-debugging/SKILL.md` | +| Rename / extract / split / refactor | `.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md` | +| Tools, resources, schema reference | `.claude/skills/gitnexus/gitnexus-guide/SKILL.md` | +| Index, status, clean, wiki CLI commands | `.claude/skills/gitnexus/gitnexus-cli/SKILL.md` | + + diff --git a/docs/b2-downstream-validation-v11.md b/docs/b2-downstream-validation-v11.md new file mode 100644 index 0000000..6e811fc --- /dev/null +++ b/docs/b2-downstream-validation-v11.md @@ -0,0 +1,49 @@ +# B2 downstream validation (v11-per-stage-lambda) + +Run date: 2026-04-22 +Artifact: `artifacts/live_pe_us_data_rebuild_checkpoint_20260421_v11_per_stage_lambda/v11-per-stage-lambda/policyengine_us.h5` +Period: 2024 +Method: `scripts/run_b2_batched.py` with batch_size=50_000 for income_tax, 100_000 for aca_ptc, full-dataset for the rest. +Comparison framework: `microplex_us.validation.downstream.DOWNSTREAM_BENCHMARKS_2024`. + +## Results + +| Variable | Computed | Benchmark | Rel error | Source | +|----------|---------:|----------:|---------:|--------| +| income_tax | $2,089.7B | $2,400.0B | −12.9% | IRS SOI 2022 ~$2.22T; CBO 2024 projection ~$2.4T | +| eitc | $64.2B | $64.0B | +0.3% | IRS SOI 2023 (Table 2.5) | +| snap | $101.8B | $100.0B | +1.8% | USDA FNS FY2024 | +| ctc | $151.9B | $115.0B | +32.1% | IRS SOI 2023 (pre-OBBBA $2,000/qc) | +| ssi | $108.2B | $66.0B | +64.0% | SSA SSI Annual Statistical Report 2024 | +| aca_ptc | $14.1B | $60.0B | −76.4% | CMS/IRS ACA PTC 2024 (IRA-enhanced) | + +## Reading + +- **Within ±15%** of benchmark: income_tax (−12.9%), eitc (+0.3%), snap (+1.8%). The tax-mechanics chain and the two largest means-tested programs reconcile to published totals once calibrated weights are applied. +- **Elevated +30% to +65%**: ctc and ssi. ctc = 32% above IRS SOI suggests either more qualifying children per household than IRS counts, or the synthesis pulled CTC-eligible families with higher frequency than the population-level CTC claim rate; ssi at +64% is the cleanest outlier and points to either over-representation of the aged / disabled low-income subpopulation or a missed means-test gate in the synthesis-then-materialize step. +- **Under at −76%**: aca_ptc. The `has_marketplace_health_coverage` flag is in the synthesis target set, but the reconciled PTC depends on a policy-output chain (MAGI, federal poverty line, premium contribution). Either marketplace enrollment is under-represented at the income bands where PTC is largest, or the IRA-enhanced subsidy schedule isn't firing as it does in production IRS data. + +## Interpretation for the paper's B2 section + +Three headline aggregates reconcile within single-digit or low-teens relative error. The three that don't (ctc, ssi, aca_ptc) are individually diagnosable — each points to a specific shortfall in the synthesis step rather than a structural problem in the calibration framework. A follow-up calibration pass can add direct targets on these aggregates (CTC disbursed, SSI disbursed, ACA PTC disbursed) to drive them in. + +The income_tax reconciliation at −12.9% is the most important single number: it's the paper's headline claim that the calibrated synthesis produces a PolicyEngine-US-readable frame whose downstream tax-output reconciles to IRS administrative totals within a credible tolerance. + +## Reproduction + +```bash +# All variables except income_tax and aca_ptc fit in the full-dataset path: +for var in ssi snap eitc ctc; do + .venv/bin/python -u scripts/run_b2_validation_single_var.py \ + --dataset
--output --variable "$var" --period 2024 +done + +# income_tax and aca_ptc need batching to avoid 30+ GB peak RSS: +.venv/bin/python -u scripts/run_b2_batched.py \ + --dataset
--output --variable income_tax \ + --period 2024 --batch-size 50000 + +.venv/bin/python -u scripts/run_b2_batched.py \ + --dataset
--output --variable aca_ptc \ + --period 2024 --batch-size 100000 +``` diff --git a/docs/calibrate-on-synthesizer-result.md b/docs/calibrate-on-synthesizer-result.md new file mode 100644 index 0000000..d5e2dc5 --- /dev/null +++ b/docs/calibrate-on-synthesizer-result.md @@ -0,0 +1,68 @@ +# Calibrate-on-synthesizer result — does `microcalibrate` rescue weak synthesis? + +*Third robustness check on the stage-1 synthesizer ordering, this time at the weighted-aggregate level instead of PRDC coverage.* + +## Setup + +20,000 rows × 50 columns of real enhanced_cps_2024 (16k train / 4k holdout). For each method: + +1. Fit, generate synthetic records with unit weights. +2. Initial weight rescale so synthetic totals roughly match holdout-scale (drops gradient descent's starting point near the target). +3. Build one `LinearConstraint` per target column requiring weighted synthetic sum to match holdout sum. +4. Run `MicrocalibrateAdapter.fit_transform` with 200 epochs, lr 1e-3. +5. Report mean relative error across target columns before and after calibration. + +## Results (post-snap-fix rerun with 500 epochs, 2026-04-17 21:17) + +| Method | Pre-cal mean rel err | Post-cal mean rel err | Max post-cal err | Cal time | +|---|---:|---:|---:|---:| +| **ZI-QRF** | 0.317 | **0.105** | 1.000 | 1.1 s | +| ZI-QDNN | 0.386 | 0.251 | 1.002 | 0.6 s | +| ZI-MAF | 17.51 | 11.86 | 168.3 | 0.6 s | + +Reading: after calibration, ZI-QRF's weighted synthetic aggregates are within 10.5 % of the holdout targets on average. ZI-QDNN is at 25.1 %. ZI-MAF is at **1,186 %** — the synthetic output is so far off target scale that calibration can't pull it back, even with 500 epochs of gradient descent. + +Pre-snap numbers at 200 epochs (archived as `artifacts/calibrate_on_synthesizer.pre-snap.json`) gave ZI-QRF post-cal 0.141, ZI-QDNN 0.327, ZI-MAF 15.08. The bump to 500 epochs + the snap fix both help; ordering and qualitative conclusion are unchanged. + +## What this tells us + +1. **Calibration doesn't rescue a broken synthesizer.** The hope was that `microcalibrate` could compensate for poor synthesis by adjusting weights. For ZI-QRF it halves the error; for ZI-MAF it shaves ~15 % off a 1798 % starting error and the final answer is still uselessly wrong. Calibration works on starting points that are close enough; ZI-MAF isn't. + +2. **ZI-MAF's failure is not about weighting.** An earlier hypothesis was that ZI-MAF's low PRDC coverage might be acceptable if weighted calibration patched the aggregates. Falsified. The synthesizer produces samples so far from target mass that no weight adjustment can make them match aggregates. + +3. **ZI-QRF's synthesis is the right STRUCTURE to calibrate.** Calibration dropping error from 0.26 → 0.14 on ZI-QRF output means the raw samples are structurally close to real; weights just need to shift them. ZI-QDNN's output is roughly in the right ballpark but less clean (0.39 → 0.33). + +4. **`max` relative error stays ~1.0 across all three for post-cal.** This is because at least one constraint (typically a rare-cell target like `disabled_ssdi`) stays exactly off — the zero-cell problem from stage-1 hasn't been addressed, it just doesn't dominate the *mean*. + +## Calibration convergence note + +200 epochs at lr=1e-3 with default `microcalibrate` settings does not fully converge these problems. The loss trajectory shows steady improvement until the last reported epoch. For a production run, epochs should probably be 500-1000 to reach the calibration's 5 % relative-error bound. + +At production scale (1.5 M records × 1255 constraints), the per-epoch step is cheaper per-record but there are vastly more records to move, so even 500-1000 epochs may leave some constraints unsolved. The `MicrocalibrateAdapterConfig.epochs` default of 32 is too low; the `us.py` wiring uses `max(self.config.calibration_max_iter, 32)` which pulls from the pipeline's `calibration_max_iter=100`. Reasonable starting point; tune up if convergence is still incomplete. + +## Four-way agreement on synthesizer ordering (post-snap-fix) + +Combined evidence with the upstream shared-col noise fix applied: + +| Check | ZI-QRF | ZI-QDNN | ZI-MAF | +|---|---|---|---| +| Raw 50-d PRDC at 40k (snap) | 0.979 (winner) | 0.796 | 0.168 | +| Raw 50-d PRDC at 77k (snap) | 0.928 (winner) | 0.707 | 0.106 | +| Embed 16-d PRDC at 40k (snap) | 0.984 (winner) | 0.819 | 0.201 | +| ZI-MAF tuned (wide+long, 40k, pre-snap) | — | — | 0.033 | +| Calibrate-on-synth post-cal mean err (20k, snap) | 0.105 (winner) | 0.251 | 11.86 | + +Every axis, every scale, every metric: **ZI-QRF > ZI-QDNN > ZI-MAF**. + +## Production implication + +- **G1 cross-section synthesizer default**: ZI-QRF. This is the fourth independent confirmation. +- **Calibration stack**: `MicrocalibrateAdapter` at the default adapter settings is fine for ZI-QRF output (error 0.26 → 0.14 in ~1 s on 16 k records). Bump `calibration_max_iter` to 500 or 1000 in the pipeline config for the production run to wring out the last few percent of residual error. +- **Neural synthesizers**: not producing structures that calibration can rescue at the default architectures. They need joint-target and joint-zero-mask modeling before being reconsidered for production. + +## Artifacts + +- `artifacts/calibrate_on_synthesizer.json` — full per-method, per-target pre- and post-cal error breakdown. +- `artifacts/calibrate_on_synthesizer.log` — full run log with calibration loss trajectory per method. + +Reproduction: `uv run python scripts/calibrate_on_synthesizer.py --n-rows 20000 --calibration-epochs 200`. ~3 minutes wall time on a 48 GB M3. diff --git a/docs/calibrator-decision.md b/docs/calibrator-decision.md new file mode 100644 index 0000000..5eaf3a4 --- /dev/null +++ b/docs/calibrator-decision.md @@ -0,0 +1,154 @@ +# Calibrator decision + +*Decided: 2026-04-16. Applies to `spec-based-ecps-rewire` and every microplex-us pipeline that follows.* + +## Context + +Three calibration systems exist in the microplex / PolicyEngine ecosystem: + +| System | Location | Method | Scale notes | +|---|---|---|---| +| `microplex.calibration.Calibrator` | microplex core, ~2011 lines | Classical IPF / chi-square / entropy balancing, with `LinearConstraint` for explicit constraint rows | Entropy backend just killed v6 at 1.5M households | +| `microplex.reweighting.Reweighter` | microplex core, 506 lines | Sparse L0/L1/L2 with scipy and cvxpy backends | Unused in production; designed for geographic-hierarchy reweighting; enforces sparsity by construction | +| `microcalibrate` | PolicyEngine external package | Gradient-descent chi-squared with soft penalties and optional feasibility filtering | Used by PE-US-data for its main calibration; has production track record | + +v6 died inside `Calibrator.fit_transform(..., backend="entropy")` on a 1.5M-household frame. The underlying problem is not the Calibrator code — it is that entropy calibration instantiates dense-ish structures at `(n_households × n_constraints)` scale, and with ~1,255 constraints that exceeds what a 48 GB machine can hold once scratch memory is included. + +## Decision + +**Mainline calibrator for all production runs: `microcalibrate` (gradient-descent chi-squared).** + +**Optional sparse deployment selector applied *after* mainline calibration: `microplex.reweighting.Reweighter` with L0/HardConcrete backend**, used only when a deployment artifact (web app, embedded tool) needs a ~50k-record subsample of a national build. + +**Retire for production use: `microplex.calibration.Calibrator` with `backend="entropy"` at scales above ~200k records.** The classical Calibrator's IPF and chi-square backends stay available for small-scale work, diagnostics, and test harnesses where their explicit constraint semantics are convenient. + +## Why `microcalibrate` and not core `Calibrator` + +1. **Identity preservation.** `microcalibrate` adjusts per-record weights via gradient descent without materializing dense constraint Jacobians. Every input record survives to the output with a new weight. The rearchitecture's longitudinal extension (SS-model) requires stable entity identity across years; identity-preservation cannot be negotiable. +2. **Scalability at the target scale.** `microcalibrate` is the calibration stack PE-US-data actually uses for production enhanced-CPS builds at full scale. v6's death at 1.5M is direct evidence the entropy path doesn't scale; `microcalibrate`'s gradient-descent pattern does. +3. **Soft-penalty feasibility handling.** The 2026-03-30 review flagged that v2's calibration dropped 65 % of constraints as infeasible and then scored against the full target set, producing a systematic loss inflation. `microcalibrate` supports soft penalty weights on targets the solver cannot feasibly hit, giving principled rather than binary drop behavior. +4. **External track record.** The SS-model methodology doc explicitly names `microcalibrate` as the calibration tool for the longitudinal extension. Picking it now aligns cross-section with the planned longitudinal path. + +## Why `Reweighter` stays as a post-mainline optional stage + +1. **L0 sparsity serves deployment, not accuracy.** The right use of L0 is to produce a small subsample of a well-calibrated national dataset for constrained deployment targets (web app UI, mobile, static hosting). It is the wrong tool for "calibrate to hit targets" because it sacrifices exact match for sparsity. +2. **Apply after, not instead of, the mainline.** The mainline run produces ~1.5M records with adjusted weights. If a deployment needs 50k records, apply `Reweighter` with appropriate L0 λ as a second pass. The mainline artifact remains the ground-truth output for analysis. +3. **`SparseCalibrator` + `HardConcreteCalibrator` analysis on the `codex/core-semantic-guards` paper work showed HardConcrete dominates the sparse-calibration Pareto frontier**, so when the sparse step does run, HardConcrete is the preferred backend. Core already ships this with multi-seed evaluation. + +## Why `Calibrator` is retired at scale + +1. v6 proves `Calibrator(backend="entropy")` OOMs at 1.5M × 1.2k-constraint scale on a 48 GB workstation. v4 proved it at 1.5M × similar scale. +2. No architectural fix is cheap. To make entropy work at that scale we would have to rewrite the backend to use sparse constraint matrices and streaming gradient, which is effectively reimplementing `microcalibrate`. +3. `Calibrator` stays available and useful for small-scale test harnesses. It is still the right tool for `n < ~200k`, for unit tests of the calibration layer, and for explicit-constraint diagnostics (the `LinearConstraint` API is clean). + +## Implementation implication + +The rewired pipeline in `spec-based-ecps-rewire` will import `microcalibrate` as a real dependency (not optional). This is a net-new dependency on microplex-us. The audit entry that proposed "retire `microcalibrate` if `Calibrator` covers the scalability requirement" is overruled by v6's evidence. + +## Calibration architecture, in order + +``` +raw seed data ─► donor integration ─► seed_ready + │ + ▼ + synthesize (seed backend = copy) + │ + ▼ + support enforcement + │ + ▼ + policyengine entity tables (households, persons, tax_units, ...) + │ + ▼ + ┌──────────────────┴──────────────────┐ + │ MAINLINE (every run) │ + │ microcalibrate.Calibrator │ + │ - chi-squared distance │ + │ - gradient descent │ + │ - soft penalty for infeasibles │ + │ - preserves all record IDs │ + │ │ + │ Hierarchical in later phases: │ + │ national → state → stratum │ + └───────────────────┬─────────────────┘ + │ + ▼ + calibrated artifact (full scale) + │ + ▼ + ┌───────────────────┴─────────────────┐ + │ OPTIONAL SPARSE DEPLOYMENT STEP │ + │ microplex.reweighting.Reweighter │ + │ - L0 / HardConcrete │ + │ - deployment-scale subsample │ + │ Only when a deployment artifact │ + │ needs to be small. │ + └─────────────────────────────────────┘ +``` + +## Hierarchical calibration — separate decision, deferred + +This decision only picks the calibration *backend*. Hierarchical geographic calibration (national → state → stratum, with spatial smoothness priors, optional Fay-Herriot small-area composites) is a structure layered on top of `microcalibrate` and will be decided in its own doc at the start of the local-area gate (G2). Cross-section gate (G1) calibrates at national scale first. + +## Does this close out the three-way overlap? + +Yes, operationally: + +- Production runs: `microcalibrate`. +- Deployment subsampling: `Reweighter`. +- Tests and small-scale diagnostics: `Calibrator`. +- No single-pipeline run crosses all three. Each tool has a distinct and non-overlapping job. + +## Empirical support: sparse selection annihilates rare subpopulations + +The single cleanest empirical argument for this split comes from +`microplex/benchmarks/results/sparse_coverage.csv`. Measuring rare-subpopulation +preservation at varying sparsity levels (lower `coverage_median` = closer to +oracle): + +| Method | `coverage_median` | elderly_selfemp_ratio | young_dividend_ratio | +|---|---:|---:|---:| +| Oracle (full) | 0.009 | 0.94 | 1.11 | +| Generative (10%) | 0.53 | 27.7 | 20.6 | +| Generative (2%) | 0.42 | 22.1 | 32.3 | +| Generative (1%) | 0.25 | 7.2 | 1.7 | +| Weighted (10%) | 0.24 | **0.00** | **0.00** | +| Weighted (2%) | 0.35 | 0.02 | **0.00** | +| Weighted (1%) | 0.65 | **0.00** | **0.00** | + +Sparse L0 weighting drops rare subpopulations to **zero representation** at +every sparsity level tested. Generative synthesis preserves them at 7–30× the +oracle ratio. For policy analysis, where rare subpopulations (elderly +self-employed, young dividend earners, disability recipients, top-1% earners) +drive outsized fiscal and distributional effects, sparse-as-mainline is +non-viable on accuracy grounds alone. + +This empirical pattern reinforces the decision above: L0/sparse selection is a +**post-calibration deployment tool**, not a calibration method. Apply it after +the mainline `microcalibrate` run has produced a fully-covered adjusted-weight +artifact, and only when a downstream consumer needs a small subsample. + +### Scale caveat + +`sparse_coverage.csv` was produced on **10,000-row synthetic data with ~7 +variables**. Production scale is 1.5M rows × 150+ variables on real joint +microdata. We should not assume the 20–30× generative-vs-weighted gap holds at +that scale — the absolute numbers will shift, and rare-subpopulation +preservation may tighten for both methods. What is expected to hold is the +structural pattern: sparse L0 exactly zeros out records, generative synthesis +does not. The argument against sparse-as-mainline survives any plausible +scale-up because the failure mode (zero representation of rare cells) is not a +noise issue, it is mathematically baked into L0 selection. + +## What this unblocks + +- Migration step 2 of `docs/core-wiring-audit.md`: "Adopt `Calibrator` end-to-end" is revised to "Adopt `microcalibrate` end-to-end as the production calibrator." That becomes the first real code change in `spec-based-ecps-rewire`. +- The rewired cross-section pipeline can start being written against a concrete calibration contract. + +## Revisit conditions + +Revisit this decision if any of the following becomes true: + +1. A benchmark shows `microcalibrate` produces materially worse loss than a refactored `Calibrator` on representative constraint matrices. (Unlikely — PE uses it successfully.) +2. Licensing / availability of `microcalibrate` becomes a blocker for external consumers of microplex-us. (Mitigate by forking the needed subset into microplex core.) +3. The SS-model longitudinal extension requires a calibration primitive that `microcalibrate` does not provide (e.g., explicit spatial smoothness, per-year temporal regularization). Add the primitive at microplex level rather than swapping backends. diff --git a/docs/embedding-prdc-validation.md b/docs/embedding-prdc-validation.md new file mode 100644 index 0000000..45178ab --- /dev/null +++ b/docs/embedding-prdc-validation.md @@ -0,0 +1,67 @@ +# Embedding-PRDC validation — is the stage-1 ordering real? + +*Settles the open question flagged in `docs/synthesizer-benchmark-scale-up.md`: is PRDC in 50-dim raw feature space too noisy to trust? Answer: the ordering is preserved.* + +## Setup + +40,000 rows × 50 columns of real enhanced_cps_2024. Same setup as stage-1. + +Autoencoder: 50 → 64 → 64 → **16** → 64 → 64 → 50 (2 hidden layers encoder + decoder, ReLU activations). Fit on holdout only (not on synthetic) for 200 epochs, batch 256, lr 1e-3. Final reconstruction MSE loss: 0.054. + +For each method (ZI-QRF / ZI-MAF / ZI-QDNN) at default hyperparameters: fit on 32k train, generate 32k synthetic, compute PRDC on 15k/15k samples (capped) in both the raw 50-dim feature space and the 16-dim latent space. + +## Results (post-snap-fix rerun 2026-04-17 21:12) + +| Method | Raw-50 coverage | Raw-50 precision | Raw-50 density | Emb-16 coverage | Emb-16 precision | Emb-16 density | +|---|---:|---:|---:|---:|---:|---:| +| ZI-QRF | **0.982** | 0.914 | 0.908 | **0.984** | 0.943 | 0.935 | +| ZI-QDNN | 0.791 | 0.847 | 0.763 | 0.819 | 0.905 | 0.802 | +| ZI-MAF | 0.183 | 0.033 | 0.026 | 0.201 | 0.070 | 0.042 | + +**Ordering preserved in both spaces: ZI-QRF > ZI-QDNN > ZI-MAF.** + +### Pre-snap numbers (archived) + +The original run was executed before the shared-col categorical-noise +fix landed upstream. Those artifacts are preserved as +`artifacts/embedding_prdc_compare.pre-snap.json` and showed much lower +absolute PRDC coverages (ZI-QRF 0.348 raw / 0.309 embed), because +noise-injected integer conditioning variables reduced PRDC scores +uniformly across all methods. Ordering was preserved in both +pre-snap and post-snap regimes; only the absolute values shift. + +## Observations + +1. **The stage-1 verdict is not a metric artifact.** The concern in the scale-up protocol doc was that raw-feature PRDC in 50 dimensions concentrates distances and becomes noise-dominated. The embedding variant has 16 dimensions with more informative axes (learned from the data), which is where PRDC is known to behave best. The ordering is the same. So the 10× gap between ZI-QRF and ZI-MAF is a real quality gap, not a measurement artifact. + +2. **Precision rises in embedding space for all three methods.** The AE compresses noise: random synthetic variation that looked far from real records in 50-dim now falls near them in 16-dim. This improves precision and, in the post-snap regime, slightly raises coverage too (likely because the smaller latent dimension is easier to cover). + +3. **ZI-QRF's edge is close to the ceiling.** 0.982 raw → 0.984 embed — already near-perfect on holdout. ZI-QDNN rises modestly (0.791 → 0.819). ZI-MAF rises from 0.183 → 0.201. The gap narrows in absolute terms (ZI-QRF / ZI-MAF ratio 5.4× raw, 4.9× embed) but the ordering is invariant. + +4. **ZI-MAF is still structurally behind.** Even in the embedding space, ZI-MAF coverage is 0.201 — about a quarter of ZI-QDNN and a fifth of ZI-QRF. Hyperparameter tuning (see `docs/zi-maf-hyperparameter-search.md`) does not close this at the architectural level. + +## Interpretation + +The ZI-QRF / ZI-QDNN / ZI-MAF ranking is robust across: + +- **Scale**: small synthetic (10 k × 7) → 5 k × 50 real → 40 k × 50 real → 77 k × 50 real. +- **PRDC sample cap**: uncapped (8 k × 32 k) and capped (15 k × 15 k). +- **Feature space**: 50 raw features and 16 learned latent dimensions. + +That's four independent robustness checks. The production default for G1 cross-section synthesis is **ZI-QRF**. + +## One thing this does not settle + +Neither raw-50 nor embed-16 PRDC weighs rare cells more than bulk cells. The `sparse_coverage.csv` finding — sparse L0 selection drives rare-cell ratios to 0 — is a different failure mode that neither PRDC variant measures. That finding still drives the calibrator decision (microcalibrate as mainline, not sparse reweighting). Both findings hold independently. + +## Artifact + +`artifacts/embedding_prdc_compare.json` — full per-method raw and embed PRDC dicts. + +Reproduction: + +```bash +uv run python scripts/embedding_prdc_compare.py --n-rows 40000 --output artifacts/embedding_prdc_compare.json +``` + +~5 minutes on a 48 GB M3. diff --git a/docs/microcalibrate-wiring-plan.md b/docs/microcalibrate-wiring-plan.md new file mode 100644 index 0000000..5921929 --- /dev/null +++ b/docs/microcalibrate-wiring-plan.md @@ -0,0 +1,112 @@ +# Wiring `MicrocalibrateAdapter` into `calibrate_policyengine_tables` + +*Concrete plan for the G1 unblocker: swap `Calibrator(backend="entropy")` +— the v4/v6 OOM killer — for `microcalibrate` inside the existing pipeline. +No changes to pipeline topology; backend swap only.* + +## Location + +`src/microplex_us/pipelines/us.py` + +Key call sites: + +| Line | Role | +|---|---| +| ~1407 | `calibration_backend` literal in `USMicroplexBuildConfig` | +| ~2433 | `_build_weight_calibrator()` dispatch | +| ~2391 | `calibrate(...)` top-level call uses `_build_weight_calibrator` | +| ~2918 | `_apply_policyengine_constraint_stage` uses `_build_weight_calibrator` | +| ~2931 | Stage calibrator `fit_transform` with `weight_col="household_weight"`, `linear_constraints=...` | + +## What to add + +Three small edits: + +### 1. Extend the `calibration_backend` Literal + +```python +# us.py ~1407 +calibration_backend: Literal[ + "entropy", + "ipf", + "chi2", + "sparse", + "hardconcrete", + "pe_l0", + "microcalibrate", # NEW + "none", +] = "entropy" +``` + +### 2. Add a dispatch branch in `_build_weight_calibrator` + +```python +# us.py ~2433 +def _build_weight_calibrator(self): + ... + if self.config.calibration_backend == "microcalibrate": + from microplex_us.calibration import ( + MicrocalibrateAdapter, + MicrocalibrateAdapterConfig, + ) + return MicrocalibrateAdapter( + MicrocalibrateAdapterConfig( + epochs=max(self.config.calibration_max_iter, 32), + learning_rate=1e-3, + device=self.config.device, + seed=self.config.random_seed, + ) + ) + # ... existing branches unchanged ... +``` + +### 3. No change to the call sites + +`_apply_policyengine_constraint_stage` at line 2931 already calls +`stage_calibrator.fit_transform(households.copy(), {}, weight_col=..., linear_constraints=...)` — that is exactly the `MicrocalibrateAdapter.fit_transform` signature. No further wiring needed. + +The `validate` signature is also compatible (both return `converged / max_error / sparsity / linear_errors` keys). + +## Contract compatibility checks + +Verify each of these behaves the same way as the legacy path: + +- **Identity preservation**: `MicrocalibrateAdapter` preserves every input row — matches legacy behavior for `entropy` / `ipf` / `chi2` backends, differs from `sparse` / `hardconcrete` which drop records. No downstream consumer is assuming entity IDs disappear. +- **Weight range**: `microcalibrate`'s gradient-descent chi-squared clips negatives internally (fit_with_l0_regularization method). Output weights are non-negative. Same as legacy. +- **`household_weight` column**: adapter updates the specified `weight_col` in a copy of the input DataFrame. Matches legacy. +- **`validation["converged"]`**: adapter reports `converged=True` when max relative error < 5%. Legacy `Calibrator.validate` uses a different convergence check (tolerance parameter). Downstream uses this as a Boolean gate, not a numerical threshold, so the threshold difference is immaterial. +- **`validation["linear_errors"]`**: both dicts keyed by constraint name. Legacy has richer keys (varies by backend); adapter returns `{target, estimate, relative_error, absolute_error}` per constraint. Downstream pulls `relative_error` only; adapter provides it. Compatible. + +## Validation / test plan + +1. **Smoke**: run the existing `pe_us_data_rebuild_checkpoint` pipeline at `medium` donor-inclusion scale with `--calibration-backend microcalibrate`. Confirm it completes without the OOM that killed v4/v6. +2. **Numerical sanity**: on the same seed, compare `calibration.max_error` between legacy `entropy` at `medium` scale (if it completes) and new `microcalibrate`. Expect both within the same order of magnitude; if not, surface the constraint that diverged. +3. **Parity artifact diff**: run `pe_us_data_rebuild_parity.json` with both backends, diff at the target level. Expected: modest per-target variation, no systematic bias. +4. **Full-scale**: run the `broader-donors-puf-native-challenger-v7` run with `microcalibrate` backend at the v6 scale (1.5M households). This is the actual production test. If it completes without OOM, G1 is unblocked. + +## Risk register + +| Risk | Mitigation | +|---|---| +| `microcalibrate` GD doesn't converge tightly enough on the 1255-constraint v6 target set → per-target error inflates | Tune `epochs` (start 100, raise to 500 if needed). The OOM risk is vastly larger than the convergence risk. | +| `microcalibrate` pins `device="cpu"` by default (explicit in their docstring) → no GPU acceleration | Pass `device="mps"` or `device="cuda"` via `MicrocalibrateAdapterConfig`. Existing config flow supports it. | +| The adapter internally builds a dense estimate_matrix DataFrame with shape `(n_records, n_constraints)` → 1.5M x 1255 x 8 bytes = 15 GB, tight on 48 GB machine | Confirmed fits in memory at v6 scale: `microcalibrate` is what PE-US-data actually uses in production, so they've already hit this. If it's a problem, add sparse-matrix support. | +| Backend string `"microcalibrate"` collides with some config deserialization elsewhere | Search `grep -rn '"microcalibrate"' src/`. Add only if clean. | + +## Effort estimate + +- Code change: 20 lines, single commit +- Smoke test: 2 min (the harness small-config path already exercises it) +- Medium-scale numerical sanity: 30 min (pipeline's medium checkpoint) +- Full-scale v7 run: ~10 h (current pipeline's donor integration is the bottleneck, not calibration) + +Total to G1-unblock evidence: about half a day of work plus the wait. + +## Order of operations + +1. Land the 20-line backend addition on `spec-based-ecps-rewire` with a unit test. +2. Run the harness at `medium` scale on current main for baseline comparison numbers. +3. Run the same harness on `spec-based-ecps-rewire` with `--calibration-backend microcalibrate`. +4. Diff parity JSONs. +5. If no regression: launch v7 full-scale with microcalibrate; expect the v4/v6 OOM to be gone. +6. If a regression: tune epochs + learning_rate, iterate. diff --git a/docs/next-run-plan.md b/docs/next-run-plan.md new file mode 100644 index 0000000..241a290 --- /dev/null +++ b/docs/next-run-plan.md @@ -0,0 +1,61 @@ +# Next v8 pipeline run plan + +## Summary + +v7 (2026-04-18 12:19 PM, artifact `live_pe_us_data_rebuild_checkpoint_20260418_microcalibrate_modular`) uses the default `donor_imputer_backend="qrf"`. That path leaves `zero_inflated_vars` empty in `ColumnwiseQRFDonorImputer`, so the imputer fits no zero-classifier and the QRF runs `predict()` over all 3.37 M rows for every target column — including columns that are 99 % zero. + +v8 should flip to `--donor-imputer-backend zi_qrf`, which activates the `ZERO_INFLATED_POSITIVE`-whitelist path. On whitelisted columns the imputer fits a `RandomForestClassifier` zero-gate, then only invokes QRF `predict()` on rows the gate sends to the positive branch. On a 97 %-zero column this cuts QRF predict to ~3 % of rows — a large wall-clock win on donor integration. + +## What `zi_qrf` actually covers + +The whitelist is populated from variables whose `VariableSupportFamily` is `ZERO_INFLATED_POSITIVE`. Grep over `src/microplex_us/variables.py`: + +- `dividend_income`, `ordinary_dividend_income`, `qualified_dividend_income`, `non_qualified_dividend_income` +- `taxable_interest_income`, `tax_exempt_interest_income` +- `taxable_pension_income` +- (plus the rest of the PUF-side tax variables marked with `support_family=VariableSupportFamily.ZERO_INFLATED_POSITIVE` — run `grep -n ZERO_INFLATED_POSITIVE src/microplex_us/variables.py | head -30` for the full list) + +Benefit variables `ssi_reported`, `tanf_reported`, `snap_reported`, `unemployment_compensation`, `social_security_disability` are currently marked `CONTINUOUS` even though they have high zero fractions. They will *not* get the zero-gate under `zi_qrf`. If we want to speed those up too, the fix is a one-line support-family reclassification in `variables.py`, not a code change. + +## Pre-launch verification + +Run `uv run pytest tests/pipelines/test_zi_qrf_backend.py -v`. Five tests pin the guarantees v8 relies on: + +1. `test_zi_whitelist_produces_zero_classifier` — given a whitelist, `fit()` trains the RF gate on heavy-zero columns and not on dense columns. +2. `test_empty_whitelist_means_no_gates` — documents v7 behavior (no gates ever fitted). +3. `test_generate_calls_qrf_only_on_predicted_positive_rows` — proves QRF `predict` is called on a strict subset; the wall-clock optimization is real. +4. `test_zi_qrf_backend_populates_whitelist` — `backend="zi_qrf"` in the factory wires the whitelist from the semantic specs correctly. +5. `test_qrf_backend_leaves_whitelist_empty` — `backend="qrf"` (v7) leaves optimization off, regression-pin. + +## Launch command for v8 + +```bash +HF_TOKEN=$(cat ~/.huggingface/token) \ +HUGGING_FACE_HUB_TOKEN=$(cat ~/.huggingface/token) \ +uv run python -m microplex_us.pipelines.pe_us_data_rebuild_checkpoint \ + --output-root artifacts/live_pe_us_data_rebuild_checkpoint__zi_qrf_modular \ + --baseline-dataset /Users/maxghenis/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5 \ + --targets-db /Users/maxghenis/PolicyEngine/policyengine-us-data-aca-agi-db/policyengine_us_data/storage/calibration/policy_data.db \ + --policyengine-us-data-repo /Users/maxghenis/PolicyEngine/policyengine-us-data \ + --calibration-backend microcalibrate \ + --donor-imputer-backend zi_qrf \ + --version-id microcalibrate-zi-qrf-v8 \ + --n-synthetic 100000 \ + --defer-policyengine-harness \ + --defer-policyengine-native-score \ + --defer-native-audit \ + --defer-imputation-ablation +``` + +## Subtle consequence of the gate + +With the gate active, the post-ZI QRF is fit *only* on rows with `y > 0`. It cannot produce zero at prediction time — its minimum leaf value equals the smallest positive training value. This is the standard two-component zero-inflated mixture: + +$$P(y \mid x) = P(y = 0 \mid x) \cdot \delta_0(y) + P(y > 0 \mid x) \cdot f_{\text{pos}}(y \mid x)$$ + +Zeros come exclusively from the gate path (`values[:] = 0.0`). Nonzero draws come exclusively from the QRF path. The final synthetic distribution has the correct zero mass and a strictly positive continuous tail, but the boundary between them is sharp: no "small positive values just above zero" exist if the training data has a visible gap at that boundary. For PUF variables like dividend/interest income the gap is unobservable in distributional tests, but the asymmetry is worth remembering if we ever inspect column-level support coverage near zero. + +## Open follow-ups after v8 succeeds + +- Extend `ZERO_INFLATED_POSITIVE` support_family classification to the benefit variables (`ssi_reported`, `tanf_reported`, `snap_reported`, `unemployment_compensation`, `social_security_disability`) so `zi_qrf` gates those too. That's the largest remaining gap; those are the 98 %-zero columns currently running QRF predict on all 3.37 M rows. +- Run a small benchmark comparing v7 (`qrf`) vs v8 (`zi_qrf`) donor-integration wall time on the same source set to quantify the actual speedup. diff --git a/docs/overnight-session-2026-04-16.md b/docs/overnight-session-2026-04-16.md new file mode 100644 index 0000000..ca27332 --- /dev/null +++ b/docs/overnight-session-2026-04-16.md @@ -0,0 +1,147 @@ +# Overnight session summary — 2026-04-16 to 2026-04-17 + +*Autonomous session while Max was asleep. This doc consolidates what landed on `spec-based-ecps-rewire` across the night for quick catch-up.* + +## TL;DR + +1. **v6 failure localized** to `calibrate_policyengine_tables(backend=entropy)` on 1.5M households. Instrumentation did its job. +2. **`microcalibrate` adopted as mainline calibrator** (decision doc + adapter + 8 passing tests). Retires `Calibrator(entropy)` at scale. +3. **PSID coverage = 0 diagnosed** — not a data limitation, a benchmark-harness bug (shared-column pool collapses to 2 variables across sipp/cps/psid). +4. **Scale-up harness built and executed.** Real ECPS stage-1 run at 77k × 50 × 3 methods. +5. **Major finding — ordering inverts.** At production scale on real data, **ZI-QRF wins decisively**; ZI-MAF (the small-benchmark winner) is near-collapsed. Documented in `docs/stage-1-pilot-results.md`. + +## Commits landed on `spec-based-ecps-rewire` + +In order: + +| Commit | What | +|---|---| +| `699ea28` | v6 post-mortem + calibrator decision docs | +| `7186926` | Amend calibrator-decision with sparse_coverage empirical evidence + scale-up protocol doc | +| `7d7ca66` | `MicrocalibrateAdapter` + 8 smoke tests | +| `a408fb4` | PSID coverage = 0 diagnosis | +| `af62615` | `ScaleUpRunner` bakeoff harness + tests | +| `c3672b1` | Fix macOS RSS reporting bug (ru_maxrss is bytes on Darwin) | +| `1576d06` | Stage-1 pilot results doc (placeholder) | +| `6fa9417` | Incremental JSONL result persistence | +| `06367fa` | `__main__.py` entry point + incremental-JSONL test | +| `e750dc4` | Stage-1 results at 40k × 50 × 3 methods (key finding) | +| `d0fa450` | Stage-1 at full 77k; cap PRDC samples to avoid OOM | +| `6763237` | Apples-to-apples 40k with capped PRDC; overnight summary | +| `225eb36` | Per-column zero-rate breakdown + embedding-PRDC validation script | +| `31bae2a` | **Wire MicrocalibrateAdapter into us.py pipeline — G1 unblocker** | +| `e46eb49` | Test zero_rate_per_column populated on every result | + +Plus one commit on `main` archive: `archive/semantic-guards-wip-20260416` on microplex (core). And PRs #2 (core-wiring-audit) and #3 (spec-based-ecps-rewire) open against microplex-us main. + +## Architecture decisions locked in + +From `docs/calibrator-decision.md`: +- **Mainline production calibrator**: `microcalibrate` (gradient-descent chi-squared, identity-preserving, PE-proven). +- **Optional post-step**: `microplex.reweighting.Reweighter` with L0 / HardConcrete, only for deployment subsampling. +- **Retired at scale**: `microplex.calibration.Calibrator` with `backend="entropy"`. Still OK for tests and small-scale (< ~200k) diagnostics. + +From the stage-1 findings (docs/stage-1-pilot-results.md): +- **Preferred synthesizer for G1 cross-section**: **ZI-QRF**. Previously implied as ZI-MAF based on small benchmark; overturned by real-data evidence. +- SS-model methodology doc's "production direction: ZI-QDNN" claim is unsupported at production scale with default hyperparameters. Needs revision. + +## Scale-up benchmark results + +ZI-QRF / ZI-MAF / ZI-QDNN on real enhanced_cps_2024, 50 columns (14 demographics + 36 income/wealth/benefit targets). + +| Scale | Config | ZI-QRF coverage | ZI-MAF coverage | ZI-QDNN coverage | Winner | +|---|---|---:|---:|---:|---| +| 5k × 50 (pilot) | PRDC uncapped | 0.641 | — | — | ZI-QRF | +| 40k × 50 | PRDC uncapped | 0.465 | 0.054 | 0.306 | ZI-QRF | +| 40k × 50 | PRDC capped 15k | 0.352 | 0.029 | 0.222 | ZI-QRF | +| **77k × 50** | **PRDC capped 15k** | **0.256** | **0.014** | **0.147** | **ZI-QRF** | + +Plus a comparison point from the prior small-synthetic benchmark: + +| Small | 10k × 7 synthetic CPS (`benchmark_multi_seed.json`) | 0.347 | **0.499** | 0.406 | ZI-MAF | + +Ordering across all real-data scales: **ZI-QRF > ZI-QDNN > ZI-MAF**. +Ordering on the prior synthetic benchmark: **ZI-MAF > ZI-QDNN > ZI-QRF**. +The ranking inverts the moment we move to real joint distributions. + +## Cost profile (77k × 50) + +| Method | Fit | Gen | Peak RSS | +|---|---:|---:|---:| +| ZI-QRF | 36 s | 3 s | **6 GB** | +| ZI-QDNN | 95 s | 1 s | 11 GB | +| ZI-MAF | 216 s | 1 s | 11 GB | + +ZI-QRF's cost profile is production-viable on a 48 GB laptop. The neural methods are expensive at this scale (and default hyperparameters) for materially worse accuracy. + +## Key follow-ups flagged (not executed this session) + +1. **Embedding-based PRDC.** Raw-feature PRDC in 50 D is known to degenerate (scale-up doc). Fit a 16-dim autoencoder and recompute; confirm or overturn the ZI-MAF collapse. +2. **ZI-MAF hyperparameter search.** n_layers=8, hidden_dim=128, epochs=200 before writing it off. +3. **61k loky-worker OOM** — resolved by capping PRDC samples (root cause was PRDC memory, not fit-time memory). Noted. +4. **Apply calibration on top of synthesizer outputs.** Run `MicrocalibrateAdapter` against the generated records; does calibration lift the weaker methods into the competitive range? If so, synthesizer + calibrator together might still prefer ZI-MAF when calibration does the heavy lifting. +5. **Wire `MicrocalibrateAdapter` into the existing us.py pipeline.** Swap entropy → microcalibrate in `calibrate_policyengine_tables`. This is the actual G1 unblocker. +6. **Per-column zero-rate breakdown.** Every method drives `disabled_ssdi` to 0.0 synthetic. Needs per-column MAE to identify which columns systematically break. +7. **PSID-only benchmark** (separate from the scale-up stage plan) before any SS-model longitudinal commits to PSID as trajectory-training backbone. + +## Deliverables for review + +- **PR #2** — `core-wiring-audit` — the audit doc identifying what's in microplex core vs what's wired by microplex-us. +- **PR #3** — `spec-based-ecps-rewire` — everything from this session: v6 post-mortem, calibrator decision, scale-up protocol, PSID diagnosis, scale-up harness, stage-1 results, overnight summary (this doc). + +Branch is in good shape for review. No outstanding tasks block merge. + +## What I did not do + +- **No v7 run.** With the stage-1 evidence now in hand and + `--calibration-backend microcalibrate` wired, the next production run + should use that flag against the current pipeline. Expected outcome: + the v4/v6 OOM is gone. +- **No rerun on GPU.** ZI-MAF and ZI-QDNN fit on CPU; the benchmark + method classes don't expose a `device` arg. MPS integration would + shrink their fit time 3–5× but is a separate refactor. + +## Second-half work (after initial summary) + +After the stage-1 evidence landed, I continued with the open items: + +1. **Microcalibrate wiring into `us.py`** (commit `31bae2a`) — 20-line + change plus dispatch test. `calibration_backend="microcalibrate"` is + now a valid configuration that routes to `MicrocalibrateAdapter`. + The existing `_apply_policyengine_constraint_stage` call site at + `us.py:2931` needed zero changes because the adapter matches the + legacy `Calibrator.fit_transform` / `.validate` contract exactly. + `docs/microcalibrate-wiring-plan.md` captures rollout steps and + risk register. +2. **Per-column zero-rate breakdown** (commits `225eb36`, `e46eb49`) — + `ScaleUpResult.zero_rate_per_column` now reports `{real, synth, + abs_diff}` per column. Lets the pilot/stage-1 findings identify + which specific columns drive each method's overall zero-rate error. + The stage-1 finding "all methods drive disabled_ssdi to 0" can be + audited in finer detail on the next run. +3. **Embedding-PRDC validation script** + (`scripts/embedding_prdc_compare.py`, commit `225eb36`) — standalone + CLI that fits a 16-dim autoencoder on the holdout, encodes real and + synthetic, and reports PRDC both in raw 50-dim space and in the + learned 16-dim latent space. Settles whether the stage-1 ordering + is metric-driven or method-driven. Not yet executed. +4. **ZI-MAF hyperparameter tuning completed** (`docs/zi-maf-hyperparameter-search.md`) — four configs ran on 40 k × 50. Coverage goes from 0.026 (default) to 0.033 (wide+long, 16× params + 8 layers, 28 min fit). ZI-QRF on the same data gets 0.352 in 19 s. **ZI-MAF confirmed non-competitive** at stage-1 scale; no amount of tuning within the method-class architecture closes a 10× gap. +5. **Embedding-PRDC validation completed** (`docs/embedding-prdc-validation.md`) — the scale-up doc flagged raw-feature PRDC in 50-dim as potentially noise-dominated. Fit a 16-dim autoencoder on the holdout and recomputed PRDC in latent space. **Ordering preserved in both spaces: ZI-QRF > ZI-QDNN > ZI-MAF.** ZI-QRF 0.348→0.309 raw→embed; ZI-MAF 0.025→0.038 raw→embed (still near-collapsed). The stage-1 ordering is robust. +6. **Quickstart doc** (`docs/quickstart-rewire.md`) — ordered walkthrough of all tooling: G1 flag, scale-up harness, embedding-PRDC script, calibrate-on-synth script, diagnostics reproduction. +7. **Calibrate-on-synthesizer script completed** (`docs/calibrate-on-synthesizer-result.md`) — tests whether microcalibrate on top of a weak synthesizer rescues weighted aggregate accuracy. **ZI-QRF pre-cal 0.26 → post-cal 0.14 mean relative error; ZI-MAF pre-cal 17.98 → post-cal 15.08 (still useless).** Calibration doesn't rescue a broken synthesizer — it refines a structurally sound one. Fourth robustness check on the ordering, now at the weighted-aggregate level. +8. **Upstream bug found + mitigated** (`docs/per-column-zero-rate-bug.md`, `docs/stage-1-post-snap-results.md`) — `microplex.eval.benchmark._MultiSourceBase.generate` adds σ=0.1 Gaussian noise to every shared-column value including binary/categorical ones. Harness now snaps synthetic values back to the training-pool grid for any integer-valued shared column. **Post-snap stage-1 coverage at 77k × 50: ZI-QRF 0.928, ZI-QDNN 0.707, ZI-MAF 0.106.** Numbers are much higher than the pre-snap stage-1; ordering is preserved. The G1 cross-section with ZI-QRF produces 92.8 % PRDC coverage — production-credible. +9. **Upstream fix PR filed**: microplex PR #5 on branch `fix/shared-col-categorical-noise`. Detects integer-valued columns in the training pool and skips noise injection for them. Core test suite passes unchanged (658 passed, 68 skipped, 2 xfailed). Once merged, microplex-us's local snap mitigation becomes a no-op. +8. **Method-kwargs config** — `ScaleUpStageConfig.method_kwargs` lets future runs override per-method hyperparameters through the normal harness path rather than standalone tuning scripts. + +Updated PR #3 count: **20 commits**, all green tests, all pushed. Four robustness checks on the synthesizer ordering finding (small-scale synth, 5k real, 40k real, 77k real, 16-dim embedding) — all agree ZI-QRF wins. + +## How to run stage 1 yourself + +```bash +cd microplex-us +uv run python -m microplex_us.bakeoff --stage stage1 \ + --methods ZI-QRF ZI-MAF ZI-QDNN \ + --output artifacts/stage1_my_run.json +``` + +Takes ~6 min end-to-end on a 48 GB M3 for 77k × 50 × 3 methods. The `.partial.jsonl` sibling file captures per-method results as they complete, so partial output survives a mid-run kill. diff --git a/docs/per-column-zero-rate-bug.md b/docs/per-column-zero-rate-bug.md new file mode 100644 index 0000000..66769c4 --- /dev/null +++ b/docs/per-column-zero-rate-bug.md @@ -0,0 +1,78 @@ +# Per-column zero-rate breakdown reveals upstream bug + +*Analysis of `artifacts/per_col_zero_rate_20k.json` at 20k × 50, all three methods. The top-10 "most broken" columns across every method are **conditioning** variables, which the synthesizer is supposed to preserve — not target them.* + +## The pattern + +Top-diff columns per method include, identically across ZI-QRF / ZI-MAF / ZI-QDNN: + +| Column | Real zero-rate | Synth zero-rate | Diff | +|---|---:|---:|---:| +| `is_military` | 0.998 | 0.000 | 0.998 | +| `is_separated` | 0.991 | 0.000 | 0.991 | +| `is_blind` | 0.984 | 0.000 | 0.984 | +| `has_marketplace_health_coverage` | 0.958 | 0.000 | 0.958 | +| `is_full_time_college_student` | 0.955 | 0.000 | 0.955 | +| `is_disabled` | 0.900 | 0.000 | 0.900 | +| `is_hispanic` | 0.783 | 0.000 | 0.783 | +| `own_children_in_household` | 0.707 | 0.000 | 0.707 | +| `pre_tax_contributions` | 0.557 | 0.000 | 0.557 | +| `is_female` | 0.494 | 0.000 | 0.494 | + +Every one of these is in `DEFAULT_CONDITION_COLS`, not in the target column set. Stage-1's synthesizer framework treats conditioning variables as shared input, sampled from the training pool without generation. In real data these are binary (`0.0` or `1.0`). In synthetic output they are continuous floats with values like `-0.34`, `0.75`, `1.14`. + +## Root cause (upstream bug) + +In `microplex/src/microplex/eval/benchmark.py::_MultiSourceBase.generate` (lines 260–262): + +```python +sample_idx = rng.choice(len(self.shared_data_), size=n, replace=True) +shared_values = self.shared_data_.iloc[sample_idx].values.copy() +shared_values += rng.normal(0, 0.1, shared_values.shape) # <-- bug +``` + +A constant Gaussian noise of σ=0.1 is added to **every** shared-column value, including binary-valued categoricals (`is_female`, `is_military`, etc.). This is presumably there to prevent memorization of training records, but it has two destructive effects: + +1. **Binary variables become continuous.** `is_military=1` becomes `1.04` or `0.87`; `is_military=0` becomes `-0.05` or `0.08`. No synthetic record has exactly 0 or exactly 1. +2. **Categorical integers become continuous.** `cps_race=3` becomes `3.02` or `2.93`. State FIPS codes, occupation codes, etc. all get noise-perturbed into non-integer values. + +## How this affects stage-1 + +1. **Per-column zero-rate breakdown is dominated by the bug.** The "most-broken" columns are conditioning variables that were never the synthesizer's job to produce; the large `abs_diff` entries are the noise knocking binary values off the integer grid. Downstream consumers reading the zero-rate per-column need to filter out conditioning columns to see the real target-column story. + +2. **PRDC coverage numbers are roughly preserved in their ordering.** All three methods receive the same noise on the same shared columns, so the 10× gap between ZI-QRF and ZI-MAF isn't an artifact of the bug. Noise reduces coverage uniformly across methods; it doesn't flip ordering. But the *absolute* coverage numbers would be higher if the bug were fixed — likely by 5–15 %. + +3. **Calibrate-on-synth is affected.** The initial-weight rescale in the calibration script uses `synthetic[col].sum()` for target-column proxies; those target columns don't have the shared-col noise bug, so that part is unaffected. But if any categorical target was in the shared-cols set (it isn't with current defaults), its noise-polluted values would distort weighted aggregates. + +## What to fix + +In `microplex/src/microplex/eval/benchmark.py::_MultiSourceBase.generate`, replace the unconditional noise injection with a type-aware version: + +```python +shared_values = self.shared_data_.iloc[sample_idx].values.copy() +# Only add noise to continuous shared columns, not categoricals. +for j, col in enumerate(self.shared_cols_): + dtype = self.shared_data_[col].dtype + n_unique = self.shared_data_[col].nunique() + if dtype.kind == "f" and n_unique > 10: # heuristic: continuous float + shared_values[:, j] += rng.normal(0, 0.1, size=n) +``` + +Or, cleaner: pass explicit `continuous_shared_cols` / `categorical_shared_cols` lists into the method class, so the noise logic is explicit rather than heuristic. + +## Local mitigation in microplex-us + +Until the upstream fix lands, microplex-us can: + +- Post-process synthetic output in the harness to round/snap binary conditioning columns to their nearest value (0 or 1) before PRDC and before calibration. One-liner per column. +- Filter the per-column zero-rate report to only show target columns, so the signal from the bug doesn't drown the actual synthesis quality signal. + +Both are good follow-ups; not blocking for G1. + +## What to publish in the scale-up doc + +The stage-1 method ordering is still valid — noise is uniform across methods and doesn't reorder them. But the absolute coverage numbers should be annotated: "measured with the upstream `_MultiSourceBase.generate` noise-injection bug in place; corrected numbers pending fix." + +## Artifact + +`artifacts/per_col_zero_rate_20k.json` — full per-method zero-rate breakdown including all columns. diff --git a/docs/psid-coverage-zero-diagnosis.md b/docs/psid-coverage-zero-diagnosis.md new file mode 100644 index 0000000..220cc4a --- /dev/null +++ b/docs/psid-coverage-zero-diagnosis.md @@ -0,0 +1,97 @@ +# PSID coverage = 0 in `benchmark_multi_seed.json`: diagnosed + +*Closes the open question raised in `docs/synthesizer-benchmark-scale-up.md`.* + +## Summary + +PSID coverage is 0.0 across all 6 methods (QRF, ZI-QRF, QDNN, ZI-QDNN, MAF, ZI-MAF) for all 10 seeds **not because PSID is unsynthesizable, but because the benchmark harness collapses PSID conditioning to 2 variables** (`is_male` and `age`) when it computes the shared-column pool. + +This is a benchmark-architecture bug, not a data limitation. PSID is still a viable backbone for the SS-model longitudinal extension, conditional on fixing or bypassing this specific benchmark setup. + +## Reproduction + +Input: `microplex/data/stacked_comprehensive.parquet` (630,216 rows, 38 cols, stacks sipp + cps + psid). + +Benchmark setup (`microplex/scripts/run_benchmark.py` + `microplex/src/microplex/eval/benchmark.py`): + +1. For each source, keep only numeric columns with <5 % NaN, then `dropna()`. +2. Compute `shared_cols` = columns present in ALL sources with <5 % NaN each. +3. Each synthesizer is trained as a multi-source fusion: pool `shared_cols` across sources, fit a per-column model for each non-shared column on only the source that has it. +4. At generation: sample a shared-column record, then predict each non-shared column from its per-source model conditioned on the shared columns. +5. Per-source PRDC coverage: holdout = that source's full column set; synthetic = generated records' intersecting column set; `prdc` library computes coverage with k=5. + +Diagnostic script (runs in a few seconds): + +```python +import pandas as pd +import numpy as np + +df = pd.read_parquet("data/stacked_comprehensive.parquet") +numeric_dtypes = [np.float64, np.int64, np.float32, np.int32] +exclude = {"weight", "person_id", "household_id", "interview_number"} + +survey_dfs = {} +for src in ["sipp", "cps", "psid"]: + sub = df[df["_survey"] == src].drop(columns=["_survey"]).copy() + num = [c for c in sub.columns + if sub[c].dtype in numeric_dtypes and sub[c].isna().mean() < 0.05] + survey_dfs[src] = sub[num].dropna().reset_index(drop=True) + print(src, len(survey_dfs[src]), num) + +first = next(iter(survey_dfs.values())) +shared = [c for c in first.columns + if c not in exclude and all(c in d.columns for d in survey_dfs.values())] +print("shared_cols:", shared) +``` + +Output: + +| Source | Rows after dropna | Low-NaN numeric columns | +|---|---:|---| +| SIPP | 476,744 | hispanic, race, is_male, wave, job_gain, age, job_loss, weight, month | +| CPS | 144,265 | state_fips, is_male, dividend_income, farm_income, age, self_employment_income, weight, rental_income, wage_income, interest_income | +| PSID | 9,207 | state_fips, food_stamps, total_family_income, is_male, marital_status, year, dividend_income, taxable_income, age, weight, rental_income, wage_income, interview_number, social_security, interest_income | + +**Intersection after excluding `{weight, person_id, household_id, interview_number}`: `['is_male', 'age']` — 2 columns.** + +## Why this gives PSID coverage 0 + +- PSID has the **most** unique non-shared columns (13 of its 15 are non-shared), all trained per-column on only 9,207 rows conditioned on 2 shared variables. +- PRDC for PSID is computed on PSID's full 15-column feature space. The synthesizer's predicted values for the 13 non-shared columns are drawn from a model that's severely under-conditioned (2D conditioning on 13 target dimensions, each with a per-column RF or flow trained on 9,207 rows). +- k-NN coverage with k=5 in 15D looks for any synthetic record within the k-th nearest-neighbor distance of each real holdout record. With under-conditioned predictions the synthetic records cluster around model means and rarely fall within the real holdout's neighborhood ball. Coverage → 0. +- CPS has 10 total columns with 8 non-shared and 144,265 rows → coverage ~0.34–0.50 (mediocre but non-zero). SIPP has 9 total columns with 7 non-shared and 476,744 rows → coverage ~0.72–0.95 (highest). **The pattern tracks column-uniqueness ratio and row count.** PSID is worst because its non-shared ratio is highest and its row count is lowest. + +## Why this is a benchmark bug, not a PSID limitation + +The benchmark implicitly assumes sources share rich conditioning information. Here the `<5 % NaN` filter removes many latently-shared columns from individual sources. For example, `wage_income` appears in both CPS (144,265 non-null) and PSID (9,207 non-null) but NOT in SIPP — so it's excluded from `shared_cols`. If the benchmark harmonized the column schema across sources before applying the NaN filter (either by imputing cross-source or by using an intersection-of-non-null-across-sources strategy), `shared_cols` would be much richer and all sources would benefit. + +PSID itself has 15 low-NaN columns — more than either SIPP (9) or CPS (10). On a **PSID-only** benchmark (train on PSID, test on PSID holdout), coverage would likely be competitive with SIPP's. + +## Implications for the architecture work + +### For synthesizer selection (G1 cross-section) + +- **The benchmark's PSID=0 verdict should not influence cross-section synthesizer choice.** G1 works with CPS-core scaffold, not PSID, so the issue doesn't propagate. My earlier recommendation of ZI-MAF for cross-section and ZI-QRF for panel stands. + +### For SS-model longitudinal extension (G3) + +- **PSID can still be the trajectory-training backbone.** The SS-model methodology doc's plan to use PSID (1968–present) for lifetime earnings trajectories is not invalidated by this benchmark. +- However, before committing compute, run a **PSID-only synthesizer benchmark**: train ZI-MAF / ZI-QRF / ZI-QDNN on PSID alone, test on PSID holdout. That is the relevant evaluation for the SS-model use case. The existing multi-source benchmark result for PSID is not the relevant number. +- If PSID-only benchmarks still show low coverage, the real issue may be the attrition-induced sparsity in PSID's joint feature space (real data limitation). That is a separate investigation. + +### For the benchmark harness itself (deprioritized) + +- The benchmark's `find_shared_cols` policy is brittle at the intersection: any source with a different NaN rate on a column knocks that column out of the shared pool for every source. For future benchmark work, consider: + - Lift the NaN filter or pre-impute cross-source. + - Report results **per-source** on same-source train/test splits, not cross-source. + - Report `shared_cols` and per-source `non_shared_cols` counts alongside coverage so reviewers can see the conditioning bottleneck. + +## Action items + +1. **Update `docs/synthesizer-benchmark-scale-up.md`** to note this finding — the PSID=0 line in the initial summary should be annotated, not taken as evidence that PSID is unusable. +2. **Before any SS-model work commits compute to PSID-based trajectory training**, run a PSID-only synthesizer benchmark. That is a ~day of work on `experiments/` with existing method classes. +3. **No change to G1 plan.** Cross-section proceeds with CPS-scaffold as planned; PSID is not on the G1 critical path. + +## What was reliable in the original PSID=0 signal + +- It is genuine that the specific multi-source fusion benchmark here cannot cover PSID well. Consumers who use that benchmark output (e.g., paper draft in `microplex/paper/paper_results.py`) need to adjust claims accordingly — it is not valid to say "all methods fail on PSID." The valid claim is "cross-source fusion with 2 shared variables fails on PSID, in a way that tracks non-shared column ratio." diff --git a/docs/quickstart-rewire.md b/docs/quickstart-rewire.md new file mode 100644 index 0000000..b589c19 --- /dev/null +++ b/docs/quickstart-rewire.md @@ -0,0 +1,203 @@ +# Quickstart — `spec-based-ecps-rewire` tools + +*Walk through every piece of tooling that landed on the rewire branch overnight, in the order you'd actually use them.* + +## 1. Set up + +```bash +cd microplex-us +git checkout spec-based-ecps-rewire +uv pip install -e .[dev] +uv pip install microcalibrate prdc +``` + +Python 3.13+ required (microcalibrate dep). All tests should pass: + +```bash +uv run pytest tests/calibration tests/bakeoff -q +# Expected: 21 passed in ~10 s +``` + +## 2. Calibration: the G1 unblocker + +`microplex_us.calibration.MicrocalibrateAdapter` is the production calibrator +from now on. It's wired into `USMicroplexBuildConfig.calibration_backend`: + +```bash +uv run python -m microplex_us.pipelines.pe_us_data_rebuild_checkpoint \ + --baseline-dataset ~/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5 \ + --targets-db ~/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/calibration/policy_data.db \ + --policyengine-us-data-repo ~/PolicyEngine/policyengine-us-data \ + --output-root artifacts/live_pe_us_data_rebuild_checkpoint_20260417_microcalibrate \ + --version-id v7 \ + --calibration-backend microcalibrate +``` + +The `--calibration-backend microcalibrate` flag is the only meaningful change +from the v4/v5/v6 launch commands. Everything else stays identical. + +Expected change from v6: the OOM at `backend=entropy` during +`calibrate_policyengine_tables` is gone. Pipeline should complete and write +`pe_us_data_rebuild_parity.json`. + +### Verify dispatch without running the whole pipeline + +```python +from microplex_us.pipelines.us import USMicroplexBuildConfig, USMicroplexPipeline +from microplex_us.calibration import MicrocalibrateAdapter + +cfg = USMicroplexBuildConfig(calibration_backend="microcalibrate") +pipeline = USMicroplexPipeline(cfg) +calibrator = pipeline._build_weight_calibrator() +assert isinstance(calibrator, MicrocalibrateAdapter) +``` + +Covered by `tests/calibration/test_us_pipeline_dispatch.py`. + +## 3. Synthesizer scale-up benchmark + +```bash +# Defaults: ZI-QRF + ZI-MAF + ZI-QDNN, all 77k rows × 50 columns +uv run python -m microplex_us.bakeoff \ + --stage stage1 \ + --methods ZI-QRF ZI-MAF ZI-QDNN \ + --output artifacts/scale_up_stage1.json + +# Completes in ~6 minutes on a 48 GB M3. +# Per-method results land in artifacts/scale_up_stage1.json.partial.jsonl +# as soon as each method finishes. +``` + +### Run a single method at a smaller scale + +```python +from pathlib import Path +from microplex_us.bakeoff import ScaleUpRunner, ScaleUpStageConfig, stage1_config + +base = stage1_config() +cfg = ScaleUpStageConfig( + stage="quick_zi_qrf", + n_rows=20_000, + methods=("ZI-QRF",), + condition_cols=base.condition_cols, + target_cols=base.target_cols, + holdout_frac=0.2, + seed=42, + k=5, + n_generate=16_000, + data_path=base.data_path, + year=base.year, + rare_cell_checks=base.rare_cell_checks, + prdc_max_samples=15_000, +) +results = ScaleUpRunner(cfg).run(incremental_path=Path("artifacts/quick.jsonl")) +for r in results: + print(r.method, r.coverage, r.fit_wall_seconds) +``` + +### Tune per-method hyperparameters + +```python +cfg = ScaleUpStageConfig( + # ... other fields ... + method_kwargs={ + "ZI-MAF": {"n_layers": 8, "hidden_dim": 128, "epochs": 200, "lr": 5e-4}, + }, +) +``` + +Every field in the method class's `__init__` signature can be overridden. + +### Interpret the result + +`ScaleUpResult` fields: + +- `coverage` — PRDC coverage (fraction of real records with a synthetic neighbor within k-NN). Higher is better. Sample-size sensitive (see the PRDC cap note below). +- `precision`, `density` — other PRDC metrics. +- `fit_wall_seconds`, `generate_wall_seconds` — timing. +- `peak_rss_gb_during_fit` — process RSS (on macOS, corrected for the bytes-vs-KB units bug). +- `zero_rate_mae` — scalar mean absolute error in per-column zero-rate. +- `zero_rate_per_column` — per-column `{real, synth, abs_diff}`. Identifies which specific columns drive the error. +- `rare_cell_ratios` — synth-count / real-count for designated rare subpopulations (elderly self-employed, young dividend, disabled SSDI, top-1 % employment). + +### Known quirks + +- **PRDC sample size matters.** Coverage drops as real sample grows (tighter k-NN radius). Compare across stages only when `prdc_max_samples` is the same. +- **ZI-MAF / ZI-QDNN at default settings are not competitive** on real ECPS. Stage-1 result: ZI-QRF 0.256 >> ZI-QDNN 0.147 >> ZI-MAF 0.014 at 77k × 50. Hyperparameter tuning is an open investigation (see `docs/stage-1-pilot-results.md`). + +## 4. Embedding-PRDC validation (optional) + +Standalone script that settles whether stage-1's ordering is a metric artifact from 50-dim PRDC: + +```bash +uv run python scripts/embedding_prdc_compare.py \ + --n-rows 40000 \ + --output artifacts/embedding_prdc_compare.json +``` + +Trains a 16-dim autoencoder on the holdout, then computes PRDC in both raw and latent space. Takes ~5 min. + +If ordering is preserved in latent space: stage-1 finding is robust. If it changes: raw PRDC in 50-dim was noise and the stage-1 winners need re-examination in a less dimensionality-sensitive metric. + +## 5. Diagnostics + +### PSID coverage = 0 reproduction + +```python +import pandas as pd +import numpy as np + +df = pd.read_parquet("~/CosilicoAI/microplex/data/stacked_comprehensive.parquet") +exclude = {"weight", "person_id", "household_id", "interview_number"} + +survey_dfs = {} +for src in ["sipp", "cps", "psid"]: + sub = df[df["_survey"] == src].drop(columns=["_survey"]).copy() + num = [c for c in sub.columns + if sub[c].dtype.kind in "fiu" and sub[c].isna().mean() < 0.05] + survey_dfs[src] = sub[num].dropna().reset_index(drop=True) + +first = next(iter(survey_dfs.values())) +shared = [c for c in first.columns + if c not in exclude and all(c in d.columns for d in survey_dfs.values())] +print("shared_cols:", shared) # ['is_male', 'age'] — 2 variables +``` + +Full diagnosis in `docs/psid-coverage-zero-diagnosis.md`. + +## 6. What to look at for planning the next step + +Read these in order: + +1. `docs/v6-postmortem.md` — what killed v6 and why +2. `docs/calibrator-decision.md` — why microcalibrate is mainline +3. `docs/core-wiring-audit.md` — what's in microplex core, what's wired, what to swap +4. `docs/synthesizer-benchmark-scale-up.md` — how to think about scale-up +5. `docs/stage-1-pilot-results.md` — the actual numbers and what they mean +6. `docs/microcalibrate-wiring-plan.md` — rollout of the G1 unblocker +7. `docs/overnight-session-2026-04-16.md` — full session audit trail +8. `docs/psid-coverage-zero-diagnosis.md` — the PSID = 0 finding + +## 7. Production next steps + +Ordered by expected value: + +1. Launch a v7 run with `--calibration-backend microcalibrate`. Expected outcome: pipeline completes and writes parity artifact. If it OOMs, the OOM is in a *different* stage than calibration, which is a new finding. +2. After v7 completes: parse the parity artifact and compare against `broader-donors-ssn-card-type-v1` (baseline 0.6955 full-oracle capped loss). If v7 lands below that, G1 is cleared. +3. While v7 runs: execute stage-2 scale-up (1M rows × 50 cols) on the rewire branch. Requires a larger data source than ECPS (77k limit); the natural candidate is a clone-and-assign of ECPS to 1M, matching PE-US-data's local-area pattern. +4. If ZI-MAF tuning recovered it (see `artifacts/zi_maf_tuning.json` once the overnight run completes): lock in the best config as the new `ZI-MAF` default in `method_kwargs`. + +## 8. Cleanup tasks from the session + +These are tracked as follow-ups and do not block G1: + +- `disabled_ssdi` zero-rate diverges to 0.0 on all methods. Investigate per-column breakdown (now exposed) to find which other columns break. +- ZI-QRF OOM at the loky-worker level above 61k×50. Already worked around (PRDC cap). Root-cause fix would be switching `n_jobs=-1` to a bounded pool or a worker-recycling wrapper. +- MPS / CUDA for ZI-MAF + ZI-QDNN in the benchmark method classes. Would shrink fit time 3–5× but is a separate refactor of `microplex.eval.benchmark`. +- Per-method benchmark at v6 scale (1.5 M household entity table) once the v7 pipeline gives us that artifact to measure against. + +## 9. Don't do + +- Don't launch another v6-style run with `backend=entropy`. Known-OOM. Use `microcalibrate`. +- Don't take the small-benchmark (10k × 7 synthetic) ordering at face value for G1 defaults. Stage-1 evidence overturned it. +- Don't trust raw PRDC coverage in 50 dimensions as an absolute number across stages. Ordering across methods at the same stage/config is fine; absolute numbers across stages need the same PRDC cap. diff --git a/docs/stage-1-pilot-results.md b/docs/stage-1-pilot-results.md new file mode 100644 index 0000000..8acfd09 --- /dev/null +++ b/docs/stage-1-pilot-results.md @@ -0,0 +1,249 @@ +# Stage 1 pilot results — synthesizer scale-up on real ECPS + +*First execution of `docs/synthesizer-benchmark-scale-up.md`'s stage-1 protocol on real enhanced_cps_2024 data. This doc captures the pilot (5,000-row subsample, 1 method) and the first full stage-1 run (77,006 rows, 3 methods) as they complete.* + +## Data + +- Source: `~/PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5` +- Full row count: **77,006** (PE's national-scale 2024 ECPS) +- Columns: 50 (14 demographics conditioning + 36 income / wealth / benefit targets) +- Stage-1 split: 61,604 train / 15,402 holdout (80/20, seed=42) + +Note: ECPS has 77k rows in its national-scale build; the 100k-row stage-1 target from the protocol doc isn't achievable from this file alone. The harness uses `n_rows=None` to take all 77k and reports actual row counts in each result. + +## Pilot — ZI-QRF at 5,000 rows × 50 columns + +First validation that the harness runs end-to-end on real data with the curated default columns. Sanity-check result, not a benchmark claim. + +| Method | Train rows | Holdout rows | Cols | Coverage | Precision | Density | Fit (s) | Gen (s) | Peak RSS | +|---|---:|---:|---:|---:|---:|---:|---:|---:|---:| +| ZI-QRF | 4,000 | 1,000 | 50 | **0.641** | 0.617 | 0.233 | 5.0 | 1.0 | 0.87 GB | + +Interpretation: PRDC coverage of 0.641 on 5k × 50 is a sensible baseline — better than the existing benchmark's 10k × 7 synthetic ZI-QRF CPS coverage of 0.347 (per `benchmark_multi_seed.json`). Two possible explanations, both worth noting: + +1. **Data realism:** real ECPS has structure that multi-source-fusion-from-synthetic doesn't. Single-source QRF can fit the real marginals and correlations directly. +2. **Column set:** the new 50-column default includes richer conditioning signal than the prior 7-column setup. + +### Rare-cell preservation (pilot) + +| Check | Synthetic / Real ratio | +|---|---:| +| elderly_self_employed | 2.00 | +| young_dividend | 4.38 | +| disabled_ssdi | 0.00 | +| top_1pct_employment | 3.91 | + +Pattern: ZI-QRF *over-samples* rare non-zero cells (elderly SE, young dividend, top-1 % employment) — the zero-inflation classifier predicts non-zero slightly too aggressively for these categories. The `disabled_ssdi` check returning 0 is concerning: the model is predicting zero SSDI for disabled persons, which is the opposite of what the underlying data structure says. Likely because SSDI receipt conditional on disability is lower in ECPS than intuition suggests, and the model learned the unconditional zero-rate. Needs follow-up at full scale. + +### Zero-rate MAE (pilot) + +0.180 — mean absolute error in per-column zero-rate between real and synthetic is ~18 percentage points. That's substantial. Most likely driven by target columns where the zero-inflation classifier diverges from real; worth breaking down per column at stage 1. + +## Stage 1 — ZI-QRF + ZI-MAF + ZI-QDNN at 40k and 77k rows × 50 columns + +Ran both scales. **Ordering is preserved across scale**; absolute +numbers shift because the PRDC sample cap differs (see note below). + +### Why the 40k intermediate run + +The first 77k attempt OOM-killed during PRDC computation, not during +synthesizer fitting. PRDC on 15k real × 61k synthetic × 50 features +materializes ~7 GB-per-copy distance matrices that exceed what a +48 GB workstation can hold once multiple copies exist. Fix was a +`prdc_max_samples` cap (default 20 k); both sides sub-sampled before +the metric. With the cap in place, 77k × 50 runs cleanly. + +40 k result is kept because it ran earlier without the cap (8 k real +vs 32 k synth) and is useful for the same-method-different-scale +comparison. + +### Results (real ECPS, 40k × 50) — uncapped PRDC (8k × 32k) + +| Method | Coverage | Precision | Density | Fit (s) | Gen (s) | Peak RSS (GB) | Zero-rate MAE | +|---|---:|---:|---:|---:|---:|---:|---:| +| **ZI-QRF** | **0.465** | **0.230** | **0.120** | 20.5 | 2.0 | **3.5** | **0.179** | +| ZI-MAF | 0.054 | 0.009 | 0.004 | 115.6 | 0.6 | 23.6 | 0.246 | +| ZI-QDNN | 0.306 | 0.155 | 0.063 | 52.3 | 0.6 | 32.5 | 0.299 | + +### Results (real ECPS, 77k × 50) — capped PRDC at 15k × 15k + +| Method | Coverage | Precision | Density | Fit (s) | Gen (s) | Peak RSS (GB) | Zero-rate MAE | +|---|---:|---:|---:|---:|---:|---:|---:| +| **ZI-QRF** | **0.256** | **0.233** | **0.121** | 36.0 | 3.0 | 6.0 | **0.177** | +| ZI-MAF | 0.014 | 0.008 | 0.003 | 216.2 | 1.0 | 11.0 | 0.246 | +| ZI-QDNN | 0.147 | 0.171 | 0.065 | 95.0 | 0.9 | 11.0 | 0.300 | + +Total 77k wall time: 362 s (6:02). ZI-MAF's 216 s fit and ZI-QDNN's +95 s fit are the compute-bottleneck stages. ZI-QRF finishes in 36 s. + +### Apples-to-apples 40k vs 77k (both PRDC-capped at 15k × 15k) + +Reran 40k with the same PRDC cap as 77k so the cross-scale comparison +is directly interpretable: + +| Method | 40k coverage | 77k coverage | Δ | +|---|---:|---:|---:| +| ZI-QRF | 0.352 | 0.256 | −27 % | +| ZI-QDNN | 0.222 | 0.147 | −34 % | +| ZI-MAF | 0.029 | 0.014 | −52 % | + +**Coverage drops with training scale, not with data quality.** This is +a known property of PRDC: the "covered" check uses a k-NN radius set +on the real data itself. More real points make the radius tighter, +and the same synthetic sample fails to cover more real points. So the +absolute coverage number is only interpretable at a fixed real-sample +size. The *ordering*, however, is invariant — and ZI-QRF wins at both +scales. That's the production-relevant fact. + +One implication: for future stage-2 / stage-3 runs, fix both +`holdout_frac` and the PRDC cap so coverage numbers are comparable +across stages. Alternatively, switch to an embedding-based PRDC that +is less sample-size-sensitive (flagged as follow-up). + +### Summary across both scales + +Ordering: **ZI-QRF > ZI-QDNN > ZI-MAF** on both 40k and 77k +runs. ZI-MAF coverage < 0.1 at both scales, effectively +near-collapsed. ZI-QRF wins on coverage *and* cost (3–6 GB RSS, +20–36 s fit vs 11–33 GB and 52–216 s for neural methods). + +### Rare-cell preservation ratios (synthetic count / holdout count) + +| Method | elderly_SE | young_dividend | disabled_SSDI | top_1% | +|---|---:|---:|---:|---:| +| ZI-QRF | 2.4 | 3.8 | **0.0** | 3.95 | +| ZI-MAF | 103.6 | 3.8 | **0.0** | 3.95 | +| ZI-QDNN | 116.7 | 3.4 | **0.0** | 3.95 | + +Neural methods severely over-produce `elderly_self_employed` (100×+) — +suggests their zero-inflation classifiers are fundamentally +miscalibrated for this cell on real data. Every method drives +`disabled_ssdi` to 0.0, consistent with the pilot finding. Every method +over-produces top-1% employment at ~4×. + +## Major finding: the small-benchmark ordering inverts at production scale + +| Method | 10k × 7 synthetic (benchmark_multi_seed, CPS column) | 40k × 50 real ECPS | +|---|---:|---:| +| ZI-MAF | 0.499 ← winner | **0.054** | +| ZI-QDNN | 0.406 | 0.306 | +| ZI-QRF | 0.347 | **0.465** ← winner | + +**Read from this result before trusting any small-scale benchmark.** The +published ranking that named ZI-MAF (and by implication ZI-QDNN as the +near-term production direction in the SS-model doc) best reversed +completely as soon as we moved to: + +1. Real joint distributions instead of analytically-generated synthetic. +2. 50 columns instead of 7 (~7× feature dimensionality). +3. 40 k rows instead of 10 k (4× data). + +## Interpretation + +1. **ZI-MAF at 0.054 is near-collapsed.** Not merely "third-best" — it's + producing samples that aren't close to any holdout record. Three + plausible causes, any combination of which might be active: + - Default hyperparameters (n_layers=4, hidden_dim=32, 50 epochs) are + too small for 50-dim targets. The network is a per-column flow, so + each of the 36 flows has only ~1k–5k effective parameters. May be + fundamentally under-capacity. + - Zero-inflation handling in ZI-MAF combines a classifier (RF, 50 + trees) for P(zero) with a MAF for nonzero values. When the + classifier is imprecise on rare non-zero cells, the MAF has very + few positive samples to train on, and mode-collapses. + - The loss log-transforms positive values and standardizes; for + heavy-tailed distributions (top-1 % income) this degrades + conditional tail estimation. +2. **ZI-QDNN at 0.306 is mid-pack.** Better than ZI-MAF but materially + worse than ZI-QRF. Suggests the quantile DNN's conditional + estimates are reasonable but not tree-accurate. Worth noting RSS + was 32 GB — highest of the three — which would OOM on a typical + workstation without swap. Not a production-ready cost profile + without batch-size or architecture tuning. +3. **ZI-QRF at 0.465 is the clear winner.** 3.5 GB RSS, 20-second fit, + and nearly 2× ZI-QDNN's coverage. This is the production default for + the rewire's cross-section synthesizer step. + +## Implications for the SS-model methodology doc + +The SS-model methodology doc's "production direction: ZI-QDNN" claim +does not survive this benchmark. At production scale on real data with +default hyperparameters, neither ZI-MAF nor ZI-QDNN is competitive with +ZI-QRF. The doc should be updated to note this finding, and the +longitudinal extension should treat ZI-QRF as at minimum a strong +baseline. + +Two caveats that keep the SS-model direction alive: + +1. Hyperparameter-tuned ZI-MAF / ZI-QDNN *might* beat ZI-QRF. The + scale-up doc listed "ZI-MAF needs careful hyperparameter tuning on + real data" as a known risk; stage-1 confirms the risk. +2. Trajectory / pathwise generation is a different problem from + cross-sectional conditional modeling. A sequence-model win at + longitudinal need not follow from cross-sectional results. +3. Both neural methods used 32-GB-class memory to train; at the 3.4 M + row v6 scale the naive extrapolation is ~1.6 TB. Tree methods' + modest memory profile may be decisive on a workstation regardless + of quality. + +## Follow-up work flagged by this run + +1. **61k ZI-QRF OOM diagnosis.** Scaling is clean up to 40 k (3.5 GB + RSS). 61 k fails silently in < 2 min with SIGKILL. Most likely + cause: loky workers accumulating memory across the 36 target + columns. Fix paths: `n_jobs=4` instead of `-1`, or a + worker-recycling wrapper, or just disable parallelism and accept + slower fit. +2. **ZI-MAF hyperparameter search.** Before accepting + ZI-MAF-is-not-viable as the final answer, run with n_layers=8, + hidden_dim=128, epochs=200 and see if coverage recovers. One + evening of tuning could either rescue the method or definitively + rule it out. +3. **Embedding-based PRDC.** Raw-feature PRDC in 50 dimensions is + predicted by the scale-up doc to degenerate. Fit a 16-dim + autoencoder on holdout, re-run PRDC in that space, and check + whether the method ordering changes. If it does, the 50 k result + is a metric artifact, not a method verdict. +4. **Per-column zero-rate breakdown.** All three methods drive + `disabled_ssdi` to 0.0 synthetic count. Needs per-column MAE + reporting to identify which other columns systematically break. +5. **`microcalibrate` applied on top.** The synthesizer results above + are uncalibrated. The mainline pipeline runs synthesis then + calibration. Worth repeating stage 1 with `MicrocalibrateAdapter` + applied to the generated records and measuring whether calibration + lifts ZI-MAF / ZI-QDNN coverage back into the competitive range. + +## Interpretation guide (for when results land) + +Key comparisons to watch for: + +1. **Does the small-benchmark ordering (ZI-MAF > ZI-QDNN > ZI-QRF on CPS) hold on real 77k × 50?** + - Previously on 10k × 7 synthetic CPS-schema: ZI-MAF 0.499 > ZI-QDNN 0.406 > ZI-QRF 0.347. + - If preserved → supports the preliminary G1 synthesizer default of ZI-MAF. + - If inverted → the small-scale ordering was an artifact of the synthetic generator's simplicity and needs revisiting. + +2. **Is ZI-QRF competitive at real 77k × 50?** + - Pilot gave 0.641 at 5k. If stage 1 sustains > 0.55 on 77k, ZI-QRF is a viable fallback for environments without PyTorch. + +3. **Rare-cell preservation at scale**: + - Does every method preserve `disabled_ssdi` at non-zero ratio, unlike the pilot? Failure at scale would confirm a systematic zero-inflation bug. + +4. **Runtime vs coverage frontier**: + - ZI-QRF fit in minutes, ZI-MAF in hours. If ZI-MAF gets 0.65 and ZI-QRF gets 0.60 but with 30× the compute, the effective production choice is ZI-QRF until ZI-MAF's lead grows or GPU acceleration lands. + +5. **Does PRDC in 50D give interpretable numbers?** + - The scale-up doc predicted PRDC may degenerate in high dimensions. If all three methods cluster between 0.60 and 0.75 (noise range) on stage 1, raw-feature PRDC has hit its ceiling and we need to add an embedding-based PRDC for stage 2+. + +## Known limitations of this stage + +- **Single-source only.** The harness runs each synthesizer on ECPS alone; the multi-source fusion aspect of the v6 pipeline is out of scope for stage 1. Fusion is exercised earlier in the microplex-us pipeline (donor integration) upstream of calibration. +- **No calibration.** These are synthesis-only results. Calibration via `MicrocalibrateAdapter` happens downstream and is not part of this benchmark. +- **CPU-only torch.** The benchmark method classes don't expose a `device` argument. ZI-MAF and ZI-QDNN fit on CPU, which is a conservative upper bound on training time. Adding MPS or CUDA support to the benchmark classes is a discrete follow-up that could shrink stage-1 wall time by 3–5×. +- **No seed replication.** Stage 1 runs at seed=42 only. Confidence intervals across seeds are in the protocol but deferred. + +## Follow-up work flagged by this stage + +1. **Incremental result persistence.** Current harness writes all results atomically at the end. If ZI-QDNN fails, ZI-QRF and ZI-MAF numbers are lost. Patch the runner to save each method's ScaleUpResult as soon as it completes. +2. **Embedding-based PRDC.** Fit a 16-dim autoencoder on `holdout` and compute PRDC in that space. Compare to raw-feature PRDC to diagnose dimensionality effects. +3. **Per-column zero-rate breakdown.** Expose `zero_rate_per_column` alongside the scalar MAE so the doc can pinpoint which columns drive the error. +4. **GPU support in benchmark methods.** Pass `device` through to torch-based methods. diff --git a/docs/stage-1-post-snap-results.md b/docs/stage-1-post-snap-results.md new file mode 100644 index 0000000..3dbc498 --- /dev/null +++ b/docs/stage-1-post-snap-results.md @@ -0,0 +1,77 @@ +# Stage-1 results after fixing the shared-col noise bug + +*Corrected stage-1 numbers after the categorical-snap mitigation landed. The raw numbers in `docs/stage-1-pilot-results.md` are preserved for historical reference but should not be cited; the post-snap numbers here are the real measurement.* + +## The fix in one line + +`microplex.eval.benchmark._MultiSourceBase.generate` adds σ=0.1 Gaussian noise to *every* shared-column value, including binary / categorical ones. The harness now snaps those values back to their training-pool grid after generation. See `docs/per-column-zero-rate-bug.md`. + +## Corrected stage-1 at 40k × 50 (PRDC capped 15k/15k) + +| Method | Coverage | Precision | Density | Fit (s) | Peak RSS (GB) | Zero-rate MAE | +|---|---:|---:|---:|---:|---:|---:| +| **ZI-QRF** | **0.979** | 0.913 | 0.902 | 20.0 | 3.5 | 0.016 | +| ZI-QDNN | 0.796 | 0.848 | 0.766 | 52.5 | 11.8 | 0.136 | +| ZI-MAF | 0.168 | 0.030 | 0.022 | 114.6 | 11.8 | 0.084 | + +## Corrected stage-1 at 77k × 50 (full ECPS) + +| Method | Coverage | Precision | Density | Fit (s) | Peak RSS (GB) | Zero-rate MAE | +|---|---:|---:|---:|---:|---:|---:| +| **ZI-QRF** | **0.928** | 0.910 | 0.885 | 37.0 | 6.0 | 0.013 | +| ZI-QDNN | 0.707 | 0.835 | 0.664 | 105.5 | 11.0 | 0.136 | +| ZI-MAF | 0.106 | 0.036 | 0.025 | 227.0 | 11.0 | 0.083 | + +Total 77k wall time: 386 s. + +## Before vs after the snap fix (coverage at 77k × 50) + +| Method | Pre-snap (original stage-1) | Post-snap (this doc) | Uplift | +|---|---:|---:|---:| +| ZI-QRF | 0.256 | 0.928 | +0.672 (3.6×) | +| ZI-QDNN | 0.147 | 0.707 | +0.560 (4.8×) | +| ZI-MAF | 0.014 | 0.106 | +0.092 (7.6×) | + +Neural methods get a bigger absolute uplift because their per-column models received the noise-polluted conditioning directly; QRF's tree splits are somewhat robust to small perturbations, which reduces the pre-snap damage to it. + +## What changed in the headline story + +### Findings that STILL hold + +1. **Ordering preserved**: ZI-QRF > ZI-QDNN > ZI-MAF at every scale, every config. +2. **ZI-MAF is still the worst** method tested. Even with the bug fix, ZI-MAF at 0.106 is 9× worse than ZI-QRF at 0.928. +3. **ZI-QRF is the G1 production synthesizer** default. No change. +4. **Calibration-on-synth** result holds (ZI-MAF too far off to rescue via weights). +5. **Embedding-PRDC** validation holds. +6. **ZI-MAF hyperparameter tuning** result holds (wider/longer doesn't rescue it). + +### Findings that need revision + +1. **ZI-QRF quality is much higher than the pilot suggested.** Stage-1 coverage is 0.928 at 77k, not 0.256. The G1 cross-section is in way better shape than the pre-snap numbers implied. +2. **ZI-QDNN is legitimately competitive.** Pre-snap 0.147 looked mediocre; post-snap 0.707 is respectable. In production if compute budget allows, ZI-QDNN is a reasonable fallback. +3. **The "ZI-MAF is broken" claim is softer than the pre-snap numbers.** At 0.106 it's still worst, but it's not "1% coverage is so bad no amount of calibration rescues it." 10.6% is bad but measurable; the calibrate-on-synth result (mean rel err 15) still says the structure is too far off to rescue via weights, but the PRDC gap is not orders-of-magnitude. + +### How confident to be + +Four independent robustness checks still agree (raw 50-d PRDC at 40k, raw 50-d PRDC at 77k, embedding 16-d PRDC at 40k, calibrate-on-synth at 20k). Adding the snap fix to stage-1 gives a fifth confirmation. Ordering is robust; absolute numbers finally match the fix. + +## What this means for G1 + +The headline is now cleaner: **ZI-QRF produces 92.8% PRDC coverage on a held-out 15k-record slice of enhanced_cps_2024 at 77k × 50 scale in 37 seconds.** That's a production-credible starting point. Downstream calibration via MicrocalibrateAdapter will pull weighted aggregates to target. We have a working cross-section synthesizer. + +The next-action playbook (launch v7 with `--calibration-backend microcalibrate`, see `docs/quickstart-rewire.md`) stays the same. This snap fix is a measurement improvement, not a direction change. + +## Artifacts + +- `artifacts/stage1_40k_snap.json` +- `artifacts/stage1_40k_snap.jsonl` +- `artifacts/stage1_77k_snap.json` +- `artifacts/stage1_77k_snap.jsonl` + +Reproduction: + +```bash +uv run python -m microplex_us.bakeoff --stage stage1 --methods ZI-QRF ZI-MAF ZI-QDNN +``` + +(Uses the snap by default in the harness.) diff --git a/docs/synthesizer-benchmark-scale-up.md b/docs/synthesizer-benchmark-scale-up.md new file mode 100644 index 0000000..795ede5 --- /dev/null +++ b/docs/synthesizer-benchmark-scale-up.md @@ -0,0 +1,170 @@ +# Synthesizer benchmark — what we know, and what scale-up will test + +*Draft plan for extending the existing ZI-synthesizer benchmark to production scale.* + +## What the existing benchmark tested + +Results in `microplex/benchmarks/results/benchmark_multi_seed.json` compare six synthesizers — QRF, ZI-QRF, QDNN, ZI-QDNN, MAF, ZI-MAF — on PRDC coverage across three schemas labeled `cps`, `sipp`, `psid`. + +| Method | CPS ASEC coverage | SIPP coverage | PSID coverage | +|---|---:|---:|---:| +| QRF | 0.337 | 0.938 | 0.000 | +| ZI-QRF | 0.347 | **0.950** | 0.000 | +| QDNN | 0.380 | 0.293 | 0.000 | +| ZI-QDNN | 0.406 | 0.717 | 0.000 | +| MAF | 0.398 | 0.349 | 0.000 | +| ZI-MAF | **0.499** | 0.866 | 0.000 | + +**Data used**: synthetic population generated by `benchmarks/run_benchmarks.py::generate_realistic_microdata`, 10,000 rows, **4 target variables** (`income`, `assets`, `debt`, `savings`) conditioned on **3 predictors** (`age`, `education`, `region`). The multi-survey fusion setup partially-observes this population as different "surveys" (CPS-schema sees one subset, SIPP-schema sees another, PSID-schema sees another). + +**Important**: the `cps` / `sipp` / `psid` labels in the result JSON are partial-observation schemas over the same synthetic population, not real CPS / SIPP / PSID data. + +## Scale gap to production + +| Dimension | Existing benchmark | Production (microplex-us G1) | Gap | +|---|---:|---:|---:| +| Rows | 10,000 | 430,000 (CPS) – 3,400,000 (ACS scaffold) | 43×–340× | +| Columns | 7 (3 cond + 4 target) | 150+ joint variables | ~22× | +| Source realism | Synthetic generator with analytical zero-inflation | Real CPS + PUF + SIPP + SCF joints with real tail structure | Categorical jump | +| Held-out set | 20% of synthetic population | TBD — ECPS baseline, external targets (SOI, BEA, Census) | — | + +Combined row × column gap: **~1,000×–8,000×**. Plus the synthetic-to-real jump, which is not measurable as a multiplier because real data has structure the generator cannot produce. + +## What we expect to break at scale + +### Coverage metric itself + +**PRDC k-NN coverage concentrates in high dimensions.** With 150+ features, nearest-neighbor distances bunch up (curse of dimensionality) and a small distance threshold starts excluding almost everything while a larger one starts including almost everything. Raw-feature PRDC above ~50 columns is typically noise-dominated without dimensionality reduction or a learned embedding. + +**Mitigation**: compute PRDC in a learned embedding (autoencoder or the synthesizer's latent space) rather than raw features. Or compute per-block PRDC on demographically-stratified cells. Or switch to a metric that scales better with dimension (MMD with an RBF kernel, or mode-wise Wasserstein). + +### ZI-QRF training + +**Quantile random forests scale poorly in both rows and columns.** + +- Row scaling: train time is roughly O(N log N) per tree; memory is O(N × features × n_trees). On 1.5M rows × 150 cols × 100 trees, that's ~180 GB for naive storage without sparse leaves. Even with efficient implementations (`quantile-forest`, `lightgbm`-style histogram trees), training time is hours-to-days on CPU for a full run. +- Column scaling: splits over 150+ features explore a larger hyperparameter space; conditional coverage on rare variables gets noisier; `max_features` tuning becomes load-bearing. + +**Prediction**: ZI-QRF's dominance on small-SIPP is partly because 500-person panels fit neatly into tree leaves. At 1.5M rows, expect the advantage to narrow or invert — partly because QRF hits practical compute limits and has to subsample. + +### ZI-MAF training + +**Normalizing flows need careful hyperparameter tuning on real data.** + +- Mode-collapse risk: ZI-MAF's joint distribution over 150 variables can collapse onto a lower-dimensional manifold, especially when many variables are zero-inflated with correlated zero patterns (same person has zero across many income sources at once). +- Training time: MAF is GPU-accelerated and scales linearly in rows. 1.5M rows × 150 cols × 200 epochs is feasible on a single H100, ~several hours. On Apple Silicon (Max's 48 GB M3), ~8–16 hours with MPS backend. +- Conditioning: the existing benchmark uses 3 condition variables. Real microdata conditions on ~10–20 demographics. Adding conditioning dimensions is the easier part of scaling MAF. + +**Prediction**: ZI-MAF's lead on CPS should hold or grow at scale (flows scale well with rows). Main risk is tail coverage — top-1% income, extreme wealth — which is exactly where the SS-model application cares most. + +### ZI-QDNN training + +**Deep quantile networks scale well but need careful tuning at width + depth.** + +- Row scaling: straightforward, O(N) per epoch, linear in batch size. +- Column scaling: the pinball loss surface gets jagged with many zero-inflated targets; per-target head design matters more at 150 vars than at 4. +- Zero-inflation head: a single logistic head for `P(zero)` becomes underpowered at 150 zero-capable variables with complex joint zero patterns (observing income=0 informs dividends=0 informs wages=0). Joint zero-mask modeling is probably needed. + +**Prediction**: ZI-QDNN as currently implemented will degrade fastest under scale-up without a joint zero-mask head. Worth testing whether a graph-structured zero-mask extension rescues it. + +### PRDC coverage = 0 on PSID across all methods + +This is unresolved in the existing benchmark and is the single most important thing to diagnose before the SS-model longitudinal extension commits to PSID. Three hypotheses: + +1. **Test-setup degeneracy.** PSID-schema's observed-variable mask may overlap with the CPS / SIPP masks in a way that produces an empty held-out set. Check the mask logic. +2. **Panel structure breaks per-record PRDC.** PSID is a panel; a "record" could mean a person-year or a person. If the test set uses person-year and the synthesizer generates persons, coverage is trivially 0. Fix: switch to a panel-aware metric (per-person trajectory coverage) or generate person-years. +3. **Real limitation.** Attrition + sparse-year coverage in PSID creates tail records the synthesizers cannot cover. If this is the case, the SS-model trajectory training must either accept this ceiling, use a different panel source (SIPP panel, HRS, NLSY), or augment PSID with synthetic history. + +**Action**: diagnose before any PSID-dependent architecture work commits. + +## Proposed scale-up experiment protocol + +Run three stages, each keeping row count and column count explicit. All stages report three classes of metric: accuracy (coverage), cost (time + memory), and health (convergence + rare-cell preservation). + +### Stage 1 — medium rows, medium columns + +Scale: **100,000 rows × 50 columns** + +Data: subsample enhanced_cps_2024 to 100k persons, select 50 PE-native-relevant columns (income components, demographics, tax inputs, benefit receipts). Use a real subsample, not synthetic. + +Purpose: exercise real joint structure (tails, categorical constraints, zero correlations) without the full row cost. Should fit comfortably in 48 GB RAM on CPU, in hours. + +Metrics per method: +- PRDC coverage on 20% holdout (computed in raw features and in a 16-dim PCA embedding) +- Per-stratum coverage (age × income-bracket × filing-status cells) — specifically flag any cell with <10 records that drops to 0 coverage +- Rare-subpopulation preservation (elderly self-employed, young dividend, SSDI, top-1% earnings — the `sparse_coverage.csv` pattern) +- Training wall time +- Peak RSS during training +- Generation wall time for 100k samples +- Zero-rate MAE per variable + +### Stage 2 — large rows, medium columns + +Scale: **1,000,000 rows × 50 columns** + +Data: 10× oversample of stage 1's column set with enhanced_cps_2024 clone-and-assign style replication (as PE-US-data does for local area) to reach 1M rows. + +Purpose: expose row-scaling failures before column scaling. ZI-QRF is the most likely to fall off here. ZI-MAF should be OK. ZI-QDNN should scale cleanly. + +Same metrics as stage 1. + +### Stage 3 — full rows, full columns + +Scale: **3,373,378 rows × 155 columns** (exactly the v6 seed-ready shape, so we can compare the post-donor frame at production scale). + +Data: the actual v6 seed frame if we can retrieve it from the log (it was never persisted); otherwise regenerate by running donor integration only. Since we don't have the v6 artifact, this stage requires regenerating the seed — ~9 hours of donor integration. + +Purpose: verify which synthesizer survives production scale, in what time, at what memory cost. + +Same metrics, plus: +- Time to first valid sample (can we get ANY synthetic records out?) +- Sample quality trajectory over training time (does it stabilize, or degrade with more training?) +- Memory peak vs memory average (does it OOM on a 48 GB machine?) + +## Runtime expectations (rough a priori) + +Order-of-magnitude estimates for training one model to convergence on a 48 GB M3: + +| Method | Stage 1 (100k × 50) | Stage 2 (1M × 50) | Stage 3 (3.4M × 155) | +|---|---|---|---| +| ZI-QRF | minutes | hours, may OOM | days or infeasible; needs subsample | +| ZI-MAF | 30 min (CPU) / 5 min (MPS) | few hours (MPS) | 8–16 hours (MPS), needs batch tuning | +| ZI-QDNN | 15 min (CPU) / 3 min (MPS) | 1–2 hours (MPS) | 4–8 hours (MPS), lowest memory footprint | + +These are coarse and based on library benchmarks + extrapolation. The scale-up experiment's actual measurements are what we commit to. + +## Evaluation contract — matched-size comparison + +To avoid the "we ran ZI-MAF at 1M and ZI-QRF at 100k and declared a winner" trap, all three stages enforce: + +- **Same held-out split** across methods per stage (same 20% records). +- **Same feature set** across methods per stage. +- **Same wall-time budget** for training. (If ZI-QRF hits the budget without converging, that counts as its stage-3 result — "did not finish.") + +Report all three as a single table with method × stage × metric cells. Pick production defaults from this table alone, not from the existing 10k-row benchmark. + +## What this experiment would actually update + +1. **Production synthesizer default for G1.** Currently implied as ZI-MAF from the small benchmark. Scale-up may confirm or overturn. +2. **SS-model methodology doc's ZI-QDNN production claim.** If ZI-QDNN does not emerge as a clear winner at scale, the doc needs a pointer to this evaluation. +3. **PSID coverage ceiling.** If PSID coverage-0 is a real limitation, the longitudinal-training plan needs a fallback panel source. +4. **Compute budget for production runs.** Knowing that ZI-MAF needs 12 hours MPS at production scale changes how often we can iterate on synthesizer hyperparameters. + +## Out of scope (for now) + +- Training on real-panel data at scale. The stage-3 experiment uses the cross-section; panel synthesis is a separate scale-up that depends on PSID-coverage diagnosis first. +- Comparing against external non-microplex synthesizers (CTGAN, TVAE, TabDDPM, TabPFN) at full scale. Do after internal best is clear. +- Runtime on GPU clusters. Local laptop numbers first; remote GPU only if production bottleneck demands it. + +## Risks to the experiment itself + +1. **Retrieving the v6 seed frame requires rerunning donor integration** (~9h) because v6 never persisted. A cheaper alternative: use the enhanced_cps_2024 HDF5 at its native scale (~400k persons × ~250 columns — already close to stage-3 scale) and adapt the donor conditioning. +2. **PRDC in 150D is likely noise.** Budget time for the embedding-based variant before committing to any absolute coverage number. +3. **ZI-QRF may be infeasible at stage 3.** That is itself a finding; have a fallback "QRF on top-20-important-columns" variant ready to report as a scale-constrained baseline. +4. **The existing synthesizers may not even run at stage 3** without code changes (memory bugs at scale). Budget for 1–2 days of debugging on first attempt. + +## Minimum useful subset + +If full three-stage execution is too costly as a first pass, the minimum that informs the rearchitecture direction is **stage 1 alone**: 100k real-subsample rows × 50 real-feature columns, running all three ZI variants, reporting coverage + runtime + rare-cell preservation. + +That alone would invalidate or confirm the small-benchmark conclusions and give us enough signal to pick a G1 default. diff --git a/docs/v6-postmortem.md b/docs/v6-postmortem.md new file mode 100644 index 0000000..11d2bf7 --- /dev/null +++ b/docs/v6-postmortem.md @@ -0,0 +1,77 @@ +# v6 post-mortem — 2026-04-16 + +Record of the `broader-donors-puf-native-challenger-v6` run (launched 2026-04-16 10:20:10 ET, died 22:56:05 ET). + +## Outcome + +**RUN_EXIT status=1** after 12h 36m of wall time. Killed by the kernel during entropy calibration. No artifact directory created; no final dataset persisted. + +## Timeline of the post-donor window + +The post-donor stage instrumentation (commit `960ac2f`) was the single highest-value diagnostic change of the session. It let us localize the OOM to a specific named stage for the first time. + +| Time (ET) | Stage marker | +|---|---| +| 10:20:10 | RUN_START | +| ~19:29 (9h 9m in) | last donor block complete (`scf_2022/social_security_pension_income`) | +| 21:04:03 | `seed ready` → `targets start`/`complete` → `synthesis variables ready` → `synthesis start`/`complete` → `support enforcement start`/`complete` → `policyengine tables start` (all in one burst; synthesis backend = seed-copy so the burst is dominated by the strip+cap pass between donor integration and tables) | +| ~22:25 | `policyengine tables complete` [households=1,505,108, persons=3,373,378] | +| ~22:25 | `policyengine calibration start [backend=entropy]` | +| 22:56:05 | RUN_EXIT status=1, kernel signal (macOS `time -l` reported "signal: Invalid argument" on the wrapper) | + +## Memory signature + +From macOS `time -l` rusage at exit: + +| Metric | v6 | v4 (previous run) | +|---|---|---| +| Wall time | 45,355 s (12h 36m) | 39,476 s (10h 58m) | +| Max RSS | 22.0 GB | 20.5 GB | +| Peak phys_footprint | 293 GB | 287 GB | +| Instructions retired | 614 T | 612 T | +| Involuntary context switches | 317 K | 264 K | + +v6's signature is nearly identical to v4's — same killer, same point. + +## Diagnosis + +**`calibrate_policyengine_tables` with `backend=entropy` on 1.5M households is the OOM killer.** + +Proximate cause: a 48 GB machine cannot hold the working set the entropy solver needs for that scale. Peak phys_footprint of 293 GB on 48 GB RAM implies heavy compression and swap pressure; eventually the kernel kills the process. + +Likely underlying structural cost (not measured, but fits the profile): + +- Entropy calibration materializes a dense Jacobian-like matrix roughly `(n_households × n_constraints)` in float64. +- With 1,505,108 households and ~1,255 constraints post-feasibility-filter (from the 2026-03-30 review), that's 15 GB for a single copy. Multiple working copies (gradient, Hessian approximation, line-search scratch) easily exceed RAM. +- `_evaluate_policyengine_target_fit_context` then runs a full PolicyEngine simulation on the calibrated frame, which adds its own memory cost on top. + +## What survived + +v6 demonstrated that the **tables-build phase works at scale**: `build_policyengine_entity_tables` successfully produced a 1.5M-household × 3.4M-person entity bundle. This was an open question after v4. The stage isn't free (roughly 1h 25m at 180–210% CPU, RSS oscillating 0.2–16%), but it doesn't OOM. + +The donor integration also ran clean. All 129 donor blocks across CPS ASEC, IRS SOI PUF, SIPP tips, SIPP assets, and SCF completed without failure. The tax-unit entity-bundle construction took ~89 min (one-time cost per run). Multi-source donor imputation is not the bottleneck. + +## What v6 ruled out as the killer + +The initial v4 diagnosis hypothesized the silent post-donor window might be in synthesis, support enforcement, or tables-build. v6's instrumentation showed those all complete instantly or within ~1.5 hours. The killer is specifically **entropy calibration**, not an earlier stage. + +## What this means for the architecture direction + +v6 is an evidence point *for* the `spec-based-ecps-rewire` direction rather than against it: + +1. **Entropy calibration on a 1.5M-household monolithic solve is a dead end on a 48 GB machine.** The rearchitecture's hierarchical / identity-preserving calibration pattern (national → state → stratum, `microcalibrate`-style chi-squared) avoids the dense-matrix blow-up by chunking over strata. +2. **Scaffold scale is the real lever.** The 3.4M-row ACS scaffold drives both tables-build size and calibration-matrix size. CPS-core at ~430k persons cuts this at the source. +3. **The instrumentation pattern is reusable.** Keeping named stage markers at every pipeline boundary in the new pipeline will make any future OOM localizable in a single run rather than requiring multiple exploratory runs. + +## What v6 does NOT tell us + +- Whether the imputation quality would have beaten `enhanced_cps_2024` on PE-native broad loss had it finished. No parity artifact was produced. +- Whether the `pe_plus_puf_native_challenger` condition selection is an improvement. Moot now that the pipeline direction is changing. +- The actual numerical Calibrator's behavior on 1.5M households. The failure was upstream of any Calibrator numerical work — the process died while setting up the constraint matrices. + +## Status of v6 artifacts + +- Log file: `artifacts/live_pe_us_data_rebuild_checkpoint_20260414_pe_plus_puf_native_challenger_broader/broader-donors-puf-native-challenger-v6.log` (~2,224 lines) +- No output artifact directory (build never completed persistence step) +- tmux session: cleaned up +- No action required on artifacts — they stay on disk as part of the experiment trail. diff --git a/docs/zi-factorial.md b/docs/zi-factorial.md new file mode 100644 index 0000000..d267311 --- /dev/null +++ b/docs/zi-factorial.md @@ -0,0 +1,100 @@ +# ZI × draw-method factorial at 77k × 50 + +*Answers Max's question: should the zero-inflation strategy be chosen independently of the draw method?* + +## Design + +Four draw methods × two zero-inflation variants = eight cells. All runs on Enhanced CPS 2024 at 77,006 records × 50 columns, PRDC capped at 15,000 samples, seed 42. + +- **No ZI**: base method (`CART`, `QRF`, `QDNN`, `MAF`) — fit one per-column model on the full training set, sample or predict directly at generation. +- **ZI**: base method preceded by a `RandomForestClassifier` (50 trees) predicting $P(y > 0 \mid x)$ when training-set zero fraction exceeds 10 %. The per-column model is then fit on the non-zero subset only, and at generation time the draw is zero with probability $1 - \hat{P}(y > 0 \mid x)$. + +## Results + +PRDC coverage (bold per row = best within that draw method): + +| Draw method | No ZI | ZI | Δ | Zero-rate MAE (No ZI) | Zero-rate MAE (ZI) | +|---|---:|---:|---:|---:|---:| +| CART | 0.9055 | **0.9098** | +0.004 | 0.013 | 0.013 | +| QRF | 0.9328 | **0.9341** | +0.001 | 0.015 | 0.013 | +| QDNN | 0.6033 | **0.7068** | +0.103 | **0.582** | **0.136** | +| MAF | **0.0986** | 0.0928 | −0.006 | **0.332** | **0.081** | + +## Reading + +1. **CART and QRF are essentially indifferent to the ZI wrapper.** Coverage differences are within single-seed noise (< 0.005), and zero-rate MAE is nearly identical across the two configurations. Both methods' per-column draws naturally preserve zero mass: CART's leaf-sample-from-empirical produces zeros at the training-set leaf rate, and QRF's quantile draws reproduce zero quantiles when a leaf's training distribution has mass at zero. The RF zero-classifier is redundant for these methods. + +2. **QDNN genuinely needs ZI handling.** Coverage jumps 0.603 → 0.707 (+0.103) and zero-rate MAE drops 0.582 → 0.136. Without ZI, QDNN produces continuous-valued quantile predictions that never exactly equal zero, so all 0-valued real records are mis-covered. The ZI classifier essentially masks the neural draw to zero for records the classifier thinks are zero, restoring a credible zero-rate structure. + +3. **MAF is broken with or without ZI.** Coverage stays near 0.09, zero-rate MAE is terrible under both configurations. The per-column-independent MAF architecture is the binding constraint; the ZI wrapper saves the zero-rate MAE from 0.33 to 0.08 (helpful for diagnostics but not enough to fix coverage). Hyperparameter expansion didn't close the gap either (see `zi-maf-hyperparameter-search.md`). + +## Does ZI choice depend on draw method? Yes. + +The factorial reveals that the "ZI wrapper" is a no-op for draw methods whose leaf- or quantile-level draws already preserve zero structure implicitly (CART, QRF), and a critical fix for draw methods that produce smooth continuous predictions (QDNN, MAF). There is no single best ZI strategy; the right choice depends on what the draw method does with zero observations. + +This has two practical implications: + +1. **`ZIQRFMethod` and `ZICARTMethod` do not justify their extra complexity.** The `_MultiSourceBase` inheritance pattern that adds an RF zero-classifier before a QRF or CART draw adds 1–2 seconds of compute and meaningful memory (ZI-CART 7.8 GB vs CART 0.5 GB, because the RF classifier is kept in memory alongside the CART per column) for essentially zero accuracy gain. Production pipelines using tree methods should consider the base variants directly. + +2. **For neural methods, the ZI classifier is not optional.** QDNN without ZI produces 0-vs-0.33 zero-rate MAE and 10 coverage points of damage. Any paper or benchmark that tests QDNN-family synthesizers without explicit zero handling is measuring a different (and worse) method. + +## Production recommendation update + +The cross-section synthesizer recommendation becomes: + +- **CART (plain, no ZI)** — fastest path, competitive accuracy, and simplest to reason about. Near-synthpop default. +- **QRF (plain, no ZI)** — accuracy maximizer, ~5× the fit time of CART for 2 points of coverage. +- **Avoid ZI wrappers on tree methods.** They don't help. +- **Do use ZI wrappers on neural methods.** They rescue a substantial fraction of the damage, though not all of it. + +## ZI classifier comparison (QDNN) + +Having established that the ZI wrapper matters for QDNN, the next question is whether a different zero-classifier improves ZI-QDNN. Five classifiers were swapped into `ZI-QDNN`'s pipeline on the 77k × 50 benchmark (seed 42): + +| Classifier | Coverage | Precision | Zero-rate MAE | Fit (s) | +|---|---:|---:|---:|---:| +| **RF (default, 50 trees, uncalibrated)** | **0.7081** | 0.8343 | 0.1359 | 100 | +| HistGradientBoostingClassifier | 0.7017 | 0.8334 | 0.1370 | 137 | +| MLP (64 × 32, Adam, early stop) | 0.6984 | 0.8397 | 0.1376 | 130 | +| RF + isotonic calibration (3-fold) | 0.6983 | 0.8309 | 0.1370 | 109 | +| Logistic regression | 0.6941 | 0.8336 | 0.1362 | 107 | + +All five classifiers cluster within 0.014 coverage points, at or below our multi-seed standard deviation (≈0.002–0.003). **The ZI classifier choice does not meaningfully affect coverage on QDNN at this scale and schema.** The 50-tree RF default is effectively optimal among the alternatives tested. + +The interpretation is that the information content of $P(y > 0 \mid x)$ is already captured by a 50-tree RF — a stronger classifier (HistGB, DNN) does not extract additional signal, calibrated probabilities do not propagate to better coverage, and logistic regression is mildly worse because its linear decision boundary under-fits on some columns. + +What would actually lift ZI-QDNN above 0.71 coverage is not a better zero-classifier but an architectural change: joint zero-mask modeling (one classifier predicting the full 36-dim zero pattern so cross-target zero correlations are captured), joint quantile output (shared-backbone multivariate QDNN), or post-hoc calibration of the quantile network's own pinball-loss output. These are deferred future work. + +## Isolated log-loss evaluation + +The coverage tie above could mean either (a) the five classifiers produce genuinely similar $P(y > 0 \mid x)$, so the downstream is honestly reporting, or (b) the classifiers differ materially but the QDNN non-zero draw's error swamps the signal. An isolated per-column evaluation decouples the two. + +Protocol: same outer 80/20 train/holdout split as the coverage benchmark (seed 42), then an inner 80/20 split within training into fit/val (49,283 fit, 12,321 val). For each of the 36 target columns with training-set zero-fraction ≥ 10 % (26 eligible columns), each classifier is fit on (`X_fit`, `(~at_min)_fit`) and scored on val with log-loss, Brier, equal-width ECE (10 bins), and ROC-AUC. + +| Classifier | Log-loss (mean) | Log-loss (median) | Brier | ECE | AUC (mean) | AUC (median) | +|---|---:|---:|---:|---:|---:|---:| +| **HistGB** | **0.2252** | **0.1712** | **0.0707** | **0.0050** | **0.809** | **0.822** | +| DNN | 0.2337 | 0.1956 | 0.0732 | 0.0070 | 0.748 | 0.773 | +| RF + isotonic (3-fold) | 0.2343 | 0.1834 | 0.0739 | 0.0081 | 0.763 | 0.780 | +| Logistic regression | 0.2468 | 0.2028 | 0.0770 | 0.0180 | 0.756 | 0.763 | +| RF default (50 trees, uncalibrated) | 0.3095 | 0.2523 | 0.0810 | 0.0394 | 0.737 | 0.762 | + +**The isolated picture is the opposite of the coverage picture.** The default 50-tree RF — the classifier that was effectively tied on PRDC coverage — is the *worst* classifier on log-loss (spread 0.085, about 6× the coverage spread), Brier, AUC, and calibration. Its ECE is ~8× worse than HistGB's. The AUC gap between RF (0.737) and HistGB (0.809) is 7 points — well outside any plausible noise band. + +This resolves the earlier ambiguity cleanly: + +1. **The ZI classifier choice does matter for the quantity the ZI wrapper is ostensibly predicting.** HistGB has meaningfully better $P(y > 0 \mid x)$ than an uncalibrated 50-tree RF on nearly every axis — log-loss, Brier, calibration, discrimination. + +2. **But the downstream QDNN draw swamps the signal.** Seven points of AUC and an order-of-magnitude calibration improvement produce zero coverage gain. The bridging logic (zero with probability $1 - \hat{P}(y > 0 \mid x)$, otherwise draw from the non-zero QDNN) is dominated by error in the non-zero draw, not error in the classifier. + +3. **The binding constraint for ZI-QDNN's coverage is downstream of the classifier.** Swapping classifiers alone cannot lift ZI-QDNN past 0.71 coverage — this requires improving the non-zero quantile output (joint modeling, pinball-loss recalibration, architectural change). + +There is a secondary implication for uses of the zero-classifier as a diagnostic rather than a generator component: if we ever surface $\hat{P}(y = 0 \mid x)$ as a subgroup-level or record-level signal (e.g., "this household is 80% likely to have zero long-term capital gains"), the RF default is not the right model. HistGB or a calibrated RF should be preferred there, because the calibration and discrimination gaps that are invisible on coverage become directly user-visible on calibration plots and top-k retrieval. + +## Artifacts + +- `artifacts/stage1_77k_no_zi.json` — pure QRF, QDNN, MAF at 77k +- `artifacts/stage1_77k_cart_variants.json` — CART, ZI-CART, ZI-QRF at 77k +- `artifacts/stage1_77k_4methods.json` — ZI-CART, ZI-QRF, ZI-QDNN, ZI-MAF at 77k +- `artifacts/zi_classifier_comparison.json` — 5 ZI classifiers on QDNN at 77k (coverage) +- `artifacts/zi_classifier_isolated_eval.json` — 5 ZI classifiers in isolation (log-loss / Brier / ECE / AUC) diff --git a/docs/zi-maf-hyperparameter-search.md b/docs/zi-maf-hyperparameter-search.md new file mode 100644 index 0000000..aae83ea --- /dev/null +++ b/docs/zi-maf-hyperparameter-search.md @@ -0,0 +1,90 @@ +# ZI-MAF hyperparameter search — does tuning rescue the method? + +*Direct test of the stage-1 follow-up flagged in `docs/stage-1-pilot-results.md`.* + +## Setup + +40,000 rows × 50 columns of real enhanced_cps_2024 (identical to stage-1). ZI-MAF trained at four progressively bigger configurations on the same seed and split. PRDC evaluated in 50-dim raw feature space, capped at 15 k × 15 k samples (same cap as stage-1 77 k). + +| Config | n_layers | hidden_dim | epochs | batch | lr | Approx params | +|---|---:|---:|---:|---:|---:|---:| +| default | 4 | 32 | 50 | 256 | 1e-3 | baseline | +| wide | 4 | 128 | 50 | 256 | 1e-3 | 4× params | +| long | 4 | 32 | 200 | 256 | 1e-3 | 4× training | +| wide+long | 8 | 128 | 200 | 256 | 5e-4 | 16× both + deeper | + +## Results + +| Config | Coverage | Precision | Density | Fit (s) | Gen (s) | +|---|---:|---:|---:|---:|---:| +| default | 0.0262 | 0.0083 | 0.0038 | 124 | 0.7 | +| wide | 0.0293 | 0.0088 | 0.0043 | 228 | 0.8 | +| long | 0.0318 | 0.0097 | 0.0048 | 467 | 0.6 | +| wide+long | **0.0328** | 0.0107 | 0.0050 | 1,711 | 1.0 | + +Fit time to get from 0.026 → 0.033 coverage: 14× the compute budget. Compare to ZI-QRF on the same data at the same PRDC cap: **coverage 0.352 in 19 s**. + +## Verdict + +**ZI-MAF is confirmed non-competitive at stage-1 scale with the method-class architecture.** Expanding capacity (4× width), training longer (4× epochs), and doing both with deeper layers (16× total + 8 layers) moves coverage from 0.026 to 0.033 — a 25 % relative improvement. ZI-QRF's 0.352 is 10 × higher at 1/90 the fit time. + +The stage-1 finding stands: ZI-QRF is the production synthesizer, not ZI-MAF. No amount of hyperparameter tuning at the default architectural level is going to close a 10× gap. + +## Why ZI-MAF fails here + +Hypotheses, ordered by how plausible they seem on this evidence: + +1. **Per-column independence.** `ZIMAFMethod` trains one `ConditionalMAF` per target column independently. With 36 target columns, 36 flows each only learn `P(col_i | conditioning)` — there's no mechanism to capture cross-target correlations (e.g., someone with high wage income also has zero SNAP). Joint-target flows would be architecturally different but expensive. Tree methods (ZI-QRF) implicitly capture some of these via the conditioning features, but their per-column independence is less damaging because each tree doesn't try to encode a full joint distribution. + +2. **Zero-inflation classifier + flow combo.** The method first classifies P(zero) via a 50-tree RF, then trains a flow on the non-zero subset. If the classifier over-predicts zero on rare non-zero cells (see stage-1's `disabled_ssdi` ratio = 0, `elderly_self_employed` ratio = 100+), the flow is trained on a biased subset and produces samples that don't cover the missing support. + +3. **Log-transform + standardization on heavy-tailed targets.** The flow log-transforms positive values (`np.log1p(y[y>0])`) and standardizes. For variables with extreme tails (top-1% employment income, net-worth-level wealth), this compresses the tail and the flow produces samples concentrated around the mode; the sparse tail coverage is exactly what PRDC measures. + +4. **No conditional target structure.** MAF learns `P(y | x)` where `x` is the shared demographics. 14 conditioning dims predicting 36 target dims (each modeled as 1-dim marginal flow conditional on the 14) may be under-identified at 40k × 36 samples per column. + +## What would change my mind + +A single condition that would lift ZI-MAF into competitive range: + +- **Joint-target flow**: one flow over all 36 target columns simultaneously, not 36 independent flows. Direction matches the SS-model methodology doc's "pathwise / trajectory" framing for longitudinal work. +- **Better zero-inflation handling**: a joint zero-mask model (which 36-dim binary vector does this person have?) instead of 36 independent RF classifiers. Training signal correlates zero patterns across targets. +- **Embedding-based PRDC**: the validation run flagged in `stage-1-pilot-results.md` could show ZI-MAF produces structurally-right samples that raw-feature PRDC misses. Separate investigation. + +None of these are in the current `ZIMAFMethod` class. Rewriting them is a materially different project. + +## Implication for the SS-model methodology doc + +The doc names ZI-QDNN as the production direction with ZI-MAF as a reasonable alternative. Neither survives stage-1 tuning at scale. The near-term cross-section synthesizer default on the rewire is **ZI-QRF**; any future trajectory-based modeling for the longitudinal extension will need a materially different architecture than per-column independent flows. + +## Where this leaves us + +- **G1 cross-section default**: ZI-QRF. Locked in. +- **ZI-MAF / ZI-QDNN**: not dead as research directions, but are dead as production defaults in their current `microplex.eval.benchmark` implementations. +- **Followup worth trying before fully ruling out neural**: joint-target flow + joint zero-mask model. Needs ~a week of implementation and may still not close the gap. + +## Reproducibility + +```bash +uv run python -c " +import json, time, numpy as np, pandas as pd +from microplex_us.bakeoff import ScaleUpRunner, ScaleUpStageConfig, DEFAULT_CONDITION_COLS, DEFAULT_TARGET_COLS, stage1_config +from microplex.eval.benchmark import ZIMAFMethod +from prdc import compute_prdc +from sklearn.preprocessing import StandardScaler + +base = stage1_config() +cfg = ScaleUpStageConfig( + stage='zi_maf_tuning', n_rows=40000, methods=('ZI-QRF',), + condition_cols=DEFAULT_CONDITION_COLS, target_cols=DEFAULT_TARGET_COLS, + holdout_frac=0.2, seed=42, k=5, n_generate=32000, + data_path=base.data_path, year=base.year, rare_cell_checks=(), + prdc_max_samples=15000, +) +runner = ScaleUpRunner(cfg) +df = runner.load_frame() +train, holdout = runner.split(df) +# ... fit and evaluate each config ... +" +``` + +Full results in `artifacts/zi_maf_tuning.json`. Wall time for all four configs: ~43 min. diff --git a/paper/.gitignore b/paper/.gitignore new file mode 100644 index 0000000..ad29309 --- /dev/null +++ b/paper/.gitignore @@ -0,0 +1,2 @@ +/.quarto/ +**/*.quarto_ipynb diff --git a/paper/AFFILIATION.md b/paper/AFFILIATION.md new file mode 100644 index 0000000..bb67698 --- /dev/null +++ b/paper/AFFILIATION.md @@ -0,0 +1,14 @@ +# Affiliation and independence — rules for this paper + +**Sole affiliation**: Cosilico. + +**Not affiliated with PolicyEngine**, for tax and organizational independence reasons. PolicyEngine is cited as prior work and as a benchmark comparator where relevant (e.g., `policyengine-us-data`, Enhanced CPS, `microcalibrate`), but: + +- Max Ghenis appears only as "Cosilico" on the author byline. +- No co-authorship with PolicyEngine team members is implied or acknowledged. +- Email is `max@cosilico.ai`, not `max@policyengine.org`. +- Acknowledgments may thank PolicyEngine's published work but must not frame this paper as a joint product. +- Quotes from or comparisons to PE-US-data are framed as "the incumbent public tool we measure against," consistent with how `microplex-us/docs/superseding-policyengine-us-data.md` already treats the relationship. +- Any language in drafts that could read as "built with / in collaboration with PolicyEngine" must be rephrased. + +Apply this rule to every section: abstract, introduction, methods, acknowledgments, appendices, captions, and bibliography entries that credit an author affiliation. diff --git a/paper/README.md b/paper/README.md new file mode 100644 index 0000000..0251136 --- /dev/null +++ b/paper/README.md @@ -0,0 +1,34 @@ +# `microplex-us` paper + +Quarto manuscript and supporting materials. + +## Affiliation + +Cosilico-only. See `AFFILIATION.md` — this work is intentionally independent of PolicyEngine for tax-and-organization reasons. + +## Contents + +- `_quarto.yml` — project config, HTML + PDF outputs. +- `index.qmd` — main manuscript. +- `literature-review.qmd` — standalone literature survey, cited by the main paper. +- `references.bib` — BibTeX bibliography, confirmed citations only. +- `AFFILIATION.md` — hard rule on affiliation independence. Re-read before adding any acknowledgment or author line. + +## Build + +```bash +cd paper +quarto render # both HTML and PDF +quarto render index.qmd # main paper only +quarto preview # live-reload local server +``` + +Output lands in `_output/`. + +## Cross-references and figures + +Figures and tables are sourced from `../artifacts/` (`stage1_77k_snap.json`, `zi_maf_tuning.json`, `embedding_prdc_compare.json`, `calibrate_on_synthesizer.json`). When final figures land, they should be generated as Quarto chunks rather than hand-placed PNGs so they re-render against the latest artifact set. + +## Citation style + +APA via Quarto's built-in CSL. Change in `_quarto.yml` if the target journal has a different requirement. diff --git a/paper/REVIEW-RESPONSE.md b/paper/REVIEW-RESPONSE.md new file mode 100644 index 0000000..ef273ad --- /dev/null +++ b/paper/REVIEW-RESPONSE.md @@ -0,0 +1,218 @@ +# Consolidated referee review and revision plan + +*Five subagent referee reviews ran in parallel on 2026-04-17 evening on the paper scaffold. This doc synthesizes their findings into an ordered revision plan.* + +## Reviewer verdicts + +| Reviewer | Verdict | Main issue | +|---|---|---| +| Citation | Minor revisions | Synthcity author mismatch; identity-preservation framing overstated vs Dekkers 2015 | +| Methodology | Major revisions | Single-seed, non-converged calibration presented as final, correlated "robustness checks" | +| Domain | Major revisions | 36 "target columns" are inputs not policy outputs; ecosystem under-represented | +| Stylistic | Major revisions | 4 of 7 body sections are stubs; solo-authored "we"; documentation register | +| Reproducibility | Major revisions | No code/data availability statement; 2 of 4 robustness checks used pre-snap data | + +Four of five reviewers reach Major Revisions. The draft is not submittable in its current state but is recoverable within 1–2 weeks of focused work. + +## Critical findings (blocker before submission) + +### B1. Two "independent robustness checks" used the pre-snap broken pipeline [RESOLVED] + +The reproducibility reviewer identified that `artifacts/embedding_prdc_compare.json` (Apr 17 08:03) and `artifacts/calibrate_on_synthesizer.json` (Apr 17 08:06) predated the snap fixes (harness-side at 12:06, upstream-core at 12:20). Both scripts called `method.fit` and `method.generate` directly without invoking `_snap_categorical_shared_cols`. + +**Resolution (2026-04-17 21:15/21:17)**: both scripts were re-run against the post-fix upstream `microplex` (commit `81a5e10`, "Only smooth-noise continuous shared cols, not categorical ones"). The pre-fix artifacts were preserved with a `.pre-snap.json` suffix for audit; the post-fix artifacts replaced the original `.json` filenames. Comparison: + +| Artifact | Pre-snap coverage (ZI-QRF, 40k raw) | Post-snap coverage (ZI-QRF, 40k raw) | +|---|---:|---:| +| `embedding_prdc_compare.json` | 0.348 | 0.982 | +| `calibrate_on_synthesizer.json` | pre-cal rel-err 0.256 | pre-cal rel-err 0.317, post-cal 0.105 | + +Ordering is preserved (ZI-QRF > ZI-QDNN > ZI-MAF) under both regimes; absolute post-snap numbers are the ones reported in §5. Paper text at lines 252–268 already references the post-snap artifacts. + +### B2. The 36 "target columns" are input variables, not policy outputs + +The domain reviewer's single most important finding: the paper uses `employment_income_last_year`, `snap_reported`, `ssi_reported`, etc. — CPS-reported amounts — as "targets." A tax-microsim reviewer expects "targets" to mean policy outputs: federal income tax liability, state income tax, computed EITC/CTC, SNAP benefits under program rules, SSI amounts. + +Two options: + +- **Rename**. Call them "conditioning income and benefit columns" or "target income components." Do this at minimum; the current language is misleading. +- **Add downstream validation**. Run `policyengine-us` (and/or TAXSIM, Tax-Calculator, TPC — whichever the reviewer population cares about most) on microplex-us output data and report computed federal tax, EITC disbursed, CTC disbursed, SNAP/SSI/ACA PTC aggregates against external benchmarks (IRS SOI tables, USDA SNAP totals, SSA SSI totals, CBO SNAP outlays). This is the test a tax-microsim reviewer actually wants. + +Recommendation: do both. Rename immediately; add the downstream validation as a major new results subsection. + +### B3. Four of seven body sections are stubs + +Architecture (§3), Methods (§4), rare-cell subsection (§5.3), Discussion (§6), Conclusion (§8) are either parenthetical placeholders or explicit TBD. Not submittable in this state. + +**Action**: work through these in order. Methods first (reviewer can't evaluate anything else until they know what was done). Architecture second. Results-rare-cell third. Discussion and Conclusion last. + +### B4. No Code and Data Availability statement + +Standard requirement at every target venue. Must state data source (HuggingFace URL with pinned revision), code repository, software versions, Python version, OS tested, hardware, expected wall time, license. + +**Action**: add `## Code and Data Availability` section after Limitations. One paragraph. + +### B5. Conflicts of Interest disclosure missing + +Author founded PolicyEngine and previously led Enhanced CPS work (cited extensively in this paper). The `AFFILIATION.md` rule is followed in the byline and acknowledgments, but silence on the prior affiliation is a disclosure gap. Per domain reviewer: "Silence on the question will read worse than acknowledgement." + +**Action**: add explicit COI statement. Template: "The author founded PolicyEngine and previously led work on Enhanced CPS [@ghenis2024ecps]. The present work is conducted at Cosilico, an independent commercial entity, and is not a joint product with PolicyEngine. PolicyEngine's Enhanced CPS is cited as the incumbent public tool against which microplex-us is measured." + +## High-priority revisions (before review circulation) + +### H1. Convert first-person plural to first-person singular (or third-person) + +Solo-authored paper uses "we" throughout both documents. Per the project's global style rule and the target venues' conventions, this should be "I" or third-person recast. The stylistic reviewer identified ~20 instances needing judgment-based conversion (global find-and-replace won't work). + +### H2. Self-contain the Related Work section + +Line 56 of `index.qmd` says "A full literature review for this paper is maintained in `literature-review.qmd`." This is a documentation move, not an academic one. Self-contain §2 with 400–600 words of prose. Keep `literature-review.qmd` as supplementary material. + +### H3. Remove all documentation-register artifacts + +- `*(This section is being written against the spec-based-ecps-rewire branch...)*` — convert to outline-as-prose. +- `[report low]` editorial marker at line ~100 — resolve. +- `77,006 × 50 scale` — rewrite as "77,006 records across 50 columns." +- "keeps every record alive" — "preserves all records" or "retains positive weight on every record." +- "mainline" — "primary calibration mechanism." +- Artifact paths referenced in body text — remove. + +### H4. Tables need captions, numbers, cross-reference labels + +All three tables are bare Markdown pipe-tables with no caption, no number, no Quarto `{#tbl-...}` label. Required for IJM / NTJ / JASA. + +### H5. Add at least one figure + +Pipeline schematic (source providers → donor blocks → chained QRF → calibration → L0 post-step) is the obvious first figure. Methods papers at the target tier with zero figures are unusual. + +### H6. Quantify or soften "widely-used upstream benchmark base class" + +Abstract claims the noise-injection defect "systematically biased earlier synthesizer comparisons." Evidence cited is one pre/post table on three methods using one base class. Either name the affected published benchmarks or soften to "introduced systematic bias into synthesizer comparisons using this base class." + +### H7. Citation form consistency + +Audit every `[@key]` vs `@key` for correct parenthetical vs textual intent. Pandoc renders them differently. + +## Medium-priority revisions (quality improvements) + +### M1. Uncertainty quantification + +Every headline table is a single-seed point estimate. Methodology reviewer correctly notes this is weak for a methods paper. ZI-QRF runs in 37 seconds — running 5-10 seeds is trivial compute. Report means with standard errors, or at least ordering-stability counts ("ordering preserved in 10/10 seeds"). + +### M2. Rerun with calibration converged + +All three entries in `artifacts/calibrate_on_synthesizer.json` have `"calibration_converged": false` at 200 epochs. The docs acknowledge this; the paper does not. Rerun at 1000-2000 epochs or report the epoch budget and frame as "fraction of pre-cal gap closed" rather than absolute post-cal error. + +### M3. Formal definition of identity preservation + +Currently asserted as an architectural property but never defined. Add Definition 1 in §3: *A weight-adjustment procedure $\phi: w \to w'$ is identity-preserving if $\forall i: w_i' > 0$ and $\phi$ does not drop records.* Either cite that `microcalibrate`'s gradient step satisfies this, or prove it. + +### M4. Embedding-PRDC circularity + +Autoencoder is fit on holdout only. Potential bias toward methods that match holdout idiosyncrasies. Re-run with AE fit on train (or an independent third partition). Report both. + +### M5. Soften "novel to PolicyEngine" Forbes claim + +Domain reviewer identified the SCF + Forbes precedent: Bricker-Henriques-Hansen-Moore (2016), Vermeulen (2018), Kennickell (2019). The tax-microsim integration remains novel; the broader pattern has precedent. Rewrite: "While top-wealth augmentation from Forbes-style lists is established practice in distributional national accounts [cites], its integration into a production tax-microsim pipeline is to our knowledge first done in policyengine-us-data." + +### M6. Cross-sectional motivation for identity preservation + +Domain reviewer: "Identity preservation also matters cross-sectionally for interpretability, subgroup analysis, confidentiality auditing, reproducibility and provenance." Add two paragraphs in Discussion making the cross-section case alongside the longitudinal case. + +### M7. ZI-QRF substrate circularity + +ECPS itself is QRF-constructed. ZI-QRF's win may be partly method-substrate match. Either add a non-ECPS robustness check (raw CPS ASEC or SCF) or explicitly note the circularity as a limitation. + +### M8. Target-set expansion + +Add Medicaid/CHIP, ACA PTC, mortgage interest, charitable contributions, medical expenses, property tax. Rerun at the expanded target set. + +### M9. Snap heuristic cardinality guard + +Stylistic and methodology reviewers flag that `_snap_categorical_shared_cols` fires on any integer-valued column, which could accidentally snap continuous-but-rounded columns (currency stored in dollars). Add cardinality threshold (e.g., snap only when `n_unique <= 50`). + +### M10. Decouple PRDC seed from split seed + +Currently both are `self.config.seed`. Use `seed + k` for the PRDC subsample. Average PRDC over 5+ subsample seeds per split to separate metric noise from split noise. + +## Low-priority revisions (cosmetic) + +### L1. Fix citation errors + +- Synthcity: author list should be Qian, Davis, van der Schaar for the NeurIPS 2023 D&B paper (not Cebere). Citation reviewer flagged as MAJOR but fix is trivial. +- Add TabPFGen (Ma et al., arXiv 2406.05216, 2024) — referenced in lit review but not cited. +- Add CTAB-GAN+ (Zhao et al. 2023, Frontiers in Big Data). +- Add Auten-Splinter (2024) as DINA counterweight to PSZ 2018. +- Add Meyer-Mok-Sullivan on CPS benefit under-reporting. +- Add Czajka-Hirabayashi-Moffitt-Scholz (1992) for statistical matching lineage. +- Add Ruggles (2025 PNAS) as engagement point. +- Remove `zhang2017privbayes` (unused) or cite. + +### L2. URL / DOI completeness + +Add URLs/DOIs for: patki2016sdv (IEEE DOI 10.1109/DSAA.2016.49), xu2019modeling (NeurIPS proceedings), naeem2020prdc (PMLR), kotelnikov2023tabddpm (PMLR), borisov2023great (OpenReview), and others listed by the citation reviewer. + +### L3. Bibliography cleanup + +- `solatorio2023realtabformer` should be `@misc` not `@article` with `journal = {arXiv preprint}`. +- `dementen2014liam2` needs `{de Menten}, Gaetan` brace protection. +- Standardize URL-only vs DOI-only policy (document the rule once). + +### L4. Table formatting + +- Pick one bolding rule (all best-per-column or none). +- Spell out abbreviated headers ("Fit (s)" → "Fit time (s)") or footnote them. +- Expand "Pre-cal" / "Post-cal" to "Before calibration" / "After calibration." + +### L5. Abstract cleanup + +- Expand ZI-QRF / ZI-QDNN / ZI-MAF / PRDC on first use. +- Replace "keeps every record alive," "mainline," "77,006 × 50 scale" per H3. +- Either support or drop "widely-used" (H6). + +### L6. Remove unused references from `.bib` + +`ruggles2025synth` (cited in lit review but not index.qmd; consider citing in index.qmd per domain reviewer M1), `zhang2017privbayes`. + +### L7. Cite each data product on first reference + +CPS ASEC, ACS, PUF, SCF, SIPP need primary-source citations on first use. + +### L8. Repository hygiene + +- Add `LICENSE` file at repo root. +- Add regression test for ordering (e.g., `test_stage1_10k_ordering`). +- Move paper tables to Quarto chunks that read from `../artifacts/*.json` to auto-update. + +## Revision order + +Roughly the sequence to work through: + +1. **Rerun pre-snap artifacts** (B1). Half-hour compute. +2. **Rename target columns + add downstream tax-output validation** (B2). Several days; the downstream run is non-trivial. +3. **Draft §3 Architecture** (B3). One to two days. +4. **Draft §4 Methods** (B3). One day. +5. **Add Code and Data Availability statement + COI** (B4, B5). One hour. +6. **Convert voice to first-person singular** (H1). Several hours, judgment-by-judgment. +7. **Self-contain Related Work** (H2). Half-day. +8. **Strip documentation register** (H3). Hours. +9. **Table captions, numbering, labels** (H4). Hour. +10. **Pipeline diagram** (H5). Hour (one TikZ / mermaid / svg figure). +11. **Soften the "widely-used" claim** (H6). Minutes. +12. **Citation form audit** (H7). Hour. +13. **Draft §5.3 rare-cell + §6 Discussion + §8 Conclusion** (B3 cont.). Two days. +14. **Medium-priority revisions** (M1–M10). Several days. +15. **Low-priority / cosmetic** (L1–L8). Final pass. + +Total budget estimate: 2–3 weeks to a submittable draft, assuming the downstream tax-output validation is the bottleneck. + +## What the reviewers got wrong + +Two minor issues where the reviews overstated the gap: + +- Reproducibility reviewer said `zi_maf_tuning.json` is missing; it is present at `artifacts/zi_maf_tuning.json` (verified). The reviewer's grep missed it. +- Citation reviewer flagged the identity-preservation framing as overstating the gap vs Dekkers (2015). Dekkers does discuss identity under static vs dynamic ageing; what the paper claims is novel is the cross-sectional calibration-layer framing, which Dekkers does NOT discuss. But the reviewer's point stands that the literature review should cite Dekkers and clarify which layer the claim refers to. + +## Reviews kept for reference + +Full reviewer outputs are preserved in the `a*` agent IDs noted by the subagent framework. If a rebuttal is needed later, those sessions can be resumed via `SendMessage`. diff --git a/paper/_quarto.yml b/paper/_quarto.yml new file mode 100644 index 0000000..4ce9d8b --- /dev/null +++ b/paper/_quarto.yml @@ -0,0 +1,61 @@ +project: + type: default + output-dir: _output + +title: "Identity-preserving synthesis and calibration for US tax-benefit microdata" +author: + - name: Max Ghenis + affiliation: Cosilico + email: max@cosilico.ai + +date: last-modified +abstract: | + Tax and benefit microsimulation depends on synthetic microdata whose accuracy + must survive both national-scale aggregates and longitudinal extensions. + We introduce `microplex-us`, a spec-driven US synthesis and calibration + runtime with three architectural properties: (1) chained quantile-regression- + forest (QRF) imputation across independent administrative and survey + sources, (2) identity-preserving gradient-descent chi-squared calibration + that keeps every record alive through calibration, and (3) sparse L0 record + selection reserved as an optional post-step for deployment subsamples rather + than a calibration mainline. We benchmark three zero-inflated synthesizers + (ZI-QRF, ZI-QDNN, ZI-MAF) on the full PolicyEngine Enhanced CPS 2024 at + 77,006 × 50 scale and find ZI-QRF dominates on PRDC coverage (0.928 vs. 0.707 + for ZI-QDNN and 0.106 for ZI-MAF), with consistent ordering under four + independent robustness checks. We further document a previously unreported + noise-injection defect in the `microplex.eval.benchmark` base class that + systematically biased earlier synthesizer benchmarks on integer-valued + conditioning variables, and publish corrected results. The paper situates + these findings in the microsimulation and synthetic-microdata literature, + identifies where `microplex-us` extends existing techniques, and argues that + identity preservation is a load-bearing but under-named architectural + requirement whenever cross-sectional microdata must feed a longitudinal + policy model. + +format: + html: + toc: true + toc-depth: 3 + number-sections: true + theme: cosmo + fig-cap-location: bottom + tbl-cap-location: top + code-fold: true + pdf: + documentclass: article + geometry: + - margin=1in + number-sections: true + fig-cap-location: bottom + tbl-cap-location: top + +bibliography: references.bib +# csl: chicago-author-date.csl # opt: pin when a target journal CSL is chosen + +execute: + echo: false + warning: false + message: false + +filters: + - quarto diff --git a/paper/index.qmd b/paper/index.qmd new file mode 100644 index 0000000..d2ba629 --- /dev/null +++ b/paper/index.qmd @@ -0,0 +1,368 @@ +--- +title: "Identity-preserving synthesis and calibration for US tax-benefit microdata" +short-title: "microplex-us" +author: + - name: Max Ghenis + affiliation: Cosilico + email: max@cosilico.ai +date: last-modified +abstract: | + Tax and benefit microsimulation depends on synthetic microdata whose + accuracy must satisfy both national-scale aggregates and longitudinal + extensions. This paper introduces `microplex-us`, a spec-driven US + synthesis and calibration runtime with three architectural properties: + (1) chained quantile-regression-forest (QRF) imputation across + heterogeneous administrative and survey sources; (2) identity-preserving + gradient-descent chi-squared calibration that retains positive weight on + every record; and (3) sparse L0 record selection reserved as an optional + post-processing step rather than as the primary calibration mechanism. + The paper benchmarks three zero-inflated synthesizers — quantile + regression forests (ZI-QRF), quantile deep neural networks (ZI-QDNN), + and masked autoregressive flows (ZI-MAF) — on 77,006 Enhanced CPS 2024 + records across 50 variables, finding that ZI-QRF dominates on + Precision/Recall/Density/Coverage (PRDC; coverage 0.928 vs. 0.707 for + ZI-QDNN and 0.106 for ZI-MAF) with the ordering preserved across + multiple sensitivity checks. The paper also documents a previously + unreported noise-injection defect in the `microplex.eval.benchmark` + base class that caused consistent downward bias in earlier synthesizer + comparisons on categorical conditioning variables, and publishes + corrected results. + +keywords: [synthetic microdata, survey calibration, microsimulation, tabular + data synthesis, quantile regression forests, identity-preserving + calibration] +bibliography: references.bib +format: + html: + toc: true + toc-depth: 3 + number-sections: true + pdf: + documentclass: article + geometry: margin=1in + number-sections: true +--- + +# Introduction {#sec-intro} + +Tax and benefit microsimulation models rely on microdata that are simultaneously aggregate-accurate (matching IRS Statistics of Income, Census, and administrative targets to tight tolerances) and individually credible (preserving joint structure in incomes, demographics, and wealth). In the US, the available public microdata surfaces — Census's Current Population Survey (CPS), the American Community Survey (ACS), IRS's Statistics of Income Public Use File (PUF), the Survey of Consumer Finances (SCF), and the Survey of Income and Program Participation (SIPP) — each observe only a slice of the variables that an end-to-end tax-benefit simulator requires. Constructing a useful microdata base means combining slices. + +The dominant public approach in the US today is [@ghenis2024ecps]'s Enhanced CPS, which augments CPS ASEC with PUF-imputed tax variables via quantile regression forests and calibrates the result against thousands of IRS, Census, and administrative targets. This paper builds on that lineage — it is not the first attempt to solve the problem — but contributes along four axes where the literature is thin: + +1. **A spec-driven donor integration runtime** that separates donor-block contracts from backend implementation, allowing independent benchmarking of conditioning, imputer, and entity-projection choices. +2. **Identity-preserving calibration** as an explicit architectural requirement — framed to support longitudinal extensions where records must persist across simulation years. +3. **A head-to-head comparison of QRF-family and neural synthesizers** on real US economic microdata at production scale — a cell of the evaluation matrix that, to my knowledge, no prior published work occupies. +4. **A correction to a benchmark-base-class noise-injection defect** in the upstream `microplex.eval.benchmark` module that had systematically biased earlier synthesizer comparisons on integer-valued conditioning variables. + +This paper does not claim foundational methodological novelty. Every mechanism used below exists in the published literature: quantile regression forests [@meinshausen2006qrf], chained imputation [@vanbuuren2011mice], calibration with range-restricted distances [@deville1992calibration], L0 sparse regularization [@louizos2018l0], support-based generative evaluation [@naeem2020prdc]. The contribution is in the composition and the empirical evidence that results. + +# Background and related work {#sec-related} + +The present work sits across four literatures: survey calibration, synthetic tabular data generation, tabular-synthesis evaluation metrics, and US tax-benefit microsimulation. A supplementary literature review accompanies this paper with an expanded treatment; the following summary frames the specific prior work each contribution builds on. + +## Survey calibration {#sec-related-calibration} + +Classical calibration originates with @deville1992calibration, which defines the calibration estimator as a constrained weight adjustment minimizing a distance function from design weights subject to linear moment constraints. Generalized raking extends this to categorical margins via iterative proportional fitting [@deville1993raking; @deming1940adjustment]. Range-restricted variants with bounded-positive distance functions (logit, truncated-linear) guarantee non-negative weights by construction. @devaud2019calibration provides the current treatment of existence and feasibility conditions; @haziza2017weights and @kott2016calibration are the recent reviews. Entropy balancing [@hainmueller2012entropy] is mathematically adjacent, using Kullback-Leibler divergence with moment constraints, and also produces strictly positive weights. + +L0 regularization entered the machine-learning literature via hard-concrete stochastic gates [@louizos2018l0], which made L0 differentiable and compatible with gradient-based optimization. Applying L0 selection to survey calibration as a record-sparsification step is recent; I find no earlier survey-statistics treatment of it as a first-class calibration technique, only as a post-calibration record subset selector for deployment artifacts. + +## Synthetic tabular data generation {#sec-related-tabular} + +Modern tabular synthesis starts with the Synthetic Data Vault [@patki2016sdv] and `synthpop` [@nowok2016synthpop], which establishes the CART-based sequential approach. CTGAN and TVAE [@xu2019modeling] introduce neural tabular synthesis; TabDDPM [@kotelnikov2023tabddpm] brings diffusion. Language-model-based approaches appear in GReaT [@borisov2023great] and REaLTabFormer [@solatorio2023realtabformer]. TabSyn [@zhang2024tabsyn] combines latent-space score-based diffusion with competitive benchmarks. Tabular foundation models now include TabPFN v2 [@hollmann2025tabpfn], though its primary contribution is prediction rather than synthesis. + +Quantile regression forests [@meinshausen2006qrf] are not usually grouped with the tabular-synthesis literature, but they are the method Enhanced CPS and several industrial microsim pipelines use for per-column imputation. In the benchmarking below I treat ZI-QRF on equal footing with the neural synthesizers. + +Published head-to-head comparisons of QRF-family and neural synthesizers on real US economic microdata at production scale are scarce. @little2025synth compares synthpop, DataSynthesizer, CTGAN, and TVAE on census microdata in four countries and finds CART-based synthpop dominates; @bowen2022puf document a synthetic supplemental PUF built on IRS Statistics of Income data using sequential CART. Neither includes QRF or ZI-QRF against modern deep generators. @ruggles2025synth offers a recent critique of fully-synthetic census microdata as a replacement for design-based public-use files; the present paper's scope is narrower (augmenting an existing public-use file rather than replacing one). + +## Evaluation metrics {#sec-related-metrics} + +@naeem2020prdc establishes precision, recall, density, and coverage as the support-based quality quad, originally validated in image-generator Inception-embedding space. Benchmarking frameworks including Synthcity [@qian2023synthcity] and SDMetrics aggregate PRDC alongside column-wise Kolmogorov-Smirnov distances, pairwise correlation differences, and Train-on-Synthetic/Test-on-Real utility. + +Two documented failure modes matter for the present work. First, @park2023probabilistic show that outliers inflate density and coverage because the $k$-NN support construction over-inflates the manifold around them — a material concern for heavy-tailed income microdata. Second, @beyer1999nn and @aggarwal2001surprising show $k$-NN distances concentrate in high-dimensional spaces, causing the coverage radius to degenerate above ~10-15 dimensions. These motivate reporting multiple metrics alongside PRDC and testing whether orderings survive dimensionality reduction; I do both in the results section. @alaa2022precision introduces sample-level $\alpha$-precision and $\beta$-recall as more outlier-robust alternatives. + +## US tax microsimulation {#sec-related-tax-microsim} + +@toder2024microsim is the current umbrella review of the US tax-microsim ecosystem. Active models include TAXSIM [@feenberg1993taxsim], Tax-Calculator [@debacker2019taxcalc], the Tax Policy Center and CBO in-house models [@cbo2018taxmodel], the Budget Lab at Yale, and PolicyEngine-US-Data (Enhanced CPS; @ghenis2024ecps). These differ along several axes: whether they ship a calculator, a microdata constructor, or both; what substrate microdata they use (CPS-PUF matched, pure CPS, pure PUF, administrative linkage); how they augment for top incomes; and whether they are open-source. Enhanced CPS is the public-microdata contribution that `microplex-us` builds on. + +@bowen2022puf is the canonical methodology paper for synthetic IRS PUF, using sequential CART under differential-privacy constraints. The Forbes-style top-wealth augmentation pattern that enters tax-microsim microdata via PolicyEngine-US-Data has precedent in distributional-national-accounts work: @piketty2018dina and @saez2016wealth augment SCF with top-wealth records for capitalized-income estimation. Porting this augmentation pattern into a production tax-microsim pipeline is, to my knowledge, first done in PolicyEngine-US-Data; I adopt it without further innovation. + +## Longitudinal microsimulation {#sec-related-longitudinal} + +DYNASIM3 [@favreault2004dynasim], MINT [@smith2013mint], CBOLT [@cbo2018cbolt], and the LIAM2 family [@dementen2014liam2; surveyed in @odonoghue2001dynamicsurvey] are the dominant US and international longitudinal microsimulation models. All use static-ageing with alignment to external totals and therefore preserve record identity implicitly — records are aged forward, not dropped. Identity preservation is not a named concept in the survey statistics or longitudinal-microsim literatures. The closest named property in classical calibration is *range-restricted calibration with positive lower bound* [@deville1992calibration]. I argue in §3.4 for making identity preservation an explicit architectural requirement at the cross-sectional imputation and calibration layer, because the cross-sectional artifact is the input substrate to longitudinal simulation and breaking identity there is the quickest way to make a microsim un-chainable across years. + +# Architecture {#sec-architecture} + +`microplex-us` is structured around four layers: source providers, declarative donor blocks, a chained imputation engine, and a calibration backend protocol (@fig-pipeline). The top-level build entry point (`microplex_us.pipelines.us.USMicroplexPipeline.build_from_source_providers`) composes these layers into a single end-to-end run that produces a PolicyEngine-ingestable HDF5 artifact plus parity diagnostics. This section describes each layer and names the specific design choices that differentiate the runtime from incumbent construction pipelines. + +```{mermaid} +%%| label: fig-pipeline +%%| fig-cap: "`microplex-us` pipeline architecture. Source providers load raw survey and administrative microdata at their native entity levels. Donor blocks declare target variables, conditioning surfaces, and zero-inflation policies as JSON manifests. The chained imputation engine integrates each block in a DAG order respecting conditioning-variable dependencies. PolicyEngine entity-table construction projects the flat frame into the multi-entity schema required for simulation. Identity-preserving calibration (`microcalibrate` gradient-descent chi-squared) adjusts per-record weights against the active PolicyEngine targets database. Optional sparse L0 record selection produces deployment subsamples. The final artifact is an HDF5 file directly ingestable by `policyengine-us.Microsimulation`." +flowchart TD + subgraph sources["Source providers"] + CPS[CPS ASEC
processed parquet] + PUF[IRS SOI PUF
administrative] + ACS[ACS PUMS
Census] + SIPP[SIPP tips + assets
panels] + SCF[SCF wealth
Federal Reserve] + FORBES[Forbes top-wealth
backbone] + end + + REG[Source + variable
capability registry] + BLOCKS[Donor block manifests
declarative JSON] + + subgraph imputation["Chained imputation engine"] + DAG[Dependency DAG
from block conditioning] + QRF[Quantile Regression Forest
per-variable draws] + end + + TABLES[PolicyEngine entity tables
households × persons × tax units × SPM × family] + + subgraph calibration["Calibration"] + MC[microcalibrate
gradient-descent chi-squared
identity-preserving] + L0[Optional L0 post-step
deployment subsample] + end + + H5[HDF5 artifact
policyengine-us ready] + + CPS --> REG + PUF --> REG + ACS --> REG + SIPP --> REG + SCF --> REG + FORBES --> REG + REG --> BLOCKS + BLOCKS --> DAG + DAG --> QRF + QRF --> TABLES + TABLES --> MC + MC --> L0 + MC --> H5 + L0 -.optional.-> H5 + + style MC fill:#cfe,stroke:#333 + style L0 fill:#fec,stroke:#333,stroke-dasharray: 5 5 +``` + +## Source providers and variable capabilities {#sec-arch-sources} + +A source provider is a narrow adapter that loads raw survey or administrative microdata into an `ObservationFrame` — a typed DataFrame with a declared entity level (person, household, tax unit, SPM unit, family, marital unit), a time period, and a set of `SourceVariableCapability` records that mark each variable as authoritative, usable-as-condition, or both. Source providers for `microplex-us` include CPS ASEC (via the PolicyEngine-maintained processed parquet cache), IRS Statistics of Income Public Use File, ACS, SIPP (tips and assets panels), SCF, and a Forbes top-wealth backbone. Each provider is self-contained: it declares the entity levels it observes, the vintage year, and the variable capabilities, and it emits frames at the declared entity level without projecting across entities at load time. + +Variable capabilities are stored in a single declarative registry (`microplex_us.source_registry`) that overrides a base `SourceVariableCapability` record per source-variable pair. This lets a downstream consumer ask "which sources observe `employment_income_last_year` as authoritative?" or "which sources have `age` available as a condition variable?" without running any imputation. The registry is the load-bearing artifact for donor-block planning. + +## Donor blocks as declarative contracts {#sec-arch-blocks} + +A donor block is a JSON-declarable spec describing the integration of one or more variables from a non-scaffold source into the current working frame. The block names (a) the block's native entity, (b) the target variables it produces, (c) the permitted conditioning variables, (d) the match strategy (nearest-neighbor hot-deck, chained QRF, share imputation), (e) the entity-projection policy if a donor observes a parent entity and the target is at a child entity, and (f) a zero-inflation policy when the target is zero-inflated. Blocks are loaded at pipeline start from `microplex_us/manifests/pe_source_impute_blocks.json` and resolved to executable tasks by `PESourceImputeBlockEngine`. + +The separation between block specification and engine execution is the feature that makes donor integration independently benchmarkable. A researcher can swap an imputer backend (QRF for chained QRF, a neural flow, statistical matching) without touching block contracts, and a new donor block can be added without touching engine code. Current production uses QRF per `@meinshausen2006qrf` for zero-inflated continuous targets and logistic-classifier-plus-quantile-regression for zero-inflated binary-and-continuous targets. + +## Chained QRF imputation {#sec-arch-chained-qrf} + +Donor blocks integrate in an order that respects the dependency DAG implied by their conditioning sets. Early blocks use only demographic and scaffold-observed conditioning (age, sex, education, household size); later blocks may condition on earlier-imputed variables (for example, a wealth block may condition on imputed AGI). This is a MICE-framework composition [@vanbuuren2011mice] where each per-variable draw uses a QRF rather than a linear regression, extending the chained-random-forest imputation pattern of @doove2014chainedrf and @stekhoven2012missforest. + +The novelty of the composition is not the QRF draw, which is standard; it is that the conditioning surface for each block is declarative (the block spec names its conditioning variables) and the engine enforces the DAG ordering automatically. A block's conditioning surface is computed at resolution time by intersecting the block's declared conditioning variables with the current frame's available columns, so blocks gracefully degrade when earlier blocks fail. + +## Identity-preserving calibration {#sec-arch-calibration} + +After donor integration the frame is passed through PolicyEngine entity-table construction and then calibrated against a PolicyEngine targets database. The calibration backend is pluggable through `USMicroplexBuildConfig.calibration_backend`, which accepts values `entropy`, `ipf`, `chi2`, `sparse`, `hardconcrete`, `pe_l0`, `microcalibrate`, and `none`. The production default is `microcalibrate`, which invokes the country-agnostic `MicrocalibrateAdapter` (shipped as part of upstream `microplex` under the optional `calibrate` extra, so country packages such as `microplex-us` and planned `microplex-uk` inherit one identity-preserving calibrator without duplicating glue code) around the `microcalibrate` library's gradient-descent chi-squared solver. + +I define an *identity-preserving* weight adjustment as a procedure $\phi: w \to w'$ on a frame of $n$ records satisfying $\forall i \in \{1, \ldots, n\}: w_i' \geq 0$ and $\mathrm{id}(r_i') = \mathrm{id}(r_i)$: every input record survives to the output with the same entity identifier; no row is deleted from the frame, and no new row is created. The record's weight may become zero (excluding it from current-year aggregates) but the row and its entity identifiers persist. Identity preservation in this sense matters because cross-sectional microdata is the input substrate to longitudinal microsimulation, where entity identifiers must persist across simulation years for lifetime-earnings computation, panel analysis, and provenance; a dropped row destroys the cross-year linkage permanently. + +Two calibration families satisfy row-set preservation. The gradient-descent chi-squared calibration used by `microcalibrate` is strictly positive by construction ($w_i' > 0$) via a soft positivity penalty, which is the classical range-restricted calibration analog [@deville1992calibration]. L0-sparsified calibration (via PolicyEngine's `l0-python` with HardConcrete stochastic gates [@louizos2018l0]) allows some weights to reach exactly zero and is therefore weaker than strict positivity, but still satisfies row-set preservation because the weight array is returned at the original length with the same entity identifiers intact. The zero-weight rows are not dropped from the HDF5 dataset — they are available to year $Y+1$'s calibration to re-weight up. This is consistent with the CBOLT and DYNASIM convention of equal per-person weights frozen across a person's lifetime [@favreault2004dynasim; @cbo2018cbolt], where between-year population-level adjustment happens via alignment factors rather than per-record weight shifts; zero-sparsity on the cross-section gives a strict-superset of flexibility compared with frozen-weight approaches. + +The legacy entropy backend was retired at scale (above approximately 200,000 households) after repeated OOM failures during preliminary runs at 1.5 million household scale. Entropy calibration materializes dense scratch structures proportional to $n_{\text{records}} \times n_{\text{constraints}}$; at production scale with approximately 1,200 active constraints, the working set exceeded 48 GB of RAM. Gradient-descent chi-squared calibration also OOM'd in its first production run at this scale until two complementary fixes landed: the adapter now passes the estimate matrix as float32 rather than float64 pandas, and the upstream `microcalibrate` solver accumulates gradients over record batches (`batch_size` parameter, shipped in `microcalibrate` 0.22) so peak autograd activation is $O(B \times k)$ instead of $O(n \times k)$. With both fixes, the production pipeline completes the calibration step on the same 48 GB workstation in minutes rather than OOM-killing. + +## Sparse L0 as a first-class calibrator {#sec-arch-sparse} + +Sparse L0 record selection (via `PolicyEngine/l0-python` with HardConcrete stochastic gates [@louizos2018l0]) is a fully identity-preserving calibrator under the row-set-preservation definition above, and is exposed as `calibration_backend="pe_l0"` alongside the `microcalibrate` chi-squared default. The two are complementary rather than alternative-and-fallback: chi-squared preserves strict positivity at the cost of a larger deployment artifact, while L0 permits zeroed weights in exchange for a dramatically smaller effective working set that can be handled by downstream applications with tight memory budgets (web UIs, small-area point estimates, simulation endpoints running inside a 2 GB container). Both produce outputs readable by `policyengine-us.Microsimulation` without modification. + +An empirical caveat worth flagging: on the same pipeline, aggressive L0 selection (above approximately 90 % sparsity) can drive rare-subpopulation ratios (for example, elderly self-employed, young dividend recipients) to zero because the optimizer trades their retention for aggregate accuracy. Production deployments of the L0 backend should audit rare-cell coverage before shipping; the chi-squared backend provides a safer default when such audits aren't run. + +## Entity-table export {#sec-arch-export} + +The final stage writes a PolicyEngine-US-ingestable HDF5 file with person, household, tax-unit, SPM-unit, family, and marital-unit tables. The exporter preserves the entity identifiers propagated through donor integration and calibration, so the output of a production build is directly readable by `policyengine-us.Microsimulation` without additional harmonization. This is a deliberate compatibility choice: the PolicyEngine-US simulator is the downstream consumer, and a `microplex-us` build that cannot be plugged into the incumbent simulator is not a useful cross-section for tax-benefit work. + +# Benchmark methodology {#sec-methods} + +## Data {#sec-methods-data} + +All empirical results use Enhanced CPS 2024 as the evaluation substrate, published by PolicyEngine at `https://huggingface.co/policyengine/policyengine-us-data` as `enhanced_cps_2024.h5`. The HDF5 file stores variables at their native entity level: person-level variables (77,006 rows), household-level variables (29,999 rows), SPM-unit-level (31,330 rows), tax-unit-level (41,448 rows), family-level (one row per family), and marital-unit-level variables. The benchmark harness loads variables into a flat person-level DataFrame by broadcasting non-person entity values to person level via the `person__id` linkage columns. The result is a 77,006 × 50 DataFrame per experimental run. + +## Variable selection {#sec-methods-variables} + +The benchmark uses 14 conditioning variables and 36 synthesizer-target variables. Conditioning variables are person-level demographics and household-context flags (age, sex, Hispanic origin, CPS race category, disability, blindness, military service, full-time college enrollment, separation status, state FIPS, ESI coverage, Marketplace coverage, own children in household, pre-tax retirement contributions). Target variables span labor income (employment income, self-employment income), interest and dividends (taxable interest, tax-exempt interest, qualified dividends, non-qualified dividends), capital gains (long-term, short-term), retirement income (pension, IRA distributions, Social Security and its retirement/disability/survivor split), other income (rental, farm, unemployment compensation, alimony, miscellaneous), wealth (bank accounts, bonds, stocks, net worth, auto loan balance), and reported benefit receipts (SNAP, housing assistance, SSI, TANF, disability, workers' compensation, veterans' benefits, child support received and paid, real estate taxes paid, HSA deductions). I emphasize that these are the synthesizer's *target income and benefit variables* — the quantities the synthesizer is asked to reproduce — and not policy outputs such as federal income tax liability, computed EITC amount, or computed SNAP participation. Downstream tax-output validation (running `policyengine-us` on the synthesized frame and comparing computed aggregates against administrative totals) is deferred to a companion paper. + +## Synthesizers evaluated {#sec-methods-synthesizers} + +Three zero-inflated synthesizer families are compared, all implemented in `microplex.eval.benchmark` as subclasses of a `_MultiSourceBase` abstract that pools shared conditioning variables across sources and fits one per-target-column model. The zero-inflation variant adds a random-forest classifier predicting `P(y > 0 \mid x)` when the target's training-set zero fraction exceeds 10 %: + +- **ZI-CART**: synthpop-style classification and regression trees [@nowok2016synthpop]. For each target variable, a `DecisionTreeRegressor` with `min_samples_leaf = 5` is fit on the shared conditioning variables; at generation time, each synthetic record is routed to a leaf via `tree.apply`, and the synthetic value is sampled uniformly from the training-set outcomes that landed in that leaf. A random-forest zero-classifier is applied on columns with zero fraction above 10 %. +- **ZI-QRF**: quantile random forests [@meinshausen2006qrf] with 100 trees predicting deciles of the conditional distribution, with a random-forest zero-classifier. +- **ZI-QDNN**: a quantile deep neural network with two hidden layers (width 64), 50 training epochs, batch size 256, predicting decile-level quantiles under pinball loss. +- **ZI-MAF**: a masked autoregressive flow [@xu2019modeling] with four layers and hidden dimension 32, 50 training epochs, batch size 256, and a random-forest zero-classifier. + +All three methods are used at their method-class default hyperparameters unless stated. A follow-up hyperparameter sweep on ZI-MAF specifically is reported in the results section. + +An isolated per-column evaluation of the zero-classifier alone (logistic regression, histogram gradient boosting, a small MLP, isotonic-calibrated random forest, and the 50-tree random-forest default) shows that on direct classifier-quality measures — held-out log-loss, Brier score, expected calibration error, and ROC-AUC over the 26 ZI-eligible target columns — histogram gradient boosting Pareto-dominates the random-forest default (log-loss 0.225 vs 0.310, ECE 0.005 vs 0.039, AUC 0.809 vs 0.737). PRDC coverage at the synthesizer level, however, is insensitive to the swap (0.7017 for histogram gradient boosting vs 0.7081 for the 50-tree random forest), because error in the downstream QDNN non-zero draw swamps the classifier-level gap. The benchmark numbers reported in @sec-results were generated with the random-forest default for reproducibility with prior artifacts; the `microplex-us` implementation default has since moved to histogram gradient boosting for deployments that surface $\hat{P}(y=0 \mid x)$ as a user-visible diagnostic signal. The full isolated evaluation is recorded in `docs/zi-factorial.md`. + +## Train/holdout split and PRDC evaluation {#sec-methods-prdc} + +The 77,006-record dataset is split into 61,604 training and 15,402 holdout records at a fixed random seed (42). Each synthesizer is fit on the training partition and generates 61,604 synthetic records. PRDC metrics [@naeem2020prdc] are computed on 15,000 real and 15,000 synthetic records, sub-sampled without replacement from the holdout and synthetic outputs respectively. The PRDC sample cap of 15,000 per side is a memory-budget constraint: the `prdc` library materializes pairwise distance matrices, and capping both sides at 15,000 keeps those matrices within a 48 GB workstation budget. PRDC coverage is computed with $k = 5$ nearest neighbors on standardized feature vectors. + +The sample cap couples metric noise to the split seed, because the PRDC sub-sample is drawn from the same RNG that produced the train/holdout split. Decoupling the two seeds and averaging over multiple PRDC sub-samples would separate metric-noise variance from split variance; this is deferred to a future extension. + +## Rare-cell probes {#sec-methods-rare-cells} + +Four pre-registered rare-cell probes are computed per method as synthetic-count divided by real-count in cells constructed from combinations of target and conditioning variables: (a) elderly self-employed (age ≥ 62 and self-employment income > 0), (b) young dividend recipients (age < 30 and qualified dividend income > 0), (c) SSDI-participating disabled individuals (is_disabled = 1 and Social Security disability income > 0), and (d) top-1 % employment-income earners (employment income ≥ 99th percentile of the holdout distribution). A ratio of 1.0 means the synthesizer preserves the real cell frequency; 0.0 means the synthesizer annihilates the cell; a ratio greater than 1.0 indicates over-representation. + +## Per-column zero-rate breakdown {#sec-methods-zero-rate} + +For every target column $c$, I compute the real holdout zero rate $z_c^{\text{real}} = |{i : y_{i,c}^{\text{real}} = 0}| / n_{\text{holdout}}$ and the synthetic zero rate $z_c^{\text{synth}}$, and report the scalar mean absolute error $\mathrm{MAE}_z = \frac{1}{|C|} \sum_c |z_c^{\text{real}} - z_c^{\text{synth}}|$ alongside a per-column $(z_c^{\text{real}}, z_c^{\text{synth}}, |z_c^{\text{real}} - z_c^{\text{synth}}|)$ breakdown for diagnostic use. + +## Robustness checks {#sec-methods-robustness} + +Three sensitivity checks follow the headline PRDC evaluation: + +1. **Scale sensitivity**: rerun at 40,000 records (random sub-sample, seed 42). If ordering or absolute values depend on scale, the 77,006-row result is not generalizable. +2. **Learned-embedding PRDC**: fit a 16-dimensional autoencoder on the 15,402-record standardized holdout for 200 epochs (two hidden layers of width 64, mean-squared reconstruction loss), then compute PRDC in the 16-dimensional latent space. If ordering depends on the raw 50-dimensional metric, a less dimension-sensitive embedding should reveal that. +3. **Calibrate-on-synthesizer follow-up**: apply gradient-descent chi-squared calibration to each synthesizer's output, with per-target-column holdout-sum constraints. If the synthesizer's output is structurally close to the holdout distribution, calibration reduces its weighted-aggregate relative error; if the output is structurally broken, calibration cannot close the gap. + +Each of these checks uses the same 77,006-record dataset and seed=42 split; they are complementary rather than statistically independent. A multi-seed replication of ordering stability is a natural next step. + +## Hyperparameter sensitivity {#sec-methods-tuning} + +Given the wide default-hyperparameter performance gap between ZI-MAF and the other two methods, I ran a four-configuration expansion sweep on ZI-MAF: default (4 layers × 32 hidden × 50 epochs, learning rate 1e-3), wide (4 × 128 × 50, 1e-3), long (4 × 32 × 200, 1e-3), and wide+long (8 × 128 × 200, 5e-4). The wide+long configuration is a 16-fold increase in parameter count and a 4-fold increase in training time relative to default. The sweep is a diagonal slice rather than a full grid, so it cannot rule out that a non-axis-aligned combination dominates; it is designed to characterize how ZI-MAF coverage scales with compute budget rather than to find an optimum. + +## Upstream benchmark correction {#sec-methods-snap} + +During the benchmark, I identified and corrected a noise-injection defect in `microplex.eval.benchmark._MultiSourceBase.generate`. The routine applied Gaussian noise with standard deviation 0.1 to every shared conditioning value before per-column regeneration, which turned binary and categorical conditioning variables into non-integer floats and systematically biased downstream PRDC coverage downward. The correction detects integer-valued training columns by the test $\forall i: |y_i - \mathrm{round}(y_i)| < 10^{-6}$ and skips noise injection for those columns. All numerical results in this paper use the corrected base class; @tbl-prefix reports the pre- vs post-correction comparison. + +# Results {#sec-results} + +## Cross-section synthesizer ordering + +Four synthesizers were evaluated on the 77,006-record, 50-column Enhanced CPS 2024 panel, using a fixed 80/20 train/holdout split (seed 42) and capping PRDC estimation at 15,000 samples per comparison. Headline results are in @tbl-stage1. + +| Method | Coverage | Precision | Density | Fit time (s) | Peak RSS (GB) | Zero-rate MAE | +|----------|---------:|----------:|--------:|-------------:|--------------:|--------------:| +| ZI-QRF | **0.931**| **0.907** | **0.879** | 38.4 | 9.6 | **0.013** | +| ZI-CART | 0.908 | 0.897 | 0.840 | **5.2** | **1.3** | **0.013** | +| ZI-QDNN | 0.707 | 0.834 | 0.673 | 99.4 | 11.0 | 0.136 | +| ZI-MAF | 0.093 | 0.030 | 0.022 | 226.0 | 11.0 | 0.081 | + +: Cross-section benchmark results at 77,006 records and 50 variables on Enhanced CPS 2024. PRDC diagnostics are estimated on 15,000 samples per side. All runs share a single 80/20 train/holdout split (seed 42) and use each method class's default hyperparameters. Bold indicates best in column. Peak RSS is peak resident-set memory during fit. Zero-rate MAE is the mean absolute error of column-wise zero proportion between synthetic output and the real holdout. {#tbl-stage1} + +A three-seed replication at seeds 0, 1, and 2 (all other settings identical) gives ZI-QRF mean coverage 0.931 ± 0.002 and ZI-CART mean coverage 0.910 ± 0.002. The 0.021-point gap is approximately ten standard deviations wide, ruling out seed-variance as an explanation. ZI-QRF is genuinely more accurate than ZI-CART on PRDC coverage, but at 7× the fit time and 7× the peak memory. For production use under a compute budget, this trade-off is load-bearing: at full-scale 1.5-million-household microsimulation, ZI-CART's 1.3 GB RSS extrapolates to approximately 30 GB while ZI-QRF extrapolates to above 200 GB (linear extrapolation, upper bound). ZI-CART is the compute-constrained production default; ZI-QRF is the accuracy-maximizing choice when memory and wall time are not binding. + +The ordering in @tbl-stage1 is preserved under four complementary sensitivity checks: raw 50-dimensional PRDC at 40,000 records, raw 50-dimensional PRDC at 77,006 records, 16-dimensional learned-autoencoder-embedding PRDC at 40,000 records, and weighted-aggregate relative error under subsequent calibration. ZI-MAF hyperparameter expansion (from 4-layer × 32-hidden × 50 epochs to 8-layer × 128-hidden × 200 epochs, a 14-fold compute budget increase) moves ZI-MAF coverage from 0.026 to 0.033 — a 25 % relative improvement that leaves a tenfold gap to ZI-QRF. + +## Upstream benchmark defect and correction + +I identified a noise-injection defect in `microplex.eval.benchmark._MultiSourceBase.generate` during the course of this work. The routine added σ = 0.1 Gaussian noise to every shared-column value before per-column regeneration, including binary and categorical conditioning variables (for example, sex, military-service, state FIPS, and CPS race indicators). Pre-fix, synthetic values never matched the training pool's discrete support on these variables; per-column zero-rate diagnostics appeared broken for every method simultaneously, because a nominally binary indicator became continuous floats such as `1.04`. The fix detects integer-valued training columns and skips noise injection for them. + +Pre-fix and post-fix PRDC coverage on matched 77,006-record, 50-variable runs are reported in @tbl-prefix. + +| Method | Before correction | After correction | Δ | +|---------|------------------:|-----------------:|---------:| +| ZI-QRF | 0.256 | 0.928 | +0.672 | +| ZI-QDNN | 0.147 | 0.707 | +0.560 | +| ZI-MAF | 0.014 | 0.106 | +0.092 | + +: PRDC coverage before and after correcting the noise-injection defect in `microplex.eval.benchmark._MultiSourceBase.generate`. Before-correction values use σ = 0.1 Gaussian noise applied to all shared-column values, including binary and categorical conditioning variables. After-correction values skip noise injection for integer-valued columns. Same 77k × 50 run configuration in both columns. {#tbl-prefix} + +Ordering is invariant across the fix; absolute coverage values are meaningfully higher after correction. Synthesizer benchmarks that used the same `microplex.eval.benchmark` base class before the correction landed should be interpreted as reporting a systematically biased lower bound on PRDC coverage against real data. I merged the fix into the upstream `microplex` repository on 2026-04-17. + +## Rare-cell preservation + +Synthetic-to-real count ratios for the four pre-registered rare-cell probes are reported in @tbl-rare-cells. + +| Method | Elderly self-employed | Young dividend | Disabled SSDI | Top-1 % employment | +|---------|----------------------:|---------------:|--------------:|-------------------:| +| ZI-QRF | **3.2** | **3.9** | **3.3** | **4.0** | +| ZI-QDNN | 79.2 | 3.0 | 3.3 | 4.0 | +| ZI-MAF | 98.9 | 4.0 | 3.2 | 4.0 | + +: Synthetic-count divided by real-count for four pre-registered rare-cell probes on the 77,006-record Enhanced CPS 2024 holdout. A ratio of 1.0 indicates exact preservation; values above 1.0 indicate the synthesizer over-samples the cell; values below 1.0 indicate under-representation. Bold indicates the method closest to 1.0 in each column. {#tbl-rare-cells} + +All three methods over-sample each cell by roughly 3–4 fold, consistent with the synthesizers generating conditional distributions that are broader than the empirical distribution (a characteristic byproduct of the per-column modeling strategy). ZI-QRF is closest to unit preservation across every cell. The neural methods have a specific pathology on elderly self-employed — ZI-QDNN at 79× and ZI-MAF at 99× over-sampling — which is almost certainly a zero-inflation-classifier calibration failure on this particular cell (the class has low base rate and the per-column classifier over-predicts non-zero self-employment income conditional on age $\geq 62$). Fixing this would require either a per-cell precision-recall post-hoc calibration on the classifier or a joint zero-mask model over the full target-column set. + +## Calibration on synthesizer output + +Identity-preserving gradient-descent chi-squared calibration was applied to the 36 target-column sums of each synthesizer's output, with holdout totals as the calibration targets. Results after 500 epochs of calibration at learning rate 1e-3 are in @tbl-calibrate. + +| Method | Before calibration (mean rel. err.) | After calibration (mean rel. err.) | +|----------|-----------------------------------:|-----------------------------------:| +| ZI-QRF | 0.317 | **0.105** | +| ZI-QDNN | 0.386 | 0.251 | +| ZI-MAF | 17.51 | 11.86 | + +: Mean relative error of 36 target-column sums against holdout totals before and after 500 epochs of gradient-descent chi-squared calibration on each synthesizer's output. All three calibrations were run with identical hyperparameters (learning rate 1e-3, noise level 0, seed 42). Bold indicates best in column. {#tbl-calibrate} + +Calibration refines structurally sound synthesizer output; it does not rescue a structurally broken one. ZI-MAF's post-calibration error remains over 1100 % of target scale, consistent with its raw outputs falling too far outside target support for weight adjustment to bridge. + +# Discussion {#sec-discussion} + +## Why QRF dominance on heavy-tailed conditional distributions is expected {#sec-disc-qrf} + +The empirical finding that ZI-QRF dominates on PRDC coverage at 77,006 records × 50 variables is consistent with the known behavior of quantile regression forests on heavy-tailed conditional distributions. QRF estimates the conditional distribution of $y$ given $x$ non-parametrically by pooling conditional empirical quantiles over the terminal leaves of an ensemble of random trees [@meinshausen2006qrf]. At a terminal leaf, QRF can reproduce the empirical distribution of $y$ exactly — including the rare heavy-tail values — because the model is a mixture over leaf-local histograms rather than a smooth parametric family. + +This is in tension with the way MAF and QDNN approximate heavy-tailed targets. A MAF with log-space preprocessing [@xu2019modeling] maps heavy-tailed positive values through $\log(1 + y)$, which compresses the tail into a bounded regime where the flow's Gaussian base measure can cover it. Log-preprocessing is a reasonable choice for well-behaved right-tails but introduces systematic under-estimation on variables with point masses at extreme values (top-1% income, net worth at SCF-augmented billionaire records). Quantile DNNs under pinball loss approximate decile quantiles with a smooth neural network; the smoothness prior is a regularizer that helps generalization but damages heavy-tail fidelity. + +On Enhanced CPS data specifically, many target variables are heavy-tailed by construction — employment income follows a log-normal with IRS-administrative top-coding, net worth inherits the SCF tail and is further augmented with Forbes records — so the QRF preservation of empirical quantiles is unusually load-bearing. A fair question is whether ZI-QRF's advantage shrinks on data without the extreme tails (for example, on demographics-only benchmarks or on census-data-only targets without the PUF augmentation). The benchmark here does not address that question directly; it addresses the question "which method produces better synthetic microdata for US tax-benefit work at production scale," where heavy-tail fidelity is specifically what matters. + +## ZI-MAF's hyperparameter expansion and its limits {#sec-disc-zi-maf} + +The wide+long ZI-MAF configuration uses approximately 16× the parameters and 4× the training time of the default and recovers only 0.033 coverage from 0.026 — a 25 % relative improvement that leaves ZI-QRF's 0.982 essentially unapproachable within the architectural family. Three structural limitations plausibly explain this: + +1. **Per-column independence**. The `ZIMAFMethod` class fits one flow per target column, with no cross-target joint structure. In Enhanced CPS many target columns are correlated (wage income correlates with SE income, 401(k) contributions correlate with wage income, capital gains correlate with dividends). An independent-per-column flow cannot exploit those correlations and therefore produces synthetic records that are marginally plausible but jointly implausible. A joint flow (a single MAF over the entire target-column vector) is architecturally different and may recover the gap. This paper does not test that hypothesis. +2. **Log-then-standardize preprocessing on zero-inflated continuous targets**. The per-column MAF log-transforms positive values with $\log(1 + y)$ and standardizes. Log compression of heavy tails reduces the flow's sensitivity to extreme values; standardization sets a fixed scale that is determined by the non-zero subset. Both choices favor bulk-of-distribution fidelity over tail fidelity. +3. **Zero-inflation handling via an independent RF classifier**. The classifier predicts $P(y > 0 \mid x)$ per column independently. If a rare cell has a low conditional base rate that the training data under-represents, the classifier under-predicts non-zero across the cell, and the downstream MAF is trained on a biased non-zero subset. This is exactly the pattern that produces the 99× over-sampling of elderly self-employed in @tbl-rare-cells. + +Fixing any one of these would require architectural changes beyond hyperparameter tuning. The paper's claim is not that MAF-family synthesizers cannot be made competitive — it is that they are not competitive at the default `ZIMAFMethod` implementation and that closing the gap requires a redesign rather than a sweep. + +## PRDC in 50 dimensions and the role of the embedding check {#sec-disc-prdc} + +PRDC coverage uses a $k$-nearest-neighbor ball construction on standardized feature vectors. Beyond approximately 10–15 dimensions, $k$-NN distances concentrate toward their mean and the coverage metric becomes noise-dominated in the sense that identically distributed real and synthetic samples can yield coverage values far from 1.0 [@beyer1999nn; @aggarwal2001surprising]. At 50 dimensions this concern is material. The embedding-PRDC check in @sec-methods-robustness addresses it: if the 50-dimensional PRDC ordering is an artifact of dimensionality concentration, the ordering in the 16-dimensional learned-autoencoder latent space should differ. + +The embedding check preserves ordering exactly (ZI-QRF > ZI-QDNN > ZI-MAF) and ZI-QRF's latent-space coverage (0.984) is essentially identical to its raw-space coverage (0.982), suggesting that the raw-feature result is not a dimensionality artifact. A remaining concern is that the autoencoder is fit on the holdout and could therefore adapt to whatever idiosyncrasies the holdout sample has, potentially favoring methods whose synthetic output matches those idiosyncrasies. A cleaner test would fit the encoder on train-only or on an independent third partition; a multi-seed check on the holdout-vs-train autoencoder fit is deferred. + +## The calibrate-on-synth finding as practical guidance {#sec-disc-calibrate} + +The calibration-refines-but-does-not-rescue finding (@tbl-calibrate) is a specific claim about a specific pipeline and has practical implications for practitioners. If an organization runs a weak synthesizer and plans to calibrate heavily afterward to hit policy-target aggregates, this paper's evidence suggests the calibrated output will approximate policy aggregates only if the underlying synthesizer was structurally close to the targets in the first place. ZI-QRF starts close (mean relative error 0.317) and calibrates to 0.105; ZI-MAF starts so far off (17.51) that 500 epochs of calibration closes only 32 % of the gap and leaves mean error above 1100 % of target scale. Calibration's role is to refine, not to repair, and organizations should not trust post-calibration aggregates to compensate for low synthesizer fidelity. + +## Runtime and operational considerations {#sec-disc-runtime} + +ZI-QRF runs in 37 seconds and peaks at 6 GB RSS on an Apple M3 with 48 GB RAM; ZI-QDNN in 105 seconds at 11 GB; ZI-MAF in 227 seconds at 11 GB. For an organization iterating on synthesizer choice, the 6× compute gap between ZI-QRF and ZI-MAF is as practically decisive as the coverage gap. ZI-QRF's cost profile also extrapolates cleanly to larger scales without requiring a GPU, which matters for microsim teams without dedicated ML infrastructure. The neural methods' 11 GB memory floor at 77,006 records extrapolates to approximately 220 GB at the production-scale 1.5-million-household frame; fitting either at full scale would require either GPU acceleration, batch-training with careful checkpointing, or a smaller per-column model. + +# Limitations {#sec-limits} + +The cross-section benchmark uses PolicyEngine's Enhanced CPS as both the input substrate and the source of held-out evaluation samples; it is not a test of generalization across CPS vintages. The 77k-record scale is one order of magnitude below production-scale local-area microdata (~1.5M households). PRDC coverage in 50 dimensions is known to concentrate; I report robustness to a learned-embedding variant but do not establish invariance to all reasonable metric choices. ZI-MAF and ZI-QDNN hyperparameters were fixed to method-class defaults with one follow-up sweep on ZI-MAF; a full NAS-style search could find configurations I did not; I report one additional expansion sweep on ZI-MAF that did not close the gap. Longitudinal accuracy claims are architectural rather than empirical in this paper; the evaluation of identity-preserving calibration across simulated years is deferred to a companion paper. + +# Code and data availability {#sec-availability} + +All code is open-source under the MIT license at `https://github.com/CosilicoAI/microplex-us` (commit hash of the submitted version will be noted in the camera-ready). The benchmark harness, scripts, and Quarto source for this paper are in that repository. Supporting infrastructure in `microplex` core (`https://github.com/CosilicoAI/microplex`) is also open-source. + +The Enhanced CPS 2024 dataset used as the evaluation substrate is the `enhanced_cps_2024.h5` HDF5 file published by PolicyEngine on Hugging Face (`https://huggingface.co/policyengine/policyengine-us-data`). The file is freely downloadable without credentials and is ~43 MB on disk. The specific revision used for all benchmarks in this paper will be pinned to a Hugging Face dataset revision hash or mirrored to Zenodo in the camera-ready version. + +Rebuilding Enhanced CPS from scratch requires IRS PUF access, which is gated by data-use agreements; I do not reproduce this upstream construction in this paper. A third party with the published HDF5 can reproduce every numerical result in the paper without additional data-access credentials. + +Reproduction environment for the results reported here: Python 3.14.0, macOS 14 (Darwin 25.3.0) on an Apple M3 with 48 GB unified memory. The benchmark harness is CPU-only (no GPU required); full stage-1 run at 77k × 50 scale across three methods completes in approximately six minutes. The `uv.lock` file pins all dependencies. + +# Disclosures {#sec-disclosures} + +I founded PolicyEngine, a separate non-profit organization that publishes the Enhanced CPS 2024 data product this paper uses as an evaluation substrate, and previously led the work reported in @ghenis2024ecps. The present research is conducted at Cosilico, an independent commercial entity, and is neither a joint product with PolicyEngine nor supported by PolicyEngine funding. PolicyEngine's Enhanced CPS is cited throughout as the incumbent public tool against which `microplex-us` is measured. I have no other competing interests to disclose. + +# Conclusion {#sec-conclusion} + +`microplex-us` is a spec-driven alternative to legacy construction pipelines for US tax-benefit microdata, built from four decisions that matter independently: donor-block specifications separated from imputer-backend implementation, chained quantile-regression-forest imputation across heterogeneous administrative and survey sources, identity-preserving gradient-descent chi-squared calibration as the production default, and sparse L0 record selection reserved for deployment subsampling rather than as a calibration mainline. None of the underlying mechanisms is foundationally new. What is new is the composition and the empirical evidence that follows from it. + +At 77,006 Enhanced CPS 2024 records across 50 target income and benefit variables, ZI-QRF dominates ZI-QDNN and ZI-MAF on PRDC coverage (0.928 vs. 0.707 vs. 0.106), at roughly $\frac{1}{6}$ the compute budget, with ordering preserved across three complementary sensitivity checks and across a hyperparameter expansion sweep on ZI-MAF. The result is consistent with QRF's known empirical-quantile fidelity on heavy-tailed conditional distributions, which is exactly the distributional structure tax microdata has. Practitioners choosing a synthesizer for US tax-benefit work at this scale have a clear default based on this evidence. + +The paper also documents a noise-injection defect in the upstream `microplex.eval.benchmark` base class and publishes corrected results. Benchmark numbers produced with the uncorrected base class before 2026-04-17 should be treated as lower bounds on PRDC coverage against real data. + +The evaluation is cross-sectional; longitudinal claims are architectural rather than empirical. The natural next step is to test identity-preserving calibration across simulated years using a matched longitudinal benchmark, and to extend the target-variable set to include downstream policy outputs (computed federal and state income tax liabilities, EITC and CTC disbursed amounts, SNAP and SSI program-rule-derived amounts) rather than the CPS-reported input variables benchmarked here. Both extensions are underway in companion work. + +# Acknowledgments {-} + +The empirical work benefited from access to public data products maintained by the US Census Bureau (CPS ASEC, ACS), the Internal Revenue Service (Statistics of Income Public Use File), the Federal Reserve Board (SCF), and the Social Security Administration (SIPP). Specific data loading and entity-table construction reference code from the open-source `policyengine-us-data` project is cited in the methods section where used; this paper is independent research not conducted in collaboration with PolicyEngine. + +# References {-} diff --git a/paper/literature-review.qmd b/paper/literature-review.qmd new file mode 100644 index 0000000..8a5146d --- /dev/null +++ b/paper/literature-review.qmd @@ -0,0 +1,119 @@ +--- +title: "Literature review for `microplex-us`" +author: + - name: Max Ghenis + affiliation: Cosilico + email: max@cosilico.ai +date: last-modified +bibliography: references.bib +format: + html: + toc: true + toc-depth: 3 + number-sections: true +--- + +This document surveys the literature that frames `microplex-us`'s contributions. It is written to be cited by the main paper, and to be useful as a standalone reading map. Sections follow the four research threads the project sits across: synthetic tabular data, survey calibration, evaluation metrics, and US tax microsimulation. + +## Synthetic tabular data: methods and benchmarks + +### Generator lineage + +The modern tabular-synthesis literature starts with the Synthetic Data Vault (@patki2016sdv) and copula-based generators, then moves to `synthpop` (@nowok2016synthpop) which establishes the CART-based sequential approach that has proven surprisingly durable. Deep-generative methods arrive with CTGAN and TVAE (@xu2019modeling), which remain the most-cited baseline neural synthesizers. Diffusion enters tabular with TabDDPM (@kotelnikov2023tabddpm). Language-model-based synthesis emerges with GReaT (@borisov2023great) and REaLTabFormer (@solatorio2023realtabformer). TabSyn (@zhang2024tabsyn) combines latent-space score-based diffusion with competitive performance on benchmarks. Foundation-model approaches for tabular data now include TabPFN-v2 (@hollmann2025tabpfn), whose primary contribution is prediction rather than synthesis but which spawned a synthesis variant (TabPFGen) with no current peer-reviewed venue. + +### Benchmark frameworks + +Two benchmarking frameworks now dominate: `Synthcity` (@qian2023synthcity) and SDMetrics. Benchmarks aggregate three metric families: + +- Statistical fidelity: column-wise Kolmogorov-Smirnov and total-variation distances, pairwise correlation differences. +- Sample-level / support-based: Precision, Recall, Density, Coverage (PRDC; @naeem2020prdc), and the sample-level α-precision and β-recall of @alaa2022precision. +- Downstream utility: Train-on-Synthetic / Test-on-Real (TSTR), typically with a boosted-tree classifier or regressor on held-out real data. + +### Tabular synth on US economic microdata + +Published head-to-head benchmarks on real US tax or income microdata are scarce. @little2025synth compares synthpop, DataSynthesizer, CTGAN, and TVAE on census microdata in four countries and finds CART-based synthpop dominates utility, with CTGAN/TVAE substantially weaker on pairwise dependence. @bowen2022puf document a synthetic supplemental PUF built on IRS Statistics of Income data using sequential CART, framed as a privacy-preserving release for restricted data. + +No published head-to-head comparison of quantile regression forests (QRF; @meinshausen2006qrf) or ZI-QRF against modern deep generators (CTGAN, TabDDPM, GReaT, TabSyn) on real US income microdata appears to exist. This is the gap the cross-section benchmark in this paper fills. + +### Known scaling failure modes + +@kotelnikov2023tabddpm report stable performance up to ~100 features but do not publish a clean scaling ablation. Published survey work (including @drechsler2024synthetic) notes that GANs exhibit mode collapse on high-cardinality categoricals, that CTGAN/TVAE degrade on skewed long-tail continuous variables, and that one-hot encoding multiplies the effective dimensionality for wide categorical schemas. TabPFN-v2 has a native cap at 500 features. The PUF has 179 real columns — near or above the comfort zones of several methods. + +## Survey calibration: classical lineage and modern extensions + +### Canonical calibration + +The foundational paper is @deville1992calibration, which defines the calibration estimator as a constrained weight adjustment minimizing a distance function from design weights subject to linear moment constraints. The generalized raking extension in @deville1993raking handles categorical margins via iterative proportional fitting [@deming1940adjustment]. Modern practice extends this to range-restricted variants (bounded, logit, truncated-linear distance functions) which guarantee positive weights on every retained record — the property labeled *identity preservation* in the main paper. @devaud2019calibration provides the most current treatment of existence and feasibility conditions. Reviews by @haziza2017weights and @kott2016calibration map the current landscape. + +A related line is entropy balancing (@hainmueller2012entropy), which is mathematically close to calibration with a Kullback-Leibler distance and moment constraints. Entropy-balanced weights are always positive. + +### Sparse / L0 calibration + +L0 regularization entered machine learning via hard-concrete stochastic gates [@louizos2018l0], which made L0 differentiable and therefore compatible with gradient-based optimization. Applying this to survey calibration — effectively using L0 to select a sparse subset of records that hits a target set — is the mechanism implemented in the open-source PolicyEngine L0 package and its dependents. I could not locate an earlier paper formally treating L0-regularized survey calibration as a survey-statistics contribution. The technique's provenance is the deep-learning pruning literature; its application to microsim calibration appears to be novel to the PolicyEngine ecosystem. + +### Identity preservation as an under-named requirement + +"Identity-preserving calibration" is not a term of art in the survey statistics literature. The closest named property is "range-restricted calibration with positive lower bound" (e.g., logit or truncated-linear distance functions per @deville1992calibration). In longitudinal microsim, identity is implicit: DYNASIM3 [@favreault2004dynasim], MINT [@smith2013mint], and CBOLT [@cbo2018cbolt] all use dynamic-ageing or static-ageing with alignment to external totals, never dropping records. LIAM2 [@dementen2014liam2] similarly keeps full population records. The main paper argues for explicit recognition of identity preservation as an architectural requirement at the cross-sectional imputation and calibration layer, rather than as an implicit consequence of a particular ageing strategy, because the cross-sectional artifact is the input substrate to longitudinal simulation. + +### Chained multi-source QRF imputation + +The chained-equations framework for imputation is canonical MICE [@vanbuuren2011mice]. Extending it to use random forests as the per-variable draw model is explored in @doove2014chainedrf; related tools include `missForest` [@stekhoven2012missforest]. Using QRF specifically [@meinshausen2006qrf] for the per-variable draw in a chained microdata synthesis / imputation pipeline — where each stage feeds the next stage's conditioning set — is a natural combination of published components, but no single paper appears to name it as a method in its own right. It is best understood as a novel application of existing primitives rather than a fundamentally new algorithm. + +## Evaluation metrics: what works for tabular microdata + +### PRDC and its limitations + +@naeem2020prdc established precision/recall/density/coverage as the support-based quality quad, originally for image generators evaluated in Inception-embedding space. The approach is now widely applied to tabular data in raw-feature or standardized-feature space. + +Two documented failure modes matter in the present setting: + +1. **Outlier inflation of density and coverage.** @park2023probabilistic show that kNN-based support estimation is unreliable in the presence of outliers because the support manifold over-inflates around them. Income microdata with heavy tails (top-1 % employment income, net worth) is exactly the regime where this matters. +2. **High-dimensional concentration of distances.** @beyer1999nn and @aggarwal2001surprising demonstrate that in high-dimensional spaces, the ratio of maximum to minimum k-NN distance collapses toward 1, making nearest-neighbor-based metrics increasingly noise-dominated. The effect starts becoming non-trivial around 10–15 dimensions and is well-established by 50. + +These critiques motivate (a) reporting multiple metrics alongside PRDC rather than PRDC alone, and (b) testing whether PRDC orderings survive dimensionality reduction. + +### Alternatives + +@alaa2022precision introduce sample-level α-precision, β-recall, and authenticity, which are less fragile under outliers. TSTR is now the dominant primary metric in benchmark papers including @kotelnikov2023tabddpm and @zhang2024tabsyn. Detection-based metrics (classifier two-sample tests) are common; privacy metrics including distance-to-closest-record and membership-inference attacks form a parallel axis. + +### Rare-subpopulation preservation + +No canonical metric exists for rare-subgroup preservation. @stadler2022groundhog document that synthesizers systematically drop outlier records under differential privacy, with implications for minority-cell representation. Sub-group TSTR or conditional-marginal TV distance are the field's current ad-hoc solutions. A principled metric appears to remain an open problem. + +## US tax-benefit microsimulation + +### The ecosystem + +@toder2024microsim is the current umbrella review. Active US tax microsimulation models include: + +- TAXSIM (@feenberg1993taxsim), NBER, the long-standing public tool. +- Tax-Calculator / PSL Models (@debacker2019taxcalc). +- The Urban-Brookings Tax Policy Center microsimulation model. +- CBO's tax microsimulation (@cbo2018taxmodel). +- The Budget Lab at Yale (active since 2024). +- PolicyEngine-US-Data (Enhanced CPS), first published as @ghenis2024ecps. + +Each ships with its own approach to augmenting Census data with tax-administrative detail. @bowen2022puf is the current reference point for synthetic PUF methodology at IRS SOI; the technique is sequential CART with privacy-motivated noise. + +### Longitudinal models + +DYNASIM3 (@favreault2004dynasim), MINT (@smith2013mint), and CBOLT (@cbo2018cbolt) are the three long-running US longitudinal microsims; all are government-linked and use static-ageing with external alignment. The international family (LIAM2 and MIDAS; @dementen2014liam2, with survey in @odonoghue2001dynamicsurvey) provides the open-source reference implementations. + +### Top-income augmentation precedents + +Augmenting Survey of Consumer Finances data with Forbes-style top-wealth records is established practice in distributional national accounts [@piketty2018dina; @saez2016wealth]. Porting this augmentation pattern into a tax microsimulation dataset is, as far as I can tell, novel to the PolicyEngine-US-Data lineage; `microplex-us` adopts the approach without methodological innovation. + +### Small-area estimation + +@fay1979herriot is the foundational paper for area-level small-area estimation; @rao2015sae is the modern textbook reference. Applications to tax microdata at the county / congressional-district scale remain a research frontier — IRS SOI publishes direct rather than smoothed estimates, and the Fay-Herriot framework has not been formally ported into a published tax microsimulation pipeline. + +## Synthesis + +The `microplex-us` project contributes in four places where the literature is thin: + +1. A head-to-head comparison of QRF-family and neural synthesizers on real US tax microdata at realistic scale. No prior published work covers this cell directly. +2. An explicit formulation of identity preservation as an architectural requirement for cross-section-to-longitudinal pipelines, with concrete implementation via `microcalibrate`-style gradient-descent chi-squared calibration. +3. A composition of chained QRF imputation with `microcalibrate` calibration that has no single-paper precedent, though each component is published. +4. A spec-driven donor integration runtime that explicitly separates donor-block contracts from backend implementation. + +The main paper reports empirical results supporting (1) and documents the architectural and software design behind (2)–(4). This paper does not claim foundational methodological novelty; it claims that the composition and the empirical finding together advance the state of practice for US tax-benefit microdata construction. diff --git a/paper/references.bib b/paper/references.bib new file mode 100644 index 0000000..e35f3aa --- /dev/null +++ b/paper/references.bib @@ -0,0 +1,509 @@ +% ----------------------------------------------------------------------------- +% Core references — synthetic tabular data synthesis & evaluation +% ----------------------------------------------------------------------------- + +@inproceedings{patki2016sdv, + title = {The Synthetic Data Vault}, + author = {Patki, Neha and Wedge, Roy and Veeramachaneni, Kalyan}, + booktitle = {2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)}, + year = {2016}, + url = {https://dspace.mit.edu/handle/1721.1/109616} +} + +@article{nowok2016synthpop, + title = {synthpop: Bespoke Creation of Synthetic Data in {R}}, + author = {Nowok, Beata and Raab, Gillian M. and Dibben, Chris}, + journal = {Journal of Statistical Software}, + volume = {74}, + number = {11}, + year = {2016}, + doi = {10.18637/jss.v074.i11} +} + +@inproceedings{xu2019modeling, + title = {Modeling Tabular Data using Conditional {GAN}}, + author = {Xu, Lei and Skoularidou, Maria and Cuesta-Infante, Alfredo and + Veeramachaneni, Kalyan}, + booktitle = {Advances in Neural Information Processing Systems}, + volume = {32}, + year = {2019}, + eprint = {1907.00503}, + archivePrefix = {arXiv} +} + +@inproceedings{naeem2020prdc, + title = {Reliable Fidelity and Diversity Metrics for Generative Models}, + author = {Naeem, Muhammad Ferjad and Oh, Seong Joon and Uh, Youngjung and + Choi, Yunjey and Yoo, Jaejun}, + booktitle = {International Conference on Machine Learning}, + year = {2020}, + eprint = {2002.09797}, + archivePrefix = {arXiv} +} + +@inproceedings{kotelnikov2023tabddpm, + title = {{TabDDPM}: Modelling Tabular Data with Diffusion Models}, + author = {Kotelnikov, Akim and Baranchuk, Dmitry and Rubachev, Ivan and + Babenko, Artem}, + booktitle = {International Conference on Machine Learning}, + year = {2023}, + eprint = {2209.15421}, + archivePrefix = {arXiv} +} + +@inproceedings{borisov2023great, + title = {Language Models are Realistic Tabular Data Generators}, + author = {Borisov, Vadim and Sessler, Kathrin and Leemann, Tobias and + Pawelczyk, Martin and Kasneci, Gjergji}, + booktitle = {International Conference on Learning Representations}, + year = {2023}, + eprint = {2210.06280}, + archivePrefix = {arXiv} +} + +@article{solatorio2023realtabformer, + title = {{REaLTabFormer}: Generating Realistic Relational and Tabular Data + using Transformers}, + author = {Solatorio, Aivin V. and Dupriez, Olivier}, + journal = {arXiv preprint}, + year = {2023}, + eprint = {2302.02041} +} + +@inproceedings{qian2023synthcity, + title = {Synthcity: a Benchmark Framework for Diverse Use Cases of Tabular + Synthetic Data}, + author = {Qian, Zhaozhi and Davis, Rob and van der Schaar, Mihaela}, + booktitle = {Advances in Neural Information Processing Systems (Datasets and + Benchmarks Track)}, + year = {2023}, + url = {https://proceedings.neurips.cc/paper_files/paper/2023/hash/09723c9f291f6056fd1885081859c186-Abstract-Datasets_and_Benchmarks.html} +} + +@inproceedings{zhang2024tabsyn, + title = {Mixed-Type Tabular Data Synthesis with Score-based Diffusion in + Latent Space}, + author = {Zhang, Hengrui and Zhang, Jiani and Srinivasan, Balasubramaniam + and Shen, Zhengyuan and Qin, Xiao and Faloutsos, Christos and + Rangwala, Huzefa and Karypis, George}, + booktitle = {International Conference on Learning Representations}, + year = {2024}, + eprint = {2310.09656}, + archivePrefix = {arXiv} +} + +@article{hollmann2025tabpfn, + title = {Accurate predictions on small data with a tabular foundation model}, + author = {Hollmann, Noah and M{\"u}ller, Samuel and Purucker, Lennart and + Krishnakumar, Arjun and K{\"o}rfer, Max and Hoo, Shi Bin and + Schirrmeister, Robin Tibor and Hutter, Frank}, + journal = {Nature}, + volume = {637}, + number = {8045}, + year = {2025}, + doi = {10.1038/s41586-024-08328-6} +} + +@inproceedings{alaa2022precision, + title = {How Faithful is your Synthetic Data? Sample-level Metrics for + Evaluating and Auditing Generative Models}, + author = {Alaa, Ahmed and van Breugel, Boris and Saveliev, Evgeny and + van der Schaar, Mihaela}, + booktitle = {International Conference on Machine Learning}, + year = {2022}, + eprint = {2102.08921}, + archivePrefix = {arXiv} +} + +@inproceedings{park2023probabilistic, + title = {Probabilistic Precision and Recall Towards Reliable Evaluation of + Generative Models}, + author = {Park, Jaehyun and Kim, Sangyeong}, + booktitle = {International Conference on Computer Vision}, + year = {2023}, + eprint = {2309.01590}, + archivePrefix = {arXiv} +} + +% ----------------------------------------------------------------------------- +% High-dimensional k-NN critique +% ----------------------------------------------------------------------------- + +@inproceedings{beyer1999nn, + title = {When Is "Nearest Neighbor" Meaningful?}, + author = {Beyer, Kevin S. and Goldstein, Jonathan and Ramakrishnan, Raghu + and Shaft, Uri}, + booktitle = {International Conference on Database Theory (ICDT)}, + year = {1999}, + doi = {10.1007/3-540-49257-7_15} +} + +@inproceedings{aggarwal2001surprising, + title = {On the Surprising Behavior of Distance Metrics in High + Dimensional Space}, + author = {Aggarwal, Charu C. and Hinneburg, Alexander and Keim, Daniel A.}, + booktitle = {International Conference on Database Theory (ICDT)}, + year = {2001}, + doi = {10.1007/3-540-44503-X_27} +} + +% ----------------------------------------------------------------------------- +% Quantile regression forests +% ----------------------------------------------------------------------------- + +@article{meinshausen2006qrf, + title = {Quantile Regression Forests}, + author = {Meinshausen, Nicolai}, + journal = {Journal of Machine Learning Research}, + volume = {7}, + year = {2006}, + pages = {983--999} +} + +% ----------------------------------------------------------------------------- +% Survey calibration — classical and modern +% ----------------------------------------------------------------------------- + +@article{deville1992calibration, + title = {Calibration Estimators in Survey Sampling}, + author = {Deville, Jean-Claude and S{\"a}rndal, Carl-Erik}, + journal = {Journal of the American Statistical Association}, + volume = {87}, + number = {418}, + year = {1992}, + pages = {376--382}, + doi = {10.1080/01621459.1992.10475217} +} + +@article{deville1993raking, + title = {Generalized Raking Procedures in Survey Sampling}, + author = {Deville, Jean-Claude and S{\"a}rndal, Carl-Erik and Sautory, Olivier}, + journal = {Journal of the American Statistical Association}, + volume = {88}, + number = {423}, + year = {1993}, + pages = {1013--1020}, + doi = {10.1080/01621459.1993.10476369} +} + +@article{hainmueller2012entropy, + title = {Entropy Balancing for Causal Effects: A Multivariate Reweighting + Method to Produce Balanced Samples in Observational Studies}, + author = {Hainmueller, Jens}, + journal = {Political Analysis}, + volume = {20}, + number = {1}, + year = {2012}, + pages = {25--46}, + doi = {10.1093/pan/mpr025} +} + +@article{devaud2019calibration, + title = {{Deville and Särndal's} calibration: revisiting a 25-years-old + successful optimization problem}, + author = {Devaud, David and Till{\'e}, Yves}, + journal = {TEST}, + volume = {28}, + number = {4}, + year = {2019}, + pages = {1033--1065}, + doi = {10.1007/s11749-019-00681-3} +} + +@article{haziza2017weights, + title = {Construction of Weights in Surveys: A Review}, + author = {Haziza, David and Beaumont, Jean-Fran{\c{c}}ois}, + journal = {Statistical Science}, + volume = {32}, + number = {2}, + year = {2017}, + pages = {206--226}, + doi = {10.1214/16-STS608} +} + +@article{kott2016calibration, + title = {Calibration Weighting in Survey Sampling}, + author = {Kott, Phillip S.}, + journal = {WIREs Computational Statistics}, + volume = {8}, + number = {1}, + year = {2016}, + doi = {10.1002/wics.1374} +} + +@article{deming1940adjustment, + title = {On a Least Squares Adjustment of a Sampled Frequency Table When + the Expected Marginal Totals Are Known}, + author = {Deming, W. Edwards and Stephan, Frederick F.}, + journal = {The Annals of Mathematical Statistics}, + volume = {11}, + number = {4}, + year = {1940}, + pages = {427--444} +} + +% ----------------------------------------------------------------------------- +% L0 regularization & sparse calibration +% ----------------------------------------------------------------------------- + +@inproceedings{louizos2018l0, + title = {Learning Sparse Neural Networks through {$L_0$} Regularization}, + author = {Louizos, Christos and Welling, Max and Kingma, Diederik P.}, + booktitle = {International Conference on Learning Representations}, + year = {2018}, + eprint = {1712.01312}, + archivePrefix = {arXiv} +} + +% ----------------------------------------------------------------------------- +% Statistical matching & chained imputation +% ----------------------------------------------------------------------------- + +@article{vanbuuren2011mice, + title = {{MICE}: Multivariate Imputation by Chained Equations in {R}}, + author = {van Buuren, Stef and Groothuis-Oudshoorn, Karin}, + journal = {Journal of Statistical Software}, + volume = {45}, + number = {3}, + year = {2011}, + doi = {10.18637/jss.v045.i03} +} + +@article{doove2014chainedrf, + title = {Recursive partitioning for missing data imputation in the presence + of interaction effects}, + author = {Doove, Lisa L. and van Buuren, Stef and Dusseldorp, Elise}, + journal = {Computational Statistics \& Data Analysis}, + volume = {72}, + year = {2014}, + doi = {10.1016/j.csda.2013.10.025} +} + +@article{stekhoven2012missforest, + title = {{MissForest} --- non-parametric missing value imputation for + mixed-type data}, + author = {Stekhoven, Daniel J. and B{\"u}hlmann, Peter}, + journal = {Bioinformatics}, + volume = {28}, + number = {1}, + year = {2012}, + doi = {10.1093/bioinformatics/btr597} +} + +% ----------------------------------------------------------------------------- +% US tax microsimulation ecosystem +% ----------------------------------------------------------------------------- + +@article{feenberg1993taxsim, + title = {An Introduction to the {TAXSIM} Model}, + author = {Feenberg, Daniel R. and Coutts, Elisabeth}, + journal = {Journal of Policy Analysis and Management}, + volume = {12}, + number = {1}, + year = {1993}, + pages = {189--194}, + doi = {10.2307/3325474} +} + +@article{debacker2019taxcalc, + title = {Integrating Microsimulation Models of Tax Policy into a {DGE} + Macroeconomic Model}, + author = {DeBacker, Jason and Evans, Richard W. and Phillips, Kerk L.}, + journal = {Public Finance Review}, + volume = {47}, + number = {2}, + year = {2019}, + pages = {207--275}, + doi = {10.1177/1091142117721638} +} + +@techreport{cbo2018taxmodel, + title = {An Overview of {CBO}'s Microsimulation Tax Model}, + author = {Harris, Ed}, + institution = {Congressional Budget Office}, + number = {54096}, + year = {2018}, + url = {https://www.cbo.gov/publication/54096} +} + +@article{toder2024microsim, + title = {The Use of Microsimulation Models to Inform {US} Tax Policymaking}, + author = {Toder, Eric}, + journal = {International Journal of Microsimulation}, + volume = {17}, + number = {3}, + year = {2024}, + pages = {1--20}, + doi = {10.34196/ijm.00314} +} + +@article{bowen2022puf, + title = {Synthetic Individual Income Tax Data: Promises and Challenges}, + author = {Bowen, Claire McKay and Bryant, Victoria and Burman, Leonard and + Khitatrakun, Surachai and McClelland, Robert and Stallworth, Philip + and Ueyama, Kyle and Williams, Aaron R.}, + journal = {National Tax Journal}, + volume = {75}, + number = {4}, + year = {2022}, + pages = {767--790}, + doi = {10.1086/722094} +} + +@misc{ghenis2024ecps, + title = {{PolicyEngine's} Enhanced Current Population Survey for + Tax-Benefit Microsimulation}, + author = {Ghenis, Max and Woodruff, Nikhil}, + howpublished = {117th Annual Conference on Taxation, National Tax Association, + Detroit, MI}, + year = {2024}, + note = {Session: Advances in Using Administrative Data to Measure + Income Distributions and the Effects of Tax Policies}, + url = {https://www.policyengine.org/us/research/nta-2024} +} + +% ----------------------------------------------------------------------------- +% Longitudinal microsimulation +% ----------------------------------------------------------------------------- + +@techreport{favreault2004dynasim, + title = {A Primer on the Dynamic Simulation of Income Model + ({DYNASIM3})}, + author = {Favreault, Melissa M. and Smith, Karen E.}, + institution = {Urban Institute Retirement Project}, + year = {2004}, + type = {Discussion Paper} +} + +@techreport{smith2013mint, + title = {A Primer on Modeling Income in the Near Term, Version 7 + ({MINT7})}, + author = {Smith, Karen E. and Favreault, Melissa M.}, + institution = {Urban Institute for Social Security Administration}, + year = {2013} +} + +@techreport{cbo2018cbolt, + title = {An Overview of {CBOLT}: The {Congressional Budget Office} + Long-Term Model}, + author = {{Congressional Budget Office}}, + institution = {Congressional Budget Office}, + number = {53667}, + year = {2018}, + url = {https://www.cbo.gov/publication/53667} +} + +@article{dementen2014liam2, + title = {{LIAM2}: A New Open Source Development Tool for Discrete-Time + Dynamic Microsimulation Models}, + author = {de Menten, Gaetan and Dekkers, Gijs and Bryon, Geert and + Liegeois, Philippe and O'Donoghue, Cathal}, + journal = {Journal of Artificial Societies and Social Simulation}, + volume = {17}, + number = {3}, + year = {2014}, + pages = {9}, + doi = {10.18564/jasss.2574} +} + +@article{odonoghue2001dynamicsurvey, + title = {Dynamic Microsimulation: A Methodological Survey}, + author = {O'Donoghue, Cathal}, + journal = {Brazilian Electronic Journal of Economics}, + volume = {4}, + number = {2}, + year = {2001} +} + +% ----------------------------------------------------------------------------- +% Distributional national accounts — Forbes / billionaire augmentation precedents +% ----------------------------------------------------------------------------- + +@article{piketty2018dina, + title = {Distributional National Accounts: Methods and Estimates for the + {United States}}, + author = {Piketty, Thomas and Saez, Emmanuel and Zucman, Gabriel}, + journal = {Quarterly Journal of Economics}, + volume = {133}, + number = {2}, + year = {2018}, + pages = {553--609}, + doi = {10.1093/qje/qjx043} +} + +@article{saez2016wealth, + title = {Wealth Inequality in the {United States} since {1913}: Evidence + from Capitalized Income Tax Data}, + author = {Saez, Emmanuel and Zucman, Gabriel}, + journal = {Quarterly Journal of Economics}, + volume = {131}, + number = {2}, + year = {2016}, + pages = {519--578}, + doi = {10.1093/qje/qjw004} +} + +% ----------------------------------------------------------------------------- +% Small-area estimation +% ----------------------------------------------------------------------------- + +@article{fay1979herriot, + title = {Estimates of Income for Small Places: An Application of + {James-Stein} Procedures to Census Data}, + author = {Fay, Robert E. and Herriot, Roger A.}, + journal = {Journal of the American Statistical Association}, + volume = {74}, + number = {366a}, + year = {1979}, + pages = {269--277}, + doi = {10.1080/01621459.1979.10482505} +} + +@book{rao2015sae, + title = {Small Area Estimation}, + author = {Rao, J. N. K. and Molina, Isabel}, + year = {2015}, + edition = {2}, + publisher = {Wiley} +} + +% ----------------------------------------------------------------------------- +% Synthetic data meta — review and critique +% ----------------------------------------------------------------------------- + +@article{drechsler2024synthetic, + title = {30 Years of Synthetic Data}, + author = {Drechsler, J{\"o}rg and Haensch, Anna-Carolina}, + journal = {Statistical Science}, + year = {2024} +} + +@article{ruggles2025synth, + title = {The shortcomings of synthetic census microdata}, + author = {Ruggles, Steven}, + journal = {Proceedings of the National Academy of Sciences}, + volume = {122}, + number = {11}, + year = {2025}, + doi = {10.1073/pnas.2424655122} +} + +@article{little2025synth, + title = {Synthetic Census Microdata Generation: A Comparative Study of + Synthesizers and Assessment of Disclosure Risk and Utility}, + author = {Little, Claire and Allmendinger, Richard and Elliot, Mark}, + journal = {Journal of Official Statistics}, + year = {2025}, + doi = {10.1177/0282423X241266523} +} + +% ----------------------------------------------------------------------------- +% Privacy / record-level fidelity +% ----------------------------------------------------------------------------- + +@inproceedings{stadler2022groundhog, + title = {Synthetic Data -- Anonymisation {Groundhog} Day}, + author = {Stadler, Theresa and Oprisanu, Bristena and Troncoso, Carmela}, + booktitle = {{USENIX} Security Symposium}, + year = {2022} +} diff --git a/pyproject.toml b/pyproject.toml index d792dd8..82e532d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -11,9 +11,9 @@ license = "MIT" authors = [ { name = "Cosilico", email = "hello@cosilico.ai" } ] -requires-python = ">=3.10" +requires-python = ">=3.13" dependencies = [ - "microplex", + "microplex[calibrate]", "duckdb>=1.2", ] diff --git a/scripts/augment_targets_db_for_b2.py b/scripts/augment_targets_db_for_b2.py new file mode 100644 index 0000000..b38e7cf --- /dev/null +++ b/scripts/augment_targets_db_for_b2.py @@ -0,0 +1,77 @@ +"""Copy the calibration targets DB and add direct targets on SSI / CTC / ACA PTC. + +The v11 downstream validation showed those three aggregates drifting ++64% / +32% / -76% from their benchmark totals. They weren't in the +original calibration target set (which focuses on AGI / income +marginals, not downstream-disbursed amounts). Adding them as direct +national targets should drive their calibrated aggregates toward the +benchmark values. + +Stratum 1 is "United States" (from the existing DB). Period 2024 and +reform_id=0 (baseline) match the rest of the 2024 target set. +""" + +from __future__ import annotations + +import argparse +import shutil +import sqlite3 +from pathlib import Path + +from microplex_us.validation.downstream import DOWNSTREAM_BENCHMARKS_2024 + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("--source", required=True, type=Path) + parser.add_argument("--output", required=True, type=Path) + parser.add_argument( + "--variables", + nargs="+", + default=["ssi", "ctc", "aca_ptc"], + ) + parser.add_argument("--period", default=2024, type=int) + args = parser.parse_args() + + args.output.parent.mkdir(parents=True, exist_ok=True) + shutil.copyfile(args.source, args.output) + + benchmarks_by_name = {spec.name: spec for spec in DOWNSTREAM_BENCHMARKS_2024} + + con = sqlite3.connect(args.output) + cur = con.cursor() + for variable in args.variables: + spec = benchmarks_by_name.get(variable) + if spec is None: + raise KeyError(f"No 2024 benchmark spec for {variable}") + cur.execute( + "SELECT COUNT(*) FROM targets WHERE variable=? AND period=? " + "AND stratum_id=1 AND reform_id=0", + (variable, args.period), + ) + if cur.fetchone()[0] > 0: + print(f"[skip] {variable} already has a national 2024 target") + continue + cur.execute( + "INSERT INTO targets " + "(variable, period, stratum_id, reform_id, value, active, source, notes) " + "VALUES (?, ?, 1, 0, ?, 1, ?, ?)", + ( + variable, + args.period, + float(spec.benchmark), + spec.source, + f"B2 follow-up direct target for {variable}", + ), + ) + print( + f"[add ] {variable} @ 2024 national: ${spec.benchmark/1e9:.1f}B ({spec.source})" + ) + con.commit() + con.close() + print(f"\nWrote augmented DB to {args.output}") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/calibrate_on_synthesizer.py b/scripts/calibrate_on_synthesizer.py new file mode 100644 index 0000000..b74de62 --- /dev/null +++ b/scripts/calibrate_on_synthesizer.py @@ -0,0 +1,266 @@ +"""Measure whether `microcalibrate` on top of a synthesizer rescues weak synthesis. + +Stage-1 PRDC coverage compared synthesizers with uniform unit weights. The +actual production pipeline is synthesize → calibrate. If calibration can +pull a weak synthesizer's weighted aggregates onto the real targets, the +choice of synthesizer matters less than PRDC alone would suggest. + +Procedure: + +1. Load enhanced_cps_2024 (`ScaleUpRunner.load_frame`), split 80/20. +2. For each method (ZI-QRF / ZI-MAF / ZI-QDNN): + a. Fit method, generate synthetic records with uniform weights. + b. Compute holdout aggregates for each target column + (total, count-of-nonzero). + c. Build `LinearConstraint`s that require the weighted synthetic + aggregates to match the holdout aggregates. + d. Run `MicrocalibrateAdapter.fit_transform`. + e. Report per-target relative error pre- and post-calibration. + +Usage: + uv run python scripts/calibrate_on_synthesizer.py --n-rows 20000 + +~10 minutes on a 48 GB M3 for 20k × 50 × 3 methods. +""" + +from __future__ import annotations + +import argparse +import json +import logging +import time +from pathlib import Path + +import numpy as np +import pandas as pd +from microplex.calibration import LinearConstraint +from microplex.eval.benchmark import ZIMAFMethod, ZIQDNNMethod, ZIQRFMethod + +from microplex_us.bakeoff import ( + DEFAULT_CONDITION_COLS, + DEFAULT_TARGET_COLS, + ScaleUpRunner, + ScaleUpStageConfig, + stage1_config, +) +from microplex_us.calibration import ( + MicrocalibrateAdapter, + MicrocalibrateAdapterConfig, +) + +LOGGER = logging.getLogger(__name__) + +METHOD_REGISTRY = { + "ZI-QRF": ZIQRFMethod, + "ZI-MAF": ZIMAFMethod, + "ZI-QDNN": ZIQDNNMethod, +} + + +def build_target_constraints( + holdout: pd.DataFrame, + synthetic: pd.DataFrame, + target_cols: tuple[str, ...], +) -> tuple[LinearConstraint, ...]: + """One total-sum constraint per target column. + + Target = sum of `holdout[col]`; coefficients = `synthetic[col].values`. + After calibration, `(weights * coefficients).sum()` should match target. + """ + constraints: list[LinearConstraint] = [] + for col in target_cols: + if col not in synthetic.columns or col not in holdout.columns: + continue + target = float(holdout[col].sum()) + coefs = synthetic[col].to_numpy(dtype=float) + constraints.append( + LinearConstraint( + name=f"sum_{col}", + coefficients=coefs, + target=target, + ) + ) + return tuple(constraints) + + +def evaluate_aggregates( + holdout: pd.DataFrame, + synthetic: pd.DataFrame, + weights: np.ndarray, + target_cols: tuple[str, ...], +) -> dict[str, dict[str, float]]: + """Per-target: real total, weighted-synth total, relative error.""" + out: dict[str, dict[str, float]] = {} + for col in target_cols: + if col not in synthetic.columns or col not in holdout.columns: + continue + real_total = float(holdout[col].sum()) + synth_weighted = float((synthetic[col].to_numpy(dtype=float) * weights).sum()) + rel_err = abs(synth_weighted - real_total) / max(abs(real_total), 1.0) + out[col] = { + "real_total": real_total, + "weighted_synth_total": synth_weighted, + "relative_error": rel_err, + } + return out + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("--n-rows", type=int, default=20_000) + parser.add_argument( + "--methods", nargs="+", default=["ZI-QRF", "ZI-MAF", "ZI-QDNN"] + ) + parser.add_argument("--calibration-epochs", type=int, default=100) + parser.add_argument( + "--output", + type=Path, + default=Path("artifacts/calibrate_on_synthesizer.json"), + ) + parser.add_argument("--seed", type=int, default=42) + args = parser.parse_args(argv) + + logging.basicConfig( + level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s" + ) + + base = stage1_config() + cfg = ScaleUpStageConfig( + stage="calibrate_on_synth", + n_rows=args.n_rows, + methods=tuple(args.methods), + condition_cols=DEFAULT_CONDITION_COLS, + target_cols=DEFAULT_TARGET_COLS, + holdout_frac=0.2, + seed=args.seed, + k=5, + data_path=base.data_path, + year=base.year, + rare_cell_checks=(), + prdc_max_samples=15_000, + ) + runner = ScaleUpRunner(cfg) + df = runner.load_frame() + train, holdout = runner.split(df) + LOGGER.info( + "loaded %d rows; train=%d holdout=%d", len(df), len(train), len(holdout) + ) + + results = [] + for method_name in args.methods: + LOGGER.info("== %s ==", method_name) + if method_name not in METHOD_REGISTRY: + LOGGER.warning("unknown method %r, skipping", method_name) + continue + method = METHOD_REGISTRY[method_name]() + t0 = time.time() + method.fit(sources={"ecps": train.copy()}, shared_cols=list(DEFAULT_CONDITION_COLS)) + fit_s = time.time() - t0 + + t0 = time.time() + synthetic = method.generate(len(train), seed=args.seed) + gen_s = time.time() - t0 + LOGGER.info(" fit=%.1fs gen=%.1fs n_synth=%d", fit_s, gen_s, len(synthetic)) + + constraints = build_target_constraints( + holdout, synthetic, DEFAULT_TARGET_COLS + ) + LOGGER.info(" %d calibration constraints", len(constraints)) + + synthetic = synthetic.copy() + synthetic["weight"] = 1.0 + + # Rescale initial weights so synth totals sum to holdout-scale before + # calibration. Otherwise gradient descent has to travel a long way. + for col in DEFAULT_TARGET_COLS: + if col not in holdout.columns or col not in synthetic.columns: + continue + r_sum = float(holdout[col].sum()) + s_sum = float(synthetic[col].sum()) + if r_sum > 0 and s_sum > 0: + synthetic["weight"] = synthetic["weight"] * (r_sum / s_sum) + break + + pre_weights = synthetic["weight"].to_numpy(dtype=float) + pre = evaluate_aggregates(holdout, synthetic, pre_weights, DEFAULT_TARGET_COLS) + + adapter = MicrocalibrateAdapter( + MicrocalibrateAdapterConfig( + epochs=args.calibration_epochs, + learning_rate=1e-3, + noise_level=0.0, + seed=args.seed, + ) + ) + t0 = time.time() + calibrated = adapter.fit_transform( + synthetic, + marginal_targets={}, + weight_col="weight", + linear_constraints=constraints, + ) + cal_s = time.time() - t0 + + post_weights = calibrated["weight"].to_numpy(dtype=float) + post = evaluate_aggregates( + holdout, calibrated, post_weights, DEFAULT_TARGET_COLS + ) + validation = adapter.validate() + + pre_mean_err = float( + np.mean([v["relative_error"] for v in pre.values()]) + ) + post_mean_err = float( + np.mean([v["relative_error"] for v in post.values()]) + ) + LOGGER.info( + " pre-cal mean rel err = %.4f; post-cal mean rel err = %.4f; cal=%.1fs", + pre_mean_err, + post_mean_err, + cal_s, + ) + + results.append( + { + "method": method_name, + "n_train": int(len(train)), + "n_holdout": int(len(holdout)), + "n_synthetic": int(len(synthetic)), + "n_constraints": int(len(constraints)), + "fit_wall_seconds": fit_s, + "generate_wall_seconds": gen_s, + "calibration_wall_seconds": cal_s, + "pre_cal_mean_rel_err": pre_mean_err, + "post_cal_mean_rel_err": post_mean_err, + "calibration_max_error": validation["max_error"], + "calibration_converged": validation["converged"], + "pre_cal_per_target": pre, + "post_cal_per_target": post, + "calibrated_weights_summary": { + "min": float(post_weights.min()), + "max": float(post_weights.max()), + "mean": float(post_weights.mean()), + "std": float(post_weights.std()), + "zero_fraction": float((post_weights == 0).mean()), + }, + } + ) + + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text(json.dumps(results, indent=2, default=str)) + + print() + print("== Pre / post mean-relative-error per method ==") + for r in sorted(results, key=lambda x: x["post_cal_mean_rel_err"]): + print( + f" {r['method']:8s}: pre={r['pre_cal_mean_rel_err']:.4f} " + f"post={r['post_cal_mean_rel_err']:.4f} " + f"max={r['calibration_max_error']:.4f} " + f"cal={r['calibration_wall_seconds']:.1f}s" + ) + + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/embedding_prdc_compare.py b/scripts/embedding_prdc_compare.py new file mode 100644 index 0000000..45717ad --- /dev/null +++ b/scripts/embedding_prdc_compare.py @@ -0,0 +1,269 @@ +"""Compare raw-feature PRDC vs learned-embedding PRDC on the stage-1 methods. + +The scale-up-protocol doc flagged that PRDC in ~50 dimensions may be +degenerate (curse of dimensionality: k-NN distances concentrate and the +metric becomes noise-dominated). This script settles the question. + +Procedure: + +1. Fit each of (ZI-QRF, ZI-MAF, ZI-QDNN) on 40k x 50 real ECPS. +2. Generate synthetic records from each. +3. Train a 16-dim autoencoder on the holdout's raw features only. +4. Compute PRDC in the raw 50-dim feature space (unchanged from stage 1). +5. Compute PRDC in the 16-dim learned latent space. +6. Report both side-by-side. If the ordering changes, the stage-1 + finding was metric-driven not method-driven; if it's preserved, the + finding is robust. + +Usage: + uv run python scripts/embedding_prdc_compare.py \ + --output artifacts/embedding_prdc_compare.json + +Runs in ~5 minutes on 40 k rows x 50 cols (driven by ZI-MAF fit time). +""" + +from __future__ import annotations + +import argparse +import json +import logging +import time +from pathlib import Path + +import numpy as np +import pandas as pd +import torch +import torch.nn as nn +from prdc import compute_prdc +from sklearn.preprocessing import StandardScaler + +from microplex.eval.benchmark import ZIMAFMethod, ZIQDNNMethod, ZIQRFMethod +from microplex_us.bakeoff import ( + DEFAULT_CONDITION_COLS, + DEFAULT_TARGET_COLS, + ScaleUpRunner, + ScaleUpStageConfig, + stage1_config, +) + +LOGGER = logging.getLogger(__name__) + + +class Autoencoder(nn.Module): + """Tiny autoencoder for dimensionality reduction on tabular features.""" + + def __init__(self, n_features: int, latent_dim: int = 16, hidden: int = 64) -> None: + super().__init__() + self.encoder = nn.Sequential( + nn.Linear(n_features, hidden), + nn.ReLU(), + nn.Linear(hidden, hidden), + nn.ReLU(), + nn.Linear(hidden, latent_dim), + ) + self.decoder = nn.Sequential( + nn.Linear(latent_dim, hidden), + nn.ReLU(), + nn.Linear(hidden, hidden), + nn.ReLU(), + nn.Linear(hidden, n_features), + ) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + return self.decoder(self.encoder(x)) + + def encode(self, x: torch.Tensor) -> torch.Tensor: + return self.encoder(x) + + +def fit_autoencoder( + x: np.ndarray, latent_dim: int = 16, epochs: int = 200, lr: float = 1e-3 +) -> Autoencoder: + """Fit an autoencoder on standardized features.""" + n_features = x.shape[1] + model = Autoencoder(n_features=n_features, latent_dim=latent_dim) + x_t = torch.tensor(x, dtype=torch.float32) + optimizer = torch.optim.Adam(model.parameters(), lr=lr) + batch_size = 256 + ds = torch.utils.data.TensorDataset(x_t) + g = torch.Generator() + g.manual_seed(42) + loader = torch.utils.data.DataLoader(ds, batch_size=batch_size, shuffle=True, generator=g) + + model.train() + for epoch in range(epochs): + total = 0.0 + for (batch,) in loader: + optimizer.zero_grad() + recon = model(batch) + loss = ((recon - batch) ** 2).mean() + loss.backward() + optimizer.step() + total += loss.item() * len(batch) + if (epoch + 1) % 50 == 0: + LOGGER.info(" AE epoch %d loss=%.4f", epoch + 1, total / len(x)) + model.eval() + return model + + +def encode(model: Autoencoder, x: np.ndarray) -> np.ndarray: + with torch.no_grad(): + return model.encode(torch.tensor(x, dtype=torch.float32)).numpy() + + +def compute_prdc_both_spaces( + real: pd.DataFrame, + synthetic: pd.DataFrame, + encoder: Autoencoder, + scaler: StandardScaler, + k: int = 5, + max_samples: int = 15_000, + seed: int = 42, +) -> dict: + """Return {raw: ..., embed: ...} PRDC tuples.""" + rng = np.random.default_rng(seed) + cols = [c for c in real.columns if c in synthetic.columns] + r = real[cols].to_numpy(dtype=np.float64) + s = synthetic[cols].to_numpy(dtype=np.float64) + if len(r) > max_samples: + r = r[rng.choice(len(r), size=max_samples, replace=False)] + if len(s) > max_samples: + s = s[rng.choice(len(s), size=max_samples, replace=False)] + + raw_r = scaler.transform(r) + raw_s = scaler.transform(s) + raw_metrics = compute_prdc(raw_r, raw_s, nearest_k=k) + + emb_r = encode(encoder, raw_r.astype(np.float32)) + emb_s = encode(encoder, raw_s.astype(np.float32)) + emb_metrics = compute_prdc(emb_r, emb_s, nearest_k=k) + + return { + "raw": {k: float(v) for k, v in raw_metrics.items()}, + "embed": {k: float(v) for k, v in emb_metrics.items()}, + } + + +def build_method(name: str): + registry = { + "ZI-QRF": ZIQRFMethod, + "ZI-MAF": ZIMAFMethod, + "ZI-QDNN": ZIQDNNMethod, + } + return registry[name]() + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("--n-rows", type=int, default=40_000) + parser.add_argument( + "--methods", nargs="+", default=["ZI-QRF", "ZI-MAF", "ZI-QDNN"] + ) + parser.add_argument( + "--output", + type=Path, + default=Path("artifacts/embedding_prdc_compare.json"), + ) + parser.add_argument("--seed", type=int, default=42) + parser.add_argument("--latent-dim", type=int, default=16) + parser.add_argument("--ae-epochs", type=int, default=200) + args = parser.parse_args(argv) + + logging.basicConfig( + level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s" + ) + + base = stage1_config() + cfg = ScaleUpStageConfig( + stage="embedding_prdc", + n_rows=args.n_rows, + methods=tuple(args.methods), + condition_cols=DEFAULT_CONDITION_COLS, + target_cols=DEFAULT_TARGET_COLS, + holdout_frac=0.2, + seed=args.seed, + k=5, + data_path=base.data_path, + year=base.year, + rare_cell_checks=(), + prdc_max_samples=15_000, + ) + + runner = ScaleUpRunner(cfg) + df = runner.load_frame() + train, holdout = runner.split(df) + LOGGER.info( + "loaded: train=%d holdout=%d cols=%d", len(train), len(holdout), len(df.columns) + ) + + scaler = StandardScaler().fit(holdout.to_numpy(dtype=np.float64)) + + LOGGER.info("fitting autoencoder on holdout...") + t0 = time.time() + encoder = fit_autoencoder( + scaler.transform(holdout.to_numpy(dtype=np.float64)).astype(np.float32), + latent_dim=args.latent_dim, + epochs=args.ae_epochs, + ) + LOGGER.info(" autoencoder fit=%.1fs", time.time() - t0) + + results = [] + for method_name in args.methods: + LOGGER.info("== %s ==", method_name) + method = build_method(method_name) + t0 = time.time() + method.fit(sources={"ecps": train.copy()}, shared_cols=list(DEFAULT_CONDITION_COLS)) + fit_s = time.time() - t0 + + t0 = time.time() + synth = method.generate(len(train), seed=args.seed) + gen_s = time.time() - t0 + + metrics = compute_prdc_both_spaces( + holdout, synth, encoder, scaler, k=5, seed=args.seed + ) + LOGGER.info( + " raw: prec=%.3f dens=%.3f cov=%.3f", + metrics["raw"]["precision"], + metrics["raw"]["density"], + metrics["raw"]["coverage"], + ) + LOGGER.info( + " embed: prec=%.3f dens=%.3f cov=%.3f (fit=%.1fs gen=%.1fs)", + metrics["embed"]["precision"], + metrics["embed"]["density"], + metrics["embed"]["coverage"], + fit_s, + gen_s, + ) + results.append( + { + "method": method_name, + "fit_wall_seconds": fit_s, + "generate_wall_seconds": gen_s, + **metrics, + } + ) + + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text(json.dumps(results, indent=2, default=str)) + + print() + print("== Raw-feature PRDC (50-dim) ==") + for r in sorted(results, key=lambda x: -x["raw"]["coverage"]): + print( + f" {r['method']:8s}: cov={r['raw']['coverage']:.3f} " + f"prec={r['raw']['precision']:.3f} dens={r['raw']['density']:.3f}" + ) + print() + print(f"== Learned-embedding PRDC ({args.latent_dim}-dim) ==") + for r in sorted(results, key=lambda x: -x["embed"]["coverage"]): + print( + f" {r['method']:8s}: cov={r['embed']['coverage']:.3f} " + f"prec={r['embed']['precision']:.3f} dens={r['embed']['density']:.3f}" + ) + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/isolate_calibration_memory.py b/scripts/isolate_calibration_memory.py new file mode 100644 index 0000000..1106123 --- /dev/null +++ b/scripts/isolate_calibration_memory.py @@ -0,0 +1,195 @@ +"""Isolate the calibration stage and profile its peak memory. + +The v7 (microcalibrate) and v8 (pe_l0) pipelines both OOM'd at the +calibration step with ~172–197 GB of compressed memory on a 48 GB +workstation. PE-US-data's production setup runs the same L0 fit on a +T4 GPU (16 GB VRAM) successfully, which strongly suggests our +pipeline has a leak or duplication an order of magnitude larger than +the legitimate workload. + +This script runs ``fit_l0_weights`` on a synthetic sparse matrix that +matches the v7 shape (1.5M records × 4k constraints, ~5% density) +*without* the surrounding pipeline. If it OOMs in isolation, the +problem is inside the L0 fit itself. If it completes at a reasonable +memory footprint, the leak is upstream (PE-table construction, +intermediate frame retained in memory, adapter build, etc.) and we +should bisect further. + +Usage: + + uv run python scripts/isolate_calibration_memory.py \ + --n-records 1500000 --n-constraints 4000 --density 0.05 \ + --epochs 5 + +Smaller smoke: + + uv run python scripts/isolate_calibration_memory.py \ + --n-records 100000 --n-constraints 500 --density 0.05 --epochs 2 +""" + +from __future__ import annotations + +import argparse +import gc +import os +import resource +import sys +import time +from dataclasses import dataclass +from typing import Any + +import numpy as np +from scipy import sparse as sp + + +def _peak_rss_gb() -> float: + """Return current process peak RSS in GB (platform-aware).""" + r = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss + if sys.platform == "darwin": + # macOS reports bytes. + return r / (1024**3) + # Linux / most BSDs: kilobytes. + return r * 1024 / (1024**3) + + +@dataclass +class Stage: + name: str + elapsed_s: float + peak_rss_gb: float + + +def _timestamp_stage(name: str, t0: float) -> Stage: + elapsed = time.perf_counter() - t0 + peak = _peak_rss_gb() + print( + f"[{elapsed:>7.1f}s | peak RSS {peak:>6.2f} GB] {name}", + flush=True, + ) + return Stage(name=name, elapsed_s=elapsed, peak_rss_gb=peak) + + +def build_synthetic_problem( + n_records: int, + n_constraints: int, + density: float, + seed: int = 42, +) -> tuple[sp.csr_matrix, np.ndarray, np.ndarray, list[str]]: + """Synthetic calibration fixture matching the v7/v8 shape. + + Builds a ``(n_constraints, n_records)`` CSR matrix at the given + density with binary-indicator-ish entries (uniform in [0, 1] for + the nonzero entries — enough to exercise torch.sparse.mm paths + without the realism of a PE constraint system). + """ + rng = np.random.default_rng(seed) + total = n_constraints * n_records + nnz = int(total * density) + rows = rng.integers(0, n_constraints, size=nnz) + cols = rng.integers(0, n_records, size=nnz) + data = rng.uniform(0.5, 1.5, size=nnz).astype(np.float64) + X = sp.csr_matrix( + (data, (rows, cols)), + shape=(n_constraints, n_records), + dtype=np.float64, + ) + weights = rng.uniform(0.5, 2.0, size=n_records).astype(np.float64) + estimated = X @ weights + # Perturb each target by ±20% so the calibration has real work to do. + targets = estimated * rng.uniform(0.8, 1.2, size=n_constraints) + target_names = [f"t{i}" for i in range(n_constraints)] + return X, targets, weights, target_names + + +def fit_l0( + X_sparse: sp.csr_matrix, + targets: np.ndarray, + initial_weights: np.ndarray, + target_names: list[str], + epochs: int, + device: str, + lambda_l0: float, +) -> np.ndarray: + """Delegate to PE-US-data's fit_l0_weights (same path pe_l0.py calls).""" + try: + from policyengine_us_data.calibration.unified_calibration import ( + fit_l0_weights, + ) + except ImportError as exc: + raise SystemExit( + f"policyengine-us-data not importable: {exc}. Install it or " + "run this script from the microplex-us venv." + ) from exc + + achievable = np.asarray(X_sparse.sum(axis=1)).reshape(-1) > 0 + return fit_l0_weights( + X_sparse=X_sparse, + targets=targets, + lambda_l0=lambda_l0, + epochs=epochs, + device=device, + verbose_freq=max(1, epochs // 5), + target_names=target_names, + initial_weights=initial_weights, + achievable=achievable, + ) + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description=__doc__ or "") + parser.add_argument("--n-records", type=int, default=100_000) + parser.add_argument("--n-constraints", type=int, default=500) + parser.add_argument("--density", type=float, default=0.05) + parser.add_argument("--epochs", type=int, default=2) + parser.add_argument("--device", default="cpu") + parser.add_argument("--lambda-l0", type=float, default=1e-4) + parser.add_argument("--seed", type=int, default=42) + args = parser.parse_args(argv) + + print( + f"Configuration: n_records={args.n_records:,} " + f"n_constraints={args.n_constraints:,} density={args.density} " + f"epochs={args.epochs} device={args.device}", + flush=True, + ) + + stages: list[Stage] = [] + + t0 = time.perf_counter() + X, targets, weights, names = build_synthetic_problem( + n_records=args.n_records, + n_constraints=args.n_constraints, + density=args.density, + seed=args.seed, + ) + stages.append(_timestamp_stage("build CSR + targets + weights", t0)) + print( + f" CSR shape {X.shape}, nnz={X.nnz:,} " + f"({X.nnz * 12 / 1024**3:.2f} GB raw storage estimate)", + flush=True, + ) + + t0 = time.perf_counter() + fit_l0( + X_sparse=X, + targets=targets, + initial_weights=weights, + target_names=names, + epochs=args.epochs, + device=args.device, + lambda_l0=args.lambda_l0, + ) + stages.append(_timestamp_stage("fit_l0_weights complete", t0)) + + gc.collect() + stages.append(_timestamp_stage("after gc.collect", time.perf_counter())) + + print("\n--- summary ---") + for s in stages: + print(f" {s.name:<40} {s.elapsed_s:>8.1f}s peak={s.peak_rss_gb:>6.2f} GB") + print(f"\nFinal peak RSS: {_peak_rss_gb():.2f} GB") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/run_b2_batched.py b/scripts/run_b2_batched.py new file mode 100644 index 0000000..cf90603 --- /dev/null +++ b/scripts/run_b2_batched.py @@ -0,0 +1,256 @@ +"""Batched Microsimulation aggregate for one variable. + +The naive one-shot ``Microsimulation.calculate(income_tax, 2024).sum()`` +OOMs on 1.5M households because the dependency chain materializes +~100+ intermediate arrays (each 3.4M floats = 27 MB) in memory +simultaneously. This runner subsets the h5 into household-size chunks, +runs a fresh Microsimulation per chunk, and accumulates the weighted +sum. + +Entity-level subsetting is done by index, matching +``policyengine_us_data``'s h5 layout: household-level arrays index by +position in ``household_id``; person-level arrays index by position in +``person_household_id``; same for tax_unit, spm_unit, family, +marital_unit. +""" + +from __future__ import annotations + +import argparse +import json +import sys +import tempfile +import time +from pathlib import Path + +import h5py +import numpy as np + +HOUSEHOLD_ID = "household_id" + +ENTITY_ID_COLUMNS = { + "household": "household_id", + "person": "person_id", + "tax_unit": "tax_unit_id", + "spm_unit": "spm_unit_id", + "family": "family_id", + "marital_unit": "marital_unit_id", +} +# Person → group-entity foreign keys. +PERSON_TO_GROUP_LINK = { + "tax_unit": "person_tax_unit_id", + "spm_unit": "person_spm_unit_id", + "family": "person_family_id", + "marital_unit": "person_marital_unit_id", +} +STRUCTURAL_VARIABLE_ENTITIES = { + "household_id": "household", + "household_weight": "household", + "person_id": "person", + "person_household_id": "person", + "person_weight": "person", + "tax_unit_id": "tax_unit", + "person_tax_unit_id": "person", + "tax_unit_weight": "tax_unit", + "spm_unit_id": "spm_unit", + "person_spm_unit_id": "person", + "spm_unit_weight": "spm_unit", + "family_id": "family", + "person_family_id": "person", + "family_weight": "family", + "marital_unit_id": "marital_unit", + "person_marital_unit_id": "person", + "marital_unit_weight": "marital_unit", +} + + +def _load_all_arrays(h5_path: Path, period_key: str) -> dict[str, np.ndarray]: + with h5py.File(h5_path, "r") as f: + out = {} + for key in f.keys(): + if period_key in f[key]: + out[key] = np.asarray(f[key][period_key]) + return out + + +def _load_policyengine_variable_entities() -> dict[str, str]: + try: + from policyengine_us import ( + system as policyengine_system_module, # noqa: PLC0415 + ) + except ImportError: + return {} + + tax_benefit_system = getattr(policyengine_system_module, "system", None) + if tax_benefit_system is None: + return {} + variables = getattr(tax_benefit_system, "variables", {}) + entity_map: dict[str, str] = {} + for name, metadata in variables.items(): + entity_key = getattr(getattr(metadata, "entity", None), "key", None) + if entity_key is not None: + entity_map[str(name)] = str(entity_key) + return entity_map + + +def _entity_of( + variable: str, + arrays: dict[str, np.ndarray], + *, + variable_entities: dict[str, str] | None = None, +) -> str: + """Classify a variable, preferring PE metadata over fragile length matching.""" + explicit_entity = STRUCTURAL_VARIABLE_ENTITIES.get(variable) + if explicit_entity is not None: + return explicit_entity + if variable_entities is not None and variable in variable_entities: + return variable_entities[variable] + n = len(arrays[variable]) + entity_lengths = { + entity: len(arrays[id_col]) + for entity, id_col in ENTITY_ID_COLUMNS.items() + if id_col in arrays + } + matches = [entity for entity, length in entity_lengths.items() if length == n] + if len(matches) == 1: + return matches[0] + if len(matches) > 1: + raise ValueError( + f"Ambiguous entity for variable {variable!r}: matched {matches} by length" + ) + return "unknown" + + +def _build_entity_masks( + arrays: dict[str, np.ndarray], chunk_hh_ids: np.ndarray +) -> dict[str, np.ndarray]: + """Produce boolean masks into each entity array for the households in ``chunk_hh_ids``.""" + hh_id = arrays["household_id"] + masks: dict[str, np.ndarray] = {} + masks["household"] = np.isin(hh_id, chunk_hh_ids) + person_hh = arrays["person_household_id"] + person_mask = np.isin(person_hh, chunk_hh_ids) + masks["person"] = person_mask + for entity, link_col in PERSON_TO_GROUP_LINK.items(): + id_col = ENTITY_ID_COLUMNS[entity] + if link_col not in arrays or id_col not in arrays: + continue + group_ids_in_chunk = np.unique(arrays[link_col][person_mask]) + masks[entity] = np.isin(arrays[id_col], group_ids_in_chunk) + return masks + + +def _write_chunk_h5( + arrays: dict[str, np.ndarray], + entity_masks: dict[str, np.ndarray], + period_key: str, + tmp_path: Path, + *, + variable_entities: dict[str, str] | None = None, +) -> None: + """Write a subset h5 keeping only rows matching each variable's entity mask.""" + with h5py.File(tmp_path, "w") as f: + for variable, values in arrays.items(): + entity = _entity_of( + variable, + arrays, + variable_entities=variable_entities, + ) + mask = entity_masks.get(entity) + if mask is None or len(values) != len(mask): + continue + group = f.create_group(variable) + group.create_dataset(period_key, data=values[mask]) + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("--dataset", required=True, type=Path) + parser.add_argument("--variable", required=True, type=str) + parser.add_argument("--period", default=2024, type=int) + parser.add_argument("--batch-size", default=50_000, type=int) + parser.add_argument("--output", required=True, type=Path) + args = parser.parse_args() + + period_key = str(args.period) + print(f"[{time.strftime('%H:%M:%S')}] loading all arrays from {args.dataset}", flush=True) + arrays = _load_all_arrays(args.dataset, period_key) + variable_entities = _load_policyengine_variable_entities() + print( + f"[{time.strftime('%H:%M:%S')}] loaded {len(arrays)} variables", + flush=True, + ) + + hh_ids = arrays[HOUSEHOLD_ID] + n_hh = len(hh_ids) + print(f"[{time.strftime('%H:%M:%S')}] {n_hh} households; batch_size={args.batch_size}", flush=True) + + total = 0.0 + n_batches = (n_hh + args.batch_size - 1) // args.batch_size + + from policyengine_us import Microsimulation # noqa: PLC0415 + + from microplex_us.validation.downstream import ( # noqa: PLC0415 + compute_downstream_weighted_aggregate, + ) + + for batch_idx in range(n_batches): + start = batch_idx * args.batch_size + end = min(start + args.batch_size, n_hh) + chunk_hh_ids = hh_ids[start:end] + + entity_masks = _build_entity_masks(arrays, chunk_hh_ids) + + with tempfile.TemporaryDirectory() as tmp: + tmp_path = Path(tmp) / "chunk.h5" + _write_chunk_h5( + arrays, + entity_masks, + period_key, + tmp_path, + variable_entities=variable_entities, + ) + + t0 = time.time() + sim = Microsimulation(dataset=str(tmp_path)) + chunk_sum = compute_downstream_weighted_aggregate( + sim, + args.variable, + args.period, + ) + total += chunk_sum + elapsed = time.time() - t0 + + print( + f"[{time.strftime('%H:%M:%S')}] batch {batch_idx+1}/{n_batches} " + f"(households {start}-{end}): ${chunk_sum/1e9:.3f}B " + f"cumulative=${total/1e9:.3f}B ({elapsed:.1f}s)", + flush=True, + ) + + print( + f"\n[{time.strftime('%H:%M:%S')}] {args.variable} total = ${total/1e9:.2f}B", + flush=True, + ) + args.output.parent.mkdir(parents=True, exist_ok=True) + raw_agg_path = args.output.with_suffix(".raw.json") + raw_aggs = ( + json.loads(raw_agg_path.read_text()) if raw_agg_path.exists() else {} + ) + raw_aggs[args.variable] = total + raw_agg_path.write_text(json.dumps(raw_aggs, indent=2)) + + from microplex_us.validation.downstream import ( # noqa: PLC0415 + DOWNSTREAM_BENCHMARKS_2024, + compute_downstream_comparison, + ) + + comparison = compute_downstream_comparison(raw_aggs, DOWNSTREAM_BENCHMARKS_2024) + report = {name: rec.to_dict() for name, rec in comparison.items()} + args.output.write_text(json.dumps(report, indent=2)) + print(f"[{time.strftime('%H:%M:%S')}] wrote {args.output}", flush=True) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/scripts/run_b2_validation.py b/scripts/run_b2_validation.py new file mode 100644 index 0000000..380dfe1 --- /dev/null +++ b/scripts/run_b2_validation.py @@ -0,0 +1,82 @@ +"""Run B2 downstream validation on a calibrated PE-US h5. + +One variable at a time, flushing progress and intermediate output to +disk so a partial run leaves usable state. Uses the +``microplex_us.validation.downstream`` module for the benchmark set. +""" + +from __future__ import annotations + +import argparse +import json +import sys +import time +from pathlib import Path + +from microplex_us.validation.downstream import ( + DOWNSTREAM_BENCHMARKS_2024, + compute_downstream_comparison, + compute_downstream_weighted_aggregate, +) + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("--dataset", required=True, type=Path) + parser.add_argument("--output", required=True, type=Path) + parser.add_argument("--period", default=2024, type=int) + args = parser.parse_args() + + print(f"[{time.strftime('%H:%M:%S')}] loading Microsimulation from {args.dataset}", flush=True) + from policyengine_us import Microsimulation + + sim = Microsimulation(dataset=str(args.dataset)) + print(f"[{time.strftime('%H:%M:%S')}] loaded", flush=True) + + variables = [spec.name for spec in DOWNSTREAM_BENCHMARKS_2024] + aggregates: dict[str, float] = {} + + args.output.parent.mkdir(parents=True, exist_ok=True) + intermediate_path = args.output.with_suffix(".partial.json") + + for variable in variables: + t0 = time.time() + print(f"[{time.strftime('%H:%M:%S')}] computing {variable} ...", flush=True) + try: + total = compute_downstream_weighted_aggregate(sim, variable, args.period) + except Exception as exc: + print(f" {variable}: FAILED ({exc})", flush=True) + aggregates[variable] = float("nan") + else: + aggregates[variable] = total + elapsed = time.time() - t0 + print( + f" {variable}: ${total/1e9:,.2f}B (in {elapsed:.1f}s)", + flush=True, + ) + # Flush partial state to disk after each variable so an OOM + # kill after N variables still leaves N results on disk. + intermediate_path.write_text(json.dumps(aggregates, indent=2)) + + comparison = compute_downstream_comparison(aggregates, DOWNSTREAM_BENCHMARKS_2024) + report = {name: rec.to_dict() for name, rec in comparison.items()} + args.output.write_text(json.dumps(report, indent=2)) + intermediate_path.unlink(missing_ok=True) + + print(f"\n[{time.strftime('%H:%M:%S')}] B2 validation complete", flush=True) + print(f"Wrote {args.output}", flush=True) + + print(f"\n{'variable':<12s} {'computed':>12s} {'benchmark':>12s} {'rel_error':>10s}") + for name, rec in sorted(comparison.items()): + rel = rec.rel_error + rel_str = f"{rel*100:+.1f}%" if rel is not None else "N/A" + print( + f"{name:<12s} ${rec.computed/1e9:>9.2f}B " + f"${rec.benchmark/1e9:>9.2f}B {rel_str:>10s}", + flush=True, + ) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/scripts/run_b2_validation_single_var.py b/scripts/run_b2_validation_single_var.py new file mode 100644 index 0000000..d67abf1 --- /dev/null +++ b/scripts/run_b2_validation_single_var.py @@ -0,0 +1,62 @@ +"""Compute one B2 downstream aggregate in a fresh process. + +Fresh-per-variable keeps the peak memory of each variable independent +so one heavy variable (e.g. income_tax) OOM-killing doesn't wipe out +progress on the others. Append-writes to the output JSON. +""" + +from __future__ import annotations + +import argparse +import json +import sys +import time +from pathlib import Path + +from microplex_us.validation.downstream import ( + DOWNSTREAM_BENCHMARKS_2024, + compute_downstream_comparison, + compute_downstream_weighted_aggregate, +) + + +def main() -> int: + parser = argparse.ArgumentParser() + parser.add_argument("--dataset", required=True, type=Path) + parser.add_argument("--output", required=True, type=Path) + parser.add_argument("--variable", required=True, type=str) + parser.add_argument("--period", default=2024, type=int) + args = parser.parse_args() + + print(f"[{time.strftime('%H:%M:%S')}] loading Microsimulation", flush=True) + from policyengine_us import Microsimulation + + sim = Microsimulation(dataset=str(args.dataset)) + print(f"[{time.strftime('%H:%M:%S')}] loaded — computing {args.variable}", flush=True) + t0 = time.time() + total = compute_downstream_weighted_aggregate(sim, args.variable, args.period) + elapsed = time.time() - t0 + print( + f"[{time.strftime('%H:%M:%S')}] {args.variable} = ${total/1e9:.2f}B " + f"(in {elapsed:.1f}s)", + flush=True, + ) + + args.output.parent.mkdir(parents=True, exist_ok=True) + # Re-read intermediate file if present (accumulates across runs). + raw_agg_path = args.output.with_suffix(".raw.json") + raw_aggs = ( + json.loads(raw_agg_path.read_text()) if raw_agg_path.exists() else {} + ) + raw_aggs[args.variable] = total + raw_agg_path.write_text(json.dumps(raw_aggs, indent=2)) + + comparison = compute_downstream_comparison(raw_aggs, DOWNSTREAM_BENCHMARKS_2024) + report = {name: rec.to_dict() for name, rec in comparison.items()} + args.output.write_text(json.dumps(report, indent=2)) + print(f"[{time.strftime('%H:%M:%S')}] updated {args.output}", flush=True) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/scripts/zi_classifier_isolated_eval.py b/scripts/zi_classifier_isolated_eval.py new file mode 100644 index 0000000..e6b045d --- /dev/null +++ b/scripts/zi_classifier_isolated_eval.py @@ -0,0 +1,322 @@ +"""Isolated per-column ZI classifier evaluation. + +Answers the diagnostic question behind the 5-way ZI-QDNN coverage tie: if we +strip the downstream draw network out of the loop and evaluate only the +zero/non-zero classifier's own calibration and discrimination, do the five +candidates still look equivalent? + +Protocol +-------- + +- Same data as the coverage benchmark: enhanced_cps_2024, 77,006 persons, 14 + conditioning columns, 36 target columns, seed 42. +- Same outer 80/20 train/holdout split used by ScaleUpRunner. +- For each target column with training-set zero-fraction >= 10% (the upstream + ZI trigger) and at least 10 zero + 10 non-zero training rows, further split + training 80/20 (seed 42) into fit / val. +- Label is (~at_min).astype(int), matching `_MultiSourceBase.fit`. +- Fit each of 5 classifiers on (X_fit, label_fit), predict P(y>0) on X_val. +- Report: log-loss, Brier, ECE (10 equal-width bins), ROC-AUC, fit seconds. + +Aggregation +----------- + +For each classifier, report column-count-weighted mean and median across the +eligible target columns. The RF default should be the baseline everything else +is compared against, since it is what the coverage benchmark locked in. +""" + +from __future__ import annotations + +import argparse +import json +import logging +import time +from pathlib import Path +from typing import Any, Callable + +import numpy as np +import pandas as pd +from sklearn.metrics import brier_score_loss, log_loss, roc_auc_score + +from microplex_us.bakeoff.local_methods import ( + _dnn_factory, + _hgb_factory, + _logistic_factory, + _rf_calibrated_factory, +) +from microplex_us.bakeoff.scale_up import ( + DEFAULT_CONDITION_COLS, + DEFAULT_TARGET_COLS, + _load_enhanced_cps, + DEFAULT_ENHANCED_CPS_PATH, +) + +LOGGER = logging.getLogger(__name__) + + +def _rf_default_factory(): + from sklearn.ensemble import RandomForestClassifier + + return RandomForestClassifier(n_estimators=50, random_state=42, n_jobs=-1) + + +CLASSIFIERS: dict[str, Callable[[], Any]] = { + "RF_default": _rf_default_factory, + "Logistic": _logistic_factory, + "HistGB": _hgb_factory, + "RF_calibrated": _rf_calibrated_factory, + "DNN": _dnn_factory, +} + + +def _expected_calibration_error( + y_true: np.ndarray, p_hat: np.ndarray, n_bins: int = 10 +) -> float: + """Equal-width ECE: sum over bins of (n_bin/N) * |acc - conf|.""" + edges = np.linspace(0.0, 1.0, n_bins + 1) + ece = 0.0 + n = len(y_true) + for i in range(n_bins): + lo, hi = edges[i], edges[i + 1] + if i == n_bins - 1: + mask = (p_hat >= lo) & (p_hat <= hi) + else: + mask = (p_hat >= lo) & (p_hat < hi) + if not mask.any(): + continue + bin_conf = float(p_hat[mask].mean()) + bin_acc = float(y_true[mask].mean()) + ece += (mask.sum() / n) * abs(bin_conf - bin_acc) + return float(ece) + + +def _positive_class_proba(clf: Any, X: np.ndarray) -> np.ndarray: + """Return P(y == 1 | x) regardless of how the classifier orders classes.""" + proba = clf.predict_proba(X) + classes = np.asarray(clf.classes_) + pos_idx = int(np.where(classes == 1)[0][0]) + return proba[:, pos_idx] + + +def evaluate_column( + col: str, + X_fit: np.ndarray, + y_fit_label: np.ndarray, + X_val: np.ndarray, + y_val_label: np.ndarray, +) -> dict[str, dict[str, float]]: + """Fit every classifier on (X_fit, y_fit_label); score on val.""" + results: dict[str, dict[str, float]] = {} + for name, factory in CLASSIFIERS.items(): + clf = factory() + t0 = time.perf_counter() + clf.fit(X_fit, y_fit_label) + fit_s = time.perf_counter() - t0 + p_hat = _positive_class_proba(clf, X_val) + p_hat = np.clip(p_hat, 1e-6, 1 - 1e-6) + ll = float(log_loss(y_val_label, p_hat, labels=[0, 1])) + brier = float(brier_score_loss(y_val_label, p_hat)) + ece = _expected_calibration_error(y_val_label, p_hat, n_bins=10) + try: + auc = float(roc_auc_score(y_val_label, p_hat)) + except ValueError: + auc = float("nan") + results[name] = { + "log_loss": ll, + "brier": brier, + "ece": ece, + "auc": auc, + "fit_s": fit_s, + } + return results + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description=__doc__ or "") + parser.add_argument( + "--data-path", type=Path, default=DEFAULT_ENHANCED_CPS_PATH + ) + parser.add_argument("--year", default="2024") + parser.add_argument("--seed", type=int, default=42) + parser.add_argument("--holdout-frac", type=float, default=0.2) + parser.add_argument("--inner-val-frac", type=float, default=0.2) + parser.add_argument("--zero-threshold", type=float, default=0.1) + parser.add_argument( + "--output", + type=Path, + default=Path( + "/Users/maxghenis/CosilicoAI/microplex-us/artifacts/" + "zi_classifier_isolated_eval.json" + ), + ) + parser.add_argument("--log-level", default="INFO") + args = parser.parse_args(argv) + logging.basicConfig( + level=getattr(logging, args.log_level), + format="%(asctime)s %(levelname)s %(name)s: %(message)s", + ) + + columns = list(DEFAULT_CONDITION_COLS) + list(DEFAULT_TARGET_COLS) + df = _load_enhanced_cps(args.data_path, args.year, columns) + df = df.astype(np.float32) + LOGGER.info("loaded %d rows x %d cols", len(df), len(df.columns)) + + rng = np.random.default_rng(args.seed) + idx = rng.permutation(len(df)) + cut = int(len(df) * (1.0 - args.holdout_frac)) + train = df.iloc[idx[:cut]].reset_index(drop=True) + LOGGER.info("outer split: %d train rows (holdout discarded, not needed here)", len(train)) + + inner_rng = np.random.default_rng(args.seed + 1) + inner_idx = inner_rng.permutation(len(train)) + inner_cut = int(len(train) * (1.0 - args.inner_val_frac)) + fit_idx, val_idx = inner_idx[:inner_cut], inner_idx[inner_cut:] + LOGGER.info("inner split: %d fit / %d val", len(fit_idx), len(val_idx)) + + cond = list(DEFAULT_CONDITION_COLS) + X_train_all = train[cond].to_numpy() + X_fit_all = X_train_all[fit_idx] + X_val_all = X_train_all[val_idx] + + per_col: dict[str, Any] = {} + eligible: list[str] = [] + skipped: list[dict[str, Any]] = [] + + for col in DEFAULT_TARGET_COLS: + y = train[col].to_numpy() + min_val = float(np.nanmin(y)) + at_min = np.isclose(y, min_val, atol=1e-6) + zero_frac = float(at_min.mean()) + label = (~at_min).astype(int) + + fit_label = label[fit_idx] + val_label = label[val_idx] + n_zero_fit = int((fit_label == 0).sum()) + n_pos_fit = int((fit_label == 1).sum()) + n_zero_val = int((val_label == 0).sum()) + n_pos_val = int((val_label == 1).sum()) + + if zero_frac < args.zero_threshold: + skipped.append( + {"col": col, "reason": "below_zero_threshold", "zero_frac": zero_frac} + ) + continue + if n_zero_fit < 10 or n_pos_fit < 10: + skipped.append( + { + "col": col, + "reason": "insufficient_class_counts_fit", + "n_zero_fit": n_zero_fit, + "n_pos_fit": n_pos_fit, + } + ) + continue + if n_zero_val < 1 or n_pos_val < 1: + skipped.append( + { + "col": col, + "reason": "insufficient_class_counts_val", + "n_zero_val": n_zero_val, + "n_pos_val": n_pos_val, + } + ) + continue + + LOGGER.info( + "== %s == zero_frac=%.3f fit=%d/%d val=%d/%d (zero/pos)", + col, + zero_frac, + n_zero_fit, + n_pos_fit, + n_zero_val, + n_pos_val, + ) + + col_result = evaluate_column( + col=col, + X_fit=X_fit_all, + y_fit_label=fit_label, + X_val=X_val_all, + y_val_label=val_label, + ) + + per_col[col] = { + "zero_frac_train": zero_frac, + "min_val": min_val, + "n_zero_fit": n_zero_fit, + "n_pos_fit": n_pos_fit, + "n_zero_val": n_zero_val, + "n_pos_val": n_pos_val, + "classifiers": col_result, + } + eligible.append(col) + + summary = " ".join( + f"{clf}=ll{m['log_loss']:.4f}/auc{m['auc']:.3f}" + for clf, m in col_result.items() + ) + LOGGER.info(" %s", summary) + + # Aggregate across eligible columns + aggregate: dict[str, dict[str, float]] = {} + for clf in CLASSIFIERS: + rows = [per_col[c]["classifiers"][clf] for c in eligible] + if not rows: + continue + agg = { + "log_loss_mean": float(np.mean([r["log_loss"] for r in rows])), + "log_loss_median": float(np.median([r["log_loss"] for r in rows])), + "brier_mean": float(np.mean([r["brier"] for r in rows])), + "ece_mean": float(np.mean([r["ece"] for r in rows])), + "auc_mean": float(np.nanmean([r["auc"] for r in rows])), + "auc_median": float(np.nanmedian([r["auc"] for r in rows])), + "fit_s_total": float(np.sum([r["fit_s"] for r in rows])), + } + aggregate[clf] = agg + + out = { + "config": { + "data_path": str(args.data_path), + "year": args.year, + "seed": args.seed, + "holdout_frac": args.holdout_frac, + "inner_val_frac": args.inner_val_frac, + "zero_threshold": args.zero_threshold, + "n_train_rows": len(train), + "n_fit_rows": len(fit_idx), + "n_val_rows": len(val_idx), + "condition_cols": list(DEFAULT_CONDITION_COLS), + "target_cols": list(DEFAULT_TARGET_COLS), + "eligible_cols": eligible, + "skipped": skipped, + }, + "per_column": per_col, + "aggregate": aggregate, + } + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text(json.dumps(out, indent=2, default=str)) + LOGGER.info("wrote %s", args.output) + + print() + print(f"Eligible columns (zero_frac >= {args.zero_threshold}): {len(eligible)}") + print(f"Skipped columns: {len(skipped)}") + print() + print( + f"{'classifier':>15} {'log_loss':>9} {'log_loss_med':>12} " + f"{'brier':>7} {'ece':>7} {'auc':>6} {'auc_med':>7} {'total_fit_s':>11}" + ) + ordered = sorted(aggregate.items(), key=lambda kv: kv[1]["log_loss_mean"]) + for clf, agg in ordered: + print( + f"{clf:>15} {agg['log_loss_mean']:9.4f} {agg['log_loss_median']:12.4f} " + f"{agg['brier_mean']:7.4f} {agg['ece_mean']:7.4f} " + f"{agg['auc_mean']:6.3f} {agg['auc_median']:7.3f} " + f"{agg['fit_s_total']:11.1f}" + ) + + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/microplex_us/__init__.py b/src/microplex_us/__init__.py index 76380ac..52ccee6 100644 --- a/src/microplex_us/__init__.py +++ b/src/microplex_us/__init__.py @@ -167,6 +167,7 @@ "infer_policyengine_us_variable_bindings", "load_policyengine_us_entity_tables", "materialize_policyengine_us_variables", + "policyengine_us_formula_variables_for_targets", "policyengine_us_variables_to_materialize", "project_frame_to_time_period_arrays", "write_policyengine_us_time_period_dataset", @@ -356,6 +357,7 @@ def __getattr__(name: str) -> Any: "infer_policyengine_us_variable_bindings", "load_policyengine_us_entity_tables", "materialize_policyengine_us_variables", + "policyengine_us_formula_variables_for_targets", "policyengine_us_variables_to_materialize", "project_frame_to_time_period_arrays", "write_policyengine_us_time_period_dataset", diff --git a/src/microplex_us/bakeoff/__init__.py b/src/microplex_us/bakeoff/__init__.py new file mode 100644 index 0000000..c1b1db9 --- /dev/null +++ b/src/microplex_us/bakeoff/__init__.py @@ -0,0 +1,43 @@ +"""Scale-up benchmark harness for synthesizer comparison. + +Implements the stage-1/2/3 scale-up protocol from +`docs/synthesizer-benchmark-scale-up.md`: load real enhanced_cps_2024, +sub-sample to the stage's row count, fit each specified synthesizer on the +conditioning + target column set, and report PRDC coverage, training wall +time, peak RSS, and rare-cell preservation. + +Use from the CLI: + + uv run python -m microplex_us.bakeoff.scale_up \\ + --stage stage1 \\ + --methods ZI-QRF ZI-MAF ZI-QDNN \\ + --output artifacts/scale_up_stage1.json + +or programmatically: + + from microplex_us.bakeoff import ScaleUpRunner, stage1_config + runner = ScaleUpRunner(stage1_config()) + results = runner.run() +""" + +from microplex_us.bakeoff.scale_up import ( + ScaleUpResult, + ScaleUpRunner, + ScaleUpStageConfig, + DEFAULT_CONDITION_COLS, + DEFAULT_TARGET_COLS, + stage1_config, + stage2_config, + stage3_config, +) + +__all__ = [ + "ScaleUpResult", + "ScaleUpRunner", + "ScaleUpStageConfig", + "DEFAULT_CONDITION_COLS", + "DEFAULT_TARGET_COLS", + "stage1_config", + "stage2_config", + "stage3_config", +] diff --git a/src/microplex_us/bakeoff/__main__.py b/src/microplex_us/bakeoff/__main__.py new file mode 100644 index 0000000..de59867 --- /dev/null +++ b/src/microplex_us/bakeoff/__main__.py @@ -0,0 +1,6 @@ +"""Entry point for `python -m microplex_us.bakeoff`.""" + +from microplex_us.bakeoff.scale_up import main + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/microplex_us/bakeoff/local_methods.py b/src/microplex_us/bakeoff/local_methods.py new file mode 100644 index 0000000..0b488ac --- /dev/null +++ b/src/microplex_us/bakeoff/local_methods.py @@ -0,0 +1,293 @@ +"""Local synthesizer methods for the bakeoff harness. + +These extend the `microplex.eval.benchmark` set without modifying the +upstream library. Methods defined here follow the same `_MultiSourceBase` +protocol so they slot into `ScaleUpRunner.fit_and_generate` unchanged. + +Current contents: + +- `CARTMethod`: synthpop-style CART per-column imputation. Each target + column gets a decision tree fit on the shared conditioning variables; + at generation time, the tree routes each synthetic record to a leaf, + and the predicted value is drawn uniformly from the training-set + values that landed in that leaf. This matches the default draw in + `synthpop`'s `syn.cart` (Nowok, Raab, and Dibben, 2016). + +- `ZICARTMethod`: zero-inflated variant that uses a random-forest + classifier for P(y > 0 | x) on columns where the training-set zero + fraction exceeds 10 %, then applies `CARTMethod` on the non-zero + subset. Mirrors `ZIQRFMethod`'s structure. +""" + +from __future__ import annotations + +from typing import Any + +import numpy as np +from microplex.eval.benchmark import _MultiSourceBase +from sklearn.tree import DecisionTreeRegressor + + +class CARTMethod(_MultiSourceBase): + """Synthpop-style CART per-column synthesis. + + Each column gets a `DecisionTreeRegressor` fit on the shared + conditioning variables. At generation time, each record is routed + to a leaf via `tree.apply`, and the synthetic value is sampled + uniformly from the training-set outcomes that landed in that leaf. + This reproduces `synthpop`'s default CART draw. + """ + + name = "CART" + + def __init__( + self, + max_depth: int | None = None, + min_samples_leaf: int = 5, + random_state: int = 42, + **kwargs: Any, + ) -> None: + super().__init__(zero_inflated=False) + self.max_depth = max_depth + self.min_samples_leaf = min_samples_leaf + self.random_state = random_state + + def _fit_column(self, col: str, X: np.ndarray, y: np.ndarray) -> None: + tree = DecisionTreeRegressor( + max_depth=self.max_depth, + min_samples_leaf=self.min_samples_leaf, + random_state=self.random_state, + ) + tree.fit(X, y) + leaf_ids = tree.apply(X) + leaf_to_values: dict[int, np.ndarray] = {} + for lid, val in zip(leaf_ids.tolist(), y.tolist(), strict=False): + leaf_to_values.setdefault(lid, []).append(val) + for lid, vals in leaf_to_values.items(): + leaf_to_values[lid] = np.asarray(vals, dtype=float) + self._col_models[col] = { + "tree": tree, + "leaf_to_values": leaf_to_values, + "fallback_value": float(np.median(y)) if len(y) > 0 else 0.0, + } + + def _generate_column( + self, + col: str, + X: np.ndarray, + rng: np.random.RandomState, + ) -> np.ndarray: + model = self._col_models[col] + tree = model["tree"] + leaf_to_values = model["leaf_to_values"] + fallback = model["fallback_value"] + leaf_ids = tree.apply(X) + out = np.empty(len(X), dtype=float) + for i, lid in enumerate(leaf_ids.tolist()): + vals = leaf_to_values.get(lid) + if vals is None or len(vals) == 0: + out[i] = fallback + else: + out[i] = float(vals[rng.randint(len(vals))]) + return out + + +class ZICARTMethod(CARTMethod): + """Zero-Inflated CART: random-forest zero classifier + CART leaf draw.""" + + name = "ZI-CART" + + def __init__(self, **kwargs: Any) -> None: + super().__init__(**kwargs) + self.zero_inflated = True + + +# --- Alternative zero-inflation classifiers (QDNN family) ---------------- + +def _patch_zi_classifier(method_instance: Any, classifier_factory: Any) -> None: + """Monkey-patch a ZI method's fit so the zero-classifier is a custom one. + + The upstream `_MultiSourceBase.fit` hardcodes + `RandomForestClassifier(n_estimators=50, random_state=42, n_jobs=-1)`. + This helper re-wraps `fit` so the zero-classifier is built by + `classifier_factory()` instead. All other fit/generate behavior is + preserved. + """ + import numpy as np + import pandas as pd + + original_fit = method_instance.fit.__func__ + + def patched_fit(self, sources, shared_cols): + self.shared_cols_ = list(shared_cols) + all_cols = set(shared_cols) + for survey_name, df in sources.items(): + for col in df.columns: + if col not in all_cols: + all_cols.add(col) + self.col_to_survey_[col] = survey_name + self.all_cols_ = list(all_cols) + + shared_dfs = [] + for survey_name, df in sources.items(): + available = [c for c in shared_cols if c in df.columns] + if len(available) == len(shared_cols): + shared_dfs.append(df[shared_cols].copy()) + self.shared_data_ = ( + pd.concat(shared_dfs, ignore_index=True) + if shared_dfs + else list(sources.values())[0][shared_cols].copy() + ) + + for col in self.all_cols_: + if col in shared_cols: + continue + survey_name = self.col_to_survey_[col] + survey_df = sources[survey_name] + available_shared = [c for c in shared_cols if c in survey_df.columns] + X = survey_df[available_shared].values + y = survey_df[col].values + + min_val = float(np.nanmin(y)) + at_min = np.isclose(y, min_val, atol=1e-6) + zero_frac = at_min.sum() / len(y) + self._col_stats[col] = {"min": min_val, "zero_frac": zero_frac} + + if ( + self.zero_inflated + and zero_frac >= self.zero_threshold + and at_min.sum() >= 10 + ): + labels = (~at_min).astype(int) + unique_labels = np.unique(labels) + if len(unique_labels) < 2: + # Degenerate column — all zeros or all non-zeros in + # training. Fall back to a constant classifier to avoid + # sklearn's single-class error. + constant_prob = float(unique_labels[0]) + + class _Constant: + classes_ = np.array([0, 1]) + + def predict_proba(self, X): + n = len(X) + return np.column_stack( + [np.full(n, 1.0 - constant_prob), + np.full(n, constant_prob)] + ) + + self._zero_classifiers[col] = _Constant() + else: + clf = classifier_factory() + clf.fit(X, labels) + self._zero_classifiers[col] = clf + if (~at_min).sum() >= 10: + self._fit_column(col, X[~at_min], y[~at_min]) + else: + self._fit_column(col, X, y) + return self + + method_instance.fit = patched_fit.__get__(method_instance, type(method_instance)) + + +def _make_zi_variant(base_name: str, classifier_factory: Any): + """Create a method class that uses a custom zero-classifier.""" + from microplex.eval.benchmark import ZIQDNNMethod + + base_classes = {"ZI-QDNN": ZIQDNNMethod} + if base_name not in base_classes: + raise ValueError(f"Unsupported base method for ZI variant: {base_name}") + base_cls = base_classes[base_name] + + class _Variant(base_cls): # type: ignore[misc, valid-type] + def __init__(self, **kwargs: Any) -> None: + super().__init__(**kwargs) + _patch_zi_classifier(self, classifier_factory) + + return _Variant + + +def _rf_calibrated_factory(): + from sklearn.calibration import CalibratedClassifierCV + from sklearn.ensemble import RandomForestClassifier + + rf = RandomForestClassifier( + n_estimators=50, random_state=42, n_jobs=-1 + ) + return CalibratedClassifierCV(rf, method="isotonic", cv=3) + + +def _logistic_factory(): + from sklearn.linear_model import LogisticRegression + + return LogisticRegression(max_iter=500, n_jobs=-1) + + +def _hgb_factory(): + from sklearn.ensemble import HistGradientBoostingClassifier + + return HistGradientBoostingClassifier(random_state=42) + + +def _dnn_factory(): + """A small-MLP zero-classifier for parity with the ZI-QDNN draw network. + + Uses sklearn's MLPClassifier (hidden: 64, 32; ReLU; Adam; max_iter=100). + Probabilities are via softmax on the output head. Not pre-calibrated; + combine with isotonic wrapping if calibration matters. + """ + from sklearn.neural_network import MLPClassifier + from sklearn.pipeline import Pipeline + from sklearn.preprocessing import StandardScaler + + return Pipeline([ + ("scaler", StandardScaler()), + ( + "mlp", + MLPClassifier( + hidden_layer_sizes=(64, 32), + activation="relu", + solver="adam", + max_iter=100, + random_state=42, + early_stopping=True, + ), + ), + ]) + + +def zi_qdnn_variant_factory(variant: str): + """Return a ZIQDNNMethod subclass with a swapped zero-classifier.""" + if variant == "logistic": + return _make_zi_variant("ZI-QDNN", _logistic_factory) + if variant == "hgb": + return _make_zi_variant("ZI-QDNN", _hgb_factory) + if variant == "calibrated": + return _make_zi_variant("ZI-QDNN", _rf_calibrated_factory) + if variant == "dnn": + return _make_zi_variant("ZI-QDNN", _dnn_factory) + raise ValueError(f"Unknown ZI variant: {variant}") + + +# Concrete ZI-QDNN variant with a histogram gradient boosting zero-classifier. +# This is the `microplex-us` default for ZI-QDNN: on the 77k x 50 Enhanced CPS +# isolated per-column log-loss evaluation (26 ZI-eligible columns, seed 42), +# HistGB Pareto-dominates the upstream RF default on log-loss (0.225 vs 0.310), +# Brier (0.071 vs 0.081), ECE (0.005 vs 0.039), and ROC-AUC (0.809 vs 0.737). +# See `docs/zi-factorial.md` for the full comparison. +# +# PRDC coverage on the same config is insensitive to the swap (0.7017 vs +# 0.7081); the downstream QDNN draw swamps the classifier-level gap. The +# default is chosen on intrinsic classifier quality, not on measured +# synthesis gains. The upstream RF-backed ZIQDNNMethod is still registered +# under "ZI-QDNN-RF" in `scale_up.py` for regression testing. +ZIQDNNHistGBMethod = _make_zi_variant("ZI-QDNN", _hgb_factory) +ZIQDNNHistGBMethod.name = "ZI-QDNN" + + +__all__ = [ + "CARTMethod", + "ZICARTMethod", + "ZIQDNNHistGBMethod", + "zi_qdnn_variant_factory", +] diff --git a/src/microplex_us/bakeoff/scale_up.py b/src/microplex_us/bakeoff/scale_up.py new file mode 100644 index 0000000..e353911 --- /dev/null +++ b/src/microplex_us/bakeoff/scale_up.py @@ -0,0 +1,857 @@ +"""Synthesizer scale-up benchmark harness. + +Stages per `docs/synthesizer-benchmark-scale-up.md`: + +- stage1: 100,000 rows x 50 columns of real enhanced_cps_2024 data +- stage2: 1,000,000 rows x 50 columns (via row replication or a larger source) +- stage3: 3,373,378 rows x 155 columns (v6 seed-ready shape — requires + regenerating the seed from donor integration; out of scope for this harness) + +The harness is deliberately narrow: + +- Single data source (enhanced_cps_2024). +- Fixed pool of synthesizer methods via `microplex.eval.benchmark.*Method`. +- PRDC coverage + wall time + peak RSS + rare-cell preservation. +- One result row per (method, stage, seed). + +Wider comparisons (CTGAN, TVAE, external tabular models) are left to +follow-up harnesses. Multi-source fusion is NOT exercised here — the v6 +pipeline's multi-source donor integration happens upstream of this eval. +""" + +from __future__ import annotations + +import argparse +import json +import logging +import resource +import time +from dataclasses import asdict, dataclass, field +from pathlib import Path +from typing import Any + +import h5py +import numpy as np +import pandas as pd + +try: + from prdc import compute_prdc # noqa: F401 (probed at run time) +except ImportError: # pragma: no cover - optional dep + compute_prdc = None + +LOGGER = logging.getLogger(__name__) + +DEFAULT_ENHANCED_CPS_PATH = ( + Path.home() + / "PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5" +) + + +# Curated default conditioning variables — demographics + household structure. +# Chosen to be numeric, low-cardinality, and genuinely shared across typical +# microsimulation use cases. Kept to 14 to leave room for 36 target variables +# under a 50-column stage-1 cap. +DEFAULT_CONDITION_COLS: tuple[str, ...] = ( + "age", + "is_female", + "is_hispanic", + "cps_race", + "is_disabled", + "is_blind", + "is_military", + "is_full_time_college_student", + "is_separated", + "state_fips", # broadcast from household + "has_esi", + "has_marketplace_health_coverage", + "own_children_in_household", + "pre_tax_contributions", +) + + +# Curated default target variables — income components, wealth, benefits. +# Chosen to span zero-inflated (most benefits, capital gains), continuous +# heavy-tailed (employment income, interest), and derived (net_worth). +DEFAULT_TARGET_COLS: tuple[str, ...] = ( + # Labor income (2) + "employment_income_last_year", + "self_employment_income_last_year", + # Interest + dividends (4) + "taxable_interest_income", + "tax_exempt_interest_income", + "qualified_dividend_income", + "non_qualified_dividend_income", + # Capital gains (2) + "long_term_capital_gains", + "short_term_capital_gains", + # Retirement income (4) + "taxable_pension_income", + "tax_exempt_pension_income", + "taxable_ira_distributions", + "social_security", + # Social Security split (3) + "social_security_retirement", + "social_security_disability", + "social_security_survivors", + # Other income (5) + "rental_income", + "farm_income", + "unemployment_compensation", + "alimony_income", + "miscellaneous_income", + # Wealth (5) + "bank_account_assets", + "bond_assets", + "stock_assets", + "net_worth", + "auto_loan_balance", + # Benefits / transfers (11) + "snap_reported", + "housing_assistance", + "ssi_reported", + "tanf_reported", + "disability_benefits", + "workers_compensation", + "veterans_benefits", + "child_support_received", + "child_support_expense", + "real_estate_taxes", + "health_savings_account_ald", +) + + +@dataclass(frozen=True) +class ScaleUpStageConfig: + """One stage of the synthesizer scale-up protocol.""" + + stage: str + n_rows: int | None # None means "use all available" + methods: tuple[str, ...] + condition_cols: tuple[str, ...] = DEFAULT_CONDITION_COLS + target_cols: tuple[str, ...] = DEFAULT_TARGET_COLS + holdout_frac: float = 0.2 + seed: int = 42 + k: int = 5 # PRDC nearest-neighbor k + n_generate: int | None = None # None => match training-set size + prdc_max_samples: int = 20_000 + method_kwargs: dict[str, dict[str, Any]] = field(default_factory=dict) + """Per-method hyperparameter overrides. + + Keys are the method registry names (`"ZI-QRF"`, `"ZI-MAF"`, + `"ZI-QDNN"`, ...); values are dicts of kwargs forwarded to the + method's constructor. Empty dict means "use method class defaults". + + Example: + method_kwargs={"ZI-MAF": {"n_layers": 8, "hidden_dim": 128, "epochs": 200}} + """ + """Cap on real and synth sample sizes fed to PRDC. + + The `prdc` library materializes full pairwise distance matrices + (O(n_real * n_synth * n_features)). With n_real = 15k and n_synth = + 61k and 50 features, that's ~7 GB per matrix — enough to OOM-kill + the process on a 48 GB workstation once multiple copies exist. The + metric is stable well below this scale: PRDC coverage on 15k real + vs 15k synthetic is essentially the same as 15k real vs 61k + synthetic. Cap keeps the evaluation tractable and consistent across + stages. + """ + data_path: Path = field(default=DEFAULT_ENHANCED_CPS_PATH) + year: str = "2024" + rare_cell_checks: tuple[dict[str, Any], ...] = field( + default_factory=lambda: ( + { + "name": "elderly_self_employed", + "mask": lambda df: (df["age"] >= 62) + & (df["self_employment_income_last_year"] > 0), + }, + { + "name": "young_dividend", + "mask": lambda df: (df["age"] < 30) + & (df["qualified_dividend_income"] > 0), + }, + { + "name": "disabled_ssdi", + "mask": lambda df: (df["is_disabled"] == 1) + & (df["social_security_disability"] > 0), + }, + { + "name": "top_1pct_employment", + "mask": lambda df: df["employment_income_last_year"] + >= df["employment_income_last_year"].quantile(0.99), + }, + ) + ) + + @property + def all_cols(self) -> list[str]: + # preserve order: conditioning first, then targets + seen: set[str] = set() + out: list[str] = [] + for c in list(self.condition_cols) + list(self.target_cols): + if c not in seen: + seen.add(c) + out.append(c) + return out + + +@dataclass +class ScaleUpResult: + """One (method, stage) outcome.""" + + stage: str + method: str + seed: int + n_train_rows: int + n_holdout_rows: int + n_cols: int + fit_wall_seconds: float + generate_wall_seconds: float + peak_rss_gb_during_fit: float + precision: float + density: float + coverage: float + rare_cell_ratios: dict[str, float] + zero_rate_mae: float + zero_rate_per_column: dict[str, dict[str, float]] = field(default_factory=dict) + notes: str = "" + + def to_dict(self) -> dict[str, Any]: + return asdict(self) + + +def stage1_config(methods: tuple[str, ...] = ("ZI-QRF", "ZI-MAF", "ZI-QDNN")) -> ScaleUpStageConfig: + """Stage 1: ~100k rows x 50 cols on real enhanced_cps_2024. + + enhanced_cps_2024 has 77,006 rows — use all of them. The nominal + 100k-row target from the protocol doc isn't achievable with only this + source; use the full dataset and note the actual row count in the + result record. + """ + return ScaleUpStageConfig(stage="stage1", n_rows=None, methods=methods) + + +def stage2_config(methods: tuple[str, ...] = ("ZI-QRF", "ZI-MAF", "ZI-QDNN")) -> ScaleUpStageConfig: + """Stage 2: 1M rows x 50 cols. + + Requires a larger source than enhanced_cps_2024 (77k rows). Intended + future use once the v6 seed-like 3.4M-row frame is retrievable. + Running stage 2 against enhanced_cps_2024 replicates rows, which is + not the same thing — not recommended. + """ + return ScaleUpStageConfig(stage="stage2", n_rows=1_000_000, methods=methods) + + +def stage3_config(methods: tuple[str, ...] = ("ZI-QRF", "ZI-MAF", "ZI-QDNN")) -> ScaleUpStageConfig: + """Stage 3: full 3.4M-row x 155-col v6 seed-ready shape.""" + return ScaleUpStageConfig(stage="stage3", n_rows=3_373_378, methods=methods) + + +_ENTITY_LINK_COLUMNS: tuple[tuple[str, str, str], ...] = ( + # (entity_name, entity_id_column, person_link_column) + ("household", "household_id", "person_household_id"), + ("spm_unit", "spm_unit_id", "person_spm_unit_id"), + ("tax_unit", "tax_unit_id", "person_tax_unit_id"), + ("family", "family_id", "person_family_id"), + ("marital_unit", "marital_unit_id", "person_marital_unit_id"), +) + + +def _build_entity_lookups( + f: h5py.File, year: str +) -> tuple[int, dict[str, tuple[int, np.ndarray]]]: + """Return (person_n, {entity_name: (entity_n, person_to_entity_position)}). + + For each non-person entity, returns a length-`person_n` integer array that, + when used to index a length-`entity_n` variable, broadcasts the entity + value down to person level. + """ + if "person_id" not in f or year not in f["person_id"]: + raise KeyError( + f"person_id/{year} missing from enhanced_cps file. Can't determine " + "person count." + ) + person_n = int(f["person_id"][year].shape[0]) + + lookups: dict[str, tuple[int, np.ndarray]] = {} + for ent_name, eid_col, pid_col in _ENTITY_LINK_COLUMNS: + if eid_col not in f or year not in f[eid_col]: + continue + if pid_col not in f or year not in f[pid_col]: + continue + entity_ids = f[eid_col][year][:] + person_ent_ids = f[pid_col][year][:] + id_to_idx = {int(v): i for i, v in enumerate(entity_ids)} + try: + lookup = np.fromiter( + (id_to_idx[int(v)] for v in person_ent_ids), + dtype=np.int64, + count=len(person_ent_ids), + ) + except KeyError as exc: + raise ValueError( + f"entity {ent_name!r}: person's {pid_col} value {exc} not in " + f"{eid_col} — entity table inconsistent" + ) from exc + lookups[ent_name] = (int(len(entity_ids)), lookup) + return person_n, lookups + + +def _load_enhanced_cps( + data_path: Path, + year: str, + columns: list[str], +) -> pd.DataFrame: + """Load enhanced_cps columns, broadcasting non-person entities to person level. + + enhanced_cps_2024 stores variables at their native entity level (person, + household, tax_unit, spm_unit, family, marital_unit). To land a flat + person-level DataFrame, this helper uses the `person__id` → + `_id` linkage to project parent-entity values down. + """ + if not data_path.exists(): + raise FileNotFoundError( + f"enhanced_cps_{year} not found at {data_path}. " + "Set `data_path` explicitly in ScaleUpStageConfig." + ) + + with h5py.File(data_path, "r") as f: + available = set(f.keys()) + missing = [c for c in columns if c not in available] + if missing: + raise KeyError( + f"Columns not in enhanced_cps: {missing[:5]}{'...' if len(missing) > 5 else ''}" + ) + + person_n, entity_lookups = _build_entity_lookups(f, year) + + data: dict[str, np.ndarray] = {} + for col in columns: + grp = f[col] + if year not in grp: + raise KeyError(f"Column {col!r} has no {year!r} entry") + arr = grp[year][:] + if arr.shape[0] == person_n: + data[col] = arr + continue + # Broadcast via entity lookup + broadcast = None + for ent_name, (ent_n, lookup) in entity_lookups.items(): + if arr.shape[0] == ent_n: + broadcast = arr[lookup] + break + if broadcast is None: + available_sizes = {e: n for e, (n, _) in entity_lookups.items()} + available_sizes["person"] = person_n + raise ValueError( + f"Column {col!r} has {arr.shape[0]} rows but no matching " + f"entity linkage. Sizes available: {available_sizes}" + ) + data[col] = broadcast + + return pd.DataFrame(data) + + +def _peak_rss_gb() -> float: + """Current process's max resident set size in GB. + + Unit of `ru_maxrss` is platform-dependent: + - Linux: kilobytes + - macOS (Darwin): bytes + - FreeBSD: kilobytes (but verify) + + Cross-checked against psutil on macOS Python 3.14: ru_maxrss is in bytes + (e.g., 190_873_600 raw = 0.18 GB matches `psutil.Process().memory_info().rss`). + """ + import sys + + r = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss + if sys.platform == "darwin": + bytes_rss = r + else: + # Linux and most BSDs: kilobytes + bytes_rss = r * 1024 + return bytes_rss / (1024**3) + + +def _compute_rare_cell_ratios( + real: pd.DataFrame, + synthetic: pd.DataFrame, + checks: tuple[dict[str, Any], ...], +) -> dict[str, float]: + """Per-check: synthetic count / real count in the rare cell. + + Matches the pattern in `microplex/benchmarks/results/sparse_coverage.csv`. + 1.0 means the synthetic preserves the rare cell at its real frequency; + 0.0 means the cell is annihilated. + """ + ratios: dict[str, float] = {} + for check in checks: + name = check["name"] + mask_fn = check["mask"] + try: + real_mask = mask_fn(real).fillna(False) + except (KeyError, AttributeError) as exc: + ratios[name] = float("nan") + LOGGER.warning( + "rare-cell check %r skipped (%s: %s)", name, type(exc).__name__, exc + ) + continue + try: + synth_mask = mask_fn(synthetic).fillna(False) + except (KeyError, AttributeError): + ratios[name] = float("nan") + continue + real_count = max(int(real_mask.sum()), 1) + synth_count = int(synth_mask.sum()) + ratios[name] = float(synth_count) / float(real_count) + return ratios + + +def _compute_zero_rate_mae(real: pd.DataFrame, synthetic: pd.DataFrame) -> float: + """Mean absolute error in per-column zero-rate across the common column set.""" + cols = [c for c in real.columns if c in synthetic.columns] + errs = [] + for c in cols: + r_zero = float((real[c] == 0).mean()) + s_zero = float((synthetic[c] == 0).mean()) + errs.append(abs(r_zero - s_zero)) + return float(np.mean(errs)) if errs else 0.0 + + +def _compute_zero_rate_per_column( + real: pd.DataFrame, synthetic: pd.DataFrame +) -> dict[str, dict[str, float]]: + """Per-column {real_zero_rate, synth_zero_rate, abs_diff} breakdown.""" + cols = [c for c in real.columns if c in synthetic.columns] + out: dict[str, dict[str, float]] = {} + for c in cols: + r_zero = float((real[c] == 0).mean()) + s_zero = float((synthetic[c] == 0).mean()) + out[c] = { + "real": r_zero, + "synth": s_zero, + "abs_diff": abs(r_zero - s_zero), + } + return out + + +def _compute_prdc( + real: pd.DataFrame, + synthetic: pd.DataFrame, + k: int, + max_samples: int = 20_000, + seed: int = 42, +) -> tuple[float, float, float]: + """Return (precision, density, coverage) via the `prdc` library. + + `max_samples` caps both `real` and `synthetic` sample sizes before + PRDC to keep the O(n_real * n_synth * n_features) distance matrices + within a 48 GB-workstation budget. + """ + if compute_prdc is None: + raise ImportError( + "PRDC requires the `prdc` package. " + "Install with: uv pip install prdc" + ) + + from sklearn.preprocessing import StandardScaler + + cols = [c for c in real.columns if c in synthetic.columns] + if not cols: + raise ValueError("No shared columns between real and synthetic for PRDC") + + rng = np.random.default_rng(seed) + if len(real) > max_samples: + real = real.iloc[rng.choice(len(real), size=max_samples, replace=False)] + if len(synthetic) > max_samples: + synthetic = synthetic.iloc[ + rng.choice(len(synthetic), size=max_samples, replace=False) + ] + + r = real[cols].to_numpy(dtype=np.float64) + s = synthetic[cols].to_numpy(dtype=np.float64) + + if len(r) < k + 1 or len(s) < k + 1: + return (0.0, 0.0, 0.0) + + scaler = StandardScaler() + r_scaled = scaler.fit_transform(r) + s_scaled = scaler.transform(s) + + metrics = compute_prdc(r_scaled, s_scaled, nearest_k=k) + return ( + float(metrics["precision"]), + float(metrics["density"]), + float(metrics["coverage"]), + ) + + +def _snap_categorical_shared_cols( + synthetic: pd.DataFrame, + train: pd.DataFrame, + shared_cols: list[str], +) -> pd.DataFrame: + """Snap categorical-looking shared-column synthetic values to training-pool values. + + `microplex.eval.benchmark._MultiSourceBase.generate` adds Gaussian noise + (sigma=0.1) to EVERY shared-column value before regenerating the + non-shared columns. This pollutes binary and categorical conditioning + variables (e.g., `is_military=1` becomes `1.04`; `cps_race=3` becomes + `2.97`, `state_fips=6` becomes `6.11`). + + Heuristic: a shared column is "categorical-looking" if every value in + the training pool is exactly integer-valued (up to float precision). + Those columns have every synthetic value snapped to its nearest + training-pool value. Continuous shared columns (non-integer training + values) keep the noise — it may legitimately add variation for them. + + Examples of columns this catches: all is_* flags, cps_race, state_fips, + own_children_in_household. + + Examples of columns left alone: age (if fractional), pre_tax_contributions. + """ + out = synthetic.copy() + for col in shared_cols: + if col not in out.columns or col not in train.columns: + continue + train_vals = train[col].to_numpy() + # Integer-valued iff every value equals its rounded version. + if not np.all(np.isclose(train_vals, np.round(train_vals), atol=1e-6)): + continue + uniques = np.sort(pd.unique(train_vals)) + synth_vals = out[col].to_numpy() + # For every synthetic value, find the nearest training-pool value. + idx = np.searchsorted(uniques, synth_vals) + idx = np.clip(idx, 0, len(uniques) - 1) + left = uniques[np.clip(idx - 1, 0, len(uniques) - 1)] + right = uniques[idx] + snapped = np.where( + np.abs(synth_vals - left) <= np.abs(synth_vals - right), + left, + right, + ) + out[col] = snapped.astype(train[col].dtype, copy=False) + return out + + +def _build_method(method_name: str, kwargs: dict[str, Any] | None = None) -> Any: + from microplex.eval.benchmark import ( + CTGANMethod, + MAFMethod, + QDNNMethod, + QRFMethod, + TVAEMethod, + ZIMAFMethod, + ZIQDNNMethod, + ZIQRFMethod, + ) + + from microplex_us.bakeoff.local_methods import ( + CARTMethod, + ZICARTMethod, + ZIQDNNHistGBMethod, + ) + + registry = { + "QRF": QRFMethod, + "ZI-QRF": ZIQRFMethod, + "QDNN": QDNNMethod, + # ZI-QDNN defaults to HistGB zero-classifier (microplex-us override). + # The upstream RF-backed variant is kept under "ZI-QDNN-RF" so prior + # benchmark artifacts (which were produced with RF) remain reproducible. + # See docs/zi-factorial.md for the rationale. + "ZI-QDNN": ZIQDNNHistGBMethod, + "ZI-QDNN-RF": ZIQDNNMethod, + "MAF": MAFMethod, + "ZI-MAF": ZIMAFMethod, + "CTGAN": CTGANMethod, + "TVAE": TVAEMethod, + "CART": CARTMethod, + "ZI-CART": ZICARTMethod, + } + if method_name not in registry: + raise ValueError( + f"Unknown method {method_name!r}. Known: {sorted(registry)}" + ) + return registry[method_name](**(kwargs or {})) + + +class ScaleUpRunner: + """Runs one stage of the scale-up protocol.""" + + def __init__(self, config: ScaleUpStageConfig) -> None: + self.config = config + self.logger = logging.getLogger(f"{__name__}.ScaleUpRunner") + + def load_frame(self) -> pd.DataFrame: + df = _load_enhanced_cps( + self.config.data_path, self.config.year, self.config.all_cols + ) + self.logger.info( + "loaded enhanced_cps: %d rows, %d cols", len(df), len(df.columns) + ) + # Cast to a single dtype so downstream DataFrame.values stays + # numeric-uniform (torch-based methods reject object arrays, which + # is what pandas produces when columns mix bool/int32/float32). + df = df.astype(np.float32) + if self.config.n_rows is not None and len(df) > self.config.n_rows: + rng = np.random.default_rng(self.config.seed) + idx = rng.choice(len(df), size=self.config.n_rows, replace=False) + df = df.iloc[idx].reset_index(drop=True) + self.logger.info("subsampled to %d rows", len(df)) + return df + + def split(self, df: pd.DataFrame) -> tuple[pd.DataFrame, pd.DataFrame]: + rng = np.random.default_rng(self.config.seed) + idx = rng.permutation(len(df)) + cut = int(len(df) * (1.0 - self.config.holdout_frac)) + train_idx, holdout_idx = idx[:cut], idx[cut:] + train = df.iloc[train_idx].reset_index(drop=True) + holdout = df.iloc[holdout_idx].reset_index(drop=True) + return train, holdout + + def fit_and_generate( + self, method_name: str, train: pd.DataFrame, n_generate: int + ) -> tuple[pd.DataFrame, dict[str, float]]: + """Fit method on `train` and generate `n_generate` synthetic records.""" + method = _build_method( + method_name, kwargs=self.config.method_kwargs.get(method_name) + ) + + # The benchmark methods take a multi-source dict; pass a single source. + sources = {"enhanced_cps_2024": train.copy()} + shared_cols = list(self.config.condition_cols) + + before_rss = _peak_rss_gb() + t_fit = time.perf_counter() + method.fit(sources=sources, shared_cols=shared_cols) + fit_wall = time.perf_counter() - t_fit + peak_fit_rss = max(_peak_rss_gb(), before_rss) + + t_gen = time.perf_counter() + synthetic = method.generate(n_generate, seed=self.config.seed) + gen_wall = time.perf_counter() - t_gen + + synthetic = _snap_categorical_shared_cols(synthetic, train, shared_cols) + + return synthetic, { + "fit_wall_seconds": fit_wall, + "generate_wall_seconds": gen_wall, + "peak_rss_gb_during_fit": peak_fit_rss, + } + + def run( + self, + incremental_path: Path | None = None, + ) -> list[ScaleUpResult]: + """Run every configured method on the loaded frame; return results. + + If `incremental_path` is given, each method's `ScaleUpResult` is + appended to that path as JSONL *as soon as it completes*. This + guarantees at least partial output if a later method crashes or + the host is interrupted. + """ + df = self.load_frame() + train, holdout = self.split(df) + n_generate = self.config.n_generate or len(train) + self.logger.info( + "split %d train / %d holdout; will generate %d synthetic", + len(train), + len(holdout), + n_generate, + ) + + if incremental_path is not None: + incremental_path.parent.mkdir(parents=True, exist_ok=True) + # Truncate any prior JSONL so this run's output is self-contained. + incremental_path.write_text("") + + results: list[ScaleUpResult] = [] + for method_name in self.config.methods: + self.logger.info("== fitting %s ==", method_name) + try: + synthetic, timing = self.fit_and_generate( + method_name, train, n_generate + ) + except Exception as exc: # pragma: no cover + self.logger.error("method %s failed: %s", method_name, exc) + result = ScaleUpResult( + stage=self.config.stage, + method=method_name, + seed=self.config.seed, + n_train_rows=len(train), + n_holdout_rows=len(holdout), + n_cols=len(df.columns), + fit_wall_seconds=0.0, + generate_wall_seconds=0.0, + peak_rss_gb_during_fit=0.0, + precision=0.0, + density=0.0, + coverage=0.0, + rare_cell_ratios={}, + zero_rate_mae=0.0, + notes=f"FAILED: {type(exc).__name__}: {exc}", + ) + results.append(result) + self._persist_incremental(incremental_path, result) + continue + + precision, density, coverage = _compute_prdc( + holdout, + synthetic, + k=self.config.k, + max_samples=self.config.prdc_max_samples, + seed=self.config.seed, + ) + rare = _compute_rare_cell_ratios( + holdout, synthetic, self.config.rare_cell_checks + ) + zero_mae = _compute_zero_rate_mae(holdout, synthetic) + zero_per_col = _compute_zero_rate_per_column(holdout, synthetic) + + result = ScaleUpResult( + stage=self.config.stage, + method=method_name, + seed=self.config.seed, + n_train_rows=len(train), + n_holdout_rows=len(holdout), + n_cols=len(df.columns), + fit_wall_seconds=timing["fit_wall_seconds"], + generate_wall_seconds=timing["generate_wall_seconds"], + peak_rss_gb_during_fit=timing["peak_rss_gb_during_fit"], + precision=precision, + density=density, + coverage=coverage, + rare_cell_ratios=rare, + zero_rate_mae=zero_mae, + zero_rate_per_column=zero_per_col, + notes="", + ) + results.append(result) + self._persist_incremental(incremental_path, result) + self.logger.info( + " %s: coverage=%.3f precision=%.3f density=%.3f fit=%.1fs gen=%.1fs peak_rss=%.2fGB", + method_name, + coverage, + precision, + density, + timing["fit_wall_seconds"], + timing["generate_wall_seconds"], + timing["peak_rss_gb_during_fit"], + ) + return results + + @staticmethod + def _persist_incremental( + path: Path | None, result: ScaleUpResult + ) -> None: + """Append one `ScaleUpResult` as a JSONL row (if path is set).""" + if path is None: + return + with path.open("a") as f: + f.write(json.dumps(result.to_dict(), default=str)) + f.write("\n") + + +def _results_to_dataframe(results: list[ScaleUpResult]) -> pd.DataFrame: + rows: list[dict[str, Any]] = [] + for r in results: + d = r.to_dict() + rare = d.pop("rare_cell_ratios") + for cell_name, ratio in rare.items(): + d[f"rare__{cell_name}"] = ratio + rows.append(d) + return pd.DataFrame(rows) + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser(description=__doc__ or "scale-up runner") + parser.add_argument( + "--stage", + choices=["stage1", "stage2", "stage3"], + default="stage1", + ) + parser.add_argument( + "--methods", + nargs="+", + default=["ZI-QRF", "ZI-MAF", "ZI-QDNN"], + ) + parser.add_argument("--seed", type=int, default=42) + parser.add_argument( + "--output", + type=Path, + default=Path("artifacts/scale_up_results.json"), + ) + parser.add_argument( + "--log-level", + default="INFO", + choices=["DEBUG", "INFO", "WARNING", "ERROR"], + ) + parser.add_argument( + "--incremental-jsonl", + type=Path, + default=None, + help=( + "Optional path to a JSONL file where each method's result is " + "appended as soon as it completes. Defaults to the final " + "--output path with '.partial.jsonl' appended." + ), + ) + args = parser.parse_args(argv) + + if args.incremental_jsonl is None: + args.incremental_jsonl = args.output.with_suffix( + args.output.suffix + ".partial.jsonl" + ) + + logging.basicConfig( + level=getattr(logging, args.log_level), + format="%(asctime)s %(levelname)s %(name)s: %(message)s", + ) + + stage_fn = {"stage1": stage1_config, "stage2": stage2_config, "stage3": stage3_config} + cfg = stage_fn[args.stage](methods=tuple(args.methods)) + cfg = ScaleUpStageConfig( + stage=cfg.stage, + n_rows=cfg.n_rows, + methods=tuple(args.methods), + condition_cols=cfg.condition_cols, + target_cols=cfg.target_cols, + holdout_frac=cfg.holdout_frac, + seed=args.seed, + k=cfg.k, + n_generate=cfg.n_generate, + data_path=cfg.data_path, + year=cfg.year, + rare_cell_checks=cfg.rare_cell_checks, + ) + + runner = ScaleUpRunner(cfg) + results = runner.run(incremental_path=args.incremental_jsonl) + + args.output.parent.mkdir(parents=True, exist_ok=True) + args.output.write_text( + json.dumps( + { + "stage": cfg.stage, + "methods": list(cfg.methods), + "seed": cfg.seed, + "n_conditioning_cols": len(cfg.condition_cols), + "n_target_cols": len(cfg.target_cols), + "results": [r.to_dict() for r in results], + }, + indent=2, + default=str, + ) + ) + LOGGER.info("wrote %d results to %s", len(results), args.output) + + df = _results_to_dataframe(results) + print() + print(df.to_string(index=False)) + + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/src/microplex_us/calibration/__init__.py b/src/microplex_us/calibration/__init__.py new file mode 100644 index 0000000..1a8e682 --- /dev/null +++ b/src/microplex_us/calibration/__init__.py @@ -0,0 +1,22 @@ +"""Calibration backends for microplex-us. + +The mainline production calibrator is `MicrocalibrateAdapter`, which +wraps `microcalibrate`'s gradient-descent chi-squared solver. It is now +country-agnostic and lives in upstream `microplex.calibration` so every +country package (microplex-us, microplex-uk, etc.) shares one +identity-preserving calibrator. This module re-exports the adapter so +existing `from microplex_us.calibration import MicrocalibrateAdapter` +imports keep working. + +See `docs/calibrator-decision.md` for the rationale. +""" + +from microplex.calibration import ( + MicrocalibrateAdapter, + MicrocalibrateAdapterConfig, +) + +__all__ = [ + "MicrocalibrateAdapter", + "MicrocalibrateAdapterConfig", +] diff --git a/src/microplex_us/pipelines/pe_l0.py b/src/microplex_us/pipelines/pe_l0.py index 551240e..cd770e1 100644 --- a/src/microplex_us/pipelines/pe_l0.py +++ b/src/microplex_us/pipelines/pe_l0.py @@ -12,7 +12,7 @@ import pandas as pd from microplex.calibration import ( LinearConstraint, - _build_linear_constraint_system, + _build_sparse_constraint_system, _validate_calibration_inputs, ) from scipy import sparse as sp @@ -123,7 +123,10 @@ def fit( self.linear_constraints_, ) - A, b, names, _ = _build_linear_constraint_system( + # Build the calibration matrix directly in CSR form to avoid the + # ~24 GB dense intermediate that OOM'd v7 at 1.5M records x + # ~4k constraints. See microplex.calibration._build_sparse_constraint_system. + X_sparse_built, b, names, _ = _build_sparse_constraint_system( data, marginal_targets, continuous_targets, @@ -131,7 +134,7 @@ def fit( ) self.target_names_ = names - if A.shape[0] == 0: + if X_sparse_built.shape[0] == 0: if weight_col in data.columns: self.weights_ = data[weight_col].to_numpy(dtype=float, copy=True) else: @@ -149,7 +152,7 @@ def fit( initial_weights = np.ones(len(data), dtype=float) initial_weights = np.maximum(initial_weights, 1e-12) - X_sparse = sp.csr_matrix(A) + X_sparse = X_sparse_built weights = self._fit_weights( X_sparse=X_sparse, targets=b.astype(np.float64), @@ -158,7 +161,7 @@ def fit( ) weights = np.maximum(np.asarray(weights, dtype=float), 0.0) - residual = A @ weights - b + residual = X_sparse @ weights - b rel_errors = np.abs(residual) / np.maximum(np.abs(b), 1e-10) self.weights_ = weights self.calibration_error_ = float(np.sqrt(np.mean(rel_errors**2))) diff --git a/src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py b/src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py index 6987960..0664caf 100644 --- a/src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py +++ b/src/microplex_us/pipelines/pe_us_data_rebuild_checkpoint.py @@ -1999,6 +1999,17 @@ def main(argv: list[str] | None = None) -> None: parser.add_argument("--n-synthetic", type=int, default=100_000) parser.add_argument("--random-seed", type=int, default=42) parser.add_argument("--donor-imputer-condition-selection") + parser.add_argument( + "--donor-imputer-backend", + choices=["maf", "qrf", "zi_qrf", "regime_aware"], + default=None, + help=( + "Donor imputer backend. `zi_qrf` activates the zero-inflated " + "QRF path that skips predict() on gate-predicted-zero rows, " + "which is a large wall-clock win on heavy-zero PUF tax " + "variables. See docs/next-run-plan.md." + ), + ) parser.add_argument("--cps-source-year", type=int, default=2023) parser.add_argument("--puf-target-year", type=int) parser.add_argument("--puf-cps-reference-year", type=int) @@ -2032,6 +2043,69 @@ def main(argv: list[str] | None = None) -> None: parser.add_argument("--defer-native-audit", action="store_true") parser.add_argument("--defer-imputation-ablation", action="store_true") parser.add_argument("--require-policyengine-native-score", action="store_true") + parser.add_argument( + "--calibration-backend", + choices=[ + "entropy", + "ipf", + "chi2", + "sparse", + "hardconcrete", + "pe_l0", + "microcalibrate", + "none", + ], + default=None, + help=( + "Weighting/calibration backend. Default is the config default " + "(entropy). Use `microcalibrate` for the identity-preserving " + "gradient-descent chi-squared backend that survived the v6 OOM." + ), + ) + parser.add_argument( + "--calibration-max-iter", + type=int, + default=None, + help=( + "Max iterations / epochs for the calibration solver. Passed " + "through to USMicroplexBuildConfig.calibration_max_iter." + ), + ) + parser.add_argument( + "--policyengine-materialize-batch-size", + type=int, + default=None, + help=( + "If set, splits PolicyEngine variable materialization into " + "household chunks of this size. At 1.5M-household scale a " + "single Microsimulation is 25-35 GB; batch_size=100_000 " + "drops peak to a few GB. Required for workstation runs; " + "unset (full-dataset) path targeted Modal GPU." + ), + ) + parser.add_argument( + "--pipeline-checkpoint-save-post-imputation-path", + type=str, + default=None, + help=( + "If set, save a post-imputation pipeline checkpoint to this " + "directory (right after donor imputation + PE tables build, " + "before microsim). A rerun can resume from this checkpoint " + "to skip the ~11 h synthesis stage." + ), + ) + parser.add_argument( + "--pipeline-checkpoint-save-post-microsim-path", + type=str, + default=None, + help=( + "If set, save a post-microsim pipeline checkpoint to this " + "directory (after target variables are materialized, before " + "the calibration fit loop). A rerun can resume from this " + "checkpoint to skip both synthesis and microsim, leaving " + "only the calibration fit." + ), + ) args = parser.parse_args(argv) config_overrides = { @@ -2042,6 +2116,24 @@ def main(argv: list[str] | None = None) -> None: config_overrides["donor_imputer_condition_selection"] = ( args.donor_imputer_condition_selection ) + if args.donor_imputer_backend is not None: + config_overrides["donor_imputer_backend"] = args.donor_imputer_backend + if args.calibration_backend is not None: + config_overrides["calibration_backend"] = args.calibration_backend + if args.calibration_max_iter is not None: + config_overrides["calibration_max_iter"] = int(args.calibration_max_iter) + if args.policyengine_materialize_batch_size is not None: + config_overrides["policyengine_materialize_batch_size"] = int( + args.policyengine_materialize_batch_size + ) + if args.pipeline_checkpoint_save_post_imputation_path is not None: + config_overrides["pipeline_checkpoint_save_post_imputation_path"] = ( + args.pipeline_checkpoint_save_post_imputation_path + ) + if args.pipeline_checkpoint_save_post_microsim_path is not None: + config_overrides["pipeline_checkpoint_save_post_microsim_path"] = ( + args.pipeline_checkpoint_save_post_microsim_path + ) result = run_policyengine_us_data_rebuild_checkpoint( output_root=args.output_root, diff --git a/src/microplex_us/pipelines/pe_us_recalibrate_from_checkpoint.py b/src/microplex_us/pipelines/pe_us_recalibrate_from_checkpoint.py new file mode 100644 index 0000000..bf24997 --- /dev/null +++ b/src/microplex_us/pipelines/pe_us_recalibrate_from_checkpoint.py @@ -0,0 +1,140 @@ +"""Recalibrate a saved US microplex checkpoint with a new calibration config. + +Load a ``post_imputation`` or ``post_microsim`` pipeline checkpoint +previously saved via +``pe_us_data_rebuild_checkpoint --pipeline-checkpoint-save-post-imputation-path`` +(or ``--pipeline-checkpoint-save-post-microsim-path``) and rerun the +calibration stage without repeating the ~11 hours of synthesis + donor +imputation. A ``post_microsim`` checkpoint additionally skips the +microsim materialization step because the materialized vars are +already on the bundle as columns. + +Intended for rapid iteration on calibration backends / target sets / +sparsity schedules: change one flag, run for ~30 min +(``post_imputation``) or ~1–2 min + calibration fit +(``post_microsim``) instead of half a day. +""" + +from __future__ import annotations + +import argparse +import json +import sys +from pathlib import Path +from typing import Sequence + +from microplex_us.pipelines.us import ( + USMicroplexBuildConfig, + recalibrate_policyengine_us_from_checkpoint, +) + + +def main(argv: Sequence[str] | None = None) -> int: + parser = argparse.ArgumentParser( + description=( + "Rerun US microplex calibration from a saved checkpoint. Works " + "with both post_imputation (skips ~11 h synthesis) and " + "post_microsim (additionally skips ~30 min microsim) stages." + ), + ) + parser.add_argument( + "--checkpoint-path", + type=Path, + required=True, + help=( + "Path to a directory written by the main pipeline with " + "--pipeline-checkpoint-save-post-imputation-path or " + "--pipeline-checkpoint-save-post-microsim-path." + ), + ) + parser.add_argument( + "--output-root", + type=Path, + required=True, + help="Output directory for the recalibrated bundle and summary.", + ) + parser.add_argument( + "--targets-db", + type=Path, + required=True, + help="Path to the PolicyEngine US targets SQLite database.", + ) + parser.add_argument( + "--target-period", + type=int, + default=None, + help="Calendar year for calibration targets (default: config default).", + ) + parser.add_argument( + "--calibration-backend", + type=str, + default="pe_l0", + help="Calibration backend (pe_l0, microcalibrate, hardconcrete, etc.).", + ) + parser.add_argument( + "--calibration-max-iter", + type=int, + default=None, + help="Max iterations / epochs for the calibration solver.", + ) + parser.add_argument( + "--policyengine-materialize-batch-size", + type=int, + default=100_000, + help=( + "Batch size for PE variable materialization (default 100_000; " + "keeps a single Microsimulation under a few GB at 1.5M-household scale)." + ), + ) + parser.add_argument( + "--pipeline-checkpoint-save-post-microsim-path", + type=Path, + default=None, + help=( + "If set, also save a post-microsim checkpoint during this " + "recalibration so the next iteration can skip microsim too." + ), + ) + args = parser.parse_args(argv) + + config_kwargs: dict[str, object] = { + "calibration_backend": args.calibration_backend, + "policyengine_targets_db": args.targets_db, + "policyengine_materialize_batch_size": int( + args.policyengine_materialize_batch_size + ), + } + if args.target_period is not None: + config_kwargs["policyengine_target_period"] = int(args.target_period) + if args.calibration_max_iter is not None: + config_kwargs["calibration_max_iter"] = int(args.calibration_max_iter) + if args.pipeline_checkpoint_save_post_microsim_path is not None: + config_kwargs["pipeline_checkpoint_save_post_microsim_path"] = ( + args.pipeline_checkpoint_save_post_microsim_path + ) + + config = USMicroplexBuildConfig(**config_kwargs) + result = recalibrate_policyengine_us_from_checkpoint(config, args.checkpoint_path) + + args.output_root.mkdir(parents=True, exist_ok=True) + result.calibrated_data.to_parquet(args.output_root / "calibrated_data.parquet") + result.policyengine_tables.households.to_parquet( + args.output_root / "households.parquet" + ) + if result.policyengine_tables.persons is not None: + result.policyengine_tables.persons.to_parquet( + args.output_root / "persons.parquet" + ) + (args.output_root / "calibration_summary.json").write_text( + json.dumps(result.calibration_summary, indent=2, default=str) + ) + print( + f"Recalibrated from {args.checkpoint_path} → {args.output_root} " + f"(stage={result.loaded_stage}, " + f"rows={len(result.calibrated_data)})" + ) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/src/microplex_us/pipelines/summarize_policyengine_oracle_target_drilldown.py b/src/microplex_us/pipelines/summarize_policyengine_oracle_target_drilldown.py index e038fab..d982fec 100644 --- a/src/microplex_us/pipelines/summarize_policyengine_oracle_target_drilldown.py +++ b/src/microplex_us/pipelines/summarize_policyengine_oracle_target_drilldown.py @@ -65,7 +65,7 @@ def summarize_us_policyengine_oracle_target_drilldown( _supported_targets, _constraints, _feasibility_filter_summary, - _materialized_variables, + calibration_materialized_variables, _materialization_failures, ) = pipeline._resolve_policyengine_calibration_targets( tables, @@ -100,6 +100,8 @@ def summarize_us_policyengine_oracle_target_drilldown( str(variable) for variable in manifest.get("calibration", {}).get("materialized_variables", ()) } + materialized_variables.update(str(variable) for variable in calibration_materialized_variables) + materialized_variables.update(str(variable) for variable in report.materialized_variables) ledger_by_name = { str(entry["target_name"]): dict(entry) for entry in target_ledger diff --git a/src/microplex_us/pipelines/us.py b/src/microplex_us/pipelines/us.py index 344240c..076357e 100644 --- a/src/microplex_us/pipelines/us.py +++ b/src/microplex_us/pipelines/us.py @@ -70,11 +70,17 @@ compile_supported_policyengine_us_household_linear_constraints, filter_supported_policyengine_us_targets, infer_policyengine_us_variable_bindings, + load_us_pipeline_checkpoint, materialize_policyengine_us_variables_safely, + policyengine_us_formula_variables_for_targets, policyengine_us_variables_to_materialize, resolve_policyengine_excluded_export_variables, + save_us_pipeline_checkpoint, write_policyengine_us_time_period_dataset, ) +from microplex_us.policyengine.us import ( + subset_policyengine_tables_by_households as _subset_policyengine_tables_by_households, +) from microplex_us.variables import ( PE_STYLE_PUF_IRS_DEMOGRAPHIC_PREDICTORS, DonorMatchStrategy, @@ -223,17 +229,28 @@ def fit( column in self.zero_inflated_vars and (y_values == 0).mean() >= self.zero_threshold and (y_values == 0).sum() >= 10 - and (y_values > 0).sum() >= 10 + and (y_values != 0).sum() >= 10 ): + # Gate trained as zero vs nonzero (both signs), not as + # zero-or-negative vs positive. The old `y > 0` label + # silently dropped every negative training row along + # with zeros, so the QRF below only ever saw positive + # rows and could never emit a negative prediction — the + # v7 bug that blanked the negative tail of capital + # gains, partnership income, farm income, etc. The + # `!= 0` label is the minimal fix; the full upgrade to + # `microimpute.ZeroInflatedImputer` (regime-aware + # tripartite routing with separate positive / negative + # QRFs) is tracked as a follow-up. zero_model = RandomForestClassifier( n_estimators=max(50, self.n_estimators // 2), random_state=42, n_jobs=-1, ) - zero_model.fit(x_values, (y_values > 0).astype(int)) + zero_model.fit(x_values, (y_values != 0).astype(int)) self._zero_models[column] = zero_model - x_values = x_values[y_values > 0] - y_values = y_values[y_values > 0] + x_values = x_values[y_values != 0] + y_values = y_values[y_values != 0] if len(y_values) < 25: continue model = RandomForestQuantileRegressor( @@ -286,6 +303,173 @@ def generate( return synthetic +class RegimeAwareDonorImputer: + """Donor imputer that wraps `microimpute.ZeroInflatedImputer` per column. + + Each target is fit with an independent `ZeroInflatedImputer`, which + auto-detects one of seven regimes (THREE_SIGN / ZI_POSITIVE / + ZI_NEGATIVE / SIGN_ONLY / POSITIVE_ONLY / NEGATIVE_ONLY / + DEGENERATE_ZERO) from the training distribution and composes a + gate classifier + one or two base imputers as appropriate. + + Key advantages over `ColumnwiseQRFDonorImputer`: + + 1. Negative values in training are preserved in predictions for + three-sign targets (capital gains, partnership/S-corp income, + farm income, rental income). The v7 `y > 0` bug is structurally + impossible under regime-aware routing. + 2. Predictions on three-sign targets never land in the interior + band between ``max(train_neg)`` and ``min(train_pos)`` — the + tripartite gate routes to sign-specific base imputers that each + see only one sign of training data. + + This class is a thin columnwise adapter: one `ZeroInflatedImputer` + is fit per target, using `microimpute.QRF` as the base. Fit and + generate work column-by-column so memory scales with the single + largest base imputer, not with the total target count. + """ + + def __init__( + self, + condition_vars: list[str], + target_vars: list[str], + n_estimators: int = 100, + nonnegative_vars: set[str] | None = None, + classifier_type: str = "hist_gb", + min_class_count: int = 10, + min_class_fraction: float = 0.01, + seed: int = 42, + ) -> None: + self.condition_vars = list(condition_vars) + self.target_vars = list(target_vars) + self.n_estimators = int(n_estimators) + self.nonnegative_vars = set(nonnegative_vars or ()) + self.classifier_type = str(classifier_type) + self.min_class_count = int(min_class_count) + self.min_class_fraction = float(min_class_fraction) + self.seed = int(seed) + self._fitted: dict[str, Any] = {} + self._regimes: dict[str, str] = {} + + def fit( + self, + data: pd.DataFrame, + *, + weight_col: str | None = "weight", + epochs: int | None = None, + batch_size: int | None = None, + learning_rate: float | None = None, + verbose: bool = False, + ) -> RegimeAwareDonorImputer: + del weight_col, epochs, batch_size, learning_rate, verbose + + if importlib.util.find_spec("microimpute") is None: + raise ImportError( + "microimpute>=2.1 is required for donor_imputer_backend=" + "'regime_aware'; install with `uv pip install microimpute`." + ) + if importlib.util.find_spec("quantile_forest") is None: + raise ImportError( + "quantile-forest is required for the RegimeAwareDonorImputer " + "base QRF." + ) + + from microimpute.models.qrf import QRF + from microimpute.models.zero_inflated import ZeroInflatedImputer + + self._fitted = {} + self._regimes = {} + for column in self.target_vars: + subset = data[self.condition_vars + [column]].dropna() + if len(subset) < 25: + continue + # base_imputer_kwargs={} because microimpute 2.x's + # ZeroInflatedImputer._fit_base_single already passes + # log_level="ERROR" to the base, and duplicating it here + # raises TypeError. Upstream fix tracked. + wrapper = ZeroInflatedImputer( + base_imputer_class=QRF, + base_imputer_kwargs={}, + min_class_count=self.min_class_count, + min_class_fraction=self.min_class_fraction, + classifier_type=self.classifier_type, + seed=self.seed, + ) + fitted = wrapper.fit( + subset, + predictors=list(self.condition_vars), + imputed_variables=[column], + ) + self._fitted[column] = fitted + self._regimes[column] = wrapper.get_regime(column) + return self + + def generate( + self, + conditions: pd.DataFrame, + seed: int | None = None, + ) -> pd.DataFrame: + synthetic = conditions.copy().reset_index(drop=True) + master_seed = self.seed if seed is None else int(seed) + master_rng = np.random.default_rng(master_seed) + for column in self.target_vars: + fitted = self._fitted.get(column) + if fitted is None: + synthetic[column] = np.nan + continue + column_seed = int( + master_rng.integers(0, np.iinfo(np.int32).max, dtype=np.int64) + ) + self._reset_prediction_rngs(fitted, seed=column_seed) + preds = fitted.predict(synthetic[self.condition_vars]) + values = preds[column].to_numpy(dtype=float) + if column in self.nonnegative_vars: + values = np.maximum(values, 0.0) + synthetic[column] = values + return synthetic + + def _reset_prediction_rngs( + self, + obj: Any, + *, + seed: int, + visited: set[int] | None = None, + ) -> None: + if visited is None: + visited = set() + if obj is None or isinstance(obj, (str, bytes, int, float, bool)): + return + object_id = id(obj) + if object_id in visited: + return + visited.add(object_id) + + if hasattr(obj, "_rng"): + obj._rng = np.random.default_rng(seed) + child_rng = np.random.default_rng(seed) + + if isinstance(obj, dict): + children = list(obj.values()) + elif isinstance(obj, (list, tuple, set)): + children = list(obj) + else: + children = [] + for attr_name in ("models", "_per_variable", "_non_numeric_bundle"): + child = getattr(obj, attr_name, None) + if child is not None: + children.append(child) + + for child in children: + child_seed = int( + child_rng.integers(0, np.iinfo(np.int32).max, dtype=np.int64) + ) + self._reset_prediction_rngs( + child, + seed=child_seed, + visited=visited, + ) + + AGE_LABELS = ["0-17", "18-34", "35-54", "55-64", "65+"] INCOME_BINS = [-np.inf, 25_000, 50_000, 100_000, np.inf] INCOME_LABELS = ["<25k", "25-50k", "50-100k", "100k+"] @@ -415,41 +599,6 @@ def _subset_policyengine_linear_constraints( return tuple(subset) -def _subset_policyengine_tables_by_households( - tables: PolicyEngineUSEntityTableBundle, - household_ids: pd.Index, -) -> PolicyEngineUSEntityTableBundle: - selected_ids = pd.Index(household_ids, name="household_id") - household_order = pd.Series(np.arange(len(selected_ids)), index=selected_ids) - - households = tables.households.loc[ - tables.households["household_id"].isin(selected_ids) - ].copy() - households = ( - households.assign( - _household_order=households["household_id"].map(household_order) - ) - .sort_values("_household_order") - .drop(columns="_household_order") - .reset_index(drop=True) - ) - - def _subset_related(table: pd.DataFrame | None) -> pd.DataFrame | None: - if table is None: - return None - subset = table.loc[table["household_id"].isin(selected_ids)].copy() - return subset.reset_index(drop=True) - - return PolicyEngineUSEntityTableBundle( - households=households, - persons=_subset_related(tables.persons), - tax_units=_subset_related(tables.tax_units), - spm_units=_subset_related(tables.spm_units), - families=_subset_related(tables.families), - marital_units=_subset_related(tables.marital_units), - ) - - def _policyengine_target_geo_priority(target: TargetSpec) -> int: geo_level = str(target.metadata.get("geo_level", "")).lower() return { @@ -463,34 +612,97 @@ def _constraint_active_household_count( constraint: Any, *, epsilon: float = 1e-12, + metadata_lookup: dict[str, dict[str, Any]] | None = None, ) -> int: + """Count households with nonzero coefficient. Uses ``metadata_lookup`` when provided.""" + if metadata_lookup is not None: + cached = metadata_lookup.get(getattr(constraint, "name", None)) + if cached is not None and "active_households" in cached: + return int(cached["active_households"]) coefficients = np.asarray(getattr(constraint, "coefficients", ()), dtype=float) if coefficients.size == 0: return 0 return int(np.count_nonzero(np.abs(coefficients) > epsilon)) +def _precompute_constraint_metadata( + constraints: tuple[Any, ...], + *, + epsilon: float = 1e-12, +) -> dict[str, dict[str, Any]]: + """Per-constraint {active_households, coefficient_mass} scalar metadata.""" + metadata: dict[str, dict[str, Any]] = {} + for constraint in constraints: + name = getattr(constraint, "name", None) + if name is None: + continue + coefficients = np.asarray( + getattr(constraint, "coefficients", ()), dtype=float + ) + if coefficients.size == 0: + metadata[name] = { + "active_households": 0, + "coefficient_mass": 0.0, + } + continue + metadata[name] = { + "active_households": int( + np.count_nonzero(np.abs(coefficients) > epsilon) + ), + "coefficient_mass": float(np.abs(coefficients).sum()), + } + return metadata + + +def _strip_constraint_coefficients( + constraints: tuple[Any, ...], +) -> tuple[LinearConstraint, ...]: + """Replace each constraint's coefficient array with a zero-length sentinel.""" + return tuple( + LinearConstraint( + name=c.name, coefficients=np.zeros(0, dtype=float), target=float(c.target) + ) + for c in constraints + ) + + def _build_policyengine_constraint_records( targets: list[TargetSpec], constraints: tuple[Any, ...], + *, + metadata_lookup: dict[str, dict[str, Any]] | None = None, ) -> list[dict[str, Any]]: records: list[dict[str, Any]] = [] for target, constraint in zip(targets, constraints, strict=True): aggregation_name = str( getattr(getattr(target, "aggregation", None), "name", target.aggregation) ).upper() + name = getattr(constraint, "name", None) + cached = ( + metadata_lookup.get(name) + if metadata_lookup is not None and name is not None + else None + ) + if cached is not None and "coefficient_mass" in cached: + coefficient_mass = float(cached["coefficient_mass"]) + else: + coefficient_mass = float( + np.abs( + np.asarray( + getattr(constraint, "coefficients", ()), dtype=float + ) + ).sum() + ) records.append( { "target": target, "constraint": constraint, - "active_households": _constraint_active_household_count(constraint), + "active_households": _constraint_active_household_count( + constraint, metadata_lookup=metadata_lookup + ), "geo_priority": _policyengine_target_geo_priority(target), "aggregation_priority": 0 if aggregation_name == "COUNT" else 1, - "coefficient_mass": float( - np.abs( - np.asarray(getattr(constraint, "coefficients", ()), dtype=float) - ).sum() - ), + "coefficient_mass": coefficient_mass, } ) return records @@ -688,6 +900,7 @@ def _build_policyengine_calibration_target_ledger( household_count: int, min_active_households: int, materialization_failures: dict[str, str], + compiled_constraint_metadata: dict[str, dict[str, Any]] | None = None, ) -> tuple[dict[str, Any], list[dict[str, Any]]]: min_required_households = max(1, int(min_active_households)) structurally_unsupported_names = { @@ -748,7 +961,11 @@ def _build_policyengine_calibration_target_ledger( ) classified_names.add(target.name) - for record in _build_policyengine_constraint_records(compiled_targets, compiled_constraints): + for record in _build_policyengine_constraint_records( + compiled_targets, + compiled_constraints, + metadata_lookup=compiled_constraint_metadata, + ): target = record["target"] classified_names.add(target.name) active_households = int(record["active_households"]) @@ -841,6 +1058,7 @@ def _select_policyengine_deferred_stage_constraints( max_constraints_per_household: float | None, top_family_count: int | None, top_geography_count: int | None, + compiled_constraint_metadata: dict[str, dict[str, Any]] | None = None, ) -> tuple[list[TargetSpec], tuple[LinearConstraint, ...], dict[str, Any]]: ledger_by_name = { str(entry["target_name"]): entry @@ -872,7 +1090,11 @@ def _select_policyengine_deferred_stage_constraints( focus_eligible_count = 0 min_required_households = max(1, int(min_active_households)) - for record in _build_policyengine_constraint_records(compiled_targets, compiled_constraints): + for record in _build_policyengine_constraint_records( + compiled_targets, + compiled_constraints, + metadata_lookup=compiled_constraint_metadata, + ): target = record["target"] if target.name in selected_target_names: continue @@ -1405,7 +1627,14 @@ class USMicroplexBuildConfig: n_synthetic: int = 100_000 synthesis_backend: Literal["bootstrap", "synthesizer", "seed"] = "synthesizer" calibration_backend: Literal[ - "entropy", "ipf", "chi2", "sparse", "hardconcrete", "pe_l0", "none" + "entropy", + "ipf", + "chi2", + "sparse", + "hardconcrete", + "pe_l0", + "microcalibrate", + "none", ] = "entropy" calibration_tol: float = 1e-6 calibration_max_iter: int = 100 @@ -1431,7 +1660,7 @@ class USMicroplexBuildConfig: donor_imputer_learning_rate: float = 1e-3 donor_imputer_n_layers: int = 2 donor_imputer_hidden_dim: int = 32 - donor_imputer_backend: Literal["maf", "qrf", "zi_qrf"] = "maf" + donor_imputer_backend: Literal["maf", "qrf", "zi_qrf", "regime_aware"] = "maf" donor_imputer_qrf_n_estimators: int = 100 donor_imputer_qrf_zero_threshold: float = 0.05 donor_imputer_condition_selection: Literal[ @@ -1506,6 +1735,36 @@ class USMicroplexBuildConfig: policyengine_oracle_relative_error_cap: float | None = 10.0 policyengine_target_reform_id: int = 0 policyengine_simulation_cls: Any | None = None + policyengine_materialize_batch_size: int | None = None + """Batch size for PolicyEngine variable materialization. + + At 1.5M-household scale a single Microsimulation is 25–35 GB. With + a batch size of e.g. 100_000, the pipeline splits the entity tables + into chunks and runs one Microsimulation per chunk, reducing peak + memory to a few GB. ``None`` (default) keeps the legacy single-pass + behavior. Safe for per-household scalar variables (all our + calibration targets); unsafe for population-quantile-dependent + variables (see docstring on + :func:`materialize_policyengine_us_variables`). + """ + pipeline_checkpoint_save_post_imputation_path: str | Path | None = None + """Write a post-imputation pipeline checkpoint to this directory. + + Saved right after donor imputation + ``build_policyengine_entity_tables`` + and before microsim materializes calibration target variables. The + ~11 h synthesis + imputation + PE-tables build can be skipped on a + rerun that loads from this checkpoint, leaving only microsim (~30 + min) + calibration fit (~30 min) to redo. + """ + pipeline_checkpoint_save_post_microsim_path: str | Path | None = None + """Write a post-microsim pipeline checkpoint to this directory. + + Saved after ``_resolve_policyengine_calibration_targets`` has + materialized every calibration target variable onto the bundle, and + before the L0/microcalibrate fit loop. A rerun that loads from this + checkpoint skips microsim too, leaving only the ~30 min calibration + fit — useful for tuning calibration targets or backends. + """ def __post_init__(self) -> None: if ( @@ -1830,6 +2089,18 @@ def build_from_frames( households=int(len(synthetic_tables.households)), persons=int(len(synthetic_tables.persons)), ) + if self.config.pipeline_checkpoint_save_post_imputation_path is not None: + save_us_pipeline_checkpoint( + synthetic_tables, + self.config.pipeline_checkpoint_save_post_imputation_path, + stage="post_imputation", + ) + _emit_us_pipeline_progress( + "US microplex build: post-imputation checkpoint saved", + path=str( + self.config.pipeline_checkpoint_save_post_imputation_path + ), + ) _emit_us_pipeline_progress( "US microplex build: policyengine calibration start", backend=self.config.calibration_backend, @@ -2432,12 +2703,19 @@ def calibrate( def _build_weight_calibrator( self, + stage_index: int = 1, ) -> ( Calibrator | SparseCalibrator | HardConcreteCalibrator | PolicyEngineL0Calibrator ): + # Stage 1 selects the sparse support via L0; stages 2+ only + # refine weights against additional targets. Re-applying the same + # L0 penalty on warm-started weights compounds sparsity and + # collapses the support set (v10 went 442k → 1.5k across stages). + sparsity_pass = stage_index <= 1 + l0_penalty = 1e-4 if sparsity_pass else 0.0 if self.config.calibration_backend in {"entropy", "ipf", "chi2"}: return Calibrator( method=self.config.calibration_backend, @@ -2452,7 +2730,7 @@ def _build_weight_calibrator( ) if self.config.calibration_backend == "hardconcrete": return HardConcreteCalibrator( - lambda_l0=1e-4, + lambda_l0=l0_penalty, epochs=max(self.config.calibration_max_iter, 500), lr=0.1, device=self.config.device, @@ -2460,11 +2738,25 @@ def _build_weight_calibrator( ) if self.config.calibration_backend == "pe_l0": return PolicyEngineL0Calibrator( - lambda_l0=1e-4, + lambda_l0=l0_penalty, epochs=max(self.config.calibration_max_iter, 100), device=self.config.device, tol=self.config.calibration_tol, ) + if self.config.calibration_backend == "microcalibrate": + from microplex_us.calibration import ( + MicrocalibrateAdapter, + MicrocalibrateAdapterConfig, + ) + + return MicrocalibrateAdapter( + MicrocalibrateAdapterConfig( + epochs=max(self.config.calibration_max_iter, 32), + learning_rate=1e-3, + device=self.config.device, + seed=self.config.random_seed, + ) + ) raise ValueError( f"Unsupported calibration backend: {self.config.calibration_backend}" ) @@ -2842,6 +3134,16 @@ def calibrate_policyengine_tables( provider=provider, target_period=target_period, ) + if self.config.pipeline_checkpoint_save_post_microsim_path is not None: + save_us_pipeline_checkpoint( + tables, + self.config.pipeline_checkpoint_save_post_microsim_path, + stage="post_microsim", + ) + _emit_us_pipeline_progress( + "US microplex build: post-microsim checkpoint saved", + path=str(self.config.pipeline_checkpoint_save_post_microsim_path), + ) preselection_supported_targets = list(supported_targets) target_planning_household_count = len(tables.households) if not supported_targets: @@ -2906,6 +3208,7 @@ def calibrate_policyengine_tables( def _apply_policyengine_constraint_stage( stage_tables: PolicyEngineUSEntityTableBundle, stage_constraints: tuple[LinearConstraint, ...], + stage_index: int = 1, ) -> tuple[PolicyEngineUSEntityTableBundle, pd.DataFrame, dict[str, Any]]: stage_input_household_weight_sum = float( stage_tables.households["household_weight"].sum() @@ -2915,7 +3218,7 @@ def _apply_policyengine_constraint_stage( calibrated_households = stage_tables.households.copy() pre_rescale_household_weight_sum = stage_input_household_weight_sum else: - stage_calibrator = self._build_weight_calibrator() + stage_calibrator = self._build_weight_calibrator(stage_index=stage_index) calibration_constraints = list(stage_constraints) if self.config.policyengine_calibration_target_total_weight is not None: n_hh = len(stage_tables.households) @@ -3030,6 +3333,16 @@ def _apply_policyengine_constraint_stage( } all_selected_targets = list(supported_targets) all_selected_constraints = list(constraints) + # Pre-compute the ledger-needed scalars once, while compiled_constraints' + # coefficient arrays are still live. Downstream calls (ledger + + # deferred-stage selection) read from this lookup instead of + # rescanning the ~4k × 1.5M float64 arrays three times. The + # repeated scans were allocating ~30 GB of transient + # ``np.abs(...)`` copies on top of the 48 GB baseline, a + # contributor to the v8 197 GB-compressed jetsam kill. + compiled_constraint_metadata = _precompute_constraint_metadata( + compiled_constraints + ) updated_tables, calibrated_persons, final_stage_summary = ( _apply_policyengine_constraint_stage( tables, @@ -3048,6 +3361,7 @@ def _apply_policyengine_constraint_stage( household_count=target_planning_household_count, min_active_households=self.config.policyengine_calibration_min_active_households, materialization_failures=materialization_failures, + compiled_constraint_metadata=compiled_constraint_metadata, ) oracle_loss, oracle_target_priority_lookup = ( _evaluate_policyengine_target_fit_context( @@ -3266,6 +3580,7 @@ def _append_stage_summary( top_geography_count=( self.config.policyengine_calibration_deferred_stage_top_geography_count ), + compiled_constraint_metadata=compiled_constraint_metadata, ) ) if not stage_targets: @@ -3285,6 +3600,7 @@ def _append_stage_summary( _apply_policyengine_constraint_stage( updated_tables, stage_constraints, + stage_index=stage_index, ) ) candidate_selected_stage_by_name = dict(selected_stage_by_name) @@ -3313,6 +3629,7 @@ def _append_stage_summary( self.config.policyengine_calibration_min_active_households ), materialization_failures=materialization_failures, + compiled_constraint_metadata=compiled_constraint_metadata, ) ) candidate_oracle_loss, candidate_target_priority_lookup = ( @@ -3515,9 +3832,15 @@ def _resolve_policyengine_calibration_targets( period=target_period, for_calibration=True, ).targets + force_materialize_variables = policyengine_us_formula_variables_for_targets( + canonical_targets, + simulation_cls=self.config.policyengine_simulation_cls, + direct_override_variables=self.config.policyengine_direct_override_variables, + ) missing_variables = policyengine_us_variables_to_materialize( canonical_targets, bindings, + force_materialize_variables=force_materialize_variables, ) materialization_failures: dict[str, str] = {} materialized_variables: set[str] = set() @@ -3528,8 +3851,20 @@ def _resolve_policyengine_calibration_targets( period=target_period, dataset_year=self.config.policyengine_dataset_year or target_period, simulation_cls=self.config.policyengine_simulation_cls, + direct_override_variables=self.config.policyengine_direct_override_variables, + batch_size=self.config.policyengine_materialize_batch_size, ) tables = materialization_result.tables + unmaterialized_forced_variables = ( + force_materialize_variables + & missing_variables + - set(materialization_result.bindings) + ) + bindings = { + variable: binding + for variable, binding in bindings.items() + if variable not in unmaterialized_forced_variables + } bindings = { **bindings, **materialization_result.bindings, @@ -3840,15 +4175,6 @@ def _build_donor_imputer( variable: variable_semantic_spec_for(variable).support_family for variable in target_vars } - zero_inflated_vars = ( - { - variable - for variable, support_family in support_families.items() - if support_family is VariableSupportFamily.ZERO_INFLATED_POSITIVE - } - if backend == "zi_qrf" - else set() - ) nonnegative_vars = { variable for variable, support_family in support_families.items() @@ -3858,6 +4184,23 @@ def _build_donor_imputer( VariableSupportFamily.BOUNDED_SHARE, } } + if backend == "regime_aware": + return RegimeAwareDonorImputer( + condition_vars=condition_vars, + target_vars=list(target_vars), + n_estimators=self.config.donor_imputer_qrf_n_estimators, + nonnegative_vars=nonnegative_vars, + seed=self.config.random_seed, + ) + zero_inflated_vars = ( + { + variable + for variable, support_family in support_families.items() + if support_family is VariableSupportFamily.ZERO_INFLATED_POSITIVE + } + if backend == "zi_qrf" + else set() + ) return ColumnwiseQRFDonorImputer( condition_vars=condition_vars, target_vars=list(target_vars), @@ -6777,3 +7120,60 @@ def build_us_microplex( """Convenience wrapper for the US microplex pipeline.""" pipeline = USMicroplexPipeline(config) return pipeline.build(persons, households) + + +@dataclass +class USMicroplexRecalibrateResult: + """Output of ``recalibrate_policyengine_us_from_checkpoint``. + + Narrower than ``USMicroplexBuildResult`` because synthesis state is + unavailable when resuming: no ``seed_data``, no ``synthesizer``, no + source frames. Only calibration output is populated. + """ + + config: USMicroplexBuildConfig + loaded_stage: str + checkpoint_path: Path + policyengine_tables: PolicyEngineUSEntityTableBundle + calibrated_data: pd.DataFrame + calibration_summary: dict[str, Any] + + +def recalibrate_policyengine_us_from_checkpoint( + config: USMicroplexBuildConfig, + checkpoint_path: str | Path, +) -> USMicroplexRecalibrateResult: + """Load a saved pipeline checkpoint and rerun calibration against it. + + Use for fast iteration on calibration config (backend, lambda + schedule, targets) without paying the ~11 h synthesis + donor + imputation cost that produced the bundle. Both + ``post_imputation`` and ``post_microsim`` checkpoints are + supported: the latter skips microsim too because + ``infer_policyengine_us_variable_bindings`` picks up the + materialized target vars as columns on the bundle, so + ``policyengine_us_variables_to_materialize`` returns an empty set + and ``_resolve_policyengine_calibration_targets`` short-circuits + past the materialization call. + """ + checkpoint_path = Path(checkpoint_path) + bundle, metadata = load_us_pipeline_checkpoint(checkpoint_path) + stage = metadata.get("stage") + if stage not in {"post_imputation", "post_microsim"}: + raise ValueError( + f"Cannot resume from checkpoint stage {stage!r}; expected " + "'post_imputation' or 'post_microsim'." + ) + + pipeline = USMicroplexPipeline(config) + policyengine_tables, calibrated_data, calibration_summary = ( + pipeline.calibrate_policyengine_tables(bundle) + ) + return USMicroplexRecalibrateResult( + config=config, + loaded_stage=stage, + checkpoint_path=checkpoint_path, + policyengine_tables=policyengine_tables, + calibrated_data=calibrated_data, + calibration_summary=calibration_summary, + ) diff --git a/src/microplex_us/policyengine/__init__.py b/src/microplex_us/policyengine/__init__.py index fe01590..87d7c1f 100644 --- a/src/microplex_us/policyengine/__init__.py +++ b/src/microplex_us/policyengine/__init__.py @@ -39,6 +39,7 @@ infer_policyengine_us_variable_bindings, load_policyengine_us_entity_tables, materialize_policyengine_us_variables, + policyengine_us_formula_variables_for_targets, policyengine_us_variables_to_materialize, project_frame_to_time_period_arrays, write_policyengine_us_time_period_dataset, @@ -79,6 +80,7 @@ "infer_policyengine_us_variable_bindings", "load_policyengine_us_entity_tables", "materialize_policyengine_us_variables", + "policyengine_us_formula_variables_for_targets", "policyengine_us_variables_to_materialize", "project_frame_to_time_period_arrays", "write_policyengine_us_time_period_dataset", diff --git a/src/microplex_us/policyengine/comparison.py b/src/microplex_us/policyengine/comparison.py index 915a911..7ca66f6 100644 --- a/src/microplex_us/policyengine/comparison.py +++ b/src/microplex_us/policyengine/comparison.py @@ -34,6 +34,8 @@ infer_policyengine_us_variable_bindings, load_policyengine_us_entity_tables, materialize_policyengine_us_variables_safely, + policyengine_us_formula_variables_for_targets, + policyengine_us_variables_to_materialize, ) POLICYENGINE_US_BENCHMARK_GROUP_FIELDS = ( @@ -363,20 +365,35 @@ def evaluate_policyengine_us_target_set( target_list = _normalize_target_list(targets) working_tables = tables bindings = infer_policyengine_us_variable_bindings(working_tables) + force_materialize_variables = policyengine_us_formula_variables_for_targets( + target_list, + simulation_cls=simulation_cls, + direct_override_variables=direct_override_variables, + ) + variables_to_materialize = policyengine_us_variables_to_materialize( + target_list, + bindings, + force_materialize_variables=force_materialize_variables, + ) materialization_result = materialize_policyengine_us_variables_safely( working_tables, - variables=tuple( - feature - for target in target_list - for feature in target.required_features - if feature not in bindings - ), + variables=tuple(sorted(variables_to_materialize)), period=period, dataset_year=dataset_year, simulation_cls=simulation_cls, direct_override_variables=direct_override_variables, ) working_tables = materialization_result.tables + unmaterialized_forced_variables = ( + force_materialize_variables + & variables_to_materialize + - set(materialization_result.bindings) + ) + bindings = { + variable: binding + for variable, binding in bindings.items() + if variable not in unmaterialized_forced_variables + } bindings = { **bindings, **materialization_result.bindings, diff --git a/src/microplex_us/policyengine/us.py b/src/microplex_us/policyengine/us.py index ee0c4e7..8e53258 100644 --- a/src/microplex_us/policyengine/us.py +++ b/src/microplex_us/policyengine/us.py @@ -139,6 +139,106 @@ def table_for(self, entity: EntityType) -> pd.DataFrame: raise KeyError(f"No table available for entity '{entity.value}'") +_PIPELINE_CHECKPOINT_TABLES: tuple[str, ...] = ( + "households", + "persons", + "tax_units", + "spm_units", + "families", + "marital_units", +) + +_ALLOWED_CHECKPOINT_STAGES: frozenset[str] = frozenset({"post_imputation", "post_microsim"}) + + +def save_us_pipeline_checkpoint( + bundle: PolicyEngineUSEntityTableBundle, + path: str | Path, + *, + stage: Literal["post_imputation", "post_microsim"], +) -> Path: + """Persist a pipeline-stage bundle to ``path`` as parquet + metadata. + + Writes one parquet file per non-None entity table plus a + ``metadata.json`` index tagged with the pipeline ``stage``. Two + stages are supported: + + * ``"post_imputation"`` — after donor imputation, before PE microsim + materializes target variables. Resuming from here reruns + microsim + calibration. + * ``"post_microsim"`` — after microsim materialization, before the + calibration fit loop. Resuming from here reruns only calibration. + """ + import json + import shutil + + if stage not in _ALLOWED_CHECKPOINT_STAGES: + raise ValueError( + f"stage must be one of {sorted(_ALLOWED_CHECKPOINT_STAGES)}; got {stage!r}" + ) + + checkpoint_dir = Path(path) + if checkpoint_dir.exists(): + shutil.rmtree(checkpoint_dir) + checkpoint_dir.mkdir(parents=True) + + metadata: dict[str, Any] = {"format_version": 1, "stage": stage} + for table_name in _PIPELINE_CHECKPOINT_TABLES: + frame = getattr(bundle, table_name) + if frame is None: + metadata[table_name] = None + continue + frame.to_parquet(checkpoint_dir / f"{table_name}.parquet", index=False) + metadata[table_name] = { + "rows": int(len(frame)), + "columns": list(frame.columns), + } + + (checkpoint_dir / "metadata.json").write_text(json.dumps(metadata, indent=2)) + return checkpoint_dir + + +def load_us_pipeline_checkpoint( + path: str | Path, + *, + expected_stage: Literal["post_imputation", "post_microsim"] | None = None, +) -> tuple[PolicyEngineUSEntityTableBundle, dict[str, Any]]: + """Load a pipeline-stage bundle previously saved by ``save_us_pipeline_checkpoint``. + + Returns ``(bundle, metadata)`` so callers can inspect the saved + stage. If ``expected_stage`` is provided, a mismatch raises a clear + error — protects against running recalibration from a post-microsim + checkpoint when a post-imputation checkpoint was expected or vice + versa. + """ + import json + + checkpoint_dir = Path(path) + metadata_path = checkpoint_dir / "metadata.json" + if not metadata_path.exists(): + raise FileNotFoundError( + f"US pipeline checkpoint not found at {checkpoint_dir}" + ) + metadata = json.loads(metadata_path.read_text()) + + saved_stage = metadata.get("stage") + if expected_stage is not None and saved_stage != expected_stage: + raise ValueError( + f"Checkpoint at {checkpoint_dir} has stage {saved_stage!r}, " + f"expected {expected_stage!r}" + ) + + tables: dict[str, pd.DataFrame | None] = {} + for table_name in _PIPELINE_CHECKPOINT_TABLES: + if metadata.get(table_name) is None: + tables[table_name] = None + continue + tables[table_name] = pd.read_parquet( + checkpoint_dir / f"{table_name}.parquet" + ) + return PolicyEngineUSEntityTableBundle(**tables), metadata + + @dataclass(frozen=True) class PolicyEngineUSVariableMaterializationResult: """Materialized PE variables plus any per-variable failures.""" @@ -186,7 +286,7 @@ class PolicyEngineUSVariableMaterializationResult: "other_medical_expenses", "over_the_counter_health_expenses", "self_employment_income_before_lsr", - "social_security_retirement", + "social_security_retirement_reported", "social_security_disability", "social_security_survivors", "social_security_dependents", @@ -227,6 +327,7 @@ class PolicyEngineUSVariableMaterializationResult: POLICYENGINE_US_EXPORT_COLUMN_ALIASES: dict[str, str] = { "race": "cps_race", + "social_security_retirement": "social_security_retirement_reported", } POLICYENGINE_US_EXPORT_DEFAULTS: dict[str, Any] = { @@ -1181,6 +1282,66 @@ def resolve_policyengine_excluded_export_variables( return excluded_variables +def subset_policyengine_tables_by_households( + tables: PolicyEngineUSEntityTableBundle, + household_ids: np.ndarray | pd.Index, +) -> PolicyEngineUSEntityTableBundle: + """Slice an entity bundle to a subset of household_ids, preserving order. + + The returned bundle's ``households`` frame is reordered to match the + order of ``household_ids``; related entity tables retain their own + internal order but are filtered to only rows whose ``household_id`` + is in the selection. + """ + selected = pd.Index(household_ids, name="household_id") + order = pd.Series(np.arange(len(selected)), index=selected) + + households = tables.households.loc[ + tables.households["household_id"].isin(selected) + ].copy() + households = ( + households.assign(_hh_order=households["household_id"].map(order)) + .sort_values("_hh_order") + .drop(columns="_hh_order") + .reset_index(drop=True) + ) + + def _slice(df: pd.DataFrame | None) -> pd.DataFrame | None: + if df is None: + return None + return df.loc[df["household_id"].isin(selected)].reset_index(drop=True) + + return PolicyEngineUSEntityTableBundle( + households=households, + persons=_slice(tables.persons), + tax_units=_slice(tables.tax_units), + spm_units=_slice(tables.spm_units), + families=_slice(tables.families), + marital_units=_slice(tables.marital_units), + ) + + +def _concat_bundles( + bundles: list[PolicyEngineUSEntityTableBundle], +) -> PolicyEngineUSEntityTableBundle: + """Concatenate a list of entity bundles into one, preserving order.""" + + def _join(field: str) -> pd.DataFrame | None: + frames = [getattr(b, field) for b in bundles if getattr(b, field) is not None] + if not frames: + return None + return pd.concat(frames, ignore_index=True) + + return PolicyEngineUSEntityTableBundle( + households=_join("households"), + persons=_join("persons"), + tax_units=_join("tax_units"), + spm_units=_join("spm_units"), + families=_join("families"), + marital_units=_join("marital_units"), + ) + + def materialize_policyengine_us_variables( tables: PolicyEngineUSEntityTableBundle, *, @@ -1191,8 +1352,49 @@ def materialize_policyengine_us_variables( microsimulation_kwargs: dict[str, Any] | None = None, temp_dir: str | Path | None = None, direct_override_variables: tuple[str, ...] = (), + batch_size: int | None = None, ) -> tuple[PolicyEngineUSEntityTableBundle, dict[str, PolicyEngineUSVariableBinding]]: - """Calculate PolicyEngine variables on a temporary export and attach them to tables.""" + """Calculate PolicyEngine variables on a temporary export and attach them to tables. + + Memory control: when ``batch_size`` is set, the function loops over + disjoint household chunks of that size, materializing variables on + each chunk (one temp h5 + one Microsimulation per chunk) and + concatenating results. Peak Microsimulation working set drops from + O(n_households) to O(batch_size) with no change in output — this is + additive for the per-household scalar variables we use as calibration + targets (employment income, EITC, CTC, federal income tax, etc.), and + the per-chunk Microsims are independent of each other. + + Variables with cross-household semantics (national quantile + thresholds, poverty rates that depend on the full income + distribution) would be incorrect under batching and are not supported + when ``batch_size`` is not ``None``. Use ``batch_size=None`` for + those. + """ + if batch_size is not None and batch_size > 0: + n_households = len(tables.households) + if n_households > batch_size: + chunk_bundles: list[PolicyEngineUSEntityTableBundle] = [] + chunk_bindings: dict[str, PolicyEngineUSVariableBinding] = {} + household_ids = tables.households["household_id"].to_numpy() + for start in range(0, n_households, batch_size): + end = min(start + batch_size, n_households) + chunk_ids = household_ids[start:end] + chunk_tables = subset_policyengine_tables_by_households(tables, chunk_ids) + chunk_result, chunk_binding = materialize_policyengine_us_variables( + chunk_tables, + variables=variables, + period=period, + dataset_year=dataset_year, + simulation_cls=simulation_cls, + microsimulation_kwargs=microsimulation_kwargs, + temp_dir=temp_dir, + direct_override_variables=direct_override_variables, + batch_size=None, + ) + chunk_bundles.append(chunk_result) + chunk_bindings.update(chunk_binding) + return _concat_bundles(chunk_bundles), chunk_bindings requested_variables = tuple(dict.fromkeys(str(variable) for variable in variables)) if not requested_variables: return tables, {} @@ -1259,8 +1461,16 @@ def materialize_policyengine_us_variables_safely( microsimulation_kwargs: dict[str, Any] | None = None, temp_dir: str | Path | None = None, direct_override_variables: tuple[str, ...] = (), + batch_size: int | None = None, ) -> PolicyEngineUSVariableMaterializationResult: - """Materialize PE variables, degrading to per-variable failures when needed.""" + """Materialize PE variables, degrading to per-variable failures when needed. + + ``batch_size`` forwards to :func:`materialize_policyengine_us_variables`. + With a non-``None`` positive value, the full-dataset Microsimulation + (25–35 GB peak at 1.5M households) is replaced with N per-chunk + Microsims (each ~2–3 GB). Results are concatenated; output is + identical for per-household scalar variables. + """ requested_variables = tuple(dict.fromkeys(str(variable) for variable in variables)) if not requested_variables: return PolicyEngineUSVariableMaterializationResult( @@ -1278,6 +1488,7 @@ def materialize_policyengine_us_variables_safely( microsimulation_kwargs=microsimulation_kwargs, temp_dir=temp_dir, direct_override_variables=direct_override_variables, + batch_size=batch_size, ) except Exception: return _materialize_policyengine_us_variables_one_by_one( @@ -1656,18 +1867,70 @@ def compile_supported_policyengine_us_household_linear_constraints( return supported_targets, unsupported_targets, tuple(constraints) +def _policyengine_us_target_required_variables(targets: list[TargetSpec]) -> set[str]: + return { + feature + for target in targets + for feature in target.required_features + } + + +def policyengine_us_formula_variables_for_targets( + targets: list[TargetSpec], + *, + simulation_cls: Any | None = None, + tax_benefit_system: Any | None = None, + direct_override_variables: tuple[str, ...] = (), +) -> set[str]: + """Return target features that should be recalculated by PolicyEngine.""" + required_variables = _policyengine_us_target_required_variables(targets) + if not required_variables: + return set() + if tax_benefit_system is None: + tax_benefit_system = _resolve_policyengine_us_tax_benefit_system( + simulation_cls + ) + variables = getattr(tax_benefit_system, "variables", {}) + direct_overrides = set(direct_override_variables) + formula_variables: set[str] = set() + for variable in required_variables: + if variable in direct_overrides: + continue + variable_metadata = variables.get(variable) + if variable_metadata is None: + continue + if _policyengine_us_variable_is_calculated(variable_metadata): + formula_variables.add(variable) + return formula_variables + + +def _policyengine_us_variable_is_calculated(variable_metadata: Any) -> bool: + if getattr(variable_metadata, "formulas", {}): + return True + if getattr(variable_metadata, "adds", ()) or getattr(variable_metadata, "subtracts", ()): + return True + is_input_variable = getattr(variable_metadata, "is_input_variable", None) + if callable(is_input_variable): + try: + return not bool(is_input_variable()) + except TypeError: + return False + return False + + def policyengine_us_variables_to_materialize( targets: list[TargetSpec], bindings: dict[str, PolicyEngineUSVariableBinding], + *, + force_materialize_variables: set[str] | tuple[str, ...] | None = None, ) -> set[str]: """Compute the missing features required to score the given targets.""" - requested_variables = { - feature - for target in targets - for feature in target.required_features - } + requested_variables = _policyengine_us_target_required_variables(targets) + force_variables = set(force_materialize_variables or ()) return { - variable for variable in requested_variables if variable not in bindings + variable + for variable in requested_variables + if variable not in bindings or variable in force_variables } diff --git a/src/microplex_us/validation/downstream.py b/src/microplex_us/validation/downstream.py new file mode 100644 index 0000000..19091e9 --- /dev/null +++ b/src/microplex_us/validation/downstream.py @@ -0,0 +1,227 @@ +"""Downstream tax-benefit aggregate validation (paper reviewer response B2). + +Input-target validation (see ``soi.py``, ``baseline.py``) asks whether +the calibrated synthetic frame's marginal sums match administrative +totals on the *variables the calibrator was told to target*. +Downstream validation asks the different, stricter question: when the +calibrated frame is ingested by ``policyengine_us.Microsimulation``, +do the *computed policy outputs* — federal income tax, EITC, CTC, +SNAP, SSI, ACA PTC — match administrative aggregates? + +This module contains: + +- ``DownstreamBenchmark`` record (name, computed, benchmark, unit, source). +- ``DOWNSTREAM_BENCHMARKS_2024`` canonical 2024 benchmark set. Each + record is sourced to an IRS / USDA / SSA / CMS / CBO publication. +- ``compute_downstream_aggregates(dataset_path, period)`` runs the + simulation and returns a dict of variable → weighted sum. +- ``compute_downstream_comparison(aggregates, benchmarks)`` joins + computed values to benchmarks and returns per-variable errors. + +Benchmark numbers are rounded publicly-reported totals; each has a +citation. Updates should be traceable to the cited source. +""" + +from __future__ import annotations + +from collections.abc import Iterable +from dataclasses import dataclass +from pathlib import Path + +import numpy as np + + +@dataclass(frozen=True) +class DownstreamBenchmark: + """One external-benchmark comparison. + + ``benchmark`` is the published external aggregate (e.g. IRS SOI + total EITC disbursed 2024). ``computed`` is the aggregate computed + on the calibrated synthetic frame by ``policyengine_us``. + """ + + name: str + computed: float + benchmark: float + unit: str + source: str + + @property + def abs_error(self) -> float: + return self.computed - self.benchmark + + @property + def rel_error(self) -> float | None: + if self.benchmark == 0: + return None + return (self.computed - self.benchmark) / self.benchmark + + def to_dict(self) -> dict[str, object]: + return { + "name": self.name, + "computed": self.computed, + "benchmark": self.benchmark, + "unit": self.unit, + "source": self.source, + "abs_error": self.abs_error, + "rel_error": self.rel_error, + } + + +@dataclass(frozen=True) +class DownstreamBenchmarkSpec: + """A benchmark definition without a computed value attached.""" + + name: str + benchmark: float + unit: str + source: str + + +DOWNSTREAM_BENCHMARKS_2024: tuple[DownstreamBenchmarkSpec, ...] = ( + DownstreamBenchmarkSpec( + name="income_tax", + benchmark=2_400_000_000_000.0, + unit="USD", + source=( + "IRS SOI 2022 total federal individual income tax liability " + "~$2.22T; CBO 2024 projection ~$2.4T" + ), + ), + DownstreamBenchmarkSpec( + name="eitc", + benchmark=64_000_000_000.0, + unit="USD", + source="IRS SOI 2023 EITC disbursed ~$64B (Table 2.5)", + ), + DownstreamBenchmarkSpec( + name="ctc", + benchmark=115_000_000_000.0, + unit="USD", + source=( + "IRS SOI 2023 CTC disbursed ~$115B (pre-OBBBA CTC of $2,000 " + "per qualifying child)" + ), + ), + DownstreamBenchmarkSpec( + name="snap", + benchmark=100_000_000_000.0, + unit="USD", + source="USDA FNS FY2024 SNAP benefits total ~$100B", + ), + DownstreamBenchmarkSpec( + name="ssi", + benchmark=66_000_000_000.0, + unit="USD", + source="SSA SSI Annual Statistical Report 2024 ~$66B total payments", + ), + DownstreamBenchmarkSpec( + name="aca_ptc", + benchmark=60_000_000_000.0, + unit="USD", + source=( + "CMS/IRS ACA Advance Premium Tax Credit & reconciled PTC " + "2024 ~$60B (IRA-enhanced subsidies in effect)" + ), + ), +) + +ENTITY_WEIGHT_VARIABLES: dict[str, str] = { + "household": "household_weight", + "person": "person_weight", + "tax_unit": "tax_unit_weight", + "spm_unit": "spm_unit_weight", + "family": "family_weight", + "marital_unit": "marital_unit_weight", +} + + +def compute_downstream_comparison( + aggregates: dict[str, float], + benchmarks: Iterable[DownstreamBenchmarkSpec], +) -> dict[str, DownstreamBenchmark]: + """Join computed aggregates to their external benchmarks. + + Variables in ``aggregates`` without a matching benchmark are + silently omitted — they're either not in the benchmark set or the + caller passed extra diagnostic values. + """ + benchmark_by_name = {spec.name: spec for spec in benchmarks} + result: dict[str, DownstreamBenchmark] = {} + for name, computed in aggregates.items(): + spec = benchmark_by_name.get(name) + if spec is None: + continue + result[name] = DownstreamBenchmark( + name=name, + computed=float(computed), + benchmark=spec.benchmark, + unit=spec.unit, + source=spec.source, + ) + return result + + +def _coerce_simulation_values(values: object) -> np.ndarray: + raw = getattr(values, "values", values) + return np.asarray(raw, dtype=float) + + +def compute_downstream_weighted_aggregate( + simulation: object, + variable: str, + period: int = 2024, +) -> float: + """Compute one entity-weighted downstream aggregate from a Microsimulation.""" + + tax_benefit_system = getattr(simulation, "tax_benefit_system", None) + if tax_benefit_system is None: + raise ValueError("Microsimulation is missing tax_benefit_system metadata") + entity = tax_benefit_system.get_variable(variable).entity + entity_key = getattr(entity, "key", None) + weight_variable = ENTITY_WEIGHT_VARIABLES.get(entity_key) + if weight_variable is None: + raise ValueError( + f"Unsupported entity {entity_key!r} for downstream aggregate {variable!r}" + ) + + values = _coerce_simulation_values(simulation.calculate(variable, period)) + weights = _coerce_simulation_values(simulation.calculate(weight_variable, period)) + if len(values) != len(weights): + raise ValueError( + f"Downstream aggregate {variable!r} length {len(values)} does not match " + f"{weight_variable!r} length {len(weights)}" + ) + return float(np.dot(values, weights)) + + +def compute_downstream_aggregates( + dataset_path: str | Path, + period: int = 2024, + variables: Iterable[str] = ( + "income_tax", + "eitc", + "ctc", + "snap", + "ssi", + "aca_ptc", + ), +) -> dict[str, float]: + """Load a PolicyEngine-US dataset and compute weighted sums for ``variables``. + + Returns a dict of variable → weighted aggregate (float). Requires + ``policyengine_us`` to be installed. + """ + # Import lazily so the rest of this module (benchmark records, + # comparison function) stays importable in environments without PE. + from policyengine_us import Microsimulation # noqa: PLC0415 + + simulation = Microsimulation(dataset=str(dataset_path)) + aggregates: dict[str, float] = {} + for variable in variables: + aggregates[variable] = compute_downstream_weighted_aggregate( + simulation, + variable, + period, + ) + return aggregates diff --git a/tests/bakeoff/__init__.py b/tests/bakeoff/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tests/bakeoff/test_scale_up.py b/tests/bakeoff/test_scale_up.py new file mode 100644 index 0000000..0a1372f --- /dev/null +++ b/tests/bakeoff/test_scale_up.py @@ -0,0 +1,218 @@ +"""Smoke tests for the synthesizer scale-up harness. + +These tests exercise the harness on a deliberately tiny slice of real +enhanced_cps_2024. They do NOT constitute the scale-up benchmark itself; +that lives behind the CLI and takes significantly longer. + +The goal here is: does the harness load data, fit a synthesizer, compute +metrics, and return a populated ScaleUpResult without crashing? +""" + +from __future__ import annotations + +import importlib.util +from pathlib import Path + +import numpy as np +import pandas as pd +import pytest + +from microplex_us.bakeoff import ( + DEFAULT_CONDITION_COLS, + DEFAULT_TARGET_COLS, + ScaleUpRunner, + ScaleUpStageConfig, + stage1_config, +) + +_ENHANCED_CPS_PATH = ( + Path.home() + / "PolicyEngine/policyengine-us-data/policyengine_us_data/storage/enhanced_cps_2024.h5" +) + +pytestmark = [ + pytest.mark.skipif( + not _ENHANCED_CPS_PATH.exists(), + reason="enhanced_cps_2024.h5 not available locally", + ), + pytest.mark.skipif( + importlib.util.find_spec("prdc") is None, + reason="prdc package not installed (uv pip install prdc)", + ), +] + + +@pytest.fixture(scope="module") +def small_config() -> ScaleUpStageConfig: + """Tiny config — a handful of columns, ~500 rows, one fast method.""" + base = stage1_config() + return ScaleUpStageConfig( + stage="smoke", + n_rows=500, + methods=("ZI-QRF",), + condition_cols=("age", "is_female"), + target_cols=( + "employment_income_last_year", + "self_employment_income_last_year", + "snap_reported", + ), + holdout_frac=0.2, + seed=0, + k=5, + n_generate=400, + data_path=base.data_path, + year=base.year, + rare_cell_checks=(), # skip rare-cell checks in smoke + ) + + +def test_load_frame_returns_expected_shape(small_config: ScaleUpStageConfig) -> None: + runner = ScaleUpRunner(small_config) + df = runner.load_frame() + # n_rows is the upper bound after subsampling; if fewer in source, we get fewer. + assert len(df) <= small_config.n_rows + 1 + assert len(df) > 100 # still a real sample + expected_cols = set(small_config.condition_cols) | set(small_config.target_cols) + assert expected_cols <= set(df.columns) + + +def test_split_train_holdout_shapes(small_config: ScaleUpStageConfig) -> None: + runner = ScaleUpRunner(small_config) + df = runner.load_frame() + train, holdout = runner.split(df) + assert len(train) + len(holdout) == len(df) + # 20 % holdout within ±1 + expected_holdout = int(len(df) * 0.2) + assert abs(len(holdout) - expected_holdout) <= 1 + + +def test_fit_and_generate_returns_dataframe( + small_config: ScaleUpStageConfig, +) -> None: + runner = ScaleUpRunner(small_config) + df = runner.load_frame() + train, _ = runner.split(df) + synthetic, timing = runner.fit_and_generate("ZI-QRF", train, n_generate=200) + + assert isinstance(synthetic, pd.DataFrame) + assert len(synthetic) == 200 + assert timing["fit_wall_seconds"] >= 0 + assert timing["generate_wall_seconds"] >= 0 + assert timing["peak_rss_gb_during_fit"] > 0 + + +def test_run_returns_populated_result(small_config: ScaleUpStageConfig) -> None: + runner = ScaleUpRunner(small_config) + results = runner.run() + assert len(results) == 1 + r = results[0] + assert r.method == "ZI-QRF" + assert r.stage == "smoke" + # PRDC values in [0, 1]. + for val in (r.precision, r.density, r.coverage): + assert 0.0 <= val <= 1.0 + 1e-9 + # Zero-rate MAE in [0, 1]. + assert 0.0 <= r.zero_rate_mae <= 1.0 + assert r.n_train_rows > 0 + assert r.n_holdout_rows > 0 + assert r.n_cols == 5 # 2 cond + 3 target + + +def test_missing_column_raises_cleanly() -> None: + cfg = ScaleUpStageConfig( + stage="smoke", + n_rows=100, + methods=("ZI-QRF",), + condition_cols=("age", "definitely_not_a_real_column"), + target_cols=("employment_income_last_year",), + data_path=_ENHANCED_CPS_PATH, + rare_cell_checks=(), + ) + runner = ScaleUpRunner(cfg) + with pytest.raises(KeyError, match="definitely_not_a_real_column"): + runner.load_frame() + + +def test_default_column_sets_are_sensible() -> None: + """Sanity check on the curated default column list.""" + total = set(DEFAULT_CONDITION_COLS) | set(DEFAULT_TARGET_COLS) + assert len(total) == len(DEFAULT_CONDITION_COLS) + len(DEFAULT_TARGET_COLS), ( + "Default conditioning and target columns overlap" + ) + assert len(DEFAULT_CONDITION_COLS) >= 5 + assert len(DEFAULT_TARGET_COLS) >= 20 + assert len(total) <= 60, "Stage-1 default exceeds ~50-column budget" + + +def test_incremental_jsonl_persists_each_method( + small_config: ScaleUpStageConfig, tmp_path: Path +) -> None: + """Each completed method gets written as JSONL before the next starts.""" + import json as _json + + runner = ScaleUpRunner(small_config) + incremental = tmp_path / "stage_incremental.jsonl" + results = runner.run(incremental_path=incremental) + + assert incremental.exists() + lines = [ln for ln in incremental.read_text().splitlines() if ln.strip()] + assert len(lines) == len(results) + # Round-trip: each line decodes to a ScaleUpResult-shaped dict. + for line in lines: + d = _json.loads(line) + assert {"method", "stage", "coverage", "fit_wall_seconds"} <= set(d) + + +def test_method_kwargs_forwarded_to_constructor( + small_config: ScaleUpStageConfig, +) -> None: + """Method-level hyperparameter overrides reach the method class.""" + # ZI-QRF accepts n_estimators as a constructor kwarg. Override to + # 3 trees so we can verify it propagates. + cfg = ScaleUpStageConfig( + stage=small_config.stage, + n_rows=small_config.n_rows, + methods=("ZI-QRF",), + condition_cols=small_config.condition_cols, + target_cols=small_config.target_cols, + holdout_frac=small_config.holdout_frac, + seed=small_config.seed, + k=small_config.k, + n_generate=small_config.n_generate, + data_path=small_config.data_path, + year=small_config.year, + rare_cell_checks=small_config.rare_cell_checks, + method_kwargs={"ZI-QRF": {"n_estimators": 3}}, + ) + runner = ScaleUpRunner(cfg) + df = runner.load_frame() + train, _ = runner.split(df) + synthetic, _ = runner.fit_and_generate("ZI-QRF", train, n_generate=50) + assert len(synthetic) == 50 + + +def test_zero_rate_per_column_populated(small_config: ScaleUpStageConfig) -> None: + """Per-column zero-rate breakdown is recorded for every target column.""" + runner = ScaleUpRunner(small_config) + results = runner.run() + assert len(results) == 1 + r = results[0] + assert r.zero_rate_per_column, "Expected non-empty zero_rate_per_column" + for col, entry in r.zero_rate_per_column.items(): + assert set(entry) == {"real", "synth", "abs_diff"} + assert 0.0 <= entry["real"] <= 1.0 + assert 0.0 <= entry["synth"] <= 1.0 + assert entry["abs_diff"] >= 0.0 + # abs_diff should be consistent with real/synth values. + assert abs(entry["abs_diff"] - abs(entry["real"] - entry["synth"])) < 1e-9 + # Confirm all target columns are covered. + covered = set(r.zero_rate_per_column) + assert set(small_config.target_cols) <= covered + # And that the scalar MAE is close to the mean of abs_diff over target cols. + target_diffs = [ + r.zero_rate_per_column[c]["abs_diff"] for c in small_config.target_cols + ] + # MAE is averaged over all shared columns (conditioning + target), so this + # is only a rough consistency check: the per-target mean should be + # within the scalar MAE's ballpark. + assert min(target_diffs) <= r.zero_rate_mae + 1e-9 diff --git a/tests/calibration/__init__.py b/tests/calibration/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tests/calibration/test_microcalibrate_adapter.py b/tests/calibration/test_microcalibrate_adapter.py new file mode 100644 index 0000000..fd6c338 --- /dev/null +++ b/tests/calibration/test_microcalibrate_adapter.py @@ -0,0 +1,233 @@ +"""Small-scale smoke tests for the microcalibrate-backed calibration adapter. + +These exercise the adapter's interface contract (matches the legacy +`Calibrator.fit_transform` shape) and verify that the underlying +gradient-descent chi-squared solver actually moves weights toward the +requested targets on a deliberately small problem. + +Scale-up validation happens separately (see +`docs/synthesizer-benchmark-scale-up.md`). These tests are only expected +to run in seconds. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest +from microplex.calibration import LinearConstraint + +from microplex_us.calibration import ( + MicrocalibrateAdapter, + MicrocalibrateAdapterConfig, +) + + +def _toy_data(n_records: int = 100, seed: int = 0) -> pd.DataFrame: + rng = np.random.default_rng(seed) + return pd.DataFrame( + { + "age": rng.integers(18, 70, size=n_records), + "income": rng.normal(40_000, 20_000, size=n_records).clip(0, None), + "weight": np.ones(n_records), + } + ) + + +def _age_band_constraint( + data: pd.DataFrame, name: str, low: int, high: int, target: float +) -> LinearConstraint: + mask = (data["age"] >= low) & (data["age"] < high) + return LinearConstraint( + name=name, + coefficients=mask.astype(float).to_numpy(), + target=target, + ) + + +def _income_age_band_constraint( + data: pd.DataFrame, name: str, low: int, high: int, target: float +) -> LinearConstraint: + mask = (data["age"] >= low) & (data["age"] < high) + coefs = (mask.astype(float) * data["income"]).to_numpy() + return LinearConstraint(name=name, coefficients=coefs, target=target) + + +class TestInterfaceContract: + """Adapter matches the legacy `Calibrator.fit_transform` signature.""" + + def test_empty_constraints_returns_copy_unchanged(self) -> None: + data = _toy_data() + adapter = MicrocalibrateAdapter() + result = adapter.fit_transform(data, marginal_targets={}) + pd.testing.assert_frame_equal(result, data) + # Should not share storage with the input. + assert result is not data + + def test_weight_column_validation(self) -> None: + data = _toy_data().drop(columns=["weight"]) + adapter = MicrocalibrateAdapter() + with pytest.raises(ValueError, match="weight column 'weight' not found"): + adapter.fit_transform( + data, + marginal_targets={}, + linear_constraints=( + _age_band_constraint(_toy_data(), "age_18_30", 18, 30, 20.0), + ), + ) + + def test_constraint_shape_validation(self) -> None: + data = _toy_data() + adapter = MicrocalibrateAdapter() + bad_constraint = LinearConstraint( + name="wrong_shape", + coefficients=np.ones(len(data) + 5), + target=10.0, + ) + with pytest.raises(ValueError, match="constraint 'wrong_shape'"): + adapter.fit_transform( + data, + marginal_targets={}, + linear_constraints=(bad_constraint,), + ) + + def test_preserves_all_records(self) -> None: + data = _toy_data() + adapter = MicrocalibrateAdapter( + MicrocalibrateAdapterConfig(epochs=8, noise_level=0.0) + ) + constraint = _age_band_constraint(data, "age_18_40", 18, 40, target=30.0) + result = adapter.fit_transform( + data, + marginal_targets={}, + linear_constraints=(constraint,), + ) + # Identity preservation: every record survives. + assert len(result) == len(data) + pd.testing.assert_index_equal(result.index, data.index) + # No negative weights. + assert (result["weight"] >= 0).all() + + +class TestCalibrationMovesWeights: + """Adapter actually does the job — weights shift toward the targets.""" + + def test_single_constraint_converges(self) -> None: + """One age-band count constraint should be matched within tolerance.""" + data = _toy_data(n_records=200, seed=1) + # Current weighted count in [25, 45) band. + mask = (data["age"] >= 25) & (data["age"] < 45) + current_count = float(mask.sum()) + # Ask for 2x the current weighted count. + target = 2.0 * current_count + + constraint = _age_band_constraint(data, "age_25_45", 25, 45, target=target) + adapter = MicrocalibrateAdapter( + MicrocalibrateAdapterConfig( + epochs=400, + learning_rate=0.05, + noise_level=0.0, + ) + ) + result = adapter.fit_transform( + data, + marginal_targets={}, + linear_constraints=(constraint,), + ) + + validation = adapter.validate(result) + errors = validation["linear_errors"] + assert "age_25_45" in errors + # 5 % relative tolerance is generous for 400 epochs on 1 constraint. + assert errors["age_25_45"]["relative_error"] < 0.05 + # Weighted count actually moved. + weighted_count = float( + (result["age"] >= 25).values + * (result["age"] < 45).values + * result["weight"].to_numpy() + ).sum() if False else float(result.loc[mask, "weight"].sum()) + # Should be close to target; at least 1.5x original (we asked for 2x). + assert weighted_count > 1.5 * current_count + + def test_two_orthogonal_constraints_both_improve(self) -> None: + """Separate age-band and income-age-band constraints should both reduce.""" + data = _toy_data(n_records=300, seed=2) + + # Current sums. + band_mask = (data["age"] >= 30) & (data["age"] < 50) + current_count = float(band_mask.sum()) + current_income_sum = float(data.loc[band_mask, "income"].sum()) + + constraints = ( + _age_band_constraint( + data, "count_30_50", 30, 50, target=1.4 * current_count + ), + _income_age_band_constraint( + data, "income_30_50", 30, 50, target=1.4 * current_income_sum + ), + ) + + adapter = MicrocalibrateAdapter( + MicrocalibrateAdapterConfig( + epochs=400, + learning_rate=0.05, + noise_level=0.0, + ) + ) + result = adapter.fit_transform( + data, + marginal_targets={}, + linear_constraints=constraints, + ) + + validation = adapter.validate(result) + # Both constraints should get meaningfully closer to target. + # 10 % relative tolerance since there's inherent trade-off between + # count and income-sum constraints on the same band. + for name in ("count_30_50", "income_30_50"): + rel = validation["linear_errors"][name]["relative_error"] + assert rel < 0.10, f"constraint {name} still at rel_error={rel:.3f}" + + +class TestValidationShape: + """Validation output has the keys the downstream pipeline expects.""" + + def test_validation_keys(self) -> None: + data = _toy_data() + adapter = MicrocalibrateAdapter( + MicrocalibrateAdapterConfig(epochs=4, noise_level=0.0) + ) + constraint = _age_band_constraint(data, "a", 18, 40, target=30.0) + _ = adapter.fit_transform( + data, + marginal_targets={}, + linear_constraints=(constraint,), + ) + validation = adapter.validate() + + assert set(validation) == { + "converged", + "max_error", + "sparsity", + "linear_errors", + } + assert isinstance(validation["converged"], bool) + assert isinstance(validation["max_error"], float) + assert 0.0 <= validation["sparsity"] <= 1.0 + assert "a" in validation["linear_errors"] + + entry = validation["linear_errors"]["a"] + assert set(entry) == { + "target", + "estimate", + "relative_error", + "absolute_error", + } + + def test_validation_without_calibration_is_trivially_converged(self) -> None: + adapter = MicrocalibrateAdapter() + validation = adapter.validate() + assert validation["converged"] is True + assert validation["max_error"] == 0.0 + assert validation["sparsity"] == 0.0 + assert validation["linear_errors"] == {} diff --git a/tests/calibration/test_microcalibrate_adapter_memory.py b/tests/calibration/test_microcalibrate_adapter_memory.py new file mode 100644 index 0000000..408f308 --- /dev/null +++ b/tests/calibration/test_microcalibrate_adapter_memory.py @@ -0,0 +1,105 @@ +"""Adapter must not materialize the estimate matrix as float64 pandas. + +At v7 scale (1.5M households x ~500 constraints) the adapter's pre-fix +behavior builds a float64 DataFrame (6 GB) *and* microcalibrate keeps +it alive in memory alongside a float32 torch copy. The combined footprint +pushes the workstation past macOS jetsam kill threshold. + +These tests pin the adapter's memory contract: the estimate matrix passed +to microcalibrate.Calibration must be float32 from the start. Adapter +behavior on small inputs is unchanged; only the dtype is tightened. +""" + +from __future__ import annotations + +from typing import Any +from unittest.mock import patch + +import numpy as np +import pandas as pd +from microplex.calibration import LinearConstraint + +from microplex_us.calibration import MicrocalibrateAdapter + + +def _toy_data(n_records: int = 200, seed: int = 0) -> pd.DataFrame: + rng = np.random.default_rng(seed) + return pd.DataFrame( + { + "age": rng.integers(18, 70, size=n_records), + "income": rng.normal(40_000, 20_000, size=n_records).clip(0, None), + "weight": np.ones(n_records), + } + ) + + +def _age_band( + data: pd.DataFrame, name: str, low: int, high: int, target: float +) -> LinearConstraint: + mask = (data["age"] >= low) & (data["age"] < high) + return LinearConstraint( + name=name, + coefficients=mask.astype(float).to_numpy(), + target=target, + ) + + +class TestEstimateMatrixDtype: + """The adapter must not pass a float64 estimate matrix to Calibration.""" + + def test_estimate_matrix_passed_to_calibration_is_float32(self) -> None: + """Intercept Calibration.__init__ and inspect the estimate_matrix arg.""" + captured: dict[str, Any] = {} + + from microcalibrate import Calibration as _RealCalibration + + original_init = _RealCalibration.__init__ + + def spy_init(self: Any, *args: Any, **kwargs: Any) -> None: + captured["estimate_matrix"] = kwargs.get("estimate_matrix") + original_init(self, *args, **kwargs) + + data = _toy_data() + constraints = ( + _age_band(data, "age_18_30", 18, 30, 40.0), + _age_band(data, "age_30_45", 30, 45, 60.0), + _age_band(data, "age_45_70", 45, 70, 100.0), + ) + adapter = MicrocalibrateAdapter() + with patch.object(_RealCalibration, "__init__", spy_init): + adapter.fit_transform(data, linear_constraints=constraints) + + estimate_matrix = captured["estimate_matrix"] + assert estimate_matrix is not None, "Calibration was not constructed" + + if isinstance(estimate_matrix, pd.DataFrame): + for col, dtype in estimate_matrix.dtypes.items(): + assert dtype == np.float32, ( + f"estimate_matrix column {col!r} is {dtype}, expected float32 " + "(float64 doubles adapter peak memory at v7 scale)" + ) + else: + arr = np.asarray(estimate_matrix) + assert arr.dtype == np.float32, ( + f"estimate_matrix dtype is {arr.dtype}, expected float32" + ) + + def test_weights_still_converge_with_float32(self) -> None: + """Dtype tightening must not break the convergence behavior.""" + from microplex_us.calibration import MicrocalibrateAdapterConfig + + data = _toy_data(n_records=300) + constraints = ( + _age_band(data, "age_18_30", 18, 30, 60.0), + _age_band(data, "age_30_45", 30, 45, 90.0), + _age_band(data, "age_45_70", 45, 70, 150.0), + ) + adapter = MicrocalibrateAdapter( + MicrocalibrateAdapterConfig( + epochs=400, learning_rate=0.05, noise_level=0.0 + ) + ) + result = adapter.fit_transform(data, linear_constraints=constraints) + validation = adapter.validate(result) + # Same tolerance the existing smoke tests in this package use. + assert validation["max_error"] < 0.1, validation diff --git a/tests/calibration/test_us_pipeline_dispatch.py b/tests/calibration/test_us_pipeline_dispatch.py new file mode 100644 index 0000000..8648c5f --- /dev/null +++ b/tests/calibration/test_us_pipeline_dispatch.py @@ -0,0 +1,113 @@ +"""Pipeline-level test: `calibration_backend="microcalibrate"` dispatches to +`MicrocalibrateAdapter` and round-trips one calibration call inside the +USMicroplexPipeline context. + +This is the final link between the adapter and the production pipeline: +the backend string needs to be valid in `USMicroplexBuildConfig`, and +`_build_weight_calibrator` must return an adapter instance that +satisfies the same `fit_transform` / `validate` contract the rest of +`calibrate_policyengine_tables` expects. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest +from microplex.calibration import LinearConstraint + +from microplex_us.calibration import MicrocalibrateAdapter +from microplex_us.pipelines.us import USMicroplexBuildConfig, USMicroplexPipeline + + +def _toy_households(n: int = 100, seed: int = 0) -> pd.DataFrame: + rng = np.random.default_rng(seed) + return pd.DataFrame( + { + "household_id": np.arange(n), + "household_weight": np.ones(n, dtype=float), + "income": rng.normal(80_000, 40_000, n).clip(0, None), + } + ) + + +def test_backend_string_resolves_to_adapter() -> None: + cfg = USMicroplexBuildConfig(calibration_backend="microcalibrate") + pipeline = USMicroplexPipeline(cfg) + calibrator = pipeline._build_weight_calibrator() + assert isinstance(calibrator, MicrocalibrateAdapter) + + +def test_backend_dispatch_fit_transform_end_to_end() -> None: + """Full path: pipeline config → dispatch → fit_transform → validate.""" + cfg = USMicroplexBuildConfig( + calibration_backend="microcalibrate", + calibration_max_iter=200, + ) + pipeline = USMicroplexPipeline(cfg) + calibrator = pipeline._build_weight_calibrator() + + data = _toy_households(n=200, seed=1) + # Constraint: weighted count of households with income > 80k should be 1.4x current. + mask = (data["income"] > 80_000).to_numpy(dtype=float) + target = 1.4 * float(mask.sum()) + constraint = LinearConstraint( + name="above_80k", coefficients=mask, target=target + ) + + result = calibrator.fit_transform( + data, + marginal_targets={}, + weight_col="household_weight", + linear_constraints=(constraint,), + ) + + assert len(result) == len(data) + assert "household_weight" in result.columns + assert (result["household_weight"] >= 0).all() + + validation = calibrator.validate(result) + assert set(validation) == {"converged", "max_error", "sparsity", "linear_errors"} + assert "above_80k" in validation["linear_errors"] + + +def test_invalid_backend_still_raises() -> None: + """Regression test: unknown backend strings surface a clear error.""" + # The Literal type is only checked by static tools; runtime dispatch + # raises a ValueError, which we want to preserve. + cfg = USMicroplexBuildConfig.__dataclass_fields__["calibration_backend"] + # Construct the dataclass bypassing the Literal constraint. + bad_cfg = USMicroplexBuildConfig() + object.__setattr__(bad_cfg, "calibration_backend", "no_such_backend") + pipeline = USMicroplexPipeline(bad_cfg) + with pytest.raises(ValueError, match="Unsupported calibration backend"): + pipeline._build_weight_calibrator() + + +def test_pe_l0_deferred_stage_disables_sparsity_penalty() -> None: + """Stages ≥2 must refine weights without re-sparsifying. + + v10 ran three L0 stages with `lambda_l0=1e-4` each, warm-starting + stages 2/3 from stage 1's already-sparse weights. Loss compounded + pruning down to 1,511 active households — unusable. Stages 2+ now + drop the sparsity penalty so they only reduce residual error. + """ + cfg = USMicroplexBuildConfig(calibration_backend="pe_l0") + pipeline = USMicroplexPipeline(cfg) + + stage1 = pipeline._build_weight_calibrator(stage_index=1) + stage2 = pipeline._build_weight_calibrator(stage_index=2) + stage3 = pipeline._build_weight_calibrator(stage_index=3) + + assert stage1.lambda_l0 == pytest.approx(1e-4) + assert stage2.lambda_l0 == 0.0 + assert stage3.lambda_l0 == 0.0 + + +def test_hardconcrete_deferred_stage_disables_sparsity_penalty() -> None: + cfg = USMicroplexBuildConfig(calibration_backend="hardconcrete") + pipeline = USMicroplexPipeline(cfg) + stage1 = pipeline._build_weight_calibrator(stage_index=1) + stage2 = pipeline._build_weight_calibrator(stage_index=2) + assert stage1.lambda_l0 == pytest.approx(1e-4) + assert stage2.lambda_l0 == 0.0 diff --git a/tests/pipelines/test_artifacts.py b/tests/pipelines/test_artifacts.py index e255c53..1d59cd8 100644 --- a/tests/pipelines/test_artifacts.py +++ b/tests/pipelines/test_artifacts.py @@ -176,19 +176,9 @@ def _create_policyengine_targets_db(path: Path) -> None: t.value, t.period, t.active, - CASE - WHEN t.variable = 'snap' THEN 'state' - ELSE 'district' - END AS geo_level, - CASE - WHEN t.variable = 'snap' THEN '06' - ELSE '0601' - END AS geographic_id, - CASE - WHEN t.variable = 'snap' THEN 'snap' - WHEN t.variable = 'household_count' THEN 'snap' - ELSE NULL - END AS domain_variable + 'state' AS geo_level, + '06' AS geographic_id, + 'household_count' AS domain_variable FROM targets AS t; """ ) @@ -216,7 +206,6 @@ def _create_policyengine_targets_db(path: Path) -> None: """, [ (1, "household_count", 2024, 1, 0, 3.0, 1, None, "test", "count"), - (2, "snap", 2024, 1, 0, 250.0, 1, None, "test", "snap"), ], ) conn.commit() @@ -604,12 +593,11 @@ def test_writes_policyengine_harness_when_baseline_and_targets_are_provided( TargetSet( [ TargetSpec( - name="snap_total", + name="household_count", entity=EntityType.HOUSEHOLD, - value=250.0, + value=3.0, period=2024, - measure="snap", - aggregation="sum", + aggregation="count", ), ] ) @@ -622,9 +610,9 @@ def test_writes_policyengine_harness_when_baseline_and_targets_are_provided( policyengine_baseline_dataset=baseline_dataset, policyengine_harness_slices=( PolicyEngineUSHarnessSlice( - name="snap", - description="SNAP parity", - query=TargetQuery(period=2024, names=("snap_total",)), + name="household_count", + description="Household count parity", + query=TargetQuery(period=2024, names=("household_count",)), ), ), policyengine_harness_metadata={"baseline_dataset": baseline_dataset.name}, @@ -838,7 +826,7 @@ def test_writes_policyengine_harness_from_build_config_defaults(self, tmp_path): policyengine_dataset_year=2024, policyengine_targets_db=str(targets_db), policyengine_baseline_dataset=str(baseline_dataset), - policyengine_target_variables=("snap", "household_count"), + policyengine_target_variables=("household_count",), ), seed_data=pd.DataFrame({"income": [10.0], "hh_weight": [1.0]}), synthetic_data=pd.DataFrame({"income": [10.0, 20.0], "weight": [1.0, 1.0]}), @@ -921,10 +909,7 @@ def test_writes_policyengine_harness_from_build_config_defaults(self, tmp_path): assert harness_payload["metadata"]["targets_db"] == "policyengine_targets.db" assert harness_payload["metadata"]["harness_suite"] == "policyengine_us_all_targets" assert harness_payload["metadata"]["harness_slice_names"] == ["all_targets"] - assert harness_payload["metadata"]["target_variables"] == [ - "snap", - "household_count", - ] + assert harness_payload["metadata"]["target_variables"] == ["household_count"] assert harness_payload["metadata"]["policyengine_us_runtime_version"] is not None assert [slice_payload["name"] for slice_payload in harness_payload["slices"]] == [ "all_targets", diff --git a/tests/pipelines/test_constraint_metadata_lookup.py b/tests/pipelines/test_constraint_metadata_lookup.py new file mode 100644 index 0000000..11a4bca --- /dev/null +++ b/tests/pipelines/test_constraint_metadata_lookup.py @@ -0,0 +1,134 @@ +"""Constraint-metadata precompute + lookup path. + +The calibration stage previously scanned each constraint's dense +1.5M-length coefficient array three separate times during ledger + +deferred-stage-selection. That accounted for ~30 GB of transient +``np.abs(...)`` allocations at v7/v8 scale on top of the ~48 GB +baseline — a contributor to the 172 GB-compressed v7 / 197 GB v8 +jetsam kills. + +Fix: precompute ``active_households`` and ``coefficient_mass`` once +per constraint, then thread a ``metadata_lookup`` dict through +``_build_policyengine_constraint_records`` and +``_constraint_active_household_count`` so the dense arrays aren't +rescanned. These tests pin that contract. +""" + +from __future__ import annotations + +import numpy as np +import pytest +from microplex.calibration import LinearConstraint + +from microplex_us.pipelines.us import ( + _build_policyengine_constraint_records, + _constraint_active_household_count, + _precompute_constraint_metadata, + _strip_constraint_coefficients, +) + + +def _toy_constraints(n_hh: int = 1000) -> tuple[LinearConstraint, ...]: + """Three constraints over ``n_hh`` households with known active counts. + + - ``all_nonzero``: every household has nonzero coefficient (count n_hh) + - ``half``: half the households have nonzero coefficient (count n_hh/2) + - ``rare``: only 10 households have nonzero coefficient + """ + rng = np.random.default_rng(0) + all_nonzero = np.ones(n_hh, dtype=float) + half = np.where(rng.random(n_hh) > 0.5, 1.0, 0.0) + rare = np.zeros(n_hh, dtype=float) + rare[:10] = 1.0 + return ( + LinearConstraint(name="all_nonzero", coefficients=all_nonzero, target=100.0), + LinearConstraint(name="half", coefficients=half, target=200.0), + LinearConstraint(name="rare", coefficients=rare, target=10.0), + ) + + +class TestPrecomputeMetadata: + def test_precomputed_scalars_match_direct_computation(self) -> None: + constraints = _toy_constraints(n_hh=1000) + metadata = _precompute_constraint_metadata(constraints) + for c in constraints: + expected_count = int(np.count_nonzero(np.abs(c.coefficients) > 1e-12)) + expected_mass = float(np.abs(c.coefficients).sum()) + assert metadata[c.name]["active_households"] == expected_count + assert metadata[c.name]["coefficient_mass"] == pytest.approx( + expected_mass, rel=1e-12 + ) + + def test_empty_constraints_produce_empty_metadata(self) -> None: + assert _precompute_constraint_metadata(()) == {} + + +class TestMetadataLookupBypassesCoefficients: + def test_active_household_count_uses_lookup(self) -> None: + constraints = _toy_constraints(n_hh=1000) + metadata = _precompute_constraint_metadata(constraints) + stripped = _strip_constraint_coefficients(constraints) + # Sanity: stripped tuple has no coefficient data to scan. + for c in stripped: + assert c.coefficients.size == 0 + # Without metadata_lookup, active-count on a stripped constraint is 0. + assert _constraint_active_household_count(stripped[0]) == 0 + # With metadata_lookup, the precomputed count is returned. + assert ( + _constraint_active_household_count( + stripped[0], metadata_lookup=metadata + ) + == 1000 + ) + + def test_build_records_uses_lookup_when_coefficients_stripped(self) -> None: + """Integration: records built from stripped constraints + lookup + match records built from the full (unstripped) constraints.""" + + class FakeTarget: + def __init__(self, name: str, geo_level: str = "national"): + self.name = name + self.aggregation = "SUM" + self.metadata = {"geo_level": geo_level} + self.required_features = () + + constraints = _toy_constraints(n_hh=1000) + targets = [ + FakeTarget(name="all_nonzero"), + FakeTarget(name="half"), + FakeTarget(name="rare"), + ] + expected = _build_policyengine_constraint_records(targets, constraints) + + metadata = _precompute_constraint_metadata(constraints) + stripped = _strip_constraint_coefficients(constraints) + actual = _build_policyengine_constraint_records( + targets, stripped, metadata_lookup=metadata + ) + + for exp, act in zip(expected, actual, strict=True): + assert exp["active_households"] == act["active_households"] + assert exp["coefficient_mass"] == pytest.approx( + act["coefficient_mass"], rel=1e-12 + ) + + +class TestBackwardCompatibility: + def test_records_without_lookup_still_work(self) -> None: + """Legacy callers that don't pass metadata_lookup should still get + correct results by scanning the coefficient arrays.""" + + class FakeTarget: + def __init__(self, name: str): + self.name = name + self.aggregation = "SUM" + self.metadata = {"geo_level": "national"} + self.required_features = () + + constraints = _toy_constraints(n_hh=500) + targets = [FakeTarget(name=c.name) for c in constraints] + records = _build_policyengine_constraint_records(targets, constraints) + assert records[0]["active_households"] == 500 + assert records[1]["active_households"] > 200 # ~half + assert records[1]["active_households"] < 300 + assert records[2]["active_households"] == 10 diff --git a/tests/pipelines/test_donor_imputer_negative_preservation.py b/tests/pipelines/test_donor_imputer_negative_preservation.py new file mode 100644 index 0000000..f1c40f2 --- /dev/null +++ b/tests/pipelines/test_donor_imputer_negative_preservation.py @@ -0,0 +1,118 @@ +"""Donor imputer must preserve negative values in zero-inflated-sign-mixed columns. + +v7 bug (`us.py:235`, pre-fix): `ColumnwiseQRFDonorImputer` applies +`y_values > 0` as its nonzero filter. For columns that can be negative +(short-term capital gains, partnership/S-corp income, farm income, +rental income), this drops all negative training rows — the QRF only +sees positives and therefore produces zero-or-positive predictions. +The entire negative tail disappears from the synthetic frame. + +v9 fix: swap the ad-hoc gate for `microimpute.models.ZeroInflatedImputer`, +which auto-detects the three-sign regime and routes negative-gated +records to a negative-only QRF. + +These tests pin the post-fix contract by fitting on a column that +genuinely spans neg/0/pos and asserting negatives survive to the +synthetic output. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest + +pytest.importorskip("quantile_forest") +pytest.importorskip("microimpute") + + +def _three_sign_frame(n: int = 800, seed: int = 0) -> pd.DataFrame: + """Training frame with a three-sign target. + + ~40% negative, ~20% zero, ~40% positive. Positive regime has + distinct distribution from negative regime, so the sign is + predictable from the conditioning variables. + """ + rng = np.random.default_rng(seed) + age = rng.integers(18, 80, size=n).astype(float) + is_female = rng.integers(0, 2, size=n).astype(float) + + # Regime assignment driven by (age, is_female). + logit_pos = -0.5 + 0.05 * (age - 50) # older → more likely positive + logit_neg = 0.5 - 0.05 * (age - 50) # younger → more likely negative + logit_zero = 1.0 - 0.02 * age + + logits = np.stack([logit_neg, logit_zero, logit_pos], axis=1) + logits -= logits.max(axis=1, keepdims=True) + probs = np.exp(logits) + probs /= probs.sum(axis=1, keepdims=True) + + u = rng.random(n) + cum = np.cumsum(probs, axis=1) + regime_idx = (cum >= u[:, None]).argmax(axis=1) + + y = np.zeros(n) + pos_mask = regime_idx == 2 + neg_mask = regime_idx == 0 + y[pos_mask] = 100 + rng.exponential(200, size=pos_mask.sum()) + y[neg_mask] = -(100 + rng.exponential(200, size=neg_mask.sum())) + + return pd.DataFrame( + { + "age": age, + "is_female": is_female, + "short_term_capital_gains": y, + } + ) + + +class TestDonorImputerPreservesNegatives: + """The donor imputer must emit negatives for three-sign training columns.""" + + def test_fit_generate_preserves_negative_predictions(self) -> None: + """The current v7 imputer (`y > 0` gate) should NOT pass this. + The v9 imputer (ZeroInflatedImputer-based) should. + """ + from microplex_us.pipelines.us import ColumnwiseQRFDonorImputer + + train = _three_sign_frame(n=800, seed=0) + # Preconditions on the fixture: genuinely three-sign. + y = train["short_term_capital_gains"].to_numpy() + assert (y > 0).sum() > 50, "fixture should have meaningful positive mass" + assert (y < 0).sum() > 50, "fixture should have meaningful negative mass" + assert (y == 0).sum() > 50, "fixture should have meaningful zero mass" + + imputer = ColumnwiseQRFDonorImputer( + condition_vars=["age", "is_female"], + target_vars=["short_term_capital_gains"], + n_estimators=30, + zero_inflated_vars={"short_term_capital_gains"}, + zero_threshold=0.05, + ) + imputer.fit(train) + + rng = np.random.default_rng(42) + n_gen = 2000 + conditions = pd.DataFrame( + { + "age": rng.integers(18, 80, size=n_gen).astype(float), + "is_female": rng.integers(0, 2, size=n_gen).astype(float), + } + ) + synthetic = imputer.generate(conditions, seed=42) + synth_y = synthetic["short_term_capital_gains"].to_numpy() + + # The core contract: the synthetic output must contain some + # negative values. Under the v7 `y > 0` bug this would be 0. + n_negative = int((synth_y < 0).sum()) + assert n_negative > 0, ( + f"Donor imputer produced no negative values despite training " + f"data having {(y < 0).sum()} negatives. This is the v7 " + "drop-negatives bug." + ) + # Loose sanity: the negative fraction should be materially + # above zero (not just a single fp-edge-case). + assert n_negative / n_gen > 0.05, ( + f"Negative fraction in synthetic = {n_negative / n_gen:.3f}; " + "expected > 5% given the training distribution has ~40% negatives." + ) diff --git a/tests/pipelines/test_recalibrate_from_checkpoint.py b/tests/pipelines/test_recalibrate_from_checkpoint.py new file mode 100644 index 0000000..f13b3c7 --- /dev/null +++ b/tests/pipelines/test_recalibrate_from_checkpoint.py @@ -0,0 +1,134 @@ +"""Recalibrate-from-checkpoint helper. + +Loads a post-imputation bundle previously saved by +``save_us_pipeline_checkpoint`` and calls +``pipeline.calibrate_policyengine_tables`` on it. Used by operators to +iterate on calibration config (backend, lambda schedule, targets) +without paying the ~11 h synthesis + donor-imputation cost that +produced the bundle. + +These tests drive: + +1. The helper loads a post-imputation checkpoint and dispatches the + bundle to a fresh pipeline's calibrate method. +2. The helper rejects post-microsim checkpoints in v1 (resume from that + stage needs pickled constraints, which is a follow-up). +3. The helper raises a clear error if the checkpoint directory is + missing. +""" + +from __future__ import annotations + +from pathlib import Path +from typing import Any +from unittest.mock import MagicMock + +import numpy as np +import pandas as pd +import pytest + +from microplex_us.pipelines.us import USMicroplexBuildConfig +from microplex_us.policyengine.us import ( + PolicyEngineUSEntityTableBundle, + save_us_pipeline_checkpoint, +) + + +def _make_bundle(n: int = 50) -> PolicyEngineUSEntityTableBundle: + rng = np.random.default_rng(0) + household_ids = np.arange(n) + 1 + return PolicyEngineUSEntityTableBundle( + households=pd.DataFrame( + { + "household_id": household_ids, + "household_weight": rng.uniform(0.5, 2.0, size=n), + } + ), + persons=pd.DataFrame( + { + "person_id": household_ids * 10, + "household_id": household_ids, + "age": rng.integers(0, 85, size=n), + } + ), + ) + + +class TestRecalibrateFromPipelineCheckpoint: + @pytest.mark.parametrize("stage", ["post_imputation", "post_microsim"]) + def test_checkpoint_dispatches_to_calibrate( + self, + tmp_path: Path, + monkeypatch: pytest.MonkeyPatch, + stage: str, + ) -> None: + """Both supported stages load their bundle and dispatch to calibrate. + + For ``post_microsim``, microsim is skipped inside + ``_resolve_policyengine_calibration_targets`` because all + materialized vars are present as columns; for + ``post_imputation``, microsim runs normally. The helper only + orchestrates the load and hand-off, so the parametrized test + covers both paths. + """ + from microplex_us.pipelines.us import recalibrate_policyengine_us_from_checkpoint + + bundle = _make_bundle(n=40) + save_us_pipeline_checkpoint( + bundle, tmp_path / "checkpoint", stage=stage + ) + + observed_tables: list[PolicyEngineUSEntityTableBundle] = [] + + def _fake_calibrate( + self: Any, + tables: PolicyEngineUSEntityTableBundle, + ) -> tuple[PolicyEngineUSEntityTableBundle, pd.DataFrame, dict[str, Any]]: + observed_tables.append(tables) + return ( + tables, + tables.households.assign(weight=tables.households["household_weight"]), + {"mock": True}, + ) + + monkeypatch.setattr( + "microplex_us.pipelines.us.USMicroplexPipeline.calibrate_policyengine_tables", + _fake_calibrate, + ) + + cfg = USMicroplexBuildConfig( + calibration_backend="pe_l0", + policyengine_targets_db=tmp_path / "targets.db", + ) + result = recalibrate_policyengine_us_from_checkpoint(cfg, tmp_path / "checkpoint") + + assert len(observed_tables) == 1 + pd.testing.assert_frame_equal( + observed_tables[0].households, bundle.households + ) + assert result.calibration_summary == {"mock": True} + assert result.loaded_stage == stage + pd.testing.assert_frame_equal( + result.policyengine_tables.households, bundle.households + ) + + def test_unsupported_stage_raises(self, tmp_path: Path) -> None: + """A metadata.json with an unknown stage is rejected.""" + from microplex_us.pipelines.us import recalibrate_policyengine_us_from_checkpoint + + (tmp_path / "checkpoint").mkdir() + import json + + (tmp_path / "checkpoint" / "metadata.json").write_text( + json.dumps({"format_version": 1, "stage": "bogus"}) + ) + cfg = USMicroplexBuildConfig(policyengine_targets_db=tmp_path / "targets.db") + with pytest.raises(ValueError, match="Cannot resume"): + recalibrate_policyengine_us_from_checkpoint(cfg, tmp_path / "checkpoint") + + def test_missing_checkpoint_raises(self, tmp_path: Path) -> None: + from microplex_us.pipelines.us import recalibrate_policyengine_us_from_checkpoint + + cfg = USMicroplexBuildConfig(policyengine_targets_db=tmp_path / "targets.db") + with pytest.raises(FileNotFoundError): + recalibrate_policyengine_us_from_checkpoint(cfg, tmp_path / "nope") diff --git a/tests/pipelines/test_regime_aware_donor_imputer.py b/tests/pipelines/test_regime_aware_donor_imputer.py new file mode 100644 index 0000000..34e5274 --- /dev/null +++ b/tests/pipelines/test_regime_aware_donor_imputer.py @@ -0,0 +1,229 @@ +"""Regime-aware donor imputer integration for v9. + +v7 had a `y > 0` bug that dropped negative training rows — fixed +minimally in v8 (commit 8c88277) by relabelling the gate to `y != 0`. +v8's fix makes the QRF see both signs, but it fits ONE QRF over mixed +positive and negative training rows, which allows predictions to land +in the interior band (``max(train_negatives)``, ``min(train_positives)``) +— a region no real record occupies. + +v9 upgrades to `microimpute.models.ZeroInflatedImputer`, which at fit +time auto-detects the three-sign regime per target and routes +predictions through separate positive and negative QRFs. The +interior-band gap becomes a structural guarantee, not a statistical +averaging hope. + +Downstream integration lives under a new `--donor-imputer-backend +regime_aware` option; the existing `qrf` and `zi_qrf` backends stay +unchanged for regression comparison. + +Tests pin: + +1. The new backend value resolves through the factory to a donor + imputer that uses ZeroInflatedImputer internally. +2. On a three-sign training fixture, predictions preserve negatives + (as v8's `y != 0` fix already does). +3. On the same fixture, predictions NEVER land in the interior band + between the positive and negative training regimes — the upgrade + v9 provides over v8. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest + +pytest.importorskip("quantile_forest") +pytest.importorskip("microimpute") + + +def _three_sign_frame_with_gap( + n: int = 1500, seed: int = 0 +) -> pd.DataFrame: + """Fixture with a hard gap between positive and negative training values. + + Positives live in [100, ∞), negatives in (-∞, -100], zeros exactly + at 0. Any prediction that lands in (-100, 100) excluding zero is + an "interior-band violation" — the test metric for the tripartite + advantage. + """ + rng = np.random.default_rng(seed) + age = rng.integers(18, 80, size=n).astype(float) + is_female = rng.integers(0, 2, size=n).astype(float) + + # Three-way regime assignment driven by (age, is_female). + logit_pos = -0.3 + 0.04 * (age - 50) + logit_neg = 0.3 - 0.04 * (age - 50) + logit_zero = 0.2 * (1 - is_female) + logits = np.stack([logit_neg, logit_zero, logit_pos], axis=1) + logits -= logits.max(axis=1, keepdims=True) + probs = np.exp(logits) + probs /= probs.sum(axis=1, keepdims=True) + u = rng.random(n) + cum = np.cumsum(probs, axis=1) + regime_idx = (cum >= u[:, None]).argmax(axis=1) + + y = np.zeros(n) + pos_mask = regime_idx == 2 + neg_mask = regime_idx == 0 + y[pos_mask] = 100.0 + rng.exponential(250, size=pos_mask.sum()) + y[neg_mask] = -(100.0 + rng.exponential(250, size=neg_mask.sum())) + + return pd.DataFrame( + { + "age": age, + "is_female": is_female, + "short_term_capital_gains": y, + } + ) + + +def _count_interior_violations( + predictions: np.ndarray, band: float = 100.0, atol: float = 1e-6 +) -> int: + """Count predictions in the (-band, band) interior, excluding exact zero.""" + interior = (np.abs(predictions) < band) & (np.abs(predictions) > atol) + return int(interior.sum()) + + +class TestRegimeAwareDonorImputerClassExists: + """The new donor imputer must be importable from microplex_us.pipelines.us.""" + + def test_importable_from_us_module(self) -> None: + from microplex_us.pipelines.us import RegimeAwareDonorImputer + + assert RegimeAwareDonorImputer is not None + + +class TestRegimeAwareBackendFactory: + """`_build_donor_imputer(backend='regime_aware')` returns the new class.""" + + def test_factory_dispatches_to_regime_aware(self) -> None: + from microplex_us.pipelines.us import ( + RegimeAwareDonorImputer, + USMicroplexBuildConfig, + USMicroplexPipeline, + ) + + config = USMicroplexBuildConfig( + donor_imputer_backend="regime_aware", + donor_imputer_qrf_n_estimators=25, + ) + pipeline = USMicroplexPipeline(config=config) + imputer = pipeline._build_donor_imputer( + condition_vars=["is_female", "cps_race"], + target_vars=("qualified_dividend_income", "age"), + ) + assert isinstance(imputer, RegimeAwareDonorImputer) + + +class TestRegimeAwareFitGenerate: + """Fit/generate contract and tripartite-specific guarantees.""" + + def _fit_generate( + self, n_train: int = 1500, n_gen: int = 2000, seed: int = 0 + ) -> np.ndarray: + from microplex_us.pipelines.us import RegimeAwareDonorImputer + + train = _three_sign_frame_with_gap(n=n_train, seed=seed) + # Precondition: fixture genuinely three-sign. + y = train["short_term_capital_gains"].to_numpy() + assert (y > 100).sum() > 100 + assert (y < -100).sum() > 100 + assert (y == 0).sum() > 100 + + imputer = RegimeAwareDonorImputer( + condition_vars=["age", "is_female"], + target_vars=["short_term_capital_gains"], + n_estimators=25, + ) + imputer.fit(train) + + rng = np.random.default_rng(42) + conditions = pd.DataFrame( + { + "age": rng.integers(18, 80, size=n_gen).astype(float), + "is_female": rng.integers(0, 2, size=n_gen).astype(float), + } + ) + synthetic = imputer.generate(conditions, seed=42) + return synthetic["short_term_capital_gains"].to_numpy() + + def test_generates_negative_predictions(self) -> None: + """Drop-negatives bug must not recur under regime-aware path.""" + synth_y = self._fit_generate() + n_neg = int((synth_y < 0).sum()) + assert n_neg > 0, ( + "Regime-aware donor imputer produced no negatives on a " + "three-sign training fixture — regression." + ) + assert n_neg / len(synth_y) > 0.05 + + def test_generates_positive_predictions(self) -> None: + synth_y = self._fit_generate() + n_pos = int((synth_y > 0).sum()) + assert n_pos / len(synth_y) > 0.05 + + def test_generates_zero_predictions(self) -> None: + synth_y = self._fit_generate() + n_zero = int((np.abs(synth_y) < 1e-6).sum()) + assert n_zero > 0, "Gate must emit some exact zeros." + + def test_no_interior_band_violations(self) -> None: + """Core v9 advantage over v8. + + v8's `y != 0` fix keeps negatives but fits ONE QRF over mixed + pos+neg training rows, so predictions can interpolate into the + (-100, 100) interior band. v9's regime-aware path fits + separate positive and negative QRFs and routes through a + three-way gate, so the interior is empty by construction. + """ + synth_y = self._fit_generate() + violations = _count_interior_violations(synth_y, band=100.0) + assert violations == 0, ( + f"Regime-aware imputer produced {violations} predictions in " + f"the (-100, 100) interior band, which should be empty by " + f"construction. Sample offenders: " + f"{sorted(synth_y[(np.abs(synth_y) < 100) & (np.abs(synth_y) > 1e-6)][:10])}" + ) + + def test_same_seed_repeats_identically(self) -> None: + from microplex_us.pipelines.us import RegimeAwareDonorImputer + + train = _three_sign_frame_with_gap(n=1200, seed=3) + conditions = train[["age", "is_female"]].head(300).reset_index(drop=True) + imputer = RegimeAwareDonorImputer( + condition_vars=["age", "is_female"], + target_vars=["short_term_capital_gains"], + n_estimators=25, + ) + imputer.fit(train) + + first = imputer.generate(conditions, seed=123)["short_term_capital_gains"].to_numpy() + second = imputer.generate(conditions, seed=123)["short_term_capital_gains"].to_numpy() + third = imputer.generate(conditions, seed=999)["short_term_capital_gains"].to_numpy() + + np.testing.assert_array_equal(first, second) + assert not np.array_equal(first, third) + + def test_same_seed_repeats_identically_for_multiple_targets(self) -> None: + from microplex_us.pipelines.us import RegimeAwareDonorImputer + + train = _three_sign_frame_with_gap(n=1200, seed=4) + train["rental_income"] = -0.5 * train["short_term_capital_gains"] + conditions = train[["age", "is_female"]].head(300).reset_index(drop=True) + imputer = RegimeAwareDonorImputer( + condition_vars=["age", "is_female"], + target_vars=["short_term_capital_gains", "rental_income"], + n_estimators=25, + ) + imputer.fit(train) + + first = imputer.generate(conditions, seed=456) + second = imputer.generate(conditions, seed=456) + third = imputer.generate(conditions, seed=654) + + for column in ("short_term_capital_gains", "rental_income"): + np.testing.assert_array_equal(first[column].to_numpy(), second[column].to_numpy()) + assert not np.array_equal(first[column].to_numpy(), third[column].to_numpy()) diff --git a/tests/pipelines/test_summarize_policyengine_oracle_target_drilldown.py b/tests/pipelines/test_summarize_policyengine_oracle_target_drilldown.py index e773fb6..547b0a3 100644 --- a/tests/pipelines/test_summarize_policyengine_oracle_target_drilldown.py +++ b/tests/pipelines/test_summarize_policyengine_oracle_target_drilldown.py @@ -38,7 +38,7 @@ def test_summarize_us_policyengine_oracle_target_drilldown_filters_saved_artifac provider = PolicyEngineUSDBTargetProvider(db_path) target = provider.load_target_set( - TargetQuery(period=2024, provider_filters={"variables": ["snap"]}) + TargetQuery(period=2024, provider_filters={"variables": ["household_count"]}) ).targets[0] target_ledger = [ _policyengine_target_ledger_entry( @@ -52,8 +52,8 @@ def test_summarize_us_policyengine_oracle_target_drilldown_filters_saved_artifac config = USMicroplexBuildConfig( policyengine_targets_db=str(db_path), policyengine_target_period=2024, - policyengine_target_variables=("snap",), - policyengine_calibration_target_variables=("snap",), + policyengine_target_variables=("household_count",), + policyengine_calibration_target_variables=("household_count",), calibration_backend="entropy", policyengine_dataset_year=2024, ) @@ -79,7 +79,7 @@ def test_summarize_us_policyengine_oracle_target_drilldown_filters_saved_artifac assert summary["summary"]["stageCounts"] == {"solve_now": 1} assert summary["summary"]["largestFamiliesByCappedError"] == [ { - "group": "snap|domain=snap", + "group": "household_count|domain=household_count", "cappedErrorMass": 0.6, "count": 1, "meanCappedError": 0.6, @@ -94,16 +94,16 @@ def test_summarize_us_policyengine_oracle_target_drilldown_filters_saved_artifac } ] assert summary["topRows"][0]["stage"] == "solve_now" - assert summary["topRows"][0]["loss_family"] == "snap|domain=snap" + assert summary["topRows"][0]["loss_family"] == "household_count|domain=household_count" assert summary["topRows"][0]["loss_geography"] == "state:CA" - assert summary["topRows"][0]["actual_value"] == 100.0 - assert summary["topRows"][0]["target_value"] == 250.0 - assert summary["topRows"][0]["driver_variable"] == "snap" + assert summary["topRows"][0]["actual_value"] == 2.0 + assert summary["topRows"][0]["target_value"] == 5.0 + assert summary["topRows"][0]["driver_variable"] == "household_count" assert summary["topRows"][0]["provenance_class"] == "stored_input" family_summary = summarize_us_policyengine_oracle_target_drilldown( bundle_dir, - family="snap|domain=snap", + family="household_count|domain=household_count", geography="state:CA", stage="solve_now", top_k=5, @@ -112,16 +112,76 @@ def test_summarize_us_policyengine_oracle_target_drilldown_filters_saved_artifac assert family_summary["topRows"][0]["target_name"] == summary["topRows"][0]["target_name"] -def _write_policyengine_dataset(path: Path) -> None: - tables = PolicyEngineUSEntityTableBundle( - households=pd.DataFrame( +def test_summarize_us_policyengine_oracle_target_drilldown_marks_rematerialized_formula( + tmp_path, +) -> None: + bundle_dir = tmp_path / "bundle" + bundle_dir.mkdir() + db_path = tmp_path / "policy_data.db" + dataset_path = bundle_dir / "policyengine_us.h5" + + _create_policyengine_targets_db( + db_path, + variable="snap", + value=250.0, + domain_variable="snap", + ) + _write_policyengine_dataset(dataset_path, include_raw_snap=True) + + provider = PolicyEngineUSDBTargetProvider(db_path) + target = provider.load_target_set( + TargetQuery(period=2024, provider_filters={"variables": ["snap"]}) + ).targets[0] + target_ledger = [ + _policyengine_target_ledger_entry( + target=target, + stage="solve_now", + reason="selected_stage_1", + household_count=2, + ) + ] + + config = USMicroplexBuildConfig( + policyengine_targets_db=str(db_path), + policyengine_target_period=2024, + policyengine_target_variables=("snap",), + policyengine_calibration_target_variables=("snap",), + calibration_backend="entropy", + policyengine_dataset_year=2024, + ) + (bundle_dir / "manifest.json").write_text( + json.dumps( { - "household_id": [1, 2], - "household_weight": [1.0, 1.0], - "snap": [100.0, 0.0], - "state_fips": [6, 6], + "config": config.to_dict(), + "artifacts": {"policyengine_dataset": dataset_path.name}, + "calibration": { + "oracle_relative_error_cap": 10.0, + "materialized_variables": [], + "target_ledger": target_ledger, + }, } - ), + ) + ) + + summary = summarize_us_policyengine_oracle_target_drilldown(bundle_dir, top_k=5) + + assert summary["topRows"][0]["driver_variable"] == "snap" + assert summary["topRows"][0]["driver_is_materialized"] is True + assert summary["topRows"][0]["provenance_class"] == "policyengine_materialized" + + +def _write_policyengine_dataset(path: Path, *, include_raw_snap: bool = False) -> None: + household_data = { + "household_id": [1, 2], + "household_weight": [1.0, 1.0], + "state_fips": [6, 6], + } + household_variable_map = {"state_fips": "state_fips"} + if include_raw_snap: + household_data["snap"] = [100.0, 0.0] + household_variable_map["snap"] = "snap" + tables = PolicyEngineUSEntityTableBundle( + households=pd.DataFrame(household_data), persons=pd.DataFrame( { "person_id": [10, 20], @@ -133,16 +193,22 @@ def _write_policyengine_dataset(path: Path) -> None: arrays = build_policyengine_us_time_period_arrays( tables, period=2024, - household_variable_map={"snap": "snap", "state_fips": "state_fips"}, + household_variable_map=household_variable_map, person_variable_map={"age": "age"}, ) write_policyengine_us_time_period_dataset(arrays, path) -def _create_policyengine_targets_db(path: Path) -> None: +def _create_policyengine_targets_db( + path: Path, + *, + variable: str = "household_count", + value: float = 5.0, + domain_variable: str = "household_count", +) -> None: conn = sqlite3.connect(path) conn.executescript( - """ + f""" CREATE TABLE strata ( stratum_id INTEGER PRIMARY KEY, definition_hash TEXT, @@ -179,7 +245,7 @@ def _create_policyengine_targets_db(path: Path) -> None: t.active, 'state' AS geo_level, '06' AS geographic_id, - 'snap' AS domain_variable + '{domain_variable}' AS domain_variable FROM targets AS t; """ ) @@ -205,7 +271,7 @@ def _create_policyengine_targets_db(path: Path) -> None: notes ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?) """, - (1, "snap", 2024, 1, 0, 250.0, 1, None, "test", "snap"), + (1, variable, 2024, 1, 0, value, 1, None, "test", variable), ) conn.commit() conn.close() diff --git a/tests/pipelines/test_us.py b/tests/pipelines/test_us.py index c203280..79b3430 100644 --- a/tests/pipelines/test_us.py +++ b/tests/pipelines/test_us.py @@ -23,6 +23,7 @@ ) from microplex.targets import TargetAggregation, TargetQuery, TargetSpec +import microplex_us.pipelines.us as us_pipeline_module from microplex_us.pipelines.us import ( USMicroplexBuildConfig, USMicroplexBuildResult, @@ -4451,7 +4452,7 @@ def test_policyengine_target_provider_returns_canonical_specs( assert all(isinstance(target, TargetSpec) for target in targets.targets) def test_calibrate_policyengine_tables_from_db_with_simulated_variable( - self, persons, households, tmp_path + self, persons, households, tmp_path, monkeypatch ): db_path = tmp_path / "policyengine_targets.db" conn = sqlite3.connect(db_path) @@ -4579,6 +4580,23 @@ def calculate(self, variable, period=None, map_to=None): return [100.0, 0.0, 0.0] raise KeyError(variable) + captured_direct_overrides: list[tuple[str, ...]] = [] + original_materialize = ( + us_pipeline_module.materialize_policyengine_us_variables_safely + ) + + def spy_materialize(*args, **kwargs): + captured_direct_overrides.append( + tuple(kwargs.get("direct_override_variables", ())) + ) + return original_materialize(*args, **kwargs) + + monkeypatch.setattr( + us_pipeline_module, + "materialize_policyengine_us_variables_safely", + spy_materialize, + ) + config = USMicroplexBuildConfig( calibration_backend="entropy", policyengine_targets_db=str(db_path), @@ -4586,6 +4604,7 @@ def calculate(self, variable, period=None, map_to=None): policyengine_target_period=2024, policyengine_dataset_year=2024, policyengine_simulation_cls=FakeSimulation, + policyengine_direct_override_variables=("pre_tax_contributions",), policyengine_calibration_min_active_households=1, ) pipeline = USMicroplexPipeline(config) @@ -4593,12 +4612,14 @@ def calculate(self, variable, period=None, map_to=None): columns={"hh_weight": "weight", "income": "employment_income"} ) tables = pipeline.build_policyengine_entity_tables(seed) + tables.households["snap"] = 999.0 calibrated_tables, calibrated_persons, summary = ( pipeline.calibrate_policyengine_tables(tables) ) assert summary["backend"] == "policyengine_db_entropy" + assert captured_direct_overrides == [("pre_tax_contributions",)] assert summary["n_constraints"] == 2 assert summary["materialized_variables"] == ["snap"] assert summary["max_error"] < 1e-6 diff --git a/tests/pipelines/test_zi_qrf_backend.py b/tests/pipelines/test_zi_qrf_backend.py new file mode 100644 index 0000000..47b85e0 --- /dev/null +++ b/tests/pipelines/test_zi_qrf_backend.py @@ -0,0 +1,188 @@ +"""Pin the zi_qrf donor-imputer backend behavior before v8 relies on it. + +v7 (2026-04-18) used `donor_imputer_backend="qrf"` which bypasses the +zero-classifier gate (see `USMicroplexPipeline._build_donor_imputer`: +`zero_inflated_vars` is populated only when `backend == "zi_qrf"`). With +an empty whitelist, every QRF predict runs over all 3.37M rows even on +columns that are 99%+ zero, which is the main reason donor integration +took hours per source on v7. + +v8 flips `--donor-imputer-backend zi_qrf`. These tests pin the three +guarantees v8 relies on: + +1. The factory (`_build_donor_imputer`) populates `zero_inflated_vars` + from the `VariableSupportFamily.ZERO_INFLATED_POSITIVE` variables + when `backend == "zi_qrf"`, and leaves it empty otherwise. +2. `ColumnwiseQRFDonorImputer.fit` trains a `RandomForestClassifier` + zero-gate on each whitelisted column whose observed zero fraction + crosses the threshold, and does not train one on dense columns. +3. `ColumnwiseQRFDonorImputer.generate` skips QRF `.predict` on rows + the zero-gate sent to zero — i.e. the QRF is invoked on a strict + subset, which is the wall-clock win. +""" + +from __future__ import annotations + +import numpy as np +import pandas as pd +import pytest + +pytest.importorskip("quantile_forest") + +from microplex_us.pipelines.us import ( + ColumnwiseQRFDonorImputer, + USMicroplexBuildConfig, + USMicroplexPipeline, +) + + +def _tiny_problem(n: int = 500, seed: int = 0) -> pd.DataFrame: + """Two-column donor frame: one heavy-zero target, one dense target.""" + rng = np.random.default_rng(seed) + age = rng.integers(18, 80, size=n).astype(float) + is_female = rng.integers(0, 2, size=n).astype(float) + # 97 % zero — only a handful of positive values, like SSI or TANF. + heavy_zero = np.where(rng.random(n) > 0.97, rng.exponential(500, n), 0.0) + # Dense — every row has a positive draw, like age or weight. + dense = rng.normal(40_000, 10_000, size=n).clip(0, None) + return pd.DataFrame( + { + "age": age, + "is_female": is_female, + "tanf_reported": heavy_zero, + "employment_income": dense, + } + ) + + +class TestImputerFit: + """Whitelisted + heavy-zero → RF classifier gate; otherwise no gate.""" + + def test_zi_whitelist_produces_zero_classifier(self) -> None: + data = _tiny_problem() + imputer = ColumnwiseQRFDonorImputer( + condition_vars=["age", "is_female"], + target_vars=["tanf_reported", "employment_income"], + n_estimators=25, + zero_inflated_vars={"tanf_reported"}, + zero_threshold=0.05, + ) + imputer.fit(data) + assert "tanf_reported" in imputer._zero_models, ( + "Heavy-zero column in whitelist must get a zero-gate classifier; " + "this is the optimization v8 depends on." + ) + assert "employment_income" not in imputer._zero_models, ( + "Dense column must not get a zero-gate classifier." + ) + + def test_empty_whitelist_means_no_gates(self) -> None: + """v7 configuration: backend='qrf' → no gates ever fitted.""" + data = _tiny_problem() + imputer = ColumnwiseQRFDonorImputer( + condition_vars=["age", "is_female"], + target_vars=["tanf_reported", "employment_income"], + n_estimators=25, + zero_inflated_vars=set(), + zero_threshold=0.05, + ) + imputer.fit(data) + assert imputer._zero_models == {} + + +class TestImputerGenerateSkipsPredict: + """With a zero-gate, the QRF's .predict runs on a strict subset.""" + + def test_generate_calls_qrf_only_on_predicted_positive_rows( + self, monkeypatch: pytest.MonkeyPatch + ) -> None: + data = _tiny_problem(n=800, seed=1) + imputer = ColumnwiseQRFDonorImputer( + condition_vars=["age", "is_female"], + target_vars=["tanf_reported"], + n_estimators=25, + zero_inflated_vars={"tanf_reported"}, + zero_threshold=0.05, + ) + imputer.fit(data) + + qrf_model = imputer._models["tanf_reported"] + call_input_sizes: list[int] = [] + original_predict = qrf_model.predict + + def spy_predict(x_values: np.ndarray, **kwargs): + call_input_sizes.append(len(x_values)) + return original_predict(x_values, **kwargs) + + monkeypatch.setattr(qrf_model, "predict", spy_predict) + + # Generate on 10k conditioning rows (much larger than training). + rng = np.random.default_rng(42) + n_generate = 10_000 + conditions = pd.DataFrame( + { + "age": rng.integers(18, 80, size=n_generate).astype(float), + "is_female": rng.integers(0, 2, size=n_generate).astype(float), + } + ) + synthetic = imputer.generate(conditions, seed=42) + + assert len(call_input_sizes) == 1, call_input_sizes + predict_rows = call_input_sizes[0] + # Heavy-zero base rate is ~3 %; ZI-predicted-positive fraction + # should be well below 50 % on unseen data, and definitely + # below n_generate. + assert predict_rows < n_generate, ( + f"QRF predict was called on all {n_generate} rows — the " + f"zero-gate isn't skipping any. call_input_sizes={call_input_sizes}" + ) + assert predict_rows < n_generate * 0.5, ( + f"QRF predict got {predict_rows}/{n_generate} rows; the gate " + "is barely cutting the wall, not matching the 3 % training base rate." + ) + # Non-predicted rows must be exactly zero (not NaN, not drawn). + zero_mass = float((synthetic["tanf_reported"] == 0).mean()) + assert zero_mass > 0.5, ( + f"Synthetic zero mass = {zero_mass:.3f}; gate should leave " + "more than half of rows at exactly 0." + ) + + +class TestBuildDonorImputerFactory: + """The pipeline factory wires zero_inflated_vars only when backend='zi_qrf'.""" + + def _factory( + self, backend: str + ) -> ColumnwiseQRFDonorImputer: + config = USMicroplexBuildConfig( + donor_imputer_backend=backend, + donor_imputer_qrf_n_estimators=25, + ) + pipeline = USMicroplexPipeline(config=config) + # Variables chosen to span support families: + # qualified_dividend_income, taxable_interest_income → ZERO_INFLATED_POSITIVE + # age → BOUNDED_INTEGER + # These are all real PolicyEngine-US variable names with explicit + # semantic specs in microplex_us.variables. + target_vars = ( + "qualified_dividend_income", + "taxable_interest_income", + "age", + ) + return pipeline._build_donor_imputer( + condition_vars=["is_female", "cps_race"], + target_vars=target_vars, + ) + + def test_zi_qrf_backend_populates_whitelist(self) -> None: + imputer = self._factory("zi_qrf") + assert isinstance(imputer, ColumnwiseQRFDonorImputer) + assert "qualified_dividend_income" in imputer.zero_inflated_vars + assert "taxable_interest_income" in imputer.zero_inflated_vars + assert "age" not in imputer.zero_inflated_vars + + def test_qrf_backend_leaves_whitelist_empty(self) -> None: + """v7 semantics: pre-v8 default leaves optimization inactive.""" + imputer = self._factory("qrf") + assert isinstance(imputer, ColumnwiseQRFDonorImputer) + assert imputer.zero_inflated_vars == set() diff --git a/tests/policyengine/test_comparison.py b/tests/policyengine/test_comparison.py index 778aacf..955d45c 100644 --- a/tests/policyengine/test_comparison.py +++ b/tests/policyengine/test_comparison.py @@ -142,6 +142,7 @@ def _sample_tables() -> PolicyEngineUSEntityTableBundle: "marital_unit_id": [7000, 7000, 8000], "age": [40.0, 10.0, 30.0], "employment_income": [30_000.0, 0.0, 20_000.0], + "employment_income_before_lsr": [30_000.0, 0.0, 20_000.0], } ), tax_units=pd.DataFrame( @@ -248,11 +249,11 @@ def test_evaluate_policyengine_us_target_set_scores_count_sum_and_mean(): filters=(TargetFilter("state_fips", FilterOperator.EQ, 6),), ), TargetSpec( - name="snap_total", - entity=EntityType.HOUSEHOLD, - value=250.0, + name="employment_income_before_lsr_total", + entity=EntityType.PERSON, + value=80_000.0, period=2024, - measure="snap", + measure="employment_income_before_lsr", aggregation="sum", ), TargetSpec( @@ -275,7 +276,7 @@ def test_evaluate_policyengine_us_target_set_scores_count_sum_and_mean(): actuals = {evaluation.target.name: evaluation.actual_value for evaluation in report.evaluations} assert actuals == { "ca_households": 2.0, - "snap_total": 250.0, + "employment_income_before_lsr_total": 80_000.0, "ca_mean_age": 25.0, } @@ -388,6 +389,73 @@ def calculate(self, variable, period=None, map_to=None): assert report.materialization_failures == {} +def test_evaluate_policyengine_us_target_set_materializes_add_based_variables(tmp_path): + tables = _sample_tables() + + class FakeEntity: + def __init__(self, key: str): + self.key = key + + class FakeVariable: + def __init__( + self, + entity: FakeEntity, + *, + adds: list[str] | None = None, + formulas: dict[str, object] | None = None, + ): + self.entity = entity + self.adds = adds or [] + self.subtracts: list[str] = [] + self.formulas = formulas or {} + + def is_input_variable(self) -> bool: + return not self.formulas and not self.adds + + class FakeTaxBenefitSystem: + variables = { + "employment_income": FakeVariable( + FakeEntity("person"), + adds=["employment_income_before_lsr"], + ), + "employment_income_before_lsr": FakeVariable(FakeEntity("person")), + } + + class FakeSimulation: + tax_benefit_system = FakeTaxBenefitSystem() + + def __init__(self, dataset, dataset_year=None, **kwargs): + assert Path(dataset).exists() + assert dataset_year == 2024 + _ = kwargs + + def calculate(self, variable, period=None, map_to=None): + assert variable == "employment_income" + assert period == 2024 + assert map_to is None + return np.array([10.0, 20.0, 30.0]) + + report = evaluate_policyengine_us_target_set( + tables, + [ + TargetSpec( + name="employment_income_total", + entity=EntityType.PERSON, + value=90.0, + period=2024, + measure="employment_income", + aggregation="sum", + ) + ], + period=2024, + dataset_year=2024, + simulation_cls=FakeSimulation, + ) + + assert report.materialized_variables == ("employment_income",) + assert report.evaluations[0].actual_value == 90.0 + + def test_evaluate_policyengine_us_target_set_skips_failed_materializations(tmp_path): base_tables = _sample_tables() tables = PolicyEngineUSEntityTableBundle( @@ -725,26 +793,41 @@ def record_compile(*args, **kwargs): def test_compare_policyengine_us_target_query_to_baseline(tmp_path): - provider_db = tmp_path / "policy_data.db" - _create_snap_targets_db(provider_db) - provider = PolicyEngineUSDBTargetProvider(provider_db) + class EmploymentIncomeProvider: + def load_target_set(self, query=None): + _ = query + return [ + TargetSpec( + name="employment_income_before_lsr_total", + entity=EntityType.PERSON, + value=80_000.0, + period=2024, + measure="employment_income_before_lsr", + aggregation="sum", + ) + ] + + provider = EmploymentIncomeProvider() baseline_tables = _sample_tables() baseline_arrays = build_policyengine_us_time_period_arrays( baseline_tables, period=2024, - household_variable_map={"state_fips": "state_fips", "snap": "snap"}, - person_variable_map={"age": "age"}, + household_variable_map={"state_fips": "state_fips"}, + person_variable_map={ + "age": "age", + "employment_income_before_lsr": "employment_income_before_lsr", + }, ) baseline_path = tmp_path / "enhanced_cps_2024.h5" write_policyengine_us_time_period_dataset(baseline_arrays, baseline_path) base_candidate = _sample_tables() candidate_tables = PolicyEngineUSEntityTableBundle( - households=base_candidate.households.assign( - snap=np.array([80.0, 50.0]) + households=base_candidate.households, + persons=base_candidate.persons.assign( + employment_income_before_lsr=np.array([20_000.0, 0.0, 20_000.0]) ), - persons=base_candidate.persons, tax_units=base_candidate.tax_units, spm_units=base_candidate.spm_units, families=base_candidate.families, @@ -754,7 +837,10 @@ def test_compare_policyengine_us_target_query_to_baseline(tmp_path): report = compare_policyengine_us_target_query_to_baseline( candidate_tables, provider, - TargetQuery(period=2024, provider_filters={"variables": ["snap"]}), + TargetQuery( + period=2024, + provider_filters={"variables": ["employment_income_before_lsr"]}, + ), baseline_dataset=baseline_path, candidate_label="microplex", baseline_label="enhanced_cps", @@ -764,9 +850,9 @@ def test_compare_policyengine_us_target_query_to_baseline(tmp_path): assert report.candidate.label == "microplex" assert report.baseline is not None assert report.baseline.label == "enhanced_cps" - assert report.candidate.mean_abs_relative_error == pytest.approx(0.18) + assert report.candidate.mean_abs_relative_error == pytest.approx(0.25) assert report.baseline.mean_abs_relative_error == 0.0 - assert report.mean_abs_relative_error_delta == pytest.approx(0.18) + assert report.mean_abs_relative_error_delta == pytest.approx(0.25) def test_policyengine_us_comparison_report_uses_common_target_intersection(): diff --git a/tests/policyengine/test_harness.py b/tests/policyengine/test_harness.py index ea5ed2e..ea341b9 100644 --- a/tests/policyengine/test_harness.py +++ b/tests/policyengine/test_harness.py @@ -60,6 +60,7 @@ def _candidate_tables() -> PolicyEngineUSEntityTableBundle: "marital_unit_id": [7000, 7000, 8000], "age": [40.0, 10.0, 30.0], "employment_income": [30_000.0, 0.0, 20_000.0], + "employment_income_before_lsr": [30_000.0, 0.0, 20_000.0], } ), tax_units=pd.DataFrame( @@ -110,6 +111,7 @@ def _baseline_dataset(tmp_path: Path) -> Path: "marital_unit_id": [7000, 7000, 8000], "age": [40.0, 10.0, 30.0], "employment_income": [30_000.0, 0.0, 20_000.0], + "employment_income_before_lsr": [30_000.0, 0.0, 20_000.0], } ), tax_units=pd.DataFrame( @@ -142,7 +144,10 @@ def _baseline_dataset(tmp_path: Path) -> Path: tables, period=2024, household_variable_map={"state_fips": "state_fips", "snap": "snap"}, - person_variable_map={"age": "age", "employment_income": "employment_income"}, + person_variable_map={ + "age": "age", + "employment_income_before_lsr": "employment_income_before_lsr", + }, tax_unit_variable_map={"filing_status": "filing_status"}, ) dataset_path = tmp_path / "baseline.h5" @@ -163,11 +168,11 @@ def test_evaluate_policyengine_us_harness_scores_candidate_against_baseline(tmp_ filters=(TargetFilter("state_fips", FilterOperator.EQ, 6),), ), TargetSpec( - name="snap_total", - entity=EntityType.HOUSEHOLD, - value=250.0, + name="employment_income_before_lsr_total", + entity=EntityType.PERSON, + value=80_000.0, period=2024, - measure="snap", + measure="employment_income_before_lsr", aggregation="sum", ), ] @@ -180,9 +185,9 @@ def test_evaluate_policyengine_us_harness_scores_candidate_against_baseline(tmp_ query=TargetQuery(period=2024, names=("ca_households",)), ), PolicyEngineUSHarnessSlice( - name="snap", + name="employment_income_before_lsr", tags=("national", "programs"), - query=TargetQuery(period=2024, names=("snap_total",)), + query=TargetQuery(period=2024, names=("employment_income_before_lsr_total",)), ), ] @@ -433,7 +438,7 @@ def calculate(self, variable, period=None, map_to=None): assert map_to is None raise RuntimeError("snap materialization unavailable") - with pytest.raises(PolicyEngineUSMaterializationError, match="baseline"): + with pytest.raises(PolicyEngineUSMaterializationError, match="candidate"): evaluate_policyengine_us_harness( _candidate_tables(), provider, @@ -498,11 +503,11 @@ def test_evaluate_policyengine_us_harness_reuses_union_evaluation(tmp_path, monk filters=(TargetFilter("state_fips", FilterOperator.EQ, 6),), ), TargetSpec( - name="snap_total", - entity=EntityType.HOUSEHOLD, - value=250.0, + name="employment_income_before_lsr_total", + entity=EntityType.PERSON, + value=80_000.0, period=2024, - measure="snap", + measure="employment_income_before_lsr", aggregation="sum", ), ] @@ -532,8 +537,8 @@ def record_evaluate(*args, **kwargs): query=TargetQuery(period=2024, names=("ca_households",)), ), PolicyEngineUSHarnessSlice( - name="snap", - query=TargetQuery(period=2024, names=("snap_total",)), + name="employment_income_before_lsr", + query=TargetQuery(period=2024, names=("employment_income_before_lsr_total",)), ), ], baseline_dataset=_baseline_dataset(tmp_path), @@ -542,8 +547,8 @@ def record_evaluate(*args, **kwargs): assert run.slice_win_rate == 1.0 assert evaluate_calls == [ - ("ca_households", "snap_total"), - ("ca_households", "snap_total"), + ("ca_households", "employment_income_before_lsr_total"), + ("ca_households", "employment_income_before_lsr_total"), ] diff --git a/tests/policyengine/test_materialize_batched.py b/tests/policyengine/test_materialize_batched.py new file mode 100644 index 0000000..13c5c96 --- /dev/null +++ b/tests/policyengine/test_materialize_batched.py @@ -0,0 +1,254 @@ +"""Batched-materialize equivalence tests. + +Covers the batched path of :func:`materialize_policyengine_us_variables` +without spinning up a real PolicyEngine Microsimulation. A fake +``simulation_cls`` mimics the per-record-scalar semantics that +calibration targets actually use (each output is a function of the +calling chunk's own data, independent of other chunks). The test then +proves that running the function with ``batch_size=None`` and with a +sub-full batch size produces identical results. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any + +import numpy as np +import pandas as pd +import pytest +from microplex.core import EntityType + +from microplex_us.policyengine.us import ( + PolicyEngineUSEntityTableBundle, + materialize_policyengine_us_variables, +) + + +@dataclass +class FakeVariable: + """Stand-in for a PolicyEngine Variable metadata entry.""" + + name: str + entity: str # "household" | "person" | etc. + + +class FakeEntity: + def __init__(self, key: str) -> None: + self.key = key + + +class FakeTaxBenefitSystem: + """Enough of the TaxBenefitSystem interface to satisfy the materializer. + + The real resolver checks a variables registry + entity registry. The + fake returns hardcoded entries for the test's target variables. + """ + + def __init__(self, variables: dict[str, FakeVariable]) -> None: + self.variables = variables + self.entities = [FakeEntity(k) for k in ("person", "household", "tax_unit")] + + def get_variable(self, name: str) -> FakeVariable: + if name not in self.variables: + raise KeyError(name) + return self.variables[name] + + +class FakeSimulation: + """Fake Microsimulation that computes per-record values deterministically. + + Each variable's value is a pure function of a household-level input + column the fake reads from the provided dataset path. Writing a + real h5 would require the full PolicyEngine dataset machinery; for + the test we instead accept an in-memory ``dataset`` dict. + """ + + def __init__(self, dataset: str | None = None, **kwargs: Any) -> None: + # The real code writes an h5 and points the sim at its path; + # for this fake we pull the chunk arrays off ``_fake_chunk_data`` + # (set via the monkeypatch below). + chunk = getattr(FakeSimulation, "_fake_chunk_data", None) + if chunk is None: + raise RuntimeError( + "FakeSimulation needs _fake_chunk_data set by the test." + ) + self._hh = chunk["households"] + self.tax_benefit_system = FakeTaxBenefitSystem( + { + "doubled_base": FakeVariable(name="doubled_base", entity="household"), + "squared_base": FakeVariable(name="squared_base", entity="household"), + } + ) + + def calculate(self, variable: str, period: Any = None, map_to: Any = None): + # Pure per-record scalar; returns len(households) values. + base = self._hh["base_value"].to_numpy(dtype=float) + if variable == "doubled_base": + return base * 2.0 + if variable == "squared_base": + return base**2 + raise KeyError(variable) + + +@pytest.fixture +def fake_sim(monkeypatch): + """Register FakeSimulation as the simulation_cls and patch the + materializer's internal helpers so they accept our in-memory chunk.""" + # Patch the module-level resolver the materializer uses to look up + # the tax-benefit system. We monkey the whole pipeline rather than + # write a real h5. + from microplex_us.policyengine import us as us_module + + monkeypatch.setattr( + us_module, + "_resolve_policyengine_us_tax_benefit_system", + lambda simulation_cls=None: FakeTaxBenefitSystem( + { + "doubled_base": FakeVariable("doubled_base", "household"), + "squared_base": FakeVariable("squared_base", "household"), + } + ), + ) + monkeypatch.setattr( + us_module, + "build_policyengine_us_export_variable_maps", + lambda tables, **_: { + "household": {"base_value": "base_value"}, + "person": {}, + "tax_unit": {}, + "spm_unit": {}, + "family": {}, + }, + ) + monkeypatch.setattr( + us_module, + "resolve_policyengine_excluded_export_variables", + lambda *args, **kwargs: set(), + ) + + def _build_arrays(tables, **kwargs): + # The real function produces a period-keyed dict of arrays; we + # just stash the chunk on the fake class and ignore the output. + FakeSimulation._fake_chunk_data = { + "households": tables.households, + } + return {} + + monkeypatch.setattr( + us_module, + "build_policyengine_us_time_period_arrays", + _build_arrays, + ) + monkeypatch.setattr( + us_module, + "write_policyengine_us_time_period_dataset", + lambda *args, **kwargs: None, + ) + + # Patch the adapter factory to return our fake + from microplex_us.policyengine.us import ( + PolicyEngineUSMicrosimulationAdapter, + ) + + def _fake_from_dataset(*args, **kwargs): + return PolicyEngineUSMicrosimulationAdapter(simulation=FakeSimulation()) + + monkeypatch.setattr( + PolicyEngineUSMicrosimulationAdapter, + "from_dataset", + classmethod(lambda cls, *a, **k: _fake_from_dataset(*a, **k)), + ) + + # Patch variable_entity so the attach helper routes all variables + # to the household table. + monkeypatch.setattr( + PolicyEngineUSMicrosimulationAdapter, + "variable_entity", + lambda self, variable: EntityType.HOUSEHOLD, + ) + + +def _make_bundle(n: int = 50, seed: int = 0) -> PolicyEngineUSEntityTableBundle: + rng = np.random.default_rng(seed) + household_ids = np.arange(n) + 1 + households = pd.DataFrame( + { + "household_id": household_ids, + "base_value": rng.uniform(1, 10, size=n), + } + ) + persons = pd.DataFrame( + { + "household_id": household_ids, + "person_id": household_ids * 10, + } + ) + return PolicyEngineUSEntityTableBundle( + households=households, + persons=persons, + tax_units=None, + spm_units=None, + families=None, + marital_units=None, + ) + + +class TestBatchedMaterializeEquivalence: + """Batched output must equal single-pass output element-wise.""" + + def test_single_pass_vs_batched_equivalent(self, fake_sim) -> None: + tables = _make_bundle(n=50) + + full_tables, full_bindings = materialize_policyengine_us_variables( + tables, + variables=["doubled_base", "squared_base"], + period=2024, + batch_size=None, + ) + batched_tables, batched_bindings = materialize_policyengine_us_variables( + tables, + variables=["doubled_base", "squared_base"], + period=2024, + batch_size=10, # 5 chunks + ) + + pd.testing.assert_frame_equal( + full_tables.households.sort_values("household_id").reset_index(drop=True), + batched_tables.households.sort_values("household_id").reset_index(drop=True), + ) + assert set(full_bindings) == set(batched_bindings) + + def test_batch_size_larger_than_data_is_noop(self, fake_sim) -> None: + tables = _make_bundle(n=10) + full, _ = materialize_policyengine_us_variables( + tables, + variables=["doubled_base"], + period=2024, + batch_size=None, + ) + batched, _ = materialize_policyengine_us_variables( + tables, + variables=["doubled_base"], + period=2024, + batch_size=10_000, # > n=10 + ) + pd.testing.assert_frame_equal(full.households, batched.households) + + def test_uneven_batch_split(self, fake_sim) -> None: + """50 records with batch_size=17 → chunks of 17, 17, 16.""" + tables = _make_bundle(n=50) + batched, _ = materialize_policyengine_us_variables( + tables, + variables=["doubled_base"], + period=2024, + batch_size=17, + ) + assert len(batched.households) == 50 + # Values correct (doubled_base = 2 * base_value) + np.testing.assert_allclose( + batched.households["doubled_base"].to_numpy(), + 2.0 * batched.households["base_value"].to_numpy(), + rtol=0, + atol=0, + ) diff --git a/tests/policyengine/test_us.py b/tests/policyengine/test_us.py index 1a37e81..701899c 100644 --- a/tests/policyengine/test_us.py +++ b/tests/policyengine/test_us.py @@ -39,6 +39,7 @@ detect_policyengine_pseudo_inputs, materialize_policyengine_us_variables, materialize_policyengine_us_variables_safely, + policyengine_us_variables_to_materialize, project_frame_to_time_period_arrays, resolve_policyengine_excluded_export_variables, write_policyengine_us_time_period_dataset, @@ -1370,6 +1371,34 @@ def calculate(self, variable, period=None, map_to=None): np.testing.assert_allclose(constraints[0].coefficients, np.array([120.0, 0.0])) np.testing.assert_allclose(constraints[1].coefficients, np.array([1.0, 0.0])) + def test_variables_to_materialize_can_force_formula_outputs(self): + targets = [ + TargetSpec( + name="ssi", + entity=EntityType.PERSON, + value=100.0, + period=2024, + measure="ssi", + ) + ] + bindings = { + "ssi": PolicyEngineUSVariableBinding( + entity=EntityType.PERSON, + column="ssi", + ), + "employment_income": PolicyEngineUSVariableBinding( + entity=EntityType.PERSON, + column="employment_income", + ), + } + + assert policyengine_us_variables_to_materialize(targets, bindings) == set() + assert policyengine_us_variables_to_materialize( + targets, + bindings, + force_materialize_variables={"ssi"}, + ) == {"ssi"} + def test_materialization_supports_nested_system_attribute(self, tmp_path): households = pd.DataFrame( { @@ -2010,6 +2039,45 @@ class FakeSystem: assert export_maps["tax_unit"] == {"filing_status": "filing_status"} assert export_maps["spm_unit"] == {"snap": "snap"} + def test_build_policyengine_us_export_variable_maps_aliases_reported_social_security_retirement(self): + class FakeEntity: + def __init__(self, key): + self.key = key + + class FakeVariable: + def __init__(self, entity): + self.entity = FakeEntity(entity) + + class FakeSystem: + variables = { + "social_security_retirement_reported": FakeVariable("person"), + } + + tables = PolicyEngineUSEntityTableBundle( + households=pd.DataFrame( + { + "household_id": [10], + "household_weight": [1.0], + } + ), + persons=pd.DataFrame( + { + "person_id": [1], + "household_id": [10], + "social_security_retirement": [12_000.0], + } + ), + ) + + export_maps = build_policyengine_us_export_variable_maps( + tables, + tax_benefit_system=FakeSystem(), + ) + + assert export_maps["person"] == { + "social_security_retirement": "social_security_retirement_reported", + } + def test_default_policyengine_us_export_surface_avoids_formula_aggregates(self): from policyengine_us import CountryTaxBenefitSystem @@ -2034,6 +2102,8 @@ def test_default_policyengine_us_export_surface_avoids_formula_aggregates(self): assert "farm_operations_income" in SAFE_POLICYENGINE_US_EXPORT_VARIABLES assert "farm_rent_income" in SAFE_POLICYENGINE_US_EXPORT_VARIABLES assert "health_savings_account_ald" in SAFE_POLICYENGINE_US_EXPORT_VARIABLES + assert "social_security_retirement" not in SAFE_POLICYENGINE_US_EXPORT_VARIABLES + assert "social_security_retirement_reported" in SAFE_POLICYENGINE_US_EXPORT_VARIABLES assert "non_sch_d_capital_gains" not in SAFE_POLICYENGINE_US_EXPORT_VARIABLES assert "receives_wic" in SAFE_POLICYENGINE_US_EXPORT_VARIABLES assert "ssn_card_type" in SAFE_POLICYENGINE_US_EXPORT_VARIABLES diff --git a/tests/policyengine/test_us_pipeline_checkpoint.py b/tests/policyengine/test_us_pipeline_checkpoint.py new file mode 100644 index 0000000..4557995 --- /dev/null +++ b/tests/policyengine/test_us_pipeline_checkpoint.py @@ -0,0 +1,163 @@ +"""US pipeline checkpoint save/load tests. + +The pipeline takes ~11 hours to synthesize + impute + build PE tables +before calibration even starts. Then PE microsim materializes target +variables (~30 min) before calibration fits. If any later stage fails +(OOM, bad config, disk full, sparsity collapse), we want to iterate +without re-paying earlier work. + +``save_us_pipeline_checkpoint`` and ``load_us_pipeline_checkpoint`` +round-trip a ``PolicyEngineUSEntityTableBundle`` at a named pipeline +stage so a downstream rerun can resume from that point. + +These tests drive: + +1. Basic round-trip equivalence at each stage. +2. Partial bundles (some entity tables ``None``) round-trip correctly. +3. Metadata file is written alongside the parquet files and contains + enough info to validate the bundle (row counts, column names, stage). +4. Load from a missing path raises a clear error. +5. Save with invalid stage raises. +6. Loading with ``expected_stage`` mismatch raises. +7. Saving twice to the same path replaces the earlier snapshot. +""" + +from __future__ import annotations + +from pathlib import Path + +import numpy as np +import pandas as pd +import pytest + +from microplex_us.policyengine.us import ( + PolicyEngineUSEntityTableBundle, + load_us_pipeline_checkpoint, + save_us_pipeline_checkpoint, +) + + +def _make_bundle(n: int = 50, seed: int = 0) -> PolicyEngineUSEntityTableBundle: + rng = np.random.default_rng(seed) + household_ids = np.arange(n) + 1 + households = pd.DataFrame( + { + "household_id": household_ids, + "household_weight": rng.uniform(0.5, 2.0, size=n), + "state_fips": rng.integers(1, 57, size=n), + } + ) + persons = pd.DataFrame( + { + "person_id": household_ids * 10, + "household_id": household_ids, + "age": rng.integers(0, 85, size=n), + "employment_income": rng.uniform(0, 200_000, size=n), + } + ) + tax_units = pd.DataFrame( + { + "tax_unit_id": household_ids * 100, + "household_id": household_ids, + "filing_status": rng.choice(["SINGLE", "JOINT"], size=n), + } + ) + return PolicyEngineUSEntityTableBundle( + households=households, + persons=persons, + tax_units=tax_units, + spm_units=None, + families=None, + marital_units=None, + ) + + +class TestUSPipelineCheckpoint: + @pytest.mark.parametrize("stage", ["post_imputation", "post_microsim"]) + def test_full_roundtrip_equivalent(self, tmp_path: Path, stage: str) -> None: + bundle = _make_bundle(n=100) + save_us_pipeline_checkpoint(bundle, tmp_path / "checkpoint", stage=stage) + loaded, metadata = load_us_pipeline_checkpoint(tmp_path / "checkpoint") + + pd.testing.assert_frame_equal(loaded.households, bundle.households) + pd.testing.assert_frame_equal(loaded.persons, bundle.persons) + pd.testing.assert_frame_equal(loaded.tax_units, bundle.tax_units) + assert loaded.spm_units is None + assert loaded.families is None + assert loaded.marital_units is None + assert metadata["stage"] == stage + + def test_partial_bundle_roundtrip(self, tmp_path: Path) -> None: + """A households-only bundle (no other entity tables) round-trips.""" + households = pd.DataFrame( + {"household_id": [1, 2, 3], "household_weight": [1.0, 2.0, 3.0]} + ) + bundle = PolicyEngineUSEntityTableBundle( + households=households, + persons=None, + tax_units=None, + spm_units=None, + families=None, + marital_units=None, + ) + save_us_pipeline_checkpoint( + bundle, tmp_path / "checkpoint", stage="post_imputation" + ) + loaded, _ = load_us_pipeline_checkpoint(tmp_path / "checkpoint") + + pd.testing.assert_frame_equal(loaded.households, bundle.households) + assert loaded.persons is None + assert loaded.tax_units is None + + def test_metadata_written_with_row_counts(self, tmp_path: Path) -> None: + bundle = _make_bundle(n=75) + save_us_pipeline_checkpoint( + bundle, tmp_path / "checkpoint", stage="post_microsim" + ) + + metadata_path = tmp_path / "checkpoint" / "metadata.json" + assert metadata_path.exists() + + import json + + metadata = json.loads(metadata_path.read_text()) + assert metadata["stage"] == "post_microsim" + assert metadata["households"]["rows"] == 75 + assert "household_id" in metadata["households"]["columns"] + assert metadata["persons"]["rows"] == 75 + assert metadata["tax_units"]["rows"] == 75 + assert metadata["spm_units"] is None + + def test_load_missing_path_raises(self, tmp_path: Path) -> None: + with pytest.raises(FileNotFoundError, match="US pipeline checkpoint"): + load_us_pipeline_checkpoint(tmp_path / "does_not_exist") + + def test_save_with_invalid_stage_raises(self, tmp_path: Path) -> None: + bundle = _make_bundle(n=5) + with pytest.raises(ValueError, match="stage must be one of"): + save_us_pipeline_checkpoint(bundle, tmp_path / "checkpoint", stage="bogus") # type: ignore[arg-type] + + def test_load_with_stage_mismatch_raises(self, tmp_path: Path) -> None: + bundle = _make_bundle(n=5) + save_us_pipeline_checkpoint( + bundle, tmp_path / "checkpoint", stage="post_imputation" + ) + with pytest.raises(ValueError, match="expected 'post_microsim'"): + load_us_pipeline_checkpoint( + tmp_path / "checkpoint", expected_stage="post_microsim" + ) + + def test_save_overwrites_existing(self, tmp_path: Path) -> None: + first = _make_bundle(n=10, seed=0) + second = _make_bundle(n=20, seed=1) + + save_us_pipeline_checkpoint( + first, tmp_path / "checkpoint", stage="post_imputation" + ) + save_us_pipeline_checkpoint( + second, tmp_path / "checkpoint", stage="post_imputation" + ) + + loaded, _ = load_us_pipeline_checkpoint(tmp_path / "checkpoint") + assert len(loaded.households) == 20 + pd.testing.assert_frame_equal(loaded.households, second.households) diff --git a/tests/test_geography.py b/tests/test_geography.py index 7e82eda..e11dba6 100644 --- a/tests/test_geography.py +++ b/tests/test_geography.py @@ -40,6 +40,14 @@ def _sample_block_table() -> pd.DataFrame: ) +def test_core_block_geography_proxy_supports_isinstance() -> None: + from microplex.geography import BlockGeography as CoreBlockGeography + + geography = BlockGeography.from_data(_sample_block_table()) + + assert isinstance(geography, CoreBlockGeography) + + class TestGEOIDConstants: def test_state_len(self) -> None: assert STATE_LEN == 2 diff --git a/tests/test_hierarchical_block_assignment.py b/tests/test_hierarchical_block_assignment.py index 9206b0f..f027054 100644 --- a/tests/test_hierarchical_block_assignment.py +++ b/tests/test_hierarchical_block_assignment.py @@ -78,6 +78,26 @@ def test_init_with_cd_probabilities_backward_compat( assert synthesizer.geography_assignment.atomic_id_column == "cd_id" assert synthesizer._geography_assigner is not None + def test_cd_probabilities_allow_state_local_district_ids(self) -> None: + cd_probs = pd.DataFrame( + { + "state_fips": [6, 6, 36, 36], + "cd_id": [1, 2, 1, 2], + "prob": [0.6, 0.4, 0.5, 0.5], + } + ) + households = pd.DataFrame({"state_fips": [6, 36]}) + synthesizer = HierarchicalSynthesizer( + cd_probabilities=cd_probs, + random_state=123, + ) + + result = synthesizer._apply_geography_assignment(households) + + assert "_microplex_cd_atomic_id" not in result.columns + assert result["state_fips"].tolist() == [6, 36] + assert result["cd_id"].isin([1, 2]).all() + def test_block_probabilities_take_precedence( self, sample_block_probs: pd.DataFrame, diff --git a/tests/test_puf_source_provider.py b/tests/test_puf_source_provider.py index 895a5ab..b4d975b 100644 --- a/tests/test_puf_source_provider.py +++ b/tests/test_puf_source_provider.py @@ -706,6 +706,7 @@ def test_puf_source_provider_maps_policyengine_medical_and_alimony_inputs(tmp_pa puf_path=puf_path, demographics_path=demographics_path, target_year=2015, + social_security_share_model_loader=_mock_social_security_share_model_loader, ) frame = provider.load_frame(SourceQuery(period=2015)) persons = frame.tables[EntityType.PERSON] diff --git a/tests/validation/test_downstream.py b/tests/validation/test_downstream.py new file mode 100644 index 0000000..6f17873 --- /dev/null +++ b/tests/validation/test_downstream.py @@ -0,0 +1,216 @@ +"""Downstream tax-benefit aggregate validation (B2). + +After calibration, the synthesized microdata is ingested by +``policyengine_us.Microsimulation``. This module computes a canonical +set of downstream aggregates — federal income tax, EITC, CTC, SNAP, +SSI, ACA PTC — and compares them against external benchmarks (IRS +SOI, USDA, SSA, CMS). The comparison is the validation a tax-microsim +reviewer actually wants: not whether input targets were hit, but +whether the downstream policy outputs computed on the synthetic frame +look like the real-world outputs. + +These tests drive: + +1. ``DownstreamBenchmark`` is a typed record for one + external-benchmark comparison (name, computed, benchmark, source, + unit). +2. ``compute_downstream_comparison`` returns a dict of benchmark + name → ``DownstreamBenchmark`` with absolute and relative errors. +3. The module's canonical benchmark set for 2024 includes the six + required headline aggregates. +4. Relative error is signed (computed − benchmark) / benchmark. +5. A benchmark record round-trips to JSON. +""" + +from __future__ import annotations + +import json +import sys +from pathlib import Path +from types import ModuleType, SimpleNamespace + +import pytest + +from microplex_us.validation.downstream import ( + DOWNSTREAM_BENCHMARKS_2024, + DownstreamBenchmark, + compute_downstream_aggregates, + compute_downstream_comparison, + compute_downstream_weighted_aggregate, +) + + +class TestDownstreamBenchmark: + def test_benchmark_record_fields(self) -> None: + record = DownstreamBenchmark( + name="eitc", + computed=65_000_000_000.0, + benchmark=64_000_000_000.0, + unit="USD", + source="IRS SOI 2024", + ) + assert record.abs_error == pytest.approx(1_000_000_000.0) + assert record.rel_error == pytest.approx(1_000_000_000.0 / 64_000_000_000.0) + + def test_benchmark_record_serializes_to_json(self) -> None: + record = DownstreamBenchmark( + name="snap", + computed=100.0, + benchmark=110.0, + unit="USD", + source="USDA 2024", + ) + as_json = json.loads(json.dumps(record.to_dict())) + assert as_json["name"] == "snap" + assert as_json["computed"] == 100.0 + assert as_json["benchmark"] == 110.0 + assert as_json["rel_error"] == pytest.approx(-10.0 / 110.0) + + def test_benchmark_zero_benchmark_returns_none_rel(self) -> None: + """Guard against divide-by-zero in report generation.""" + record = DownstreamBenchmark( + name="zero", + computed=5.0, + benchmark=0.0, + unit="USD", + source="test", + ) + assert record.rel_error is None + + +class TestDownstreamBenchmarksSet: + def test_2024_benchmark_set_covers_headline_aggregates(self) -> None: + names = {b.name for b in DOWNSTREAM_BENCHMARKS_2024} + assert names >= {"income_tax", "eitc", "ctc", "snap", "ssi", "aca_ptc"} + + def test_2024_benchmarks_have_sources_cited(self) -> None: + """No magic numbers — each benchmark must declare its source.""" + for benchmark in DOWNSTREAM_BENCHMARKS_2024: + assert benchmark.source, f"missing source on {benchmark.name}" + assert benchmark.benchmark > 0, f"non-positive benchmark on {benchmark.name}" + + +class TestComputeDownstreamComparison: + def test_compute_from_aggregates_dict(self) -> None: + """The pure comparison step: given computed numbers, wrap them + with their benchmarks and errors. No PE-sim needed. + """ + computed = { + "income_tax": 2_300_000_000_000.0, + "eitc": 64_000_000_000.0, + "ctc": 115_000_000_000.0, + "snap": 98_000_000_000.0, + "ssi": 66_000_000_000.0, + "aca_ptc": 55_000_000_000.0, + } + result = compute_downstream_comparison(computed, DOWNSTREAM_BENCHMARKS_2024) + + assert set(result) == set(computed) + eitc = result["eitc"] + assert eitc.computed == 64_000_000_000.0 + assert eitc.benchmark > 0 + assert abs(eitc.rel_error) < 0.2, "EITC computed ~ benchmark" + assert eitc.source + + def test_compute_skips_missing_variables(self) -> None: + """If a variable doesn't have a benchmark, it's silently omitted.""" + computed = {"not_a_benchmark_name": 1.0, "eitc": 60_000_000_000.0} + result = compute_downstream_comparison(computed, DOWNSTREAM_BENCHMARKS_2024) + assert "not_a_benchmark_name" not in result + assert "eitc" in result + + +class TestComputeDownstreamAggregates: + @staticmethod + def _fake_simulation( + *, + values: dict[str, list[float]], + entities: dict[str, str], + ): + class FakeMicrosimulation: + def __init__(self, dataset: str = "fake.h5") -> None: + self.dataset = dataset + self.tax_benefit_system = SimpleNamespace( + get_variable=lambda name: SimpleNamespace( + entity=SimpleNamespace(key=entities[name]) + ) + ) + + def calculate(self, variable: str, period: int): + assert period == 2024 + return SimpleNamespace(values=values[variable]) + + return FakeMicrosimulation() + + def test_uses_entity_weights_for_weighted_totals( + self, + monkeypatch: pytest.MonkeyPatch, + tmp_path: Path, + ) -> None: + class FakeMicrosimulation: + def __init__(self, dataset: str) -> None: + self.dataset = dataset + self.tax_benefit_system = SimpleNamespace( + get_variable=lambda name: SimpleNamespace( + entity=SimpleNamespace( + key={ + "eitc": "tax_unit", + "snap": "spm_unit", + "ssi": "person", + }[name] + ) + ) + ) + + def calculate(self, variable: str, period: int): + assert period == 2024 + values = { + "eitc": [10.0, 20.0], + "tax_unit_weight": [100.0, 200.0], + "snap": [1.0, 2.0, 3.0], + "spm_unit_weight": [10.0, 20.0, 30.0], + "ssi": [7.0, 11.0], + "person_weight": [2.0, 3.0], + } + return SimpleNamespace(sum=lambda: sum(values[variable]), values=values[variable]) + + fake_module = ModuleType("policyengine_us") + fake_module.Microsimulation = FakeMicrosimulation + monkeypatch.setitem(sys.modules, "policyengine_us", fake_module) + + aggregates = compute_downstream_aggregates( + tmp_path / "fake.h5", + period=2024, + variables=("eitc", "snap", "ssi"), + ) + + assert aggregates["eitc"] == pytest.approx(10.0 * 100.0 + 20.0 * 200.0) + assert aggregates["snap"] == pytest.approx( + 1.0 * 10.0 + 2.0 * 20.0 + 3.0 * 30.0 + ) + assert aggregates["ssi"] == pytest.approx(7.0 * 2.0 + 11.0 * 3.0) + + def test_weighted_aggregate_rejects_unsupported_entity(self) -> None: + simulation = self._fake_simulation( + values={"odd_output": [1.0, 2.0]}, + entities={"odd_output": "benefit_unit"}, + ) + + with pytest.raises(ValueError, match="Unsupported entity"): + compute_downstream_weighted_aggregate( + simulation, + "odd_output", + period=2024, + ) + + def test_weighted_aggregate_rejects_value_weight_length_mismatch(self) -> None: + simulation = self._fake_simulation( + values={ + "eitc": [10.0, 20.0, 30.0], + "tax_unit_weight": [100.0, 200.0], + }, + entities={"eitc": "tax_unit"}, + ) + + with pytest.raises(ValueError, match="does not match"): + compute_downstream_weighted_aggregate(simulation, "eitc", period=2024) diff --git a/tests/validation/test_run_b2_batched.py b/tests/validation/test_run_b2_batched.py new file mode 100644 index 0000000..f59069f --- /dev/null +++ b/tests/validation/test_run_b2_batched.py @@ -0,0 +1,89 @@ +from __future__ import annotations + +import importlib.util +from pathlib import Path + +import h5py +import numpy as np +import pytest + + +def _load_run_b2_batched_module(): + script_path = ( + Path(__file__).resolve().parents[2] / "scripts" / "run_b2_batched.py" + ) + spec = importlib.util.spec_from_file_location("run_b2_batched", script_path) + assert spec is not None and spec.loader is not None + module = importlib.util.module_from_spec(spec) + spec.loader.exec_module(module) + return module + + +class TestRunB2BatchedEntityResolution: + def test_prefers_policyengine_metadata_over_length_match(self) -> None: + module = _load_run_b2_batched_module() + arrays = { + "household_id": np.array([1, 2, 3]), + "tax_unit_id": np.array([10, 20, 30]), + "some_tax_unit_var": np.array([100.0, 200.0, 300.0]), + } + + entity = module._entity_of( + "some_tax_unit_var", + arrays, + variable_entities={"some_tax_unit_var": "tax_unit"}, + ) + + assert entity == "tax_unit" + + def test_ambiguous_length_match_raises_without_metadata(self) -> None: + module = _load_run_b2_batched_module() + arrays = { + "household_id": np.array([1, 2, 3]), + "tax_unit_id": np.array([10, 20, 30]), + "ambiguous_var": np.array([100.0, 200.0, 300.0]), + } + + with pytest.raises(ValueError, match="Ambiguous entity for variable"): + module._entity_of("ambiguous_var", arrays) + + def test_write_chunk_h5_slices_mixed_entities( + self, + tmp_path: Path, + ) -> None: + module = _load_run_b2_batched_module() + arrays = { + "household_id": np.array([1, 2]), + "household_weight": np.array([100.0, 200.0]), + "person_id": np.array([10, 11, 20]), + "person_household_id": np.array([1, 1, 2]), + "tax_unit_id": np.array([100, 200]), + "person_tax_unit_id": np.array([100, 100, 200]), + "tax_unit_weight": np.array([100.0, 200.0]), + "household_output": np.array([1.0, 2.0]), + "person_output": np.array([3.0, 4.0, 5.0]), + "tax_unit_output": np.array([6.0, 7.0]), + } + masks = module._build_entity_masks(arrays, np.array([1])) + output_path = tmp_path / "chunk.h5" + + module._write_chunk_h5( + arrays, + masks, + "2024", + output_path, + variable_entities={ + "household_output": "household", + "person_output": "person", + "tax_unit_output": "tax_unit", + }, + ) + + with h5py.File(output_path, "r") as handle: + assert handle["household_id"]["2024"][:].tolist() == [1] + assert handle["person_id"]["2024"][:].tolist() == [10, 11] + assert handle["tax_unit_id"]["2024"][:].tolist() == [100] + assert handle["household_output"]["2024"][:].tolist() == [1.0] + assert handle["person_output"]["2024"][:].tolist() == [3.0, 4.0] + assert handle["tax_unit_output"]["2024"][:].tolist() == [6.0] + assert handle["tax_unit_weight"]["2024"][:].tolist() == [100.0] diff --git a/uv.lock b/uv.lock index bd1d8d4..9bf745a 100644 --- a/uv.lock +++ b/uv.lock @@ -1,17 +1,13 @@ version = 1 revision = 3 -requires-python = ">=3.10" +requires-python = ">=3.13" resolution-markers = [ "python_full_version >= '3.14' and sys_platform == 'win32'", "python_full_version >= '3.14' and sys_platform == 'emscripten'", "python_full_version >= '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'win32'", - "python_full_version == '3.11.*' and sys_platform == 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'emscripten'", - "python_full_version == '3.11.*' and sys_platform == 'emscripten'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version == '3.11.*' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version < '3.11'", + "python_full_version < '3.14' and sys_platform == 'win32'", + "python_full_version < '3.14' and sys_platform == 'emscripten'", + "python_full_version < '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", ] [[package]] @@ -19,9 +15,9 @@ name = "alembic" version = "1.18.4" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "mako", marker = "python_full_version >= '3.12'" }, - { name = "sqlalchemy", marker = "python_full_version >= '3.12'" }, - { name = "typing-extensions", marker = "python_full_version >= '3.12'" }, + { name = "mako" }, + { name = "sqlalchemy" }, + { name = "typing-extensions" }, ] sdist = { url = "https://files.pythonhosted.org/packages/94/13/8b084e0f2efb0275a1d534838844926f798bd766566b1375174e2448cd31/alembic-1.18.4.tar.gz", hash = "sha256:cb6e1fd84b6174ab8dbb2329f86d631ba9559dd78df550b57804d607672cedbc", size = 2056725, upload-time = "2026-02-10T16:00:47.195Z" } wheels = [ @@ -51,9 +47,7 @@ name = "anyio" version = "4.13.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "exceptiongroup", marker = "python_full_version < '3.11'" }, { name = "idna" }, - { name = "typing-extensions", marker = "python_full_version < '3.13'" }, ] sdist = { url = "https://files.pythonhosted.org/packages/19/14/2c5dd9f512b66549ae92767a9c7b330ae88e1932ca57876909410251fe13/anyio-4.13.0.tar.gz", hash = "sha256:334b70e641fd2221c1505b3890c69882fe4a2df910cba14d97019b90b24439dc", size = 231622, upload-time = "2026-03-24T12:59:09.671Z" } wheels = [ @@ -84,54 +78,6 @@ version = "3.4.6" source = { registry = "https://pypi.org/simple" } sdist = { url = "https://files.pythonhosted.org/packages/7b/60/e3bec1881450851b087e301bedc3daa9377a4d45f1c26aa90b0b235e38aa/charset_normalizer-3.4.6.tar.gz", hash = "sha256:1ae6b62897110aa7c79ea2f5dd38d1abca6db663687c0b1ad9aed6f6bae3d9d6", size = 143363, upload-time = "2026-03-15T18:53:25.478Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/e6/8c/2c56124c6dc53a774d435f985b5973bc592f42d437be58c0c92d65ae7296/charset_normalizer-3.4.6-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:2e1d8ca8611099001949d1cdfaefc510cf0f212484fe7c565f735b68c78c3c95", size = 298751, upload-time = "2026-03-15T18:50:00.003Z" }, - { url = "https://files.pythonhosted.org/packages/86/2a/2a7db6b314b966a3bcad8c731c0719c60b931b931de7ae9f34b2839289ee/charset_normalizer-3.4.6-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e25369dc110d58ddf29b949377a93e0716d72a24f62bad72b2b39f155949c1fd", size = 200027, upload-time = "2026-03-15T18:50:01.702Z" }, - { url = "https://files.pythonhosted.org/packages/68/f2/0fe775c74ae25e2a3b07b01538fc162737b3e3f795bada3bc26f4d4d495c/charset_normalizer-3.4.6-cp310-cp310-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:259695e2ccc253feb2a016303543d691825e920917e31f894ca1a687982b1de4", size = 220741, upload-time = "2026-03-15T18:50:03.194Z" }, - { url = "https://files.pythonhosted.org/packages/10/98/8085596e41f00b27dd6aa1e68413d1ddda7e605f34dd546833c61fddd709/charset_normalizer-3.4.6-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:dda86aba335c902b6149a02a55b38e96287157e609200811837678214ba2b1db", size = 215802, upload-time = "2026-03-15T18:50:05.859Z" }, - { url = "https://files.pythonhosted.org/packages/fd/ce/865e4e09b041bad659d682bbd98b47fb490b8e124f9398c9448065f64fee/charset_normalizer-3.4.6-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:51fb3c322c81d20567019778cb5a4a6f2dc1c200b886bc0d636238e364848c89", size = 207908, upload-time = "2026-03-15T18:50:07.676Z" }, - { url = "https://files.pythonhosted.org/packages/a8/54/8c757f1f7349262898c2f169e0d562b39dcb977503f18fdf0814e923db78/charset_normalizer-3.4.6-cp310-cp310-manylinux_2_31_armv7l.whl", hash = "sha256:4482481cb0572180b6fd976a4d5c72a30263e98564da68b86ec91f0fe35e8565", size = 194357, upload-time = "2026-03-15T18:50:09.327Z" }, - { url = "https://files.pythonhosted.org/packages/6f/29/e88f2fac9218907fc7a70722b393d1bbe8334c61fe9c46640dba349b6e66/charset_normalizer-3.4.6-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:39f5068d35621da2881271e5c3205125cc456f54e9030d3f723288c873a71bf9", size = 205610, upload-time = "2026-03-15T18:50:10.732Z" }, - { url = "https://files.pythonhosted.org/packages/4c/c5/21d7bb0cb415287178450171d130bed9d664211fdd59731ed2c34267b07d/charset_normalizer-3.4.6-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:8bea55c4eef25b0b19a0337dc4e3f9a15b00d569c77211fa8cde38684f234fb7", size = 203512, upload-time = "2026-03-15T18:50:12.535Z" }, - { url = "https://files.pythonhosted.org/packages/a4/be/ce52f3c7fdb35cc987ad38a53ebcef52eec498f4fb6c66ecfe62cfe57ba2/charset_normalizer-3.4.6-cp310-cp310-musllinux_1_2_armv7l.whl", hash = "sha256:f0cdaecd4c953bfae0b6bb64910aaaca5a424ad9c72d85cb88417bb9814f7550", size = 195398, upload-time = "2026-03-15T18:50:14.236Z" }, - { url = "https://files.pythonhosted.org/packages/81/a0/3ab5dd39d4859a3555e5dadfc8a9fa7f8352f8c183d1a65c90264517da0e/charset_normalizer-3.4.6-cp310-cp310-musllinux_1_2_ppc64le.whl", hash = "sha256:150b8ce8e830eb7ccb029ec9ca36022f756986aaaa7956aad6d9ec90089338c0", size = 221772, upload-time = "2026-03-15T18:50:15.581Z" }, - { url = "https://files.pythonhosted.org/packages/04/6e/6a4e41a97ba6b2fa87f849c41e4d229449a586be85053c4d90135fe82d26/charset_normalizer-3.4.6-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:e68c14b04827dd76dcbd1aeea9e604e3e4b78322d8faf2f8132c7138efa340a8", size = 205759, upload-time = "2026-03-15T18:50:17.047Z" }, - { url = "https://files.pythonhosted.org/packages/db/3b/34a712a5ee64a6957bf355b01dc17b12de457638d436fdb05d01e463cd1c/charset_normalizer-3.4.6-cp310-cp310-musllinux_1_2_s390x.whl", hash = "sha256:3778fd7d7cd04ae8f54651f4a7a0bd6e39a0cf20f801720a4c21d80e9b7ad6b0", size = 216938, upload-time = "2026-03-15T18:50:18.44Z" }, - { url = "https://files.pythonhosted.org/packages/cb/05/5bd1e12da9ab18790af05c61aafd01a60f489778179b621ac2a305243c62/charset_normalizer-3.4.6-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:dad6e0f2e481fffdcf776d10ebee25e0ef89f16d691f1e5dee4b586375fdc64b", size = 210138, upload-time = "2026-03-15T18:50:19.852Z" }, - { url = "https://files.pythonhosted.org/packages/bd/8e/3cb9e2d998ff6b21c0a1860343cb7b83eba9cdb66b91410e18fc4969d6ab/charset_normalizer-3.4.6-cp310-cp310-win32.whl", hash = "sha256:74a2e659c7ecbc73562e2a15e05039f1e22c75b7c7618b4b574a3ea9118d1557", size = 144137, upload-time = "2026-03-15T18:50:21.505Z" }, - { url = "https://files.pythonhosted.org/packages/d8/8f/78f5489ffadb0db3eb7aff53d31c24531d33eb545f0c6f6567c25f49a5ff/charset_normalizer-3.4.6-cp310-cp310-win_amd64.whl", hash = "sha256:aa9cccf4a44b9b62d8ba8b4dd06c649ba683e4bf04eea606d2e94cfc2d6ff4d6", size = 154244, upload-time = "2026-03-15T18:50:22.81Z" }, - { url = "https://files.pythonhosted.org/packages/e4/74/e472659dffb0cadb2f411282d2d76c60da1fc94076d7fffed4ae8a93ec01/charset_normalizer-3.4.6-cp310-cp310-win_arm64.whl", hash = "sha256:e985a16ff513596f217cee86c21371b8cd011c0f6f056d0920aa2d926c544058", size = 143312, upload-time = "2026-03-15T18:50:24.074Z" }, - { url = "https://files.pythonhosted.org/packages/62/28/ff6f234e628a2de61c458be2779cb182bc03f6eec12200d4a525bbfc9741/charset_normalizer-3.4.6-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:82060f995ab5003a2d6e0f4ad29065b7672b6593c8c63559beefe5b443242c3e", size = 293582, upload-time = "2026-03-15T18:50:25.454Z" }, - { url = "https://files.pythonhosted.org/packages/1c/b7/b1a117e5385cbdb3205f6055403c2a2a220c5ea80b8716c324eaf75c5c95/charset_normalizer-3.4.6-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:60c74963d8350241a79cb8feea80e54d518f72c26db618862a8f53e5023deaf9", size = 197240, upload-time = "2026-03-15T18:50:27.196Z" }, - { url = "https://files.pythonhosted.org/packages/a1/5f/2574f0f09f3c3bc1b2f992e20bce6546cb1f17e111c5be07308dc5427956/charset_normalizer-3.4.6-cp311-cp311-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:f6e4333fb15c83f7d1482a76d45a0818897b3d33f00efd215528ff7c51b8e35d", size = 217363, upload-time = "2026-03-15T18:50:28.601Z" }, - { url = "https://files.pythonhosted.org/packages/4a/d1/0ae20ad77bc949ddd39b51bf383b6ca932f2916074c95cad34ae465ab71f/charset_normalizer-3.4.6-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:bc72863f4d9aba2e8fd9085e63548a324ba706d2ea2c83b260da08a59b9482de", size = 212994, upload-time = "2026-03-15T18:50:30.102Z" }, - { url = "https://files.pythonhosted.org/packages/60/ac/3233d262a310c1b12633536a07cde5ddd16985e6e7e238e9f3f9423d8eb9/charset_normalizer-3.4.6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9cc4fc6c196d6a8b76629a70ddfcd4635a6898756e2d9cac5565cf0654605d73", size = 204697, upload-time = "2026-03-15T18:50:31.654Z" }, - { url = "https://files.pythonhosted.org/packages/25/3c/8a18fc411f085b82303cfb7154eed5bd49c77035eb7608d049468b53f87c/charset_normalizer-3.4.6-cp311-cp311-manylinux_2_31_armv7l.whl", hash = "sha256:0c173ce3a681f309f31b87125fecec7a5d1347261ea11ebbb856fa6006b23c8c", size = 191673, upload-time = "2026-03-15T18:50:33.433Z" }, - { url = "https://files.pythonhosted.org/packages/ff/a7/11cfe61d6c5c5c7438d6ba40919d0306ed83c9ab957f3d4da2277ff67836/charset_normalizer-3.4.6-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c907cdc8109f6c619e6254212e794d6548373cc40e1ec75e6e3823d9135d29cc", size = 201120, upload-time = "2026-03-15T18:50:35.105Z" }, - { url = "https://files.pythonhosted.org/packages/b5/10/cf491fa1abd47c02f69687046b896c950b92b6cd7337a27e6548adbec8e4/charset_normalizer-3.4.6-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:404a1e552cf5b675a87f0651f8b79f5f1e6fd100ee88dc612f89aa16abd4486f", size = 200911, upload-time = "2026-03-15T18:50:36.819Z" }, - { url = "https://files.pythonhosted.org/packages/28/70/039796160b48b18ed466fde0af84c1b090c4e288fae26cd674ad04a2d703/charset_normalizer-3.4.6-cp311-cp311-musllinux_1_2_armv7l.whl", hash = "sha256:e3c701e954abf6fc03a49f7c579cc80c2c6cc52525340ca3186c41d3f33482ef", size = 192516, upload-time = "2026-03-15T18:50:38.228Z" }, - { url = "https://files.pythonhosted.org/packages/ff/34/c56f3223393d6ff3124b9e78f7de738047c2d6bc40a4f16ac0c9d7a1cb3c/charset_normalizer-3.4.6-cp311-cp311-musllinux_1_2_ppc64le.whl", hash = "sha256:7a6967aaf043bceabab5412ed6bd6bd26603dae84d5cb75bf8d9a74a4959d398", size = 218795, upload-time = "2026-03-15T18:50:39.664Z" }, - { url = "https://files.pythonhosted.org/packages/e8/3b/ce2d4f86c5282191a041fdc5a4ce18f1c6bd40a5bd1f74cf8625f08d51c1/charset_normalizer-3.4.6-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:5feb91325bbceade6afab43eb3b508c63ee53579fe896c77137ded51c6b6958e", size = 201833, upload-time = "2026-03-15T18:50:41.552Z" }, - { url = "https://files.pythonhosted.org/packages/3b/9b/b6a9f76b0fd7c5b5ec58b228ff7e85095370282150f0bd50b3126f5506d6/charset_normalizer-3.4.6-cp311-cp311-musllinux_1_2_s390x.whl", hash = "sha256:f820f24b09e3e779fe84c3c456cb4108a7aa639b0d1f02c28046e11bfcd088ed", size = 213920, upload-time = "2026-03-15T18:50:43.33Z" }, - { url = "https://files.pythonhosted.org/packages/ae/98/7bc23513a33d8172365ed30ee3a3b3fe1ece14a395e5fc94129541fc6003/charset_normalizer-3.4.6-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:b35b200d6a71b9839a46b9b7fff66b6638bb52fc9658aa58796b0326595d3021", size = 206951, upload-time = "2026-03-15T18:50:44.789Z" }, - { url = "https://files.pythonhosted.org/packages/32/73/c0b86f3d1458468e11aec870e6b3feac931facbe105a894b552b0e518e79/charset_normalizer-3.4.6-cp311-cp311-win32.whl", hash = "sha256:9ca4c0b502ab399ef89248a2c84c54954f77a070f28e546a85e91da627d1301e", size = 143703, upload-time = "2026-03-15T18:50:46.103Z" }, - { url = "https://files.pythonhosted.org/packages/c6/e3/76f2facfe8eddee0bbd38d2594e709033338eae44ebf1738bcefe0a06185/charset_normalizer-3.4.6-cp311-cp311-win_amd64.whl", hash = "sha256:a9e68c9d88823b274cf1e72f28cb5dc89c990edf430b0bfd3e2fb0785bfeabf4", size = 153857, upload-time = "2026-03-15T18:50:47.563Z" }, - { url = "https://files.pythonhosted.org/packages/e2/dc/9abe19c9b27e6cd3636036b9d1b387b78c40dedbf0b47f9366737684b4b0/charset_normalizer-3.4.6-cp311-cp311-win_arm64.whl", hash = "sha256:97d0235baafca5f2b09cf332cc275f021e694e8362c6bb9c96fc9a0eb74fc316", size = 142751, upload-time = "2026-03-15T18:50:49.234Z" }, - { url = "https://files.pythonhosted.org/packages/e5/62/c0815c992c9545347aeea7859b50dc9044d147e2e7278329c6e02ac9a616/charset_normalizer-3.4.6-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:2ef7fedc7a6ecbe99969cd09632516738a97eeb8bd7258bf8a0f23114c057dab", size = 295154, upload-time = "2026-03-15T18:50:50.88Z" }, - { url = "https://files.pythonhosted.org/packages/a8/37/bdca6613c2e3c58c7421891d80cc3efa1d32e882f7c4a7ee6039c3fc951a/charset_normalizer-3.4.6-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a4ea868bc28109052790eb2b52a9ab33f3aa7adc02f96673526ff47419490e21", size = 199191, upload-time = "2026-03-15T18:50:52.658Z" }, - { url = "https://files.pythonhosted.org/packages/6c/92/9934d1bbd69f7f398b38c5dae1cbf9cc672e7c34a4adf7b17c0a9c17d15d/charset_normalizer-3.4.6-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:836ab36280f21fc1a03c99cd05c6b7af70d2697e374c7af0b61ed271401a72a2", size = 218674, upload-time = "2026-03-15T18:50:54.102Z" }, - { url = "https://files.pythonhosted.org/packages/af/90/25f6ab406659286be929fd89ab0e78e38aa183fc374e03aa3c12d730af8a/charset_normalizer-3.4.6-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:f1ce721c8a7dfec21fcbdfe04e8f68174183cf4e8188e0645e92aa23985c57ff", size = 215259, upload-time = "2026-03-15T18:50:55.616Z" }, - { url = "https://files.pythonhosted.org/packages/4e/ef/79a463eb0fff7f96afa04c1d4c51f8fc85426f918db467854bfb6a569ce3/charset_normalizer-3.4.6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0e28d62a8fc7a1fa411c43bd65e346f3bce9716dc51b897fbe930c5987b402d5", size = 207276, upload-time = "2026-03-15T18:50:57.054Z" }, - { url = "https://files.pythonhosted.org/packages/f7/72/d0426afec4b71dc159fa6b4e68f868cd5a3ecd918fec5813a15d292a7d10/charset_normalizer-3.4.6-cp312-cp312-manylinux_2_31_armv7l.whl", hash = "sha256:530d548084c4a9f7a16ed4a294d459b4f229db50df689bfe92027452452943a0", size = 195161, upload-time = "2026-03-15T18:50:58.686Z" }, - { url = "https://files.pythonhosted.org/packages/bf/18/c82b06a68bfcb6ce55e508225d210c7e6a4ea122bfc0748892f3dc4e8e11/charset_normalizer-3.4.6-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:30f445ae60aad5e1f8bdbb3108e39f6fbc09f4ea16c815c66578878325f8f15a", size = 203452, upload-time = "2026-03-15T18:51:00.196Z" }, - { url = "https://files.pythonhosted.org/packages/44/d6/0c25979b92f8adafdbb946160348d8d44aa60ce99afdc27df524379875cb/charset_normalizer-3.4.6-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:ac2393c73378fea4e52aa56285a3d64be50f1a12395afef9cce47772f60334c2", size = 202272, upload-time = "2026-03-15T18:51:01.703Z" }, - { url = "https://files.pythonhosted.org/packages/2e/3d/7fea3e8fe84136bebbac715dd1221cc25c173c57a699c030ab9b8900cbb7/charset_normalizer-3.4.6-cp312-cp312-musllinux_1_2_armv7l.whl", hash = "sha256:90ca27cd8da8118b18a52d5f547859cc1f8354a00cd1e8e5120df3e30d6279e5", size = 195622, upload-time = "2026-03-15T18:51:03.526Z" }, - { url = "https://files.pythonhosted.org/packages/57/8a/d6f7fd5cb96c58ef2f681424fbca01264461336d2a7fc875e4446b1f1346/charset_normalizer-3.4.6-cp312-cp312-musllinux_1_2_ppc64le.whl", hash = "sha256:8e5a94886bedca0f9b78fecd6afb6629142fd2605aa70a125d49f4edc6037ee6", size = 220056, upload-time = "2026-03-15T18:51:05.269Z" }, - { url = "https://files.pythonhosted.org/packages/16/50/478cdda782c8c9c3fb5da3cc72dd7f331f031e7f1363a893cdd6ca0f8de0/charset_normalizer-3.4.6-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:695f5c2823691a25f17bc5d5ffe79fa90972cc34b002ac6c843bb8a1720e950d", size = 203751, upload-time = "2026-03-15T18:51:06.858Z" }, - { url = "https://files.pythonhosted.org/packages/75/fc/cc2fcac943939c8e4d8791abfa139f685e5150cae9f94b60f12520feaa9b/charset_normalizer-3.4.6-cp312-cp312-musllinux_1_2_s390x.whl", hash = "sha256:231d4da14bcd9301310faf492051bee27df11f2bc7549bc0bb41fef11b82daa2", size = 216563, upload-time = "2026-03-15T18:51:08.564Z" }, - { url = "https://files.pythonhosted.org/packages/a8/b7/a4add1d9a5f68f3d037261aecca83abdb0ab15960a3591d340e829b37298/charset_normalizer-3.4.6-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:a056d1ad2633548ca18ffa2f85c202cfb48b68615129143915b8dc72a806a923", size = 209265, upload-time = "2026-03-15T18:51:10.312Z" }, - { url = "https://files.pythonhosted.org/packages/6c/18/c094561b5d64a24277707698e54b7f67bd17a4f857bbfbb1072bba07c8bf/charset_normalizer-3.4.6-cp312-cp312-win32.whl", hash = "sha256:c2274ca724536f173122f36c98ce188fd24ce3dad886ec2b7af859518ce008a4", size = 144229, upload-time = "2026-03-15T18:51:11.694Z" }, - { url = "https://files.pythonhosted.org/packages/ab/20/0567efb3a8fd481b8f34f739ebddc098ed062a59fed41a8d193a61939e8f/charset_normalizer-3.4.6-cp312-cp312-win_amd64.whl", hash = "sha256:c8ae56368f8cc97c7e40a7ee18e1cedaf8e780cd8bc5ed5ac8b81f238614facb", size = 154277, upload-time = "2026-03-15T18:51:13.004Z" }, - { url = "https://files.pythonhosted.org/packages/15/57/28d79b44b51933119e21f65479d0864a8d5893e494cf5daab15df0247c17/charset_normalizer-3.4.6-cp312-cp312-win_arm64.whl", hash = "sha256:899d28f422116b08be5118ef350c292b36fc15ec2daeb9ea987c89281c7bb5c4", size = 142817, upload-time = "2026-03-15T18:51:14.408Z" }, { url = "https://files.pythonhosted.org/packages/1e/1d/4fdabeef4e231153b6ed7567602f3b68265ec4e5b76d6024cf647d43d981/charset_normalizer-3.4.6-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:11afb56037cbc4b1555a34dd69151e8e069bee82e613a73bef6e714ce733585f", size = 294823, upload-time = "2026-03-15T18:51:15.755Z" }, { url = "https://files.pythonhosted.org/packages/47/7b/20e809b89c69d37be748d98e84dce6820bf663cf19cf6b942c951a3e8f41/charset_normalizer-3.4.6-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:423fb7e748a08f854a08a222b983f4df1912b1daedce51a72bd24fe8f26a1843", size = 198527, upload-time = "2026-03-15T18:51:17.177Z" }, { url = "https://files.pythonhosted.org/packages/37/a6/4f8d27527d59c039dce6f7622593cdcd3d70a8504d87d09eb11e9fdc6062/charset_normalizer-3.4.6-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:d73beaac5e90173ac3deb9928a74763a6d230f494e4bfb422c217a0ad8e629bf", size = 218388, upload-time = "2026-03-15T18:51:18.934Z" }, @@ -209,7 +155,7 @@ name = "colorlog" version = "6.10.1" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "colorama", marker = "python_full_version >= '3.12' and sys_platform == 'win32'" }, + { name = "colorama", marker = "sys_platform == 'win32'" }, ] sdist = { url = "https://files.pythonhosted.org/packages/a2/61/f083b5ac52e505dfc1c624eafbf8c7589a0d7f32daa398d2e7590efa5fda/colorlog-6.10.1.tar.gz", hash = "sha256:eb4ae5cb65fe7fec7773c2306061a8e63e02efc2c72eba9d27b0fa23c94f1321", size = 17162, upload-time = "2025-10-16T16:14:11.978Z" } wheels = [ @@ -221,15 +167,9 @@ name = "cuda-bindings" version = "13.2.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "cuda-pathfinder", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" }, + { name = "cuda-pathfinder", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, ] wheels = [ - { url = "https://files.pythonhosted.org/packages/1a/fe/7351d7e586a8b4c9f89731bfe4cf0148223e8f9903ff09571f78b3fb0682/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:08b395f79cb89ce0cd8effff07c4a1e20101b873c256a1aeb286e8fd7bd0f556", size = 5744254, upload-time = "2026-03-11T00:12:29.798Z" }, - { url = "https://files.pythonhosted.org/packages/aa/ef/184aa775e970fc089942cd9ec6302e6e44679d4c14549c6a7ea45bf7f798/cuda_bindings-13.2.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d6f3682ec3c4769326aafc67c2ba669d97d688d0b7e63e659d36d2f8b72f32d6", size = 6329075, upload-time = "2026-03-11T00:12:32.319Z" }, - { url = "https://files.pythonhosted.org/packages/e0/a9/3a8241c6e19483ac1f1dcf5c10238205dcb8a6e9d0d4d4709240dff28ff4/cuda_bindings-13.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:721104c603f059780d287969be3d194a18d0cc3b713ed9049065a1107706759d", size = 5730273, upload-time = "2026-03-11T00:12:37.18Z" }, - { url = "https://files.pythonhosted.org/packages/e9/94/2748597f47bb1600cd466b20cab4159f1530a3a33fe7f70fee199b3abb9e/cuda_bindings-13.2.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1eba9504ac70667dd48313395fe05157518fd6371b532790e96fbb31bbb5a5e1", size = 6313924, upload-time = "2026-03-11T00:12:39.462Z" }, - { url = "https://files.pythonhosted.org/packages/52/c8/b2589d68acf7e3d63e2be330b84bc25712e97ed799affbca7edd7eae25d6/cuda_bindings-13.2.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e865447abfb83d6a98ad5130ed3c70b1fc295ae3eeee39fd07b4ddb0671b6788", size = 5722404, upload-time = "2026-03-11T00:12:44.041Z" }, - { url = "https://files.pythonhosted.org/packages/1f/92/f899f7bbb5617bb65ec52a6eac1e9a1447a86b916c4194f8a5001b8cde0c/cuda_bindings-13.2.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:46d8776a55d6d5da9dd6e9858fba2efcda2abe6743871dee47dd06eb8cb6d955", size = 6320619, upload-time = "2026-03-11T00:12:45.939Z" }, { url = "https://files.pythonhosted.org/packages/df/93/eef988860a3ca985f82c4f3174fc0cdd94e07331ba9a92e8e064c260337f/cuda_bindings-13.2.0-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6629ca2df6f795b784752409bcaedbd22a7a651b74b56a165ebc0c9dcbd504d0", size = 5614610, upload-time = "2026-03-11T00:12:50.337Z" }, { url = "https://files.pythonhosted.org/packages/18/23/6db3aba46864aee357ab2415135b3fe3da7e9f1fa0221fa2a86a5968099c/cuda_bindings-13.2.0-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:7dca0da053d3b4cc4869eff49c61c03f3c5dbaa0bcd712317a358d5b8f3f385d", size = 6149914, upload-time = "2026-03-11T00:12:52.374Z" }, { url = "https://files.pythonhosted.org/packages/c0/87/87a014f045b77c6de5c8527b0757fe644417b184e5367db977236a141602/cuda_bindings-13.2.0-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a6464b30f46692d6c7f65d4a0e0450d81dd29de3afc1bb515653973d01c2cd6e", size = 5685673, upload-time = "2026-03-11T00:12:56.371Z" }, @@ -256,37 +196,37 @@ wheels = [ [package.optional-dependencies] cublas = [ - { name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-cublas", marker = "sys_platform == 'linux'" }, ] cudart = [ - { name = "nvidia-cuda-runtime", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-cuda-runtime", marker = "sys_platform == 'linux'" }, ] cufft = [ - { name = "nvidia-cufft", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-cufft", marker = "sys_platform == 'linux'" }, ] cufile = [ { name = "nvidia-cufile", marker = "sys_platform == 'linux'" }, ] cupti = [ - { name = "nvidia-cuda-cupti", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-cuda-cupti", marker = "sys_platform == 'linux'" }, ] curand = [ - { name = "nvidia-curand", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-curand", marker = "sys_platform == 'linux'" }, ] cusolver = [ - { name = "nvidia-cusolver", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-cusolver", marker = "sys_platform == 'linux'" }, ] cusparse = [ - { name = "nvidia-cusparse", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-cusparse", marker = "sys_platform == 'linux'" }, ] nvjitlink = [ - { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-nvjitlink", marker = "sys_platform == 'linux'" }, ] nvrtc = [ - { name = "nvidia-cuda-nvrtc", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-cuda-nvrtc", marker = "sys_platform == 'linux'" }, ] nvtx = [ - { name = "nvidia-nvtx", marker = "(python_full_version < '3.11' and sys_platform == 'win32') or sys_platform == 'linux'" }, + { name = "nvidia-nvtx", marker = "sys_platform == 'linux'" }, ] [[package]] @@ -313,26 +253,6 @@ version = "1.5.1" source = { registry = "https://pypi.org/simple" } sdist = { url = "https://files.pythonhosted.org/packages/ae/62/590caabec6c41003f46a244b6fd707d35ca2e552e0c70cbf454e08bf6685/duckdb-1.5.1.tar.gz", hash = "sha256:b370d1620a34a4538ef66524fcee9de8171fa263c701036a92bc0b4c1f2f9c6d", size = 17995082, upload-time = "2026-03-23T12:12:15.894Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/eb/63/d6477057ea6103f80ed9499580c8602183211689889ec50c32f25a935e3d/duckdb-1.5.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:46f92ada9023e59f27edc048167b31ac9a03911978b1296c845a34462a27f096", size = 30067487, upload-time = "2026-03-23T12:10:15.712Z" }, - { url = "https://files.pythonhosted.org/packages/ba/b8/22e6c605d9281df7a83653f4a60168eec0f650b23f1d4648aca940d79d00/duckdb-1.5.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:caa65e1f5bf007430bf657c37cab7ab81a4ddf8d337e3062bcc5085d17ef038b", size = 15968413, upload-time = "2026-03-23T12:10:18.978Z" }, - { url = "https://files.pythonhosted.org/packages/85/b1/88a457cd3105525cba0d4c155f847c5c32fa4f543d3ba4ee38b4fd75f82e/duckdb-1.5.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:8c0088765747ae5d6c9f89987bb36f9fb83564f07090d721344ce8e1abedffea", size = 14222115, upload-time = "2026-03-23T12:10:21.662Z" }, - { url = "https://files.pythonhosted.org/packages/c5/3b/800c3f1d54ae0062b3e9b0b54fc54d6c155d731311931d748fc9c5c565f9/duckdb-1.5.1-cp310-cp310-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e56a20ab6cdb90a95b0c99652e28de3504ce77129087319c03c9098266183ae5", size = 19244994, upload-time = "2026-03-23T12:10:24.708Z" }, - { url = "https://files.pythonhosted.org/packages/3a/09/4c4dd94f521d016e0fb83cca2c203d10ce1e3f8bcc679691b5271fc98b83/duckdb-1.5.1-cp310-cp310-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:715f05ea198d20d7f8b407b9b84e0023d17f2b9096c194cea702b7840e74f1f7", size = 21347663, upload-time = "2026-03-23T12:10:27.428Z" }, - { url = "https://files.pythonhosted.org/packages/d0/b3/eb3c70be70d0b3fa6c8051d6fa4b7fb3d5787fa77b3f50b7e38d5f7cc6fd/duckdb-1.5.1-cp310-cp310-win_amd64.whl", hash = "sha256:e878ccb7d20872065e1597935fdb5e65efa43220c8edd0d9c4a1a7ff1f3eb277", size = 13067979, upload-time = "2026-03-23T12:10:30.783Z" }, - { url = "https://files.pythonhosted.org/packages/42/3e/827ffcf58f0abc6ad6dcf826c5d24ebfc65e03ad1a20d74cad9806f91c99/duckdb-1.5.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:bc7ca6a1a40e7e4c933017e6c09ef18032add793df4e42624c6c0c87e0bebdad", size = 30067835, upload-time = "2026-03-23T12:10:34.026Z" }, - { url = "https://files.pythonhosted.org/packages/04/b5/e921ecf8a7e0cc7da2100c98bef64b3da386df9444f467d6389364851302/duckdb-1.5.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:446d500a2977c6ae2077f340c510a25956da5c77597175c316edfa87248ceda3", size = 15970464, upload-time = "2026-03-23T12:10:42.063Z" }, - { url = "https://files.pythonhosted.org/packages/dd/da/ed804006cd09ba303389d573c8b15d74220667cbd1fd990c26e98d0e0a5b/duckdb-1.5.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:b8b0808dba0c63b7633bdaefb34e08fe0612622224f9feb0e7518904b1615101", size = 14222994, upload-time = "2026-03-23T12:10:45.162Z" }, - { url = "https://files.pythonhosted.org/packages/b3/43/c904d81a61306edab81a9d74bb37bbe65679639abb7030d4c4fec9ed84f7/duckdb-1.5.1-cp311-cp311-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:553c273a6a8f140adaa6da6a6135c7f95bdc8c2e5f95252fcdf9832d758e2141", size = 19244880, upload-time = "2026-03-23T12:10:48.529Z" }, - { url = "https://files.pythonhosted.org/packages/50/db/358715d677bfe5e117d9e1f2d6cc2fc2b0bd621144d1f15335b8b59f95d7/duckdb-1.5.1-cp311-cp311-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:40c5220ec93790b18ec6278da9c6ac2608d997ee6d6f7cd44c5c3992764e8e71", size = 21350874, upload-time = "2026-03-23T12:10:52.095Z" }, - { url = "https://files.pythonhosted.org/packages/3f/db/fd647ce46315347976f5576a279bacb8134d23b1f004bd0bcda7ce9cf429/duckdb-1.5.1-cp311-cp311-win_amd64.whl", hash = "sha256:36e8e32621a9e2a9abe75dc15a4b54a3997f2d8b1e53ad754bae48a083c91130", size = 13068140, upload-time = "2026-03-23T12:10:55.622Z" }, - { url = "https://files.pythonhosted.org/packages/27/95/e29d42792707619da5867ffab338d7e7b086242c7296aa9cfc6dcf52d568/duckdb-1.5.1-cp311-cp311-win_arm64.whl", hash = "sha256:5ae7c0d744d64e2753149634787cc4ab60f05ef1e542b060eeab719f3cdb7723", size = 13908823, upload-time = "2026-03-23T12:10:58.572Z" }, - { url = "https://files.pythonhosted.org/packages/3f/06/be4c62f812c6e23898733073ace0482eeb18dffabe0585d63a3bf38bca1e/duckdb-1.5.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:6f7361d66cc801d9eb4df734b139cd7b0e3c257a16f3573ebd550ddb255549e6", size = 30113703, upload-time = "2026-03-23T12:11:02.536Z" }, - { url = "https://files.pythonhosted.org/packages/44/03/1794dcdda75ff203ab0982ff7eb5232549b58b9af66f243f1b7212d6d6be/duckdb-1.5.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:0a6acc2040bec1f05de62a2f3f68f4c12f3ec7d6012b4317d0ab1a195af26225", size = 15991802, upload-time = "2026-03-23T12:11:06.321Z" }, - { url = "https://files.pythonhosted.org/packages/87/03/293bccd838a293d42ea26dec7f4eb4f58b57b6c9ffcfabc6518a5f20a24a/duckdb-1.5.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:ed6d23a3f806898e69c77430ebd8da0c79c219f97b9acbc9a29a653e09740c59", size = 14246803, upload-time = "2026-03-23T12:11:09.624Z" }, - { url = "https://files.pythonhosted.org/packages/15/2c/7b4f11879aa2924838168b4640da999dccda1b4a033d43cb998fd6dc33ea/duckdb-1.5.1-cp312-cp312-manylinux_2_26_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6af347debc8b721aa72e48671166282da979d5e5ae52dbc660ab417282b48e23", size = 19271654, upload-time = "2026-03-23T12:11:13.354Z" }, - { url = "https://files.pythonhosted.org/packages/6f/d6/8f9a6b1fbcc669108ec6a4d625a70be9e480b437ed9b70cd56b78cd577a6/duckdb-1.5.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8150c569b2aa4573b51ba8475e814aa41fd53a3d510c1ffb96f1139f46faf611", size = 21386100, upload-time = "2026-03-23T12:11:16.758Z" }, - { url = "https://files.pythonhosted.org/packages/c4/fe/8d02c6473273468cf8d43fd5d73c677f8cdfcd036c1e884df0613f124c2b/duckdb-1.5.1-cp312-cp312-win_amd64.whl", hash = "sha256:054ad424b051b334052afac58cb216f3b1ebb8579fc8c641e60f0182e8725ea9", size = 13083506, upload-time = "2026-03-23T12:11:19.785Z" }, - { url = "https://files.pythonhosted.org/packages/96/0b/2be786b9c153eb263bf5d3d5f7ab621b14a715d7e70f92b24ecf8536369e/duckdb-1.5.1-cp312-cp312-win_arm64.whl", hash = "sha256:6ba302115f63f6482c000ccfd62efdb6c41d9d182a5bcd4a90e7ab8cd13856eb", size = 13888862, upload-time = "2026-03-23T12:11:22.84Z" }, { url = "https://files.pythonhosted.org/packages/a5/f2/af476945e3b97417945b0f660b5efa661863547c0ea104251bb6387342b1/duckdb-1.5.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:26e56b5f0c96189e3288d83cf7b476e23615987902f801e5788dee15ee9f24a9", size = 30113759, upload-time = "2026-03-23T12:11:26.5Z" }, { url = "https://files.pythonhosted.org/packages/fe/9d/5a542b3933647369e601175190093597ce0ac54909aea0dd876ec51ffad4/duckdb-1.5.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:972d0dbf283508f9bc446ee09c3838cb7c7f114b5bdceee41753288c97fe2f7c", size = 15991463, upload-time = "2026-03-23T12:11:30.025Z" }, { url = "https://files.pythonhosted.org/packages/53/a5/b59cff67f5e0420b8f337ad86406801cffacae219deed83961dcceefda67/duckdb-1.5.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:482f8a13f2600f527e427f73c42b5aa75536f9892868068f0aaf573055a0135f", size = 14246482, upload-time = "2026-03-23T12:11:33.33Z" }, @@ -349,18 +269,6 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e6/ac/f9e4e731635192571f86f52d86234f537c7f8ca4f6917c56b29051c077ef/duckdb-1.5.1-cp314-cp314-win_arm64.whl", hash = "sha256:a3be2072315982e232bfe49c9d3db0a59ba67b2240a537ef42656cc772a887c7", size = 14370790, upload-time = "2026-03-23T12:12:12.497Z" }, ] -[[package]] -name = "exceptiongroup" -version = "1.3.1" -source = { registry = "https://pypi.org/simple" } -dependencies = [ - { name = "typing-extensions", marker = "python_full_version < '3.11'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/50/79/66800aadf48771f6b62f7eb014e352e5d06856655206165d775e675a02c9/exceptiongroup-1.3.1.tar.gz", hash = "sha256:8b412432c6055b0b7d14c310000ae93352ed6754f70fa8f7c34141f91c4e3219", size = 30371, upload-time = "2025-11-21T23:01:54.787Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/8a/0e/97c33bf5009bdbac74fd2beace167cab3f978feb69cc36f1ef79360d6c4e/exceptiongroup-1.3.1-py3-none-any.whl", hash = "sha256:a7a39a3bd276781e98394987d3a5701d0c4edffb633bb7a5144577f82c773598", size = 16740, upload-time = "2025-11-21T23:01:53.443Z" }, -] - [[package]] name = "executing" version = "2.2.1" @@ -394,29 +302,6 @@ version = "3.3.2" source = { registry = "https://pypi.org/simple" } sdist = { url = "https://files.pythonhosted.org/packages/a3/51/1664f6b78fc6ebbd98019a1fd730e83fa78f2db7058f72b1463d3612b8db/greenlet-3.3.2.tar.gz", hash = "sha256:2eaf067fc6d886931c7962e8c6bede15d2f01965560f3359b27c80bde2d151f2", size = 188267, upload-time = "2026-02-20T20:54:15.531Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/38/3f/9859f655d11901e7b2996c6e3d33e0caa9a1d4572c3bc61ed0faa64b2f4c/greenlet-3.3.2-cp310-cp310-macosx_11_0_universal2.whl", hash = "sha256:9bc885b89709d901859cf95179ec9f6bb67a3d2bb1f0e88456461bd4b7f8fd0d", size = 277747, upload-time = "2026-02-20T20:16:21.325Z" }, - { url = "https://files.pythonhosted.org/packages/fb/07/cb284a8b5c6498dbd7cba35d31380bb123d7dceaa7907f606c8ff5993cbf/greenlet-3.3.2-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b568183cf65b94919be4438dc28416b234b678c608cafac8874dfeeb2a9bbe13", size = 579202, upload-time = "2026-02-20T20:47:28.955Z" }, - { url = "https://files.pythonhosted.org/packages/ed/45/67922992b3a152f726163b19f890a85129a992f39607a2a53155de3448b8/greenlet-3.3.2-cp310-cp310-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:527fec58dc9f90efd594b9b700662ed3fb2493c2122067ac9c740d98080a620e", size = 590620, upload-time = "2026-02-20T20:55:55.581Z" }, - { url = "https://files.pythonhosted.org/packages/ad/55/9f1ebb5a825215fadcc0f7d5073f6e79e3007e3282b14b22d6aba7ca6cb8/greenlet-3.3.2-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ad0c8917dd42a819fe77e6bdfcb84e3379c0de956469301d9fd36427a1ca501f", size = 591729, upload-time = "2026-02-20T20:20:58.395Z" }, - { url = "https://files.pythonhosted.org/packages/24/b4/21f5455773d37f94b866eb3cf5caed88d6cea6dd2c6e1f9c34f463cba3ec/greenlet-3.3.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:97245cc10e5515dbc8c3104b2928f7f02b6813002770cfaffaf9a6e0fc2b94ef", size = 1551946, upload-time = "2026-02-20T20:49:31.102Z" }, - { url = "https://files.pythonhosted.org/packages/00/68/91f061a926abead128fe1a87f0b453ccf07368666bd59ffa46016627a930/greenlet-3.3.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:8c1fdd7d1b309ff0da81d60a9688a8bd044ac4e18b250320a96fc68d31c209ca", size = 1618494, upload-time = "2026-02-20T20:21:06.541Z" }, - { url = "https://files.pythonhosted.org/packages/ac/78/f93e840cbaef8becaf6adafbaf1319682a6c2d8c1c20224267a5c6c8c891/greenlet-3.3.2-cp310-cp310-win_amd64.whl", hash = "sha256:5d0e35379f93a6d0222de929a25ab47b5eb35b5ef4721c2b9cbcc4036129ff1f", size = 230092, upload-time = "2026-02-20T20:17:09.379Z" }, - { url = "https://files.pythonhosted.org/packages/f3/47/16400cb42d18d7a6bb46f0626852c1718612e35dcb0dffa16bbaffdf5dd2/greenlet-3.3.2-cp311-cp311-macosx_11_0_universal2.whl", hash = "sha256:c56692189a7d1c7606cb794be0a8381470d95c57ce5be03fb3d0ef57c7853b86", size = 278890, upload-time = "2026-02-20T20:19:39.263Z" }, - { url = "https://files.pythonhosted.org/packages/a3/90/42762b77a5b6aa96cd8c0e80612663d39211e8ae8a6cd47c7f1249a66262/greenlet-3.3.2-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1ebd458fa8285960f382841da585e02201b53a5ec2bac6b156fc623b5ce4499f", size = 581120, upload-time = "2026-02-20T20:47:30.161Z" }, - { url = "https://files.pythonhosted.org/packages/bf/6f/f3d64f4fa0a9c7b5c5b3c810ff1df614540d5aa7d519261b53fba55d4df9/greenlet-3.3.2-cp311-cp311-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a443358b33c4ec7b05b79a7c8b466f5d275025e750298be7340f8fc63dff2a55", size = 594363, upload-time = "2026-02-20T20:55:56.965Z" }, - { url = "https://files.pythonhosted.org/packages/72/83/3e06a52aca8128bdd4dcd67e932b809e76a96ab8c232a8b025b2850264c5/greenlet-3.3.2-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e2cd90d413acbf5e77ae41e5d3c9b3ac1d011a756d7284d7f3f2b806bbd6358", size = 594156, upload-time = "2026-02-20T20:20:59.955Z" }, - { url = "https://files.pythonhosted.org/packages/70/79/0de5e62b873e08fe3cef7dbe84e5c4bc0e8ed0c7ff131bccb8405cd107c8/greenlet-3.3.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:442b6057453c8cb29b4fb36a2ac689382fc71112273726e2423f7f17dc73bf99", size = 1554649, upload-time = "2026-02-20T20:49:32.293Z" }, - { url = "https://files.pythonhosted.org/packages/5a/00/32d30dee8389dc36d42170a9c66217757289e2afb0de59a3565260f38373/greenlet-3.3.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:45abe8eb6339518180d5a7fa47fa01945414d7cca5ecb745346fc6a87d2750be", size = 1619472, upload-time = "2026-02-20T20:21:07.966Z" }, - { url = "https://files.pythonhosted.org/packages/f1/3a/efb2cf697fbccdf75b24e2c18025e7dfa54c4f31fab75c51d0fe79942cef/greenlet-3.3.2-cp311-cp311-win_amd64.whl", hash = "sha256:1e692b2dae4cc7077cbb11b47d258533b48c8fde69a33d0d8a82e2fe8d8531d5", size = 230389, upload-time = "2026-02-20T20:17:18.772Z" }, - { url = "https://files.pythonhosted.org/packages/e1/a1/65bbc059a43a7e2143ec4fc1f9e3f673e04f9c7b371a494a101422ac4fd5/greenlet-3.3.2-cp311-cp311-win_arm64.whl", hash = "sha256:02b0a8682aecd4d3c6c18edf52bc8e51eacdd75c8eac52a790a210b06aa295fd", size = 229645, upload-time = "2026-02-20T20:18:18.695Z" }, - { url = "https://files.pythonhosted.org/packages/ea/ab/1608e5a7578e62113506740b88066bf09888322a311cff602105e619bd87/greenlet-3.3.2-cp312-cp312-macosx_11_0_universal2.whl", hash = "sha256:ac8d61d4343b799d1e526db579833d72f23759c71e07181c2d2944e429eb09cd", size = 280358, upload-time = "2026-02-20T20:17:43.971Z" }, - { url = "https://files.pythonhosted.org/packages/a5/23/0eae412a4ade4e6623ff7626e38998cb9b11e9ff1ebacaa021e4e108ec15/greenlet-3.3.2-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3ceec72030dae6ac0c8ed7591b96b70410a8be370b6a477b1dbc072856ad02bd", size = 601217, upload-time = "2026-02-20T20:47:31.462Z" }, - { url = "https://files.pythonhosted.org/packages/f8/16/5b1678a9c07098ecb9ab2dd159fafaf12e963293e61ee8d10ecb55273e5e/greenlet-3.3.2-cp312-cp312-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a2a5be83a45ce6188c045bcc44b0ee037d6a518978de9a5d97438548b953a1ac", size = 611792, upload-time = "2026-02-20T20:55:58.423Z" }, - { url = "https://files.pythonhosted.org/packages/50/1f/5155f55bd71cabd03765a4aac9ac446be129895271f73872c36ebd4b04b6/greenlet-3.3.2-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:43e99d1749147ac21dde49b99c9abffcbc1e2d55c67501465ef0930d6e78e070", size = 613875, upload-time = "2026-02-20T20:21:01.102Z" }, - { url = "https://files.pythonhosted.org/packages/fc/dd/845f249c3fcd69e32df80cdab059b4be8b766ef5830a3d0aa9d6cad55beb/greenlet-3.3.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:4c956a19350e2c37f2c48b336a3afb4bff120b36076d9d7fb68cb44e05d95b79", size = 1571467, upload-time = "2026-02-20T20:49:33.495Z" }, - { url = "https://files.pythonhosted.org/packages/2a/50/2649fe21fcc2b56659a452868e695634722a6655ba245d9f77f5656010bf/greenlet-3.3.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:6c6f8ba97d17a1e7d664151284cb3315fc5f8353e75221ed4324f84eb162b395", size = 1640001, upload-time = "2026-02-20T20:21:09.154Z" }, - { url = "https://files.pythonhosted.org/packages/9b/40/cc802e067d02af8b60b6771cea7d57e21ef5e6659912814babb42b864713/greenlet-3.3.2-cp312-cp312-win_amd64.whl", hash = "sha256:34308836d8370bddadb41f5a7ce96879b72e2fdfb4e87729330c6ab52376409f", size = 231081, upload-time = "2026-02-20T20:17:28.121Z" }, - { url = "https://files.pythonhosted.org/packages/58/2e/fe7f36ff1982d6b10a60d5e0740c759259a7d6d2e1dc41da6d96de32fff6/greenlet-3.3.2-cp312-cp312-win_arm64.whl", hash = "sha256:d3a62fa76a32b462a97198e4c9e99afb9ab375115e74e9a83ce180e7a496f643", size = 230331, upload-time = "2026-02-20T20:17:23.34Z" }, { url = "https://files.pythonhosted.org/packages/ac/48/f8b875fa7dea7dd9b33245e37f065af59df6a25af2f9561efa8d822fde51/greenlet-3.3.2-cp313-cp313-macosx_11_0_universal2.whl", hash = "sha256:aa6ac98bdfd716a749b84d4034486863fd81c3abde9aa3cf8eff9127981a4ae4", size = 279120, upload-time = "2026-02-20T20:19:01.9Z" }, { url = "https://files.pythonhosted.org/packages/49/8d/9771d03e7a8b1ee456511961e1b97a6d77ae1dea4a34a5b98eee706689d3/greenlet-3.3.2-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ab0c7e7901a00bc0a7284907273dc165b32e0d109a6713babd04471327ff7986", size = 603238, upload-time = "2026-02-20T20:47:32.873Z" }, { url = "https://files.pythonhosted.org/packages/59/0e/4223c2bbb63cd5c97f28ffb2a8aee71bdfb30b323c35d409450f51b91e3e/greenlet-3.3.2-cp313-cp313-manylinux_2_24_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:d248d8c23c67d2291ffd47af766e2a3aa9fa1c6703155c099feb11f526c63a92", size = 614219, upload-time = "2026-02-20T20:55:59.817Z" }, @@ -456,34 +341,10 @@ name = "h5py" version = "3.16.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "numpy" }, ] sdist = { url = "https://files.pythonhosted.org/packages/db/33/acd0ce6863b6c0d7735007df01815403f5589a21ff8c2e1ee2587a38f548/h5py-3.16.0.tar.gz", hash = "sha256:a0dbaad796840ccaa67a4c144a0d0c8080073c34c76d5a6941d6818678ef2738", size = 446526, upload-time = "2026-03-06T13:49:08.07Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/3a/6b/231413e58a787a89b316bb0d1777da3c62257e4797e09afd8d17ad3549dc/h5py-3.16.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:e06f864bedb2c8e7c1358e6c73af48519e317457c444d6f3d332bb4e8fa6d7d9", size = 3724137, upload-time = "2026-03-06T13:47:35.242Z" }, - { url = "https://files.pythonhosted.org/packages/74/f9/557ce3aad0fe8471fb5279bab0fc56ea473858a022c4ce8a0b8f303d64e9/h5py-3.16.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:ec86d4fffd87a0f4cb3d5796ceb5a50123a2a6d99b43e616e5504e66a953eca3", size = 3090112, upload-time = "2026-03-06T13:47:37.634Z" }, - { url = "https://files.pythonhosted.org/packages/7a/f5/e15b3d0dc8a18e56409a839e6468d6fb589bc5207c917399c2e0706eeb44/h5py-3.16.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:86385ea895508220b8a7e45efa428aeafaa586bd737c7af9ee04661d8d84a10d", size = 4844847, upload-time = "2026-03-06T13:47:39.811Z" }, - { url = "https://files.pythonhosted.org/packages/cb/92/a8851d936547efe30cc0ce5245feac01f3ec6171f7899bc3f775c72030b3/h5py-3.16.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:8975273c2c5921c25700193b408e28d6bdd0111c37468b2d4e25dcec4cd1d84d", size = 5065352, upload-time = "2026-03-06T13:47:41.489Z" }, - { url = "https://files.pythonhosted.org/packages/2b/ae/f2adc5d0ca9626db3277a3d87516e124cbc5d0eea0bd79bc085702d04f2c/h5py-3.16.0-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:1677ad48b703f44efc9ea0c3ab284527f81bc4f318386aaaebc5fede6bbae56f", size = 4839173, upload-time = "2026-03-06T13:47:43.586Z" }, - { url = "https://files.pythonhosted.org/packages/64/0b/e0c8c69da1d8838da023a50cd3080eae5d475691f7636b35eff20bb6ef20/h5py-3.16.0-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:7c4dd4cf5f0a4e36083f73172f6cfc25a5710789269547f132a20975bfe2434c", size = 5076216, upload-time = "2026-03-06T13:47:45.315Z" }, - { url = "https://files.pythonhosted.org/packages/66/35/d88fd6718832133c885004c61ceeeb24dbd6397ef877dbed6b3a64d6a286/h5py-3.16.0-cp310-cp310-win_amd64.whl", hash = "sha256:bdef06507725b455fccba9c16529121a5e1fbf56aa375f7d9713d9e8ff42454d", size = 3183639, upload-time = "2026-03-06T13:47:47.041Z" }, - { url = "https://files.pythonhosted.org/packages/ba/95/a825894f3e45cbac7554c4e97314ce886b233a20033787eda755ca8fecc7/h5py-3.16.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:719439d14b83f74eeb080e9650a6c7aa6d0d9ea0ca7f804347b05fac6fbf18af", size = 3721663, upload-time = "2026-03-06T13:47:49.599Z" }, - { url = "https://files.pythonhosted.org/packages/bf/3b/38ff88b347c3e346cda1d3fc1b65a7aa75d40632228d8b8a5d7b58508c24/h5py-3.16.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:c3f0a0e136f2e95dd0b67146abb6668af4f1a69c81ef8651a2d316e8e01de447", size = 3087630, upload-time = "2026-03-06T13:47:51.249Z" }, - { url = "https://files.pythonhosted.org/packages/98/a8/2594cef906aee761601eff842c7dc598bea2b394a3e1c00966832b8eeb7c/h5py-3.16.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:a6fbc5367d4046801f9b7db9191b31895f22f1c6df1f9987d667854cac493538", size = 4823472, upload-time = "2026-03-06T13:47:53.085Z" }, - { url = "https://files.pythonhosted.org/packages/52/a0/c1f604538ff6db22a0690be2dc44ab59178e115f63c917794e529356ab23/h5py-3.16.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:fb1720028d99040792bb2fb31facb8da44a6f29df7697e0b84f0d79aff2e9bd3", size = 5027150, upload-time = "2026-03-06T13:47:55.043Z" }, - { url = "https://files.pythonhosted.org/packages/2e/fd/301739083c2fc4fd89950f9bcfce75d6e14b40b0ca3d40e48a8993d1722c/h5py-3.16.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:314b6054fe0b1051c2b0cb2df5cbdab15622fb05e80f202e3b6a5eee0d6fe365", size = 4814544, upload-time = "2026-03-06T13:47:56.893Z" }, - { url = "https://files.pythonhosted.org/packages/4c/42/2193ed41ccee78baba8fcc0cff2c925b8b9ee3793305b23e1f22c20bf4c7/h5py-3.16.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:ffbab2fedd6581f6aa31cf1639ca2cb86e02779de525667892ebf4cc9fd26434", size = 5034013, upload-time = "2026-03-06T13:47:59.01Z" }, - { url = "https://files.pythonhosted.org/packages/f7/20/e6c0ff62ca2ad1a396a34f4380bafccaaf8791ff8fccf3d995a1fc12d417/h5py-3.16.0-cp311-cp311-win_amd64.whl", hash = "sha256:17d1f1630f92ad74494a9a7392ab25982ce2b469fc62da6074c0ce48366a2999", size = 3191673, upload-time = "2026-03-06T13:48:00.626Z" }, - { url = "https://files.pythonhosted.org/packages/f2/48/239cbe352ac4f2b8243a8e620fa1a2034635f633731493a7ff1ed71e8658/h5py-3.16.0-cp311-cp311-win_arm64.whl", hash = "sha256:85b9c49dd58dc44cf70af944784e2c2038b6f799665d0dcbbc812a26e0faa859", size = 2673834, upload-time = "2026-03-06T13:48:02.579Z" }, - { url = "https://files.pythonhosted.org/packages/c8/c0/5d4119dba94093bbafede500d3defd2f5eab7897732998c04b54021e530b/h5py-3.16.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:c5313566f4643121a78503a473f0fb1e6dcc541d5115c44f05e037609c565c4d", size = 3685604, upload-time = "2026-03-06T13:48:04.198Z" }, - { url = "https://files.pythonhosted.org/packages/b0/42/c84efcc1d4caebafb1ecd8be4643f39c85c47a80fe254d92b8b43b1eadaf/h5py-3.16.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:42b012933a83e1a558c673176676a10ce2fd3759976a0fedee1e672d1e04fc9d", size = 3061940, upload-time = "2026-03-06T13:48:05.783Z" }, - { url = "https://files.pythonhosted.org/packages/89/84/06281c82d4d1686fde1ac6b0f307c50918f1c0151062445ab3b6fa5a921d/h5py-3.16.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:ff24039e2573297787c3063df64b60aab0591980ac898329a08b0320e0cf2527", size = 5198852, upload-time = "2026-03-06T13:48:07.482Z" }, - { url = "https://files.pythonhosted.org/packages/9e/e9/1a19e42cd43cc1365e127db6aae85e1c671da1d9a5d746f4d34a50edb577/h5py-3.16.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:dfc21898ff025f1e8e67e194965a95a8d4754f452f83454538f98f8a3fcb207e", size = 5405250, upload-time = "2026-03-06T13:48:09.628Z" }, - { url = "https://files.pythonhosted.org/packages/b7/8e/9790c1655eabeb85b92b1ecab7d7e62a2069e53baefd58c98f0909c7a948/h5py-3.16.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:698dd69291272642ffda44a0ecd6cd3bda5faf9621452d255f57ce91487b9794", size = 5190108, upload-time = "2026-03-06T13:48:11.26Z" }, - { url = "https://files.pythonhosted.org/packages/51/d7/ab693274f1bd7e8c5f9fdd6c7003a88d59bedeaf8752716a55f532924fbb/h5py-3.16.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:2b2c02b0a160faed5fb33f1ba8a264a37ee240b22e049ecc827345d0d9043074", size = 5419216, upload-time = "2026-03-06T13:48:13.322Z" }, - { url = "https://files.pythonhosted.org/packages/03/c1/0976b235cf29ead553e22f2fb6385a8252b533715e00d0ae52ed7b900582/h5py-3.16.0-cp312-cp312-win_amd64.whl", hash = "sha256:96b422019a1c8975c2d5dadcf61d4ba6f01c31f92bbde6e4649607885fe502d6", size = 3182868, upload-time = "2026-03-06T13:48:15.759Z" }, - { url = "https://files.pythonhosted.org/packages/14/d9/866b7e570b39070f92d47b0ff1800f0f8239b6f9e45f02363d7112336c1f/h5py-3.16.0-cp312-cp312-win_arm64.whl", hash = "sha256:39c2838fb1e8d97bcf1755e60ad1f3dd76a7b2a475928dc321672752678b96db", size = 2653286, upload-time = "2026-03-06T13:48:17.279Z" }, { url = "https://files.pythonhosted.org/packages/0f/9e/6142ebfda0cb6e9349c091eae73c2e01a770b7659255248d637bec54a88b/h5py-3.16.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:370a845f432c2c9619db8eed334d1e610c6015796122b0e57aa46312c22617d9", size = 3671808, upload-time = "2026-03-06T13:48:19.737Z" }, { url = "https://files.pythonhosted.org/packages/b0/65/5e088a45d0f43cd814bc5bec521c051d42005a472e804b1a36c48dada09b/h5py-3.16.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:42108e93326c50c2810025aade9eac9d6827524cdccc7d4b75a546e5ab308edb", size = 3045837, upload-time = "2026-03-06T13:48:21.854Z" }, { url = "https://files.pythonhosted.org/packages/da/1e/6172269e18cc5a484e2913ced33339aad588e02ba407fafd00d369e22ef3/h5py-3.16.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:099f2525c9dcf28de366970a5fb34879aab20491589fa89ce2863a84218bb524", size = 5193860, upload-time = "2026-03-06T13:48:24.071Z" }, @@ -613,16 +474,15 @@ name = "ipython" version = "8.38.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "colorama", marker = "python_full_version >= '3.11' and sys_platform == 'win32'" }, - { name = "decorator", marker = "python_full_version >= '3.11'" }, - { name = "jedi", marker = "python_full_version >= '3.11'" }, - { name = "matplotlib-inline", marker = "python_full_version >= '3.11'" }, - { name = "pexpect", marker = "python_full_version >= '3.11' and sys_platform != 'emscripten' and sys_platform != 'win32'" }, - { name = "prompt-toolkit", marker = "python_full_version >= '3.11'" }, - { name = "pygments", marker = "python_full_version >= '3.11'" }, - { name = "stack-data", marker = "python_full_version >= '3.11'" }, - { name = "traitlets", marker = "python_full_version >= '3.11'" }, - { name = "typing-extensions", marker = "python_full_version == '3.11.*'" }, + { name = "colorama", marker = "sys_platform == 'win32'" }, + { name = "decorator" }, + { name = "jedi" }, + { name = "matplotlib-inline" }, + { name = "pexpect", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, + { name = "prompt-toolkit" }, + { name = "pygments" }, + { name = "stack-data" }, + { name = "traitlets" }, ] sdist = { url = "https://files.pythonhosted.org/packages/e5/61/1810830e8b93c72dcd3c0f150c80a00c3deb229562d9423807ec92c3a539/ipython-8.38.0.tar.gz", hash = "sha256:9cfea8c903ce0867cc2f23199ed8545eb741f3a69420bfcf3743ad1cec856d39", size = 5513996, upload-time = "2026-01-05T10:59:06.901Z" } wheels = [ @@ -634,7 +494,7 @@ name = "jedi" version = "0.19.2" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "parso", marker = "python_full_version >= '3.11'" }, + { name = "parso" }, ] sdist = { url = "https://files.pythonhosted.org/packages/72/3a/79a912fbd4d8dd6fbb02bf69afd3bb72cf0c729bb3063c6f4498603db17a/jedi-0.19.2.tar.gz", hash = "sha256:4770dc3de41bde3966b02eb84fbcf557fb33cce26ad23da12c742fb50ecb11f0", size = 1231287, upload-time = "2024-11-11T01:41:42.873Z" } wheels = [ @@ -671,12 +531,26 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/c1/73/04df8a6fa66d43a9fd45c30f283cc4afff17da671886e451d52af60bdc7e/jsonpickle-4.1.1-py3-none-any.whl", hash = "sha256:bb141da6057898aa2438ff268362b126826c812a1721e31cf08a6e142910dc91", size = 47125, upload-time = "2025-06-02T20:36:08.647Z" }, ] +[[package]] +name = "l0-python" +version = "0.6.1" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "numpy" }, + { name = "scipy" }, + { name = "torch" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/2a/fe/3929e39c6e30b7b22730a2021cc108f00d0da611b48854eb67b0d49be94e/l0_python-0.6.1.tar.gz", hash = "sha256:8fbea10059813ef408255c93dcd5a61dfdd893612efb7e62c934a93f5701d45a", size = 37782, upload-time = "2026-02-25T16:59:39.84Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/1b/ea/28fb7d49b4113953a5938c8bd39904d4aa709b619710aa27311ccf11b669/l0_python-0.6.1-py3-none-any.whl", hash = "sha256:5a8282760bf4b48b1e7ad2e435a6878f15dcc614e97f5ec1aa5690c66510733e", size = 23912, upload-time = "2026-02-25T16:59:37.953Z" }, +] + [[package]] name = "mako" version = "1.3.10" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "markupsafe", marker = "python_full_version >= '3.12'" }, + { name = "markupsafe" }, ] sdist = { url = "https://files.pythonhosted.org/packages/9e/38/bd5b78a920a64d708fe6bc8e0a2c075e1389d53bef8413725c63ba041535/mako-1.3.10.tar.gz", hash = "sha256:99579a6f39583fa7e5630a28c3c1f440e4e97a414b80372649c0ce338da2ea28", size = 392474, upload-time = "2025-04-10T12:44:31.16Z" } wheels = [ @@ -701,39 +575,6 @@ version = "3.0.3" source = { registry = "https://pypi.org/simple" } sdist = { url = "https://files.pythonhosted.org/packages/7e/99/7690b6d4034fffd95959cbe0c02de8deb3098cc577c67bb6a24fe5d7caa7/markupsafe-3.0.3.tar.gz", hash = "sha256:722695808f4b6457b320fdc131280796bdceb04ab50fe1795cd540799ebe1698", size = 80313, upload-time = "2025-09-27T18:37:40.426Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/e8/4b/3541d44f3937ba468b75da9eebcae497dcf67adb65caa16760b0a6807ebb/markupsafe-3.0.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:2f981d352f04553a7171b8e44369f2af4055f888dfb147d55e42d29e29e74559", size = 11631, upload-time = "2025-09-27T18:36:05.558Z" }, - { url = "https://files.pythonhosted.org/packages/98/1b/fbd8eed11021cabd9226c37342fa6ca4e8a98d8188a8d9b66740494960e4/markupsafe-3.0.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e1c1493fb6e50ab01d20a22826e57520f1284df32f2d8601fdd90b6304601419", size = 12057, upload-time = "2025-09-27T18:36:07.165Z" }, - { url = "https://files.pythonhosted.org/packages/40/01/e560d658dc0bb8ab762670ece35281dec7b6c1b33f5fbc09ebb57a185519/markupsafe-3.0.3-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1ba88449deb3de88bd40044603fafffb7bc2b055d626a330323a9ed736661695", size = 22050, upload-time = "2025-09-27T18:36:08.005Z" }, - { url = "https://files.pythonhosted.org/packages/af/cd/ce6e848bbf2c32314c9b237839119c5a564a59725b53157c856e90937b7a/markupsafe-3.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:f42d0984e947b8adf7dd6dde396e720934d12c506ce84eea8476409563607591", size = 20681, upload-time = "2025-09-27T18:36:08.881Z" }, - { url = "https://files.pythonhosted.org/packages/c9/2a/b5c12c809f1c3045c4d580b035a743d12fcde53cf685dbc44660826308da/markupsafe-3.0.3-cp310-cp310-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c0c0b3ade1c0b13b936d7970b1d37a57acde9199dc2aecc4c336773e1d86049c", size = 20705, upload-time = "2025-09-27T18:36:10.131Z" }, - { url = "https://files.pythonhosted.org/packages/cf/e3/9427a68c82728d0a88c50f890d0fc072a1484de2f3ac1ad0bfc1a7214fd5/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:0303439a41979d9e74d18ff5e2dd8c43ed6c6001fd40e5bf2e43f7bd9bbc523f", size = 21524, upload-time = "2025-09-27T18:36:11.324Z" }, - { url = "https://files.pythonhosted.org/packages/bc/36/23578f29e9e582a4d0278e009b38081dbe363c5e7165113fad546918a232/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_riscv64.whl", hash = "sha256:d2ee202e79d8ed691ceebae8e0486bd9a2cd4794cec4824e1c99b6f5009502f6", size = 20282, upload-time = "2025-09-27T18:36:12.573Z" }, - { url = "https://files.pythonhosted.org/packages/56/21/dca11354e756ebd03e036bd8ad58d6d7168c80ce1fe5e75218e4945cbab7/markupsafe-3.0.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:177b5253b2834fe3678cb4a5f0059808258584c559193998be2601324fdeafb1", size = 20745, upload-time = "2025-09-27T18:36:13.504Z" }, - { url = "https://files.pythonhosted.org/packages/87/99/faba9369a7ad6e4d10b6a5fbf71fa2a188fe4a593b15f0963b73859a1bbd/markupsafe-3.0.3-cp310-cp310-win32.whl", hash = "sha256:2a15a08b17dd94c53a1da0438822d70ebcd13f8c3a95abe3a9ef9f11a94830aa", size = 14571, upload-time = "2025-09-27T18:36:14.779Z" }, - { url = "https://files.pythonhosted.org/packages/d6/25/55dc3ab959917602c96985cb1253efaa4ff42f71194bddeb61eb7278b8be/markupsafe-3.0.3-cp310-cp310-win_amd64.whl", hash = "sha256:c4ffb7ebf07cfe8931028e3e4c85f0357459a3f9f9490886198848f4fa002ec8", size = 15056, upload-time = "2025-09-27T18:36:16.125Z" }, - { url = "https://files.pythonhosted.org/packages/d0/9e/0a02226640c255d1da0b8d12e24ac2aa6734da68bff14c05dd53b94a0fc3/markupsafe-3.0.3-cp310-cp310-win_arm64.whl", hash = "sha256:e2103a929dfa2fcaf9bb4e7c091983a49c9ac3b19c9061b6d5427dd7d14d81a1", size = 13932, upload-time = "2025-09-27T18:36:17.311Z" }, - { url = "https://files.pythonhosted.org/packages/08/db/fefacb2136439fc8dd20e797950e749aa1f4997ed584c62cfb8ef7c2be0e/markupsafe-3.0.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:1cc7ea17a6824959616c525620e387f6dd30fec8cb44f649e31712db02123dad", size = 11631, upload-time = "2025-09-27T18:36:18.185Z" }, - { url = "https://files.pythonhosted.org/packages/e1/2e/5898933336b61975ce9dc04decbc0a7f2fee78c30353c5efba7f2d6ff27a/markupsafe-3.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:4bd4cd07944443f5a265608cc6aab442e4f74dff8088b0dfc8238647b8f6ae9a", size = 12058, upload-time = "2025-09-27T18:36:19.444Z" }, - { url = "https://files.pythonhosted.org/packages/1d/09/adf2df3699d87d1d8184038df46a9c80d78c0148492323f4693df54e17bb/markupsafe-3.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b5420a1d9450023228968e7e6a9ce57f65d148ab56d2313fcd589eee96a7a50", size = 24287, upload-time = "2025-09-27T18:36:20.768Z" }, - { url = "https://files.pythonhosted.org/packages/30/ac/0273f6fcb5f42e314c6d8cd99effae6a5354604d461b8d392b5ec9530a54/markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0bf2a864d67e76e5c9a34dc26ec616a66b9888e25e7b9460e1c76d3293bd9dbf", size = 22940, upload-time = "2025-09-27T18:36:22.249Z" }, - { url = "https://files.pythonhosted.org/packages/19/ae/31c1be199ef767124c042c6c3e904da327a2f7f0cd63a0337e1eca2967a8/markupsafe-3.0.3-cp311-cp311-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:bc51efed119bc9cfdf792cdeaa4d67e8f6fcccab66ed4bfdd6bde3e59bfcbb2f", size = 21887, upload-time = "2025-09-27T18:36:23.535Z" }, - { url = "https://files.pythonhosted.org/packages/b2/76/7edcab99d5349a4532a459e1fe64f0b0467a3365056ae550d3bcf3f79e1e/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:068f375c472b3e7acbe2d5318dea141359e6900156b5b2ba06a30b169086b91a", size = 23692, upload-time = "2025-09-27T18:36:24.823Z" }, - { url = "https://files.pythonhosted.org/packages/a4/28/6e74cdd26d7514849143d69f0bf2399f929c37dc2b31e6829fd2045b2765/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_riscv64.whl", hash = "sha256:7be7b61bb172e1ed687f1754f8e7484f1c8019780f6f6b0786e76bb01c2ae115", size = 21471, upload-time = "2025-09-27T18:36:25.95Z" }, - { url = "https://files.pythonhosted.org/packages/62/7e/a145f36a5c2945673e590850a6f8014318d5577ed7e5920a4b3448e0865d/markupsafe-3.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:f9e130248f4462aaa8e2552d547f36ddadbeaa573879158d721bbd33dfe4743a", size = 22923, upload-time = "2025-09-27T18:36:27.109Z" }, - { url = "https://files.pythonhosted.org/packages/0f/62/d9c46a7f5c9adbeeeda52f5b8d802e1094e9717705a645efc71b0913a0a8/markupsafe-3.0.3-cp311-cp311-win32.whl", hash = "sha256:0db14f5dafddbb6d9208827849fad01f1a2609380add406671a26386cdf15a19", size = 14572, upload-time = "2025-09-27T18:36:28.045Z" }, - { url = "https://files.pythonhosted.org/packages/83/8a/4414c03d3f891739326e1783338e48fb49781cc915b2e0ee052aa490d586/markupsafe-3.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:de8a88e63464af587c950061a5e6a67d3632e36df62b986892331d4620a35c01", size = 15077, upload-time = "2025-09-27T18:36:29.025Z" }, - { url = "https://files.pythonhosted.org/packages/35/73/893072b42e6862f319b5207adc9ae06070f095b358655f077f69a35601f0/markupsafe-3.0.3-cp311-cp311-win_arm64.whl", hash = "sha256:3b562dd9e9ea93f13d53989d23a7e775fdfd1066c33494ff43f5418bc8c58a5c", size = 13876, upload-time = "2025-09-27T18:36:29.954Z" }, - { url = "https://files.pythonhosted.org/packages/5a/72/147da192e38635ada20e0a2e1a51cf8823d2119ce8883f7053879c2199b5/markupsafe-3.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:d53197da72cc091b024dd97249dfc7794d6a56530370992a5e1a08983ad9230e", size = 11615, upload-time = "2025-09-27T18:36:30.854Z" }, - { url = "https://files.pythonhosted.org/packages/9a/81/7e4e08678a1f98521201c3079f77db69fb552acd56067661f8c2f534a718/markupsafe-3.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:1872df69a4de6aead3491198eaf13810b565bdbeec3ae2dc8780f14458ec73ce", size = 12020, upload-time = "2025-09-27T18:36:31.971Z" }, - { url = "https://files.pythonhosted.org/packages/1e/2c/799f4742efc39633a1b54a92eec4082e4f815314869865d876824c257c1e/markupsafe-3.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:3a7e8ae81ae39e62a41ec302f972ba6ae23a5c5396c8e60113e9066ef893da0d", size = 24332, upload-time = "2025-09-27T18:36:32.813Z" }, - { url = "https://files.pythonhosted.org/packages/3c/2e/8d0c2ab90a8c1d9a24f0399058ab8519a3279d1bd4289511d74e909f060e/markupsafe-3.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d6dd0be5b5b189d31db7cda48b91d7e0a9795f31430b7f271219ab30f1d3ac9d", size = 22947, upload-time = "2025-09-27T18:36:33.86Z" }, - { url = "https://files.pythonhosted.org/packages/2c/54/887f3092a85238093a0b2154bd629c89444f395618842e8b0c41783898ea/markupsafe-3.0.3-cp312-cp312-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:94c6f0bb423f739146aec64595853541634bde58b2135f27f61c1ffd1cd4d16a", size = 21962, upload-time = "2025-09-27T18:36:35.099Z" }, - { url = "https://files.pythonhosted.org/packages/c9/2f/336b8c7b6f4a4d95e91119dc8521402461b74a485558d8f238a68312f11c/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:be8813b57049a7dc738189df53d69395eba14fb99345e0a5994914a3864c8a4b", size = 23760, upload-time = "2025-09-27T18:36:36.001Z" }, - { url = "https://files.pythonhosted.org/packages/32/43/67935f2b7e4982ffb50a4d169b724d74b62a3964bc1a9a527f5ac4f1ee2b/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_riscv64.whl", hash = "sha256:83891d0e9fb81a825d9a6d61e3f07550ca70a076484292a70fde82c4b807286f", size = 21529, upload-time = "2025-09-27T18:36:36.906Z" }, - { url = "https://files.pythonhosted.org/packages/89/e0/4486f11e51bbba8b0c041098859e869e304d1c261e59244baa3d295d47b7/markupsafe-3.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:77f0643abe7495da77fb436f50f8dab76dbc6e5fd25d39589a0f1fe6548bfa2b", size = 23015, upload-time = "2025-09-27T18:36:37.868Z" }, - { url = "https://files.pythonhosted.org/packages/2f/e1/78ee7a023dac597a5825441ebd17170785a9dab23de95d2c7508ade94e0e/markupsafe-3.0.3-cp312-cp312-win32.whl", hash = "sha256:d88b440e37a16e651bda4c7c2b930eb586fd15ca7406cb39e211fcff3bf3017d", size = 14540, upload-time = "2025-09-27T18:36:38.761Z" }, - { url = "https://files.pythonhosted.org/packages/aa/5b/bec5aa9bbbb2c946ca2733ef9c4ca91c91b6a24580193e891b5f7dbe8e1e/markupsafe-3.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:26a5784ded40c9e318cfc2bdb30fe164bdb8665ded9cd64d500a34fb42067b1c", size = 15105, upload-time = "2025-09-27T18:36:39.701Z" }, - { url = "https://files.pythonhosted.org/packages/e5/f1/216fc1bbfd74011693a4fd837e7026152e89c4bcf3e77b6692fba9923123/markupsafe-3.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:35add3b638a5d900e807944a078b51922212fb3dedb01633a8defc4b01a3c85f", size = 13906, upload-time = "2025-09-27T18:36:40.689Z" }, { url = "https://files.pythonhosted.org/packages/38/2f/907b9c7bbba283e68f20259574b13d005c121a0fa4c175f9bed27c4597ff/markupsafe-3.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:e1cf1972137e83c5d4c136c43ced9ac51d0e124706ee1c8aa8532c1287fa8795", size = 11622, upload-time = "2025-09-27T18:36:41.777Z" }, { url = "https://files.pythonhosted.org/packages/9c/d9/5f7756922cdd676869eca1c4e3c0cd0df60ed30199ffd775e319089cb3ed/markupsafe-3.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:116bb52f642a37c115f517494ea5feb03889e04df47eeff5b130b1808ce7c219", size = 12029, upload-time = "2025-09-27T18:36:43.257Z" }, { url = "https://files.pythonhosted.org/packages/00/07/575a68c754943058c78f30db02ee03a64b3c638586fba6a6dd56830b30a3/markupsafe-3.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:133a43e73a802c5562be9bbcd03d090aa5a1fe899db609c29e8c8d815c5f6de6", size = 24374, upload-time = "2025-09-27T18:36:44.508Z" }, @@ -785,7 +626,7 @@ name = "matplotlib-inline" version = "0.2.1" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "traitlets", marker = "python_full_version >= '3.11'" }, + { name = "traitlets" }, ] sdist = { url = "https://files.pythonhosted.org/packages/c7/74/97e72a36efd4ae2bccb3463284300f8953f199b5ffbc04cbbb0ec78f74b1/matplotlib_inline-0.2.1.tar.gz", hash = "sha256:e1ee949c340d771fc39e241ea75683deb94762c8fa5f2927ec57c83c4dffa9fe", size = 8110, upload-time = "2025-10-23T09:00:22.126Z" } wheels = [ @@ -801,13 +642,30 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979, upload-time = "2022-08-14T12:40:09.779Z" }, ] +[[package]] +name = "microcalibrate" +version = "0.22.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "l0-python" }, + { name = "numpy" }, + { name = "optuna" }, + { name = "pandas" }, + { name = "torch" }, + { name = "tqdm" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/b7/11/dc170c33ab42a1c6437c9094696c149ec780161a2cdb2630b6a70c8234dc/microcalibrate-0.22.0.tar.gz", hash = "sha256:360eb241156f3731902a9aa73aea1d39437d97a6a40db1ddd0ab85ef636596ea", size = 216545, upload-time = "2026-04-18T15:21:59.591Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/3b/7f/36882ae748084bb7e570417cb81f2791a2d3f29fddeeaa7616c2a100c8ad/microcalibrate-0.22.0-py3-none-any.whl", hash = "sha256:c713220bfe24661fd3fba9d94ccf4352c1b961f7f7a1871d437ac15527dcf431", size = 31563, upload-time = "2026-04-18T15:21:58.69Z" }, +] + [[package]] name = "microdf-python" version = "1.2.3" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "numpy" }, + { name = "pandas" }, ] sdist = { url = "https://files.pythonhosted.org/packages/dd/70/29702ec0d482efb08049a7bec4ebfc8dc4754bf088fe7491a0260aa050ad/microdf_python-1.2.3.tar.gz", hash = "sha256:86b72532ade5fa78d12c6e05dee029206ba7f19f17a9744db6a92d3c9567e756", size = 20089, upload-time = "2026-03-06T12:50:48.02Z" } wheels = [ @@ -819,19 +677,19 @@ name = "microimpute" version = "1.15.1" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "joblib", marker = "python_full_version >= '3.12'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, - { name = "optuna", marker = "python_full_version >= '3.12'" }, - { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, - { name = "plotly", marker = "python_full_version >= '3.12'" }, - { name = "psutil", marker = "python_full_version >= '3.12'" }, - { name = "pydantic", marker = "python_full_version >= '3.12'" }, - { name = "quantile-forest", marker = "python_full_version >= '3.12'" }, - { name = "requests", marker = "python_full_version >= '3.12'" }, - { name = "scikit-learn", version = "1.8.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, - { name = "scipy", version = "1.17.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, - { name = "statsmodels", marker = "python_full_version >= '3.12'" }, - { name = "tqdm", marker = "python_full_version >= '3.12'" }, + { name = "joblib" }, + { name = "numpy" }, + { name = "optuna" }, + { name = "pandas" }, + { name = "plotly" }, + { name = "psutil" }, + { name = "pydantic" }, + { name = "quantile-forest" }, + { name = "requests" }, + { name = "scikit-learn" }, + { name = "scipy" }, + { name = "statsmodels" }, + { name = "tqdm" }, ] sdist = { url = "https://files.pythonhosted.org/packages/97/17/d621d4ed40e0afac6f1a2c4dea423783576613820d1460ae30d65c48309e/microimpute-1.15.1.tar.gz", hash = "sha256:af409525d475efeb8c8526e9630834c4f16563e15cd42665117d2a1397fcf404", size = 128669, upload-time = "2026-03-09T15:59:33.885Z" } wheels = [ @@ -843,38 +701,39 @@ name = "microplex" version = "0.1.0" source = { editable = "../microplex" } dependencies = [ - { name = "h5py" }, { name = "httpx" }, { name = "huggingface-hub" }, - { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "pandas", version = "2.3.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "numpy" }, + { name = "pandas" }, { name = "polars" }, { name = "prdc" }, { name = "pyarrow" }, { name = "pydantic" }, { name = "pyyaml" }, { name = "quantile-forest" }, - { name = "scikit-learn", version = "1.7.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "scikit-learn", version = "1.8.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "scipy", version = "1.17.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "scikit-learn" }, + { name = "scipy" }, { name = "torch" }, ] +[package.optional-dependencies] +calibrate = [ + { name = "microcalibrate" }, +] + [package.metadata] requires-dist = [ { name = "cvxpy", marker = "extra == 'cvxpy'", specifier = ">=1.3" }, - { name = "h5py", specifier = ">=3.8" }, { name = "httpx", specifier = ">=0.25" }, { name = "huggingface-hub", specifier = ">=0.20" }, { name = "jupyter-book", marker = "extra == 'docs'", specifier = ">=0.15" }, + { name = "l0-python", marker = "extra == 'l0'", specifier = ">=0.4" }, { name = "matplotlib", marker = "extra == 'benchmark'", specifier = ">=3.7" }, - { name = "microplex", extras = ["dev", "benchmark", "docs"], marker = "extra == 'all'" }, + { name = "microcalibrate", marker = "python_full_version >= '3.13' and extra == 'calibrate'", specifier = ">=0.22" }, + { name = "microplex", extras = ["dev", "benchmark", "docs", "calibrate"], marker = "extra == 'all'" }, { name = "mypy", marker = "extra == 'dev'", specifier = ">=1.0" }, { name = "myst-nb", marker = "extra == 'docs'", specifier = ">=0.17" }, - { name = "numpy", specifier = ">=1.24,!=2.4.0" }, + { name = "numpy", specifier = ">=1.24" }, { name = "pandas", specifier = ">=2.0" }, { name = "polars", specifier = ">=0.20" }, { name = "prdc", specifier = ">=0.1" }, @@ -895,7 +754,7 @@ requires-dist = [ { name = "sphinx-autodoc-typehints", marker = "extra == 'docs'", specifier = ">=1.23" }, { name = "torch", specifier = ">=2.0" }, ] -provides-extras = ["dev", "cvxpy", "statmatch", "l0", "benchmark", "docs", "all"] +provides-extras = ["dev", "cvxpy", "statmatch", "l0", "calibrate", "benchmark", "docs", "all"] [[package]] name = "microplex-us" @@ -903,7 +762,7 @@ version = "0.2.0" source = { editable = "." } dependencies = [ { name = "duckdb" }, - { name = "microplex" }, + { name = "microplex", extra = ["calibrate"] }, ] [package.optional-dependencies] @@ -912,15 +771,15 @@ dev = [ { name = "ruff" }, ] policyengine = [ - { name = "microimpute", marker = "python_full_version >= '3.12' and python_full_version < '3.15'" }, - { name = "policyengine-us", marker = "python_full_version >= '3.11' and python_full_version < '3.15'" }, + { name = "microimpute", marker = "python_full_version < '3.15'" }, + { name = "policyengine-us", marker = "python_full_version < '3.15'" }, ] [package.metadata] requires-dist = [ { name = "duckdb", specifier = ">=1.2" }, { name = "microimpute", marker = "python_full_version >= '3.12' and python_full_version < '3.15' and extra == 'policyengine'", specifier = "==1.15.1" }, - { name = "microplex", editable = "../microplex" }, + { name = "microplex", extras = ["calibrate"], editable = "../microplex" }, { name = "policyengine-us", marker = "python_full_version >= '3.11' and python_full_version < '3.15' and extra == 'policyengine'", specifier = "==1.587.0" }, { name = "pytest", marker = "extra == 'dev'", specifier = ">=7.0" }, { name = "ruff", marker = "extra == 'dev'", specifier = ">=0.1" }, @@ -936,33 +795,10 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl", hash = "sha256:a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c", size = 536198, upload-time = "2023-03-07T16:47:09.197Z" }, ] -[[package]] -name = "networkx" -version = "3.4.2" -source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version < '3.11'", -] -sdist = { url = "https://files.pythonhosted.org/packages/fd/1d/06475e1cd5264c0b870ea2cc6fdb3e37177c1e565c43f56ff17a10e3937f/networkx-3.4.2.tar.gz", hash = "sha256:307c3669428c5362aab27c8a1260aa8f47c4e91d3891f48be0141738d8d053e1", size = 2151368, upload-time = "2024-10-21T12:39:38.695Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/b9/54/dd730b32ea14ea797530a4479b2ed46a6fb250f682a9cfb997e968bf0261/networkx-3.4.2-py3-none-any.whl", hash = "sha256:df5d4365b724cf81b8c6a7312509d0c22386097011ad1abe274afd5e9d3bbc5f", size = 1723263, upload-time = "2024-10-21T12:39:36.247Z" }, -] - [[package]] name = "networkx" version = "3.6.1" source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version >= '3.14' and sys_platform == 'win32'", - "python_full_version >= '3.14' and sys_platform == 'emscripten'", - "python_full_version >= '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'win32'", - "python_full_version == '3.11.*' and sys_platform == 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'emscripten'", - "python_full_version == '3.11.*' and sys_platform == 'emscripten'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version == '3.11.*' and sys_platform != 'emscripten' and sys_platform != 'win32'", -] sdist = { url = "https://files.pythonhosted.org/packages/6a/51/63fe664f3908c97be9d2e4f1158eb633317598cfa6e1fc14af5383f17512/networkx-3.6.1.tar.gz", hash = "sha256:26b7c357accc0c8cde558ad486283728b65b6a95d85ee1cd66bafab4c8168509", size = 2517025, upload-time = "2025-12-08T17:02:39.908Z" } wheels = [ { url = "https://files.pythonhosted.org/packages/9e/c9/b2622292ea83fbb4ec318f5b9ab867d0a28ab43c5717bb85b0a5f6b3b0a4/networkx-3.6.1-py3-none-any.whl", hash = "sha256:d47fbf302e7d9cbbb9e2555a0d267983d2aa476bac30e90dfbe5669bd57f3762", size = 2068504, upload-time = "2025-12-08T17:02:38.159Z" }, @@ -973,34 +809,10 @@ name = "numexpr" version = "2.14.1" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "numpy" }, ] sdist = { url = "https://files.pythonhosted.org/packages/cb/2f/fdba158c9dbe5caca9c3eca3eaffffb251f2fb8674bf8e2d0aed5f38d319/numexpr-2.14.1.tar.gz", hash = "sha256:4be00b1086c7b7a5c32e31558122b7b80243fe098579b170967da83f3152b48b", size = 119400, upload-time = "2025-10-13T16:17:27.351Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/db/91/ccd504cbe5b88d06987c77f42ba37a13ef05065fdab4afe6dcfeb2961faf/numexpr-2.14.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:d0fab3fd06a04f6b86102552b26aa5d85e20ac7d8296c15764c726eeabae6cc8", size = 163200, upload-time = "2025-10-13T16:16:25.47Z" }, - { url = "https://files.pythonhosted.org/packages/f3/89/6b07977baf2af75fb6692f9e7a1fb612a15f600fc921f3f565366de01f4a/numexpr-2.14.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:64ae5dfd62d74a3ef82fe0b37f80527247f3626171ad82025900f46ffca4b39a", size = 152085, upload-time = "2025-10-13T16:16:29.508Z" }, - { url = "https://files.pythonhosted.org/packages/28/c2/c5775541256c4bf16b4d88fa1cffa74a0126703e513093c8774d911b0bb7/numexpr-2.14.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:955c92b064f9074d2970cf3138f5e3b965be673b82024962ed526f39bc25a920", size = 449435, upload-time = "2025-10-13T16:13:16.257Z" }, - { url = "https://files.pythonhosted.org/packages/34/d4/d1a410901c620f7a6a3c5c2b1fc9dab22170be05a89d2c02ae699e27bd3f/numexpr-2.14.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:75440c54fc01e130396650fdf307aa9d41a67dc06ddbfb288971b591c13a395b", size = 440197, upload-time = "2025-10-13T16:14:44.109Z" }, - { url = "https://files.pythonhosted.org/packages/ac/c8/fa85f0cc5c39db587ba4927b862a92477c017ee8476e415e8120a100457b/numexpr-2.14.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:dde9fa47ed319e1e1728940a539df3cb78326b7754bc7c6ab3152afc91808f9b", size = 1414125, upload-time = "2025-10-13T16:13:19.882Z" }, - { url = "https://files.pythonhosted.org/packages/08/72/a58ddc05e0eabb3fa8d3fcd319f3d97870e6b41520832acfd04a6734c2c0/numexpr-2.14.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:76db0bc6267e591ab9c4df405ffb533598e4c88239db7338d11ae9e4b368a85a", size = 1463041, upload-time = "2025-10-13T16:14:47.502Z" }, - { url = "https://files.pythonhosted.org/packages/c4/c5/bdd1862302bb71a78dba941eaf7060e1274f1cf6af2d1b0f1880bfcb289b/numexpr-2.14.1-cp310-cp310-win32.whl", hash = "sha256:0d1dcbdc4d0374c0d523cee2f94f06b001623cbc1fd163612841017a3495427c", size = 166833, upload-time = "2025-10-13T16:17:03.543Z" }, - { url = "https://files.pythonhosted.org/packages/18/af/26773a246716922794388786529e5640676399efabb0ee217ce034df9d27/numexpr-2.14.1-cp310-cp310-win_amd64.whl", hash = "sha256:823cd82c8e7937981339f634e7a9c6a92cb2d0b9d0a5cf627a5e394fffc05377", size = 160068, upload-time = "2025-10-13T16:17:05.191Z" }, - { url = "https://files.pythonhosted.org/packages/b2/a3/67999bdd1ed1f938d38f3fedd4969632f2f197b090e50505f7cc1fa82510/numexpr-2.14.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:2d03fcb4644a12f70a14d74006f72662824da5b6128bf1bcd10cc3ed80e64c34", size = 163195, upload-time = "2025-10-13T16:16:31.212Z" }, - { url = "https://files.pythonhosted.org/packages/25/95/d64f680ea1fc56d165457287e0851d6708800f9fcea346fc1b9957942ee6/numexpr-2.14.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2773ee1133f77009a1fc2f34fe236f3d9823779f5f75450e183137d49f00499f", size = 152088, upload-time = "2025-10-13T16:16:33.186Z" }, - { url = "https://files.pythonhosted.org/packages/0e/7f/3bae417cb13ae08afd86d08bb0301c32440fe0cae4e6262b530e0819aeda/numexpr-2.14.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ebe4980f9494b9f94d10d2e526edc29e72516698d3bf95670ba79415492212a4", size = 451126, upload-time = "2025-10-13T16:13:22.248Z" }, - { url = "https://files.pythonhosted.org/packages/4c/1a/edbe839109518364ac0bd9e918cf874c755bb2c128040e920f198c494263/numexpr-2.14.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2a381e5e919a745c9503bcefffc1c7f98c972c04ec58fc8e999ed1a929e01ba6", size = 442012, upload-time = "2025-10-13T16:14:51.416Z" }, - { url = "https://files.pythonhosted.org/packages/66/b1/be4ce99bff769a5003baddac103f34681997b31d4640d5a75c0e8ed59c78/numexpr-2.14.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d08856cfc1b440eb1caaa60515235369654321995dd68eb9377577392020f6cb", size = 1415975, upload-time = "2025-10-13T16:13:26.088Z" }, - { url = "https://files.pythonhosted.org/packages/e7/33/b33b8fdc032a05d9ebb44a51bfcd4b92c178a2572cd3e6c1b03d8a4b45b2/numexpr-2.14.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:03130afa04edf83a7b590d207444f05a00363c9b9ea5d81c0f53b1ea13fad55a", size = 1464683, upload-time = "2025-10-13T16:14:58.87Z" }, - { url = "https://files.pythonhosted.org/packages/d0/b2/ddcf0ac6cf0a1d605e5aecd4281507fd79a9628a67896795ab2e975de5df/numexpr-2.14.1-cp311-cp311-win32.whl", hash = "sha256:db78fa0c9fcbaded3ae7453faf060bd7a18b0dc10299d7fcd02d9362be1213ed", size = 166838, upload-time = "2025-10-13T16:17:06.765Z" }, - { url = "https://files.pythonhosted.org/packages/64/72/4ca9bd97b2eb6dce9f5e70a3b6acec1a93e1fb9b079cb4cba2cdfbbf295d/numexpr-2.14.1-cp311-cp311-win_amd64.whl", hash = "sha256:e9b2f957798c67a2428be96b04bce85439bed05efe78eb78e4c2ca43737578e7", size = 160069, upload-time = "2025-10-13T16:17:08.752Z" }, - { url = "https://files.pythonhosted.org/packages/9d/20/c473fc04a371f5e2f8c5749e04505c13e7a8ede27c09e9f099b2ad6f43d6/numexpr-2.14.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:91ebae0ab18c799b0e6b8c5a8d11e1fa3848eb4011271d99848b297468a39430", size = 162790, upload-time = "2025-10-13T16:16:34.903Z" }, - { url = "https://files.pythonhosted.org/packages/45/93/b6760dd1904c2a498e5f43d1bb436f59383c3ddea3815f1461dfaa259373/numexpr-2.14.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:47041f2f7b9e69498fb311af672ba914a60e6e6d804011caacb17d66f639e659", size = 152196, upload-time = "2025-10-13T16:16:36.593Z" }, - { url = "https://files.pythonhosted.org/packages/72/94/cc921e35593b820521e464cbbeaf8212bbdb07f16dc79fe283168df38195/numexpr-2.14.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d686dfb2c1382d9e6e0ee0b7647f943c1886dba3adbf606c625479f35f1956c1", size = 452468, upload-time = "2025-10-13T16:13:29.531Z" }, - { url = "https://files.pythonhosted.org/packages/d9/43/560e9ba23c02c904b5934496486d061bcb14cd3ebba2e3cf0e2dccb6c22b/numexpr-2.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:eee6d4fbbbc368e6cdd0772734d6249128d957b3b8ad47a100789009f4de7083", size = 443631, upload-time = "2025-10-13T16:15:02.473Z" }, - { url = "https://files.pythonhosted.org/packages/7b/6c/78f83b6219f61c2c22d71ab6e6c2d4e5d7381334c6c29b77204e59edb039/numexpr-2.14.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:3a2839efa25f3c8d4133252ea7342d8f81226c7c4dda81f97a57e090b9d87a48", size = 1417670, upload-time = "2025-10-13T16:13:33.464Z" }, - { url = "https://files.pythonhosted.org/packages/0e/bb/1ccc9dcaf46281568ce769888bf16294c40e98a5158e4b16c241de31d0d3/numexpr-2.14.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:9f9137f1351b310436662b5dc6f4082a245efa8950c3b0d9008028df92fefb9b", size = 1466212, upload-time = "2025-10-13T16:15:12.828Z" }, - { url = "https://files.pythonhosted.org/packages/31/9f/203d82b9e39dadd91d64bca55b3c8ca432e981b822468dcef41a4418626b/numexpr-2.14.1-cp312-cp312-win32.whl", hash = "sha256:36f8d5c1bd1355df93b43d766790f9046cccfc1e32b7c6163f75bcde682cda07", size = 166996, upload-time = "2025-10-13T16:17:10.369Z" }, - { url = "https://files.pythonhosted.org/packages/1f/67/ffe750b5452eb66de788c34e7d21ec6d886abb4d7c43ad1dc88ceb3d998f/numexpr-2.14.1-cp312-cp312-win_amd64.whl", hash = "sha256:fdd886f4b7dbaf167633ee396478f0d0aa58ea2f9e7ccc3c6431019623e8d68f", size = 160187, upload-time = "2025-10-13T16:17:11.974Z" }, { url = "https://files.pythonhosted.org/packages/73/b4/9f6d637fd79df42be1be29ee7ba1f050fab63b7182cb922a0e08adc12320/numexpr-2.14.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:09078ba73cffe94745abfbcc2d81ab8b4b4e9d7bfbbde6cac2ee5dbf38eee222", size = 162794, upload-time = "2025-10-13T16:16:38.291Z" }, { url = "https://files.pythonhosted.org/packages/35/ae/d58558d8043de0c49f385ea2fa789e3cfe4d436c96be80200c5292f45f15/numexpr-2.14.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:dce0b5a0447baa7b44bc218ec2d7dcd175b8eee6083605293349c0c1d9b82fb6", size = 152203, upload-time = "2025-10-13T16:16:39.907Z" }, { url = "https://files.pythonhosted.org/packages/13/65/72b065f9c75baf8f474fd5d2b768350935989d4917db1c6c75b866d4067c/numexpr-2.14.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:06855053de7a3a8425429bd996e8ae3c50b57637ad3e757e0fa0602a7874be30", size = 455860, upload-time = "2025-10-13T16:13:35.811Z" }, @@ -1035,110 +847,12 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/41/a2/5a1a2c72528b429337f49911b18c302ecd36eeab00f409147e1aa4ae4519/numexpr-2.14.1-cp314-cp314t-win_amd64.whl", hash = "sha256:a40b350cd45b4446076fa11843fa32bbe07024747aeddf6d467290bf9011b392", size = 163589, upload-time = "2025-10-13T16:17:25.696Z" }, ] -[[package]] -name = "numpy" -version = "2.2.6" -source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version < '3.11'", -] -sdist = { url = "https://files.pythonhosted.org/packages/76/21/7d2a95e4bba9dc13d043ee156a356c0a8f0c6309dff6b21b4d71a073b8a8/numpy-2.2.6.tar.gz", hash = "sha256:e29554e2bef54a90aa5cc07da6ce955accb83f21ab5de01a62c8478897b264fd", size = 20276440, upload-time = "2025-05-17T22:38:04.611Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/9a/3e/ed6db5be21ce87955c0cbd3009f2803f59fa08df21b5df06862e2d8e2bdd/numpy-2.2.6-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:b412caa66f72040e6d268491a59f2c43bf03eb6c96dd8f0307829feb7fa2b6fb", size = 21165245, upload-time = "2025-05-17T21:27:58.555Z" }, - { url = "https://files.pythonhosted.org/packages/22/c2/4b9221495b2a132cc9d2eb862e21d42a009f5a60e45fc44b00118c174bff/numpy-2.2.6-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:8e41fd67c52b86603a91c1a505ebaef50b3314de0213461c7a6e99c9a3beff90", size = 14360048, upload-time = "2025-05-17T21:28:21.406Z" }, - { url = "https://files.pythonhosted.org/packages/fd/77/dc2fcfc66943c6410e2bf598062f5959372735ffda175b39906d54f02349/numpy-2.2.6-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:37e990a01ae6ec7fe7fa1c26c55ecb672dd98b19c3d0e1d1f326fa13cb38d163", size = 5340542, upload-time = "2025-05-17T21:28:30.931Z" }, - { url = "https://files.pythonhosted.org/packages/7a/4f/1cb5fdc353a5f5cc7feb692db9b8ec2c3d6405453f982435efc52561df58/numpy-2.2.6-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:5a6429d4be8ca66d889b7cf70f536a397dc45ba6faeb5f8c5427935d9592e9cf", size = 6878301, upload-time = "2025-05-17T21:28:41.613Z" }, - { url = "https://files.pythonhosted.org/packages/eb/17/96a3acd228cec142fcb8723bd3cc39c2a474f7dcf0a5d16731980bcafa95/numpy-2.2.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:efd28d4e9cd7d7a8d39074a4d44c63eda73401580c5c76acda2ce969e0a38e83", size = 14297320, upload-time = "2025-05-17T21:29:02.78Z" }, - { url = "https://files.pythonhosted.org/packages/b4/63/3de6a34ad7ad6646ac7d2f55ebc6ad439dbbf9c4370017c50cf403fb19b5/numpy-2.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fc7b73d02efb0e18c000e9ad8b83480dfcd5dfd11065997ed4c6747470ae8915", size = 16801050, upload-time = "2025-05-17T21:29:27.675Z" }, - { url = "https://files.pythonhosted.org/packages/07/b6/89d837eddef52b3d0cec5c6ba0456c1bf1b9ef6a6672fc2b7873c3ec4e2e/numpy-2.2.6-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:74d4531beb257d2c3f4b261bfb0fc09e0f9ebb8842d82a7b4209415896adc680", size = 15807034, upload-time = "2025-05-17T21:29:51.102Z" }, - { url = "https://files.pythonhosted.org/packages/01/c8/dc6ae86e3c61cfec1f178e5c9f7858584049b6093f843bca541f94120920/numpy-2.2.6-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:8fc377d995680230e83241d8a96def29f204b5782f371c532579b4f20607a289", size = 18614185, upload-time = "2025-05-17T21:30:18.703Z" }, - { url = "https://files.pythonhosted.org/packages/5b/c5/0064b1b7e7c89137b471ccec1fd2282fceaae0ab3a9550f2568782d80357/numpy-2.2.6-cp310-cp310-win32.whl", hash = "sha256:b093dd74e50a8cba3e873868d9e93a85b78e0daf2e98c6797566ad8044e8363d", size = 6527149, upload-time = "2025-05-17T21:30:29.788Z" }, - { url = "https://files.pythonhosted.org/packages/a3/dd/4b822569d6b96c39d1215dbae0582fd99954dcbcf0c1a13c61783feaca3f/numpy-2.2.6-cp310-cp310-win_amd64.whl", hash = "sha256:f0fd6321b839904e15c46e0d257fdd101dd7f530fe03fd6359c1ea63738703f3", size = 12904620, upload-time = "2025-05-17T21:30:48.994Z" }, - { url = "https://files.pythonhosted.org/packages/da/a8/4f83e2aa666a9fbf56d6118faaaf5f1974d456b1823fda0a176eff722839/numpy-2.2.6-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:f9f1adb22318e121c5c69a09142811a201ef17ab257a1e66ca3025065b7f53ae", size = 21176963, upload-time = "2025-05-17T21:31:19.36Z" }, - { url = "https://files.pythonhosted.org/packages/b3/2b/64e1affc7972decb74c9e29e5649fac940514910960ba25cd9af4488b66c/numpy-2.2.6-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:c820a93b0255bc360f53eca31a0e676fd1101f673dda8da93454a12e23fc5f7a", size = 14406743, upload-time = "2025-05-17T21:31:41.087Z" }, - { url = "https://files.pythonhosted.org/packages/4a/9f/0121e375000b5e50ffdd8b25bf78d8e1a5aa4cca3f185d41265198c7b834/numpy-2.2.6-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:3d70692235e759f260c3d837193090014aebdf026dfd167834bcba43e30c2a42", size = 5352616, upload-time = "2025-05-17T21:31:50.072Z" }, - { url = "https://files.pythonhosted.org/packages/31/0d/b48c405c91693635fbe2dcd7bc84a33a602add5f63286e024d3b6741411c/numpy-2.2.6-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:481b49095335f8eed42e39e8041327c05b0f6f4780488f61286ed3c01368d491", size = 6889579, upload-time = "2025-05-17T21:32:01.712Z" }, - { url = "https://files.pythonhosted.org/packages/52/b8/7f0554d49b565d0171eab6e99001846882000883998e7b7d9f0d98b1f934/numpy-2.2.6-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:b64d8d4d17135e00c8e346e0a738deb17e754230d7e0810ac5012750bbd85a5a", size = 14312005, upload-time = "2025-05-17T21:32:23.332Z" }, - { url = "https://files.pythonhosted.org/packages/b3/dd/2238b898e51bd6d389b7389ffb20d7f4c10066d80351187ec8e303a5a475/numpy-2.2.6-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ba10f8411898fc418a521833e014a77d3ca01c15b0c6cdcce6a0d2897e6dbbdf", size = 16821570, upload-time = "2025-05-17T21:32:47.991Z" }, - { url = "https://files.pythonhosted.org/packages/83/6c/44d0325722cf644f191042bf47eedad61c1e6df2432ed65cbe28509d404e/numpy-2.2.6-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:bd48227a919f1bafbdda0583705e547892342c26fb127219d60a5c36882609d1", size = 15818548, upload-time = "2025-05-17T21:33:11.728Z" }, - { url = "https://files.pythonhosted.org/packages/ae/9d/81e8216030ce66be25279098789b665d49ff19eef08bfa8cb96d4957f422/numpy-2.2.6-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:9551a499bf125c1d4f9e250377c1ee2eddd02e01eac6644c080162c0c51778ab", size = 18620521, upload-time = "2025-05-17T21:33:39.139Z" }, - { url = "https://files.pythonhosted.org/packages/6a/fd/e19617b9530b031db51b0926eed5345ce8ddc669bb3bc0044b23e275ebe8/numpy-2.2.6-cp311-cp311-win32.whl", hash = "sha256:0678000bb9ac1475cd454c6b8c799206af8107e310843532b04d49649c717a47", size = 6525866, upload-time = "2025-05-17T21:33:50.273Z" }, - { url = "https://files.pythonhosted.org/packages/31/0a/f354fb7176b81747d870f7991dc763e157a934c717b67b58456bc63da3df/numpy-2.2.6-cp311-cp311-win_amd64.whl", hash = "sha256:e8213002e427c69c45a52bbd94163084025f533a55a59d6f9c5b820774ef3303", size = 12907455, upload-time = "2025-05-17T21:34:09.135Z" }, - { url = "https://files.pythonhosted.org/packages/82/5d/c00588b6cf18e1da539b45d3598d3557084990dcc4331960c15ee776ee41/numpy-2.2.6-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:41c5a21f4a04fa86436124d388f6ed60a9343a6f767fced1a8a71c3fbca038ff", size = 20875348, upload-time = "2025-05-17T21:34:39.648Z" }, - { url = "https://files.pythonhosted.org/packages/66/ee/560deadcdde6c2f90200450d5938f63a34b37e27ebff162810f716f6a230/numpy-2.2.6-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:de749064336d37e340f640b05f24e9e3dd678c57318c7289d222a8a2f543e90c", size = 14119362, upload-time = "2025-05-17T21:35:01.241Z" }, - { url = "https://files.pythonhosted.org/packages/3c/65/4baa99f1c53b30adf0acd9a5519078871ddde8d2339dc5a7fde80d9d87da/numpy-2.2.6-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:894b3a42502226a1cac872f840030665f33326fc3dac8e57c607905773cdcde3", size = 5084103, upload-time = "2025-05-17T21:35:10.622Z" }, - { url = "https://files.pythonhosted.org/packages/cc/89/e5a34c071a0570cc40c9a54eb472d113eea6d002e9ae12bb3a8407fb912e/numpy-2.2.6-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:71594f7c51a18e728451bb50cc60a3ce4e6538822731b2933209a1f3614e9282", size = 6625382, upload-time = "2025-05-17T21:35:21.414Z" }, - { url = "https://files.pythonhosted.org/packages/f8/35/8c80729f1ff76b3921d5c9487c7ac3de9b2a103b1cd05e905b3090513510/numpy-2.2.6-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f2618db89be1b4e05f7a1a847a9c1c0abd63e63a1607d892dd54668dd92faf87", size = 14018462, upload-time = "2025-05-17T21:35:42.174Z" }, - { url = "https://files.pythonhosted.org/packages/8c/3d/1e1db36cfd41f895d266b103df00ca5b3cbe965184df824dec5c08c6b803/numpy-2.2.6-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:fd83c01228a688733f1ded5201c678f0c53ecc1006ffbc404db9f7a899ac6249", size = 16527618, upload-time = "2025-05-17T21:36:06.711Z" }, - { url = "https://files.pythonhosted.org/packages/61/c6/03ed30992602c85aa3cd95b9070a514f8b3c33e31124694438d88809ae36/numpy-2.2.6-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:37c0ca431f82cd5fa716eca9506aefcabc247fb27ba69c5062a6d3ade8cf8f49", size = 15505511, upload-time = "2025-05-17T21:36:29.965Z" }, - { url = "https://files.pythonhosted.org/packages/b7/25/5761d832a81df431e260719ec45de696414266613c9ee268394dd5ad8236/numpy-2.2.6-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:fe27749d33bb772c80dcd84ae7e8df2adc920ae8297400dabec45f0dedb3f6de", size = 18313783, upload-time = "2025-05-17T21:36:56.883Z" }, - { url = "https://files.pythonhosted.org/packages/57/0a/72d5a3527c5ebffcd47bde9162c39fae1f90138c961e5296491ce778e682/numpy-2.2.6-cp312-cp312-win32.whl", hash = "sha256:4eeaae00d789f66c7a25ac5f34b71a7035bb474e679f410e5e1a94deb24cf2d4", size = 6246506, upload-time = "2025-05-17T21:37:07.368Z" }, - { url = "https://files.pythonhosted.org/packages/36/fa/8c9210162ca1b88529ab76b41ba02d433fd54fecaf6feb70ef9f124683f1/numpy-2.2.6-cp312-cp312-win_amd64.whl", hash = "sha256:c1f9540be57940698ed329904db803cf7a402f3fc200bfe599334c9bd84a40b2", size = 12614190, upload-time = "2025-05-17T21:37:26.213Z" }, - { url = "https://files.pythonhosted.org/packages/f9/5c/6657823f4f594f72b5471f1db1ab12e26e890bb2e41897522d134d2a3e81/numpy-2.2.6-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0811bb762109d9708cca4d0b13c4f67146e3c3b7cf8d34018c722adb2d957c84", size = 20867828, upload-time = "2025-05-17T21:37:56.699Z" }, - { url = "https://files.pythonhosted.org/packages/dc/9e/14520dc3dadf3c803473bd07e9b2bd1b69bc583cb2497b47000fed2fa92f/numpy-2.2.6-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:287cc3162b6f01463ccd86be154f284d0893d2b3ed7292439ea97eafa8170e0b", size = 14143006, upload-time = "2025-05-17T21:38:18.291Z" }, - { url = "https://files.pythonhosted.org/packages/4f/06/7e96c57d90bebdce9918412087fc22ca9851cceaf5567a45c1f404480e9e/numpy-2.2.6-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:f1372f041402e37e5e633e586f62aa53de2eac8d98cbfb822806ce4bbefcb74d", size = 5076765, upload-time = "2025-05-17T21:38:27.319Z" }, - { url = "https://files.pythonhosted.org/packages/73/ed/63d920c23b4289fdac96ddbdd6132e9427790977d5457cd132f18e76eae0/numpy-2.2.6-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:55a4d33fa519660d69614a9fad433be87e5252f4b03850642f88993f7b2ca566", size = 6617736, upload-time = "2025-05-17T21:38:38.141Z" }, - { url = "https://files.pythonhosted.org/packages/85/c5/e19c8f99d83fd377ec8c7e0cf627a8049746da54afc24ef0a0cb73d5dfb5/numpy-2.2.6-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:f92729c95468a2f4f15e9bb94c432a9229d0d50de67304399627a943201baa2f", size = 14010719, upload-time = "2025-05-17T21:38:58.433Z" }, - { url = "https://files.pythonhosted.org/packages/19/49/4df9123aafa7b539317bf6d342cb6d227e49f7a35b99c287a6109b13dd93/numpy-2.2.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:1bc23a79bfabc5d056d106f9befb8d50c31ced2fbc70eedb8155aec74a45798f", size = 16526072, upload-time = "2025-05-17T21:39:22.638Z" }, - { url = "https://files.pythonhosted.org/packages/b2/6c/04b5f47f4f32f7c2b0e7260442a8cbcf8168b0e1a41ff1495da42f42a14f/numpy-2.2.6-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:e3143e4451880bed956e706a3220b4e5cf6172ef05fcc397f6f36a550b1dd868", size = 15503213, upload-time = "2025-05-17T21:39:45.865Z" }, - { url = "https://files.pythonhosted.org/packages/17/0a/5cd92e352c1307640d5b6fec1b2ffb06cd0dabe7d7b8227f97933d378422/numpy-2.2.6-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:b4f13750ce79751586ae2eb824ba7e1e8dba64784086c98cdbbcc6a42112ce0d", size = 18316632, upload-time = "2025-05-17T21:40:13.331Z" }, - { url = "https://files.pythonhosted.org/packages/f0/3b/5cba2b1d88760ef86596ad0f3d484b1cbff7c115ae2429678465057c5155/numpy-2.2.6-cp313-cp313-win32.whl", hash = "sha256:5beb72339d9d4fa36522fc63802f469b13cdbe4fdab4a288f0c441b74272ebfd", size = 6244532, upload-time = "2025-05-17T21:43:46.099Z" }, - { url = "https://files.pythonhosted.org/packages/cb/3b/d58c12eafcb298d4e6d0d40216866ab15f59e55d148a5658bb3132311fcf/numpy-2.2.6-cp313-cp313-win_amd64.whl", hash = "sha256:b0544343a702fa80c95ad5d3d608ea3599dd54d4632df855e4c8d24eb6ecfa1c", size = 12610885, upload-time = "2025-05-17T21:44:05.145Z" }, - { url = "https://files.pythonhosted.org/packages/6b/9e/4bf918b818e516322db999ac25d00c75788ddfd2d2ade4fa66f1f38097e1/numpy-2.2.6-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:0bca768cd85ae743b2affdc762d617eddf3bcf8724435498a1e80132d04879e6", size = 20963467, upload-time = "2025-05-17T21:40:44Z" }, - { url = "https://files.pythonhosted.org/packages/61/66/d2de6b291507517ff2e438e13ff7b1e2cdbdb7cb40b3ed475377aece69f9/numpy-2.2.6-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:fc0c5673685c508a142ca65209b4e79ed6740a4ed6b2267dbba90f34b0b3cfda", size = 14225144, upload-time = "2025-05-17T21:41:05.695Z" }, - { url = "https://files.pythonhosted.org/packages/e4/25/480387655407ead912e28ba3a820bc69af9adf13bcbe40b299d454ec011f/numpy-2.2.6-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:5bd4fc3ac8926b3819797a7c0e2631eb889b4118a9898c84f585a54d475b7e40", size = 5200217, upload-time = "2025-05-17T21:41:15.903Z" }, - { url = "https://files.pythonhosted.org/packages/aa/4a/6e313b5108f53dcbf3aca0c0f3e9c92f4c10ce57a0a721851f9785872895/numpy-2.2.6-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:fee4236c876c4e8369388054d02d0e9bb84821feb1a64dd59e137e6511a551f8", size = 6712014, upload-time = "2025-05-17T21:41:27.321Z" }, - { url = "https://files.pythonhosted.org/packages/b7/30/172c2d5c4be71fdf476e9de553443cf8e25feddbe185e0bd88b096915bcc/numpy-2.2.6-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e1dda9c7e08dc141e0247a5b8f49cf05984955246a327d4c48bda16821947b2f", size = 14077935, upload-time = "2025-05-17T21:41:49.738Z" }, - { url = "https://files.pythonhosted.org/packages/12/fb/9e743f8d4e4d3c710902cf87af3512082ae3d43b945d5d16563f26ec251d/numpy-2.2.6-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f447e6acb680fd307f40d3da4852208af94afdfab89cf850986c3ca00562f4fa", size = 16600122, upload-time = "2025-05-17T21:42:14.046Z" }, - { url = "https://files.pythonhosted.org/packages/12/75/ee20da0e58d3a66f204f38916757e01e33a9737d0b22373b3eb5a27358f9/numpy-2.2.6-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:389d771b1623ec92636b0786bc4ae56abafad4a4c513d36a55dce14bd9ce8571", size = 15586143, upload-time = "2025-05-17T21:42:37.464Z" }, - { url = "https://files.pythonhosted.org/packages/76/95/bef5b37f29fc5e739947e9ce5179ad402875633308504a52d188302319c8/numpy-2.2.6-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:8e9ace4a37db23421249ed236fdcdd457d671e25146786dfc96835cd951aa7c1", size = 18385260, upload-time = "2025-05-17T21:43:05.189Z" }, - { url = "https://files.pythonhosted.org/packages/09/04/f2f83279d287407cf36a7a8053a5abe7be3622a4363337338f2585e4afda/numpy-2.2.6-cp313-cp313t-win32.whl", hash = "sha256:038613e9fb8c72b0a41f025a7e4c3f0b7a1b5d768ece4796b674c8f3fe13efff", size = 6377225, upload-time = "2025-05-17T21:43:16.254Z" }, - { url = "https://files.pythonhosted.org/packages/67/0e/35082d13c09c02c011cf21570543d202ad929d961c02a147493cb0c2bdf5/numpy-2.2.6-cp313-cp313t-win_amd64.whl", hash = "sha256:6031dd6dfecc0cf9f668681a37648373bddd6421fff6c66ec1624eed0180ee06", size = 12771374, upload-time = "2025-05-17T21:43:35.479Z" }, - { url = "https://files.pythonhosted.org/packages/9e/3b/d94a75f4dbf1ef5d321523ecac21ef23a3cd2ac8b78ae2aac40873590229/numpy-2.2.6-pp310-pypy310_pp73-macosx_10_15_x86_64.whl", hash = "sha256:0b605b275d7bd0c640cad4e5d30fa701a8d59302e127e5f79138ad62762c3e3d", size = 21040391, upload-time = "2025-05-17T21:44:35.948Z" }, - { url = "https://files.pythonhosted.org/packages/17/f4/09b2fa1b58f0fb4f7c7963a1649c64c4d315752240377ed74d9cd878f7b5/numpy-2.2.6-pp310-pypy310_pp73-macosx_14_0_x86_64.whl", hash = "sha256:7befc596a7dc9da8a337f79802ee8adb30a552a94f792b9c9d18c840055907db", size = 6786754, upload-time = "2025-05-17T21:44:47.446Z" }, - { url = "https://files.pythonhosted.org/packages/af/30/feba75f143bdc868a1cc3f44ccfa6c4b9ec522b36458e738cd00f67b573f/numpy-2.2.6-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ce47521a4754c8f4593837384bd3424880629f718d87c5d44f8ed763edd63543", size = 16643476, upload-time = "2025-05-17T21:45:11.871Z" }, - { url = "https://files.pythonhosted.org/packages/37/48/ac2a9584402fb6c0cd5b5d1a91dcf176b15760130dd386bbafdbfe3640bf/numpy-2.2.6-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:d042d24c90c41b54fd506da306759e06e568864df8ec17ccc17e9e884634fd00", size = 12812666, upload-time = "2025-05-17T21:45:31.426Z" }, -] - [[package]] name = "numpy" version = "2.4.3" source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version >= '3.14' and sys_platform == 'win32'", - "python_full_version >= '3.14' and sys_platform == 'emscripten'", - "python_full_version >= '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'win32'", - "python_full_version == '3.11.*' and sys_platform == 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'emscripten'", - "python_full_version == '3.11.*' and sys_platform == 'emscripten'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version == '3.11.*' and sys_platform != 'emscripten' and sys_platform != 'win32'", -] sdist = { url = "https://files.pythonhosted.org/packages/10/8b/c265f4823726ab832de836cdd184d0986dcf94480f81e8739692a7ac7af2/numpy-2.4.3.tar.gz", hash = "sha256:483a201202b73495f00dbc83796c6ae63137a9bdade074f7648b3e32613412dd", size = 20727743, upload-time = "2026-03-09T07:58:53.426Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/f9/51/5093a2df15c4dc19da3f79d1021e891f5dcf1d9d1db6ba38891d5590f3fe/numpy-2.4.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:33b3bf58ee84b172c067f56aeadc7ee9ab6de69c5e800ab5b10295d54c581adb", size = 16957183, upload-time = "2026-03-09T07:55:57.774Z" }, - { url = "https://files.pythonhosted.org/packages/b5/7c/c061f3de0630941073d2598dc271ac2f6cbcf5c83c74a5870fea07488333/numpy-2.4.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:8ba7b51e71c05aa1f9bc3641463cd82308eab40ce0d5c7e1fd4038cbf9938147", size = 14968734, upload-time = "2026-03-09T07:56:00.494Z" }, - { url = "https://files.pythonhosted.org/packages/ef/27/d26c85cbcd86b26e4f125b0668e7a7c0542d19dd7d23ee12e87b550e95b5/numpy-2.4.3-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:a1988292870c7cb9d0ebb4cc96b4d447513a9644801de54606dc7aabf2b7d920", size = 5475288, upload-time = "2026-03-09T07:56:02.857Z" }, - { url = "https://files.pythonhosted.org/packages/2b/09/3c4abbc1dcd8010bf1a611d174c7aa689fc505585ec806111b4406f6f1b1/numpy-2.4.3-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:23b46bb6d8ecb68b58c09944483c135ae5f0e9b8d8858ece5e4ead783771d2a9", size = 6805253, upload-time = "2026-03-09T07:56:04.53Z" }, - { url = "https://files.pythonhosted.org/packages/21/bc/e7aa3f6817e40c3f517d407742337cbb8e6fc4b83ce0b55ab780c829243b/numpy-2.4.3-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a016db5c5dba78fa8fe9f5d80d6708f9c42ab087a739803c0ac83a43d686a470", size = 15969479, upload-time = "2026-03-09T07:56:06.638Z" }, - { url = "https://files.pythonhosted.org/packages/78/51/9f5d7a41f0b51649ddf2f2320595e15e122a40610b233d51928dd6c92353/numpy-2.4.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:715de7f82e192e8cae5a507a347d97ad17598f8e026152ca97233e3666daaa71", size = 16901035, upload-time = "2026-03-09T07:56:09.405Z" }, - { url = "https://files.pythonhosted.org/packages/64/6e/b221dd847d7181bc5ee4857bfb026182ef69499f9305eb1371cbb1aea626/numpy-2.4.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:2ddb7919366ee468342b91dea2352824c25b55814a987847b6c52003a7c97f15", size = 17325657, upload-time = "2026-03-09T07:56:12.067Z" }, - { url = "https://files.pythonhosted.org/packages/eb/b8/8f3fd2da596e1063964b758b5e3c970aed1949a05200d7e3d46a9d46d643/numpy-2.4.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:a315e5234d88067f2d97e1f2ef670a7569df445d55400f1e33d117418d008d52", size = 18635512, upload-time = "2026-03-09T07:56:14.629Z" }, - { url = "https://files.pythonhosted.org/packages/5c/24/2993b775c37e39d2f8ab4125b44337ab0b2ba106c100980b7c274a22bee7/numpy-2.4.3-cp311-cp311-win32.whl", hash = "sha256:2b3f8d2c4589b1a2028d2a770b0fc4d1f332fb5e01521f4de3199a896d158ddd", size = 6238100, upload-time = "2026-03-09T07:56:17.243Z" }, - { url = "https://files.pythonhosted.org/packages/76/1d/edccf27adedb754db7c4511d5eac8b83f004ae948fe2d3509e8b78097d4c/numpy-2.4.3-cp311-cp311-win_amd64.whl", hash = "sha256:77e76d932c49a75617c6d13464e41203cd410956614d0a0e999b25e9e8d27eec", size = 12609816, upload-time = "2026-03-09T07:56:19.089Z" }, - { url = "https://files.pythonhosted.org/packages/92/82/190b99153480076c8dce85f4cfe7d53ea84444145ffa54cb58dcd460d66b/numpy-2.4.3-cp311-cp311-win_arm64.whl", hash = "sha256:eb610595dd91560905c132c709412b512135a60f1851ccbd2c959e136431ff67", size = 10485757, upload-time = "2026-03-09T07:56:21.753Z" }, - { url = "https://files.pythonhosted.org/packages/a9/ed/6388632536f9788cea23a3a1b629f25b43eaacd7d7377e5d6bc7b9deb69b/numpy-2.4.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:61b0cbabbb6126c8df63b9a3a0c4b1f44ebca5e12ff6997b80fcf267fb3150ef", size = 16669628, upload-time = "2026-03-09T07:56:24.252Z" }, - { url = "https://files.pythonhosted.org/packages/74/1b/ee2abfc68e1ce728b2958b6ba831d65c62e1b13ce3017c13943f8f9b5b2e/numpy-2.4.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:7395e69ff32526710748f92cd8c9849b361830968ea3e24a676f272653e8983e", size = 14696872, upload-time = "2026-03-09T07:56:26.991Z" }, - { url = "https://files.pythonhosted.org/packages/ba/d1/780400e915ff5638166f11ca9dc2c5815189f3d7cf6f8759a1685e586413/numpy-2.4.3-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:abdce0f71dcb4a00e4e77f3faf05e4616ceccfe72ccaa07f47ee79cda3b7b0f4", size = 5203489, upload-time = "2026-03-09T07:56:29.414Z" }, - { url = "https://files.pythonhosted.org/packages/0b/bb/baffa907e9da4cc34a6e556d6d90e032f6d7a75ea47968ea92b4858826c4/numpy-2.4.3-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:48da3a4ee1336454b07497ff7ec83903efa5505792c4e6d9bf83d99dc07a1e18", size = 6550814, upload-time = "2026-03-09T07:56:32.225Z" }, - { url = "https://files.pythonhosted.org/packages/7b/12/8c9f0c6c95f76aeb20fc4a699c33e9f827fa0d0f857747c73bb7b17af945/numpy-2.4.3-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:32e3bef222ad6b052280311d1d60db8e259e4947052c3ae7dd6817451fc8a4c5", size = 15666601, upload-time = "2026-03-09T07:56:34.461Z" }, - { url = "https://files.pythonhosted.org/packages/bd/79/cc665495e4d57d0aa6fbcc0aa57aa82671dfc78fbf95fe733ed86d98f52a/numpy-2.4.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e7dd01a46700b1967487141a66ac1a3cf0dd8ebf1f08db37d46389401512ca97", size = 16621358, upload-time = "2026-03-09T07:56:36.852Z" }, - { url = "https://files.pythonhosted.org/packages/a8/40/b4ecb7224af1065c3539f5ecfff879d090de09608ad1008f02c05c770cb3/numpy-2.4.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:76f0f283506c28b12bba319c0fab98217e9f9b54e6160e9c79e9f7348ba32e9c", size = 17016135, upload-time = "2026-03-09T07:56:39.337Z" }, - { url = "https://files.pythonhosted.org/packages/f7/b1/6a88e888052eed951afed7a142dcdf3b149a030ca59b4c71eef085858e43/numpy-2.4.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:737f630a337364665aba3b5a77e56a68cc42d350edd010c345d65a3efa3addcc", size = 18345816, upload-time = "2026-03-09T07:56:42.31Z" }, - { url = "https://files.pythonhosted.org/packages/f3/8f/103a60c5f8c3d7fc678c19cd7b2476110da689ccb80bc18050efbaeae183/numpy-2.4.3-cp312-cp312-win32.whl", hash = "sha256:26952e18d82a1dbbc2f008d402021baa8d6fc8e84347a2072a25e08b46d698b9", size = 5960132, upload-time = "2026-03-09T07:56:44.851Z" }, - { url = "https://files.pythonhosted.org/packages/d7/7c/f5ee1bf6ed888494978046a809df2882aad35d414b622893322df7286879/numpy-2.4.3-cp312-cp312-win_amd64.whl", hash = "sha256:65f3c2455188f09678355f5cae1f959a06b778bc66d535da07bf2ef20cd319d5", size = 12316144, upload-time = "2026-03-09T07:56:47.057Z" }, - { url = "https://files.pythonhosted.org/packages/71/46/8d1cb3f7a00f2fb6394140e7e6623696e54c6318a9d9691bb4904672cf42/numpy-2.4.3-cp312-cp312-win_arm64.whl", hash = "sha256:2abad5c7fef172b3377502bde47892439bae394a71bc329f31df0fd829b41a9e", size = 10220364, upload-time = "2026-03-09T07:56:49.849Z" }, { url = "https://files.pythonhosted.org/packages/b6/d0/1fe47a98ce0df229238b77611340aff92d52691bcbc10583303181abf7fc/numpy-2.4.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:b346845443716c8e542d54112966383b448f4a3ba5c66409771b8c0889485dd3", size = 16665297, upload-time = "2026-03-09T07:56:52.296Z" }, { url = "https://files.pythonhosted.org/packages/27/d9/4e7c3f0e68dfa91f21c6fb6cf839bc829ec920688b1ce7ec722b1a6202fb/numpy-2.4.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:2629289168f4897a3c4e23dc98d6f1731f0fc0fe52fb9db19f974041e4cc12b9", size = 14691853, upload-time = "2026-03-09T07:56:54.992Z" }, { url = "https://files.pythonhosted.org/packages/3a/66/bd096b13a87549683812b53ab211e6d413497f84e794fb3c39191948da97/numpy-2.4.3-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:bb2e3cf95854233799013779216c57e153c1ee67a0bf92138acca0e429aefaee", size = 5198435, upload-time = "2026-03-09T07:56:57.184Z" }, @@ -1181,13 +895,6 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/07/12/8160bea39da3335737b10308df4f484235fd297f556745f13092aa039d3b/numpy-2.4.3-cp314-cp314t-win32.whl", hash = "sha256:5e10da9e93247e554bb1d22f8edc51847ddd7dde52d85ce31024c1b4312bfba0", size = 6154547, upload-time = "2026-03-09T07:58:28.289Z" }, { url = "https://files.pythonhosted.org/packages/42/f3/76534f61f80d74cc9cdf2e570d3d4eeb92c2280a27c39b0aaf471eda7b48/numpy-2.4.3-cp314-cp314t-win_amd64.whl", hash = "sha256:45f003dbdffb997a03da2d1d0cb41fbd24a87507fb41605c0420a3db5bd4667b", size = 12633645, upload-time = "2026-03-09T07:58:30.384Z" }, { url = "https://files.pythonhosted.org/packages/1f/b6/7c0d4334c15983cec7f92a69e8ce9b1e6f31857e5ee3a413ac424e6bd63d/numpy-2.4.3-cp314-cp314t-win_arm64.whl", hash = "sha256:4d382735cecd7bcf090172489a525cd7d4087bc331f7df9f60ddc9a296cf208e", size = 10565454, upload-time = "2026-03-09T07:58:33.031Z" }, - { url = "https://files.pythonhosted.org/packages/64/e4/4dab9fb43c83719c29241c535d9e07be73bea4bc0c6686c5816d8e1b6689/numpy-2.4.3-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:c6b124bfcafb9e8d3ed09130dbee44848c20b3e758b6bbf006e641778927c028", size = 16834892, upload-time = "2026-03-09T07:58:35.334Z" }, - { url = "https://files.pythonhosted.org/packages/c9/29/f8b6d4af90fed3dfda84ebc0df06c9833d38880c79ce954e5b661758aa31/numpy-2.4.3-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:76dbb9d4e43c16cf9aa711fcd8de1e2eeb27539dcefb60a1d5e9f12fae1d1ed8", size = 14893070, upload-time = "2026-03-09T07:58:37.7Z" }, - { url = "https://files.pythonhosted.org/packages/9a/04/a19b3c91dbec0a49269407f15d5753673a09832daed40c45e8150e6fa558/numpy-2.4.3-pp311-pypy311_pp73-macosx_14_0_arm64.whl", hash = "sha256:29363fbfa6f8ee855d7569c96ce524845e3d726d6c19b29eceec7dd555dab152", size = 5399609, upload-time = "2026-03-09T07:58:39.853Z" }, - { url = "https://files.pythonhosted.org/packages/79/34/4d73603f5420eab89ea8a67097b31364bf7c30f811d4dd84b1659c7476d9/numpy-2.4.3-pp311-pypy311_pp73-macosx_14_0_x86_64.whl", hash = "sha256:bc71942c789ef415a37f0d4eab90341425a00d538cd0642445d30b41023d3395", size = 6714355, upload-time = "2026-03-09T07:58:42.365Z" }, - { url = "https://files.pythonhosted.org/packages/58/ad/1100d7229bb248394939a12a8074d485b655e8ed44207d328fdd7fcebc7b/numpy-2.4.3-pp311-pypy311_pp73-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7e58765ad74dcebd3ef0208a5078fba32dc8ec3578fe84a604432950cd043d79", size = 15800434, upload-time = "2026-03-09T07:58:44.837Z" }, - { url = "https://files.pythonhosted.org/packages/0c/fd/16d710c085d28ba4feaf29ac60c936c9d662e390344f94a6beaa2ac9899b/numpy-2.4.3-pp311-pypy311_pp73-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:8e236dbda4e1d319d681afcbb136c0c4a8e0f1a5c58ceec2adebb547357fe857", size = 16729409, upload-time = "2026-03-09T07:58:47.972Z" }, - { url = "https://files.pythonhosted.org/packages/57/a7/b35835e278c18b85206834b3aa3abe68e77a98769c59233d1f6300284781/numpy-2.4.3-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:4b42639cdde6d24e732ff823a3fa5b701d8acad89c4142bc1d0bd6dc85200ba5", size = 12504685, upload-time = "2026-03-09T07:58:50.525Z" }, ] [[package]] @@ -1231,7 +938,7 @@ name = "nvidia-cudnn-cu13" version = "9.19.0.56" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" }, + { name = "nvidia-cublas", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, ] wheels = [ { url = "https://files.pythonhosted.org/packages/f1/84/26025437c1e6b61a707442184fa0c03d083b661adf3a3eecfd6d21677740/nvidia_cudnn_cu13-9.19.0.56-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:6ed29ffaee1176c612daf442e4dd6cfeb6a0caa43ddcbeb59da94953030b1be4", size = 433781201, upload-time = "2026-02-03T20:40:53.805Z" }, @@ -1243,7 +950,7 @@ name = "nvidia-cufft" version = "12.0.0.61" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" }, + { name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, ] wheels = [ { url = "https://files.pythonhosted.org/packages/8b/ae/f417a75c0259e85c1d2f83ca4e960289a5f814ed0cea74d18c353d3e989d/nvidia_cufft-12.0.0.61-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:2708c852ef8cd89d1d2068bdbece0aa188813a0c934db3779b9b1faa8442e5f5", size = 214053554, upload-time = "2025-09-04T08:31:38.196Z" }, @@ -1273,9 +980,9 @@ name = "nvidia-cusolver" version = "12.0.4.66" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "nvidia-cublas", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" }, - { name = "nvidia-cusparse", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" }, - { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" }, + { name = "nvidia-cublas", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, + { name = "nvidia-cusparse", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, + { name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, ] wheels = [ { url = "https://files.pythonhosted.org/packages/c8/c3/b30c9e935fc01e3da443ec0116ed1b2a009bb867f5324d3f2d7e533e776b/nvidia_cusolver-12.0.4.66-py3-none-manylinux_2_27_aarch64.whl", hash = "sha256:02c2457eaa9e39de20f880f4bd8820e6a1cfb9f9a34f820eb12a155aa5bc92d2", size = 223467760, upload-time = "2025-09-04T08:33:04.222Z" }, @@ -1287,7 +994,7 @@ name = "nvidia-cusparse" version = "12.6.3.3" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "nvidia-nvjitlink", marker = "(python_full_version < '3.11' and sys_platform == 'emscripten') or (python_full_version < '3.11' and sys_platform == 'win32') or (sys_platform != 'emscripten' and sys_platform != 'win32')" }, + { name = "nvidia-nvjitlink", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, ] wheels = [ { url = "https://files.pythonhosted.org/packages/f8/94/5c26f33738ae35276672f12615a64bd008ed5be6d1ebcb23579285d960a9/nvidia_cusparse-12.6.3.3-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:80bcc4662f23f1054ee334a15c72b8940402975e0eab63178fc7e670aa59472c", size = 162155568, upload-time = "2025-09-04T08:33:42.864Z" }, @@ -1344,13 +1051,13 @@ name = "optuna" version = "4.8.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "alembic", marker = "python_full_version >= '3.12'" }, - { name = "colorlog", marker = "python_full_version >= '3.12'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, - { name = "packaging", marker = "python_full_version >= '3.12'" }, - { name = "pyyaml", marker = "python_full_version >= '3.12'" }, - { name = "sqlalchemy", marker = "python_full_version >= '3.12'" }, - { name = "tqdm", marker = "python_full_version >= '3.12'" }, + { name = "alembic" }, + { name = "colorlog" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pyyaml" }, + { name = "sqlalchemy" }, + { name = "tqdm" }, ] sdist = { url = "https://files.pythonhosted.org/packages/bf/9b/62f120fb2ecbc4338bee70c5a3671c8e561714f3aa1a046b897ff142050e/optuna-4.8.0.tar.gz", hash = "sha256:6f7043e9f8ecb5e607af86a7eb00fb5ec2be26c3b08c201209a73d36aff37a38", size = 482603, upload-time = "2026-03-16T04:59:58.659Z" } wheels = [ @@ -1366,108 +1073,17 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/b7/b9/c538f279a4e237a006a2c98387d081e9eb060d203d8ed34467cc0f0b9b53/packaging-26.0-py3-none-any.whl", hash = "sha256:b36f1fef9334a5588b4166f8bcd26a14e521f2b55e6b9de3aaa80d3ff7a37529", size = 74366, upload-time = "2026-01-21T20:50:37.788Z" }, ] -[[package]] -name = "pandas" -version = "2.3.3" -source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version < '3.11'", -] -dependencies = [ - { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "python-dateutil", marker = "python_full_version < '3.11'" }, - { name = "pytz", marker = "python_full_version < '3.11'" }, - { name = "tzdata", marker = "python_full_version < '3.11'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/33/01/d40b85317f86cf08d853a4f495195c73815fdf205eef3993821720274518/pandas-2.3.3.tar.gz", hash = "sha256:e05e1af93b977f7eafa636d043f9f94c7ee3ac81af99c13508215942e64c993b", size = 4495223, upload-time = "2025-09-29T23:34:51.853Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/3d/f7/f425a00df4fcc22b292c6895c6831c0c8ae1d9fac1e024d16f98a9ce8749/pandas-2.3.3-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:376c6446ae31770764215a6c937f72d917f214b43560603cd60da6408f183b6c", size = 11555763, upload-time = "2025-09-29T23:16:53.287Z" }, - { url = "https://files.pythonhosted.org/packages/13/4f/66d99628ff8ce7857aca52fed8f0066ce209f96be2fede6cef9f84e8d04f/pandas-2.3.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:e19d192383eab2f4ceb30b412b22ea30690c9e618f78870357ae1d682912015a", size = 10801217, upload-time = "2025-09-29T23:17:04.522Z" }, - { url = "https://files.pythonhosted.org/packages/1d/03/3fc4a529a7710f890a239cc496fc6d50ad4a0995657dccc1d64695adb9f4/pandas-2.3.3-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5caf26f64126b6c7aec964f74266f435afef1c1b13da3b0636c7518a1fa3e2b1", size = 12148791, upload-time = "2025-09-29T23:17:18.444Z" }, - { url = "https://files.pythonhosted.org/packages/40/a8/4dac1f8f8235e5d25b9955d02ff6f29396191d4e665d71122c3722ca83c5/pandas-2.3.3-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:dd7478f1463441ae4ca7308a70e90b33470fa593429f9d4c578dd00d1fa78838", size = 12769373, upload-time = "2025-09-29T23:17:35.846Z" }, - { url = "https://files.pythonhosted.org/packages/df/91/82cc5169b6b25440a7fc0ef3a694582418d875c8e3ebf796a6d6470aa578/pandas-2.3.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:4793891684806ae50d1288c9bae9330293ab4e083ccd1c5e383c34549c6e4250", size = 13200444, upload-time = "2025-09-29T23:17:49.341Z" }, - { url = "https://files.pythonhosted.org/packages/10/ae/89b3283800ab58f7af2952704078555fa60c807fff764395bb57ea0b0dbd/pandas-2.3.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:28083c648d9a99a5dd035ec125d42439c6c1c525098c58af0fc38dd1a7a1b3d4", size = 13858459, upload-time = "2025-09-29T23:18:03.722Z" }, - { url = "https://files.pythonhosted.org/packages/85/72/530900610650f54a35a19476eca5104f38555afccda1aa11a92ee14cb21d/pandas-2.3.3-cp310-cp310-win_amd64.whl", hash = "sha256:503cf027cf9940d2ceaa1a93cfb5f8c8c7e6e90720a2850378f0b3f3b1e06826", size = 11346086, upload-time = "2025-09-29T23:18:18.505Z" }, - { url = "https://files.pythonhosted.org/packages/c1/fa/7ac648108144a095b4fb6aa3de1954689f7af60a14cf25583f4960ecb878/pandas-2.3.3-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:602b8615ebcc4a0c1751e71840428ddebeb142ec02c786e8ad6b1ce3c8dec523", size = 11578790, upload-time = "2025-09-29T23:18:30.065Z" }, - { url = "https://files.pythonhosted.org/packages/9b/35/74442388c6cf008882d4d4bdfc4109be87e9b8b7ccd097ad1e7f006e2e95/pandas-2.3.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:8fe25fc7b623b0ef6b5009149627e34d2a4657e880948ec3c840e9402e5c1b45", size = 10833831, upload-time = "2025-09-29T23:38:56.071Z" }, - { url = "https://files.pythonhosted.org/packages/fe/e4/de154cbfeee13383ad58d23017da99390b91d73f8c11856f2095e813201b/pandas-2.3.3-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b468d3dad6ff947df92dcb32ede5b7bd41a9b3cceef0a30ed925f6d01fb8fa66", size = 12199267, upload-time = "2025-09-29T23:18:41.627Z" }, - { url = "https://files.pythonhosted.org/packages/bf/c9/63f8d545568d9ab91476b1818b4741f521646cbdd151c6efebf40d6de6f7/pandas-2.3.3-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b98560e98cb334799c0b07ca7967ac361a47326e9b4e5a7dfb5ab2b1c9d35a1b", size = 12789281, upload-time = "2025-09-29T23:18:56.834Z" }, - { url = "https://files.pythonhosted.org/packages/f2/00/a5ac8c7a0e67fd1a6059e40aa08fa1c52cc00709077d2300e210c3ce0322/pandas-2.3.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1d37b5848ba49824e5c30bedb9c830ab9b7751fd049bc7914533e01c65f79791", size = 13240453, upload-time = "2025-09-29T23:19:09.247Z" }, - { url = "https://files.pythonhosted.org/packages/27/4d/5c23a5bc7bd209231618dd9e606ce076272c9bc4f12023a70e03a86b4067/pandas-2.3.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:db4301b2d1f926ae677a751eb2bd0e8c5f5319c9cb3f88b0becbbb0b07b34151", size = 13890361, upload-time = "2025-09-29T23:19:25.342Z" }, - { url = "https://files.pythonhosted.org/packages/8e/59/712db1d7040520de7a4965df15b774348980e6df45c129b8c64d0dbe74ef/pandas-2.3.3-cp311-cp311-win_amd64.whl", hash = "sha256:f086f6fe114e19d92014a1966f43a3e62285109afe874f067f5abbdcbb10e59c", size = 11348702, upload-time = "2025-09-29T23:19:38.296Z" }, - { url = "https://files.pythonhosted.org/packages/9c/fb/231d89e8637c808b997d172b18e9d4a4bc7bf31296196c260526055d1ea0/pandas-2.3.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:6d21f6d74eb1725c2efaa71a2bfc661a0689579b58e9c0ca58a739ff0b002b53", size = 11597846, upload-time = "2025-09-29T23:19:48.856Z" }, - { url = "https://files.pythonhosted.org/packages/5c/bd/bf8064d9cfa214294356c2d6702b716d3cf3bb24be59287a6a21e24cae6b/pandas-2.3.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:3fd2f887589c7aa868e02632612ba39acb0b8948faf5cc58f0850e165bd46f35", size = 10729618, upload-time = "2025-09-29T23:39:08.659Z" }, - { url = "https://files.pythonhosted.org/packages/57/56/cf2dbe1a3f5271370669475ead12ce77c61726ffd19a35546e31aa8edf4e/pandas-2.3.3-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ecaf1e12bdc03c86ad4a7ea848d66c685cb6851d807a26aa245ca3d2017a1908", size = 11737212, upload-time = "2025-09-29T23:19:59.765Z" }, - { url = "https://files.pythonhosted.org/packages/e5/63/cd7d615331b328e287d8233ba9fdf191a9c2d11b6af0c7a59cfcec23de68/pandas-2.3.3-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b3d11d2fda7eb164ef27ffc14b4fcab16a80e1ce67e9f57e19ec0afaf715ba89", size = 12362693, upload-time = "2025-09-29T23:20:14.098Z" }, - { url = "https://files.pythonhosted.org/packages/a6/de/8b1895b107277d52f2b42d3a6806e69cfef0d5cf1d0ba343470b9d8e0a04/pandas-2.3.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:a68e15f780eddf2b07d242e17a04aa187a7ee12b40b930bfdd78070556550e98", size = 12771002, upload-time = "2025-09-29T23:20:26.76Z" }, - { url = "https://files.pythonhosted.org/packages/87/21/84072af3187a677c5893b170ba2c8fbe450a6ff911234916da889b698220/pandas-2.3.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:371a4ab48e950033bcf52b6527eccb564f52dc826c02afd9a1bc0ab731bba084", size = 13450971, upload-time = "2025-09-29T23:20:41.344Z" }, - { url = "https://files.pythonhosted.org/packages/86/41/585a168330ff063014880a80d744219dbf1dd7a1c706e75ab3425a987384/pandas-2.3.3-cp312-cp312-win_amd64.whl", hash = "sha256:a16dcec078a01eeef8ee61bf64074b4e524a2a3f4b3be9326420cabe59c4778b", size = 10992722, upload-time = "2025-09-29T23:20:54.139Z" }, - { url = "https://files.pythonhosted.org/packages/cd/4b/18b035ee18f97c1040d94debd8f2e737000ad70ccc8f5513f4eefad75f4b/pandas-2.3.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:56851a737e3470de7fa88e6131f41281ed440d29a9268dcbf0002da5ac366713", size = 11544671, upload-time = "2025-09-29T23:21:05.024Z" }, - { url = "https://files.pythonhosted.org/packages/31/94/72fac03573102779920099bcac1c3b05975c2cb5f01eac609faf34bed1ca/pandas-2.3.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:bdcd9d1167f4885211e401b3036c0c8d9e274eee67ea8d0758a256d60704cfe8", size = 10680807, upload-time = "2025-09-29T23:21:15.979Z" }, - { url = "https://files.pythonhosted.org/packages/16/87/9472cf4a487d848476865321de18cc8c920b8cab98453ab79dbbc98db63a/pandas-2.3.3-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e32e7cc9af0f1cc15548288a51a3b681cc2a219faa838e995f7dc53dbab1062d", size = 11709872, upload-time = "2025-09-29T23:21:27.165Z" }, - { url = "https://files.pythonhosted.org/packages/15/07/284f757f63f8a8d69ed4472bfd85122bd086e637bf4ed09de572d575a693/pandas-2.3.3-cp313-cp313-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:318d77e0e42a628c04dc56bcef4b40de67918f7041c2b061af1da41dcff670ac", size = 12306371, upload-time = "2025-09-29T23:21:40.532Z" }, - { url = "https://files.pythonhosted.org/packages/33/81/a3afc88fca4aa925804a27d2676d22dcd2031c2ebe08aabd0ae55b9ff282/pandas-2.3.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:4e0a175408804d566144e170d0476b15d78458795bb18f1304fb94160cabf40c", size = 12765333, upload-time = "2025-09-29T23:21:55.77Z" }, - { url = "https://files.pythonhosted.org/packages/8d/0f/b4d4ae743a83742f1153464cf1a8ecfafc3ac59722a0b5c8602310cb7158/pandas-2.3.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:93c2d9ab0fc11822b5eece72ec9587e172f63cff87c00b062f6e37448ced4493", size = 13418120, upload-time = "2025-09-29T23:22:10.109Z" }, - { url = "https://files.pythonhosted.org/packages/4f/c7/e54682c96a895d0c808453269e0b5928a07a127a15704fedb643e9b0a4c8/pandas-2.3.3-cp313-cp313-win_amd64.whl", hash = "sha256:f8bfc0e12dc78f777f323f55c58649591b2cd0c43534e8355c51d3fede5f4dee", size = 10993991, upload-time = "2025-09-29T23:25:04.889Z" }, - { url = "https://files.pythonhosted.org/packages/f9/ca/3f8d4f49740799189e1395812f3bf23b5e8fc7c190827d55a610da72ce55/pandas-2.3.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:75ea25f9529fdec2d2e93a42c523962261e567d250b0013b16210e1d40d7c2e5", size = 12048227, upload-time = "2025-09-29T23:22:24.343Z" }, - { url = "https://files.pythonhosted.org/packages/0e/5a/f43efec3e8c0cc92c4663ccad372dbdff72b60bdb56b2749f04aa1d07d7e/pandas-2.3.3-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:74ecdf1d301e812db96a465a525952f4dde225fdb6d8e5a521d47e1f42041e21", size = 11411056, upload-time = "2025-09-29T23:22:37.762Z" }, - { url = "https://files.pythonhosted.org/packages/46/b1/85331edfc591208c9d1a63a06baa67b21d332e63b7a591a5ba42a10bb507/pandas-2.3.3-cp313-cp313t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6435cb949cb34ec11cc9860246ccb2fdc9ecd742c12d3304989017d53f039a78", size = 11645189, upload-time = "2025-09-29T23:22:51.688Z" }, - { url = "https://files.pythonhosted.org/packages/44/23/78d645adc35d94d1ac4f2a3c4112ab6f5b8999f4898b8cdf01252f8df4a9/pandas-2.3.3-cp313-cp313t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:900f47d8f20860de523a1ac881c4c36d65efcb2eb850e6948140fa781736e110", size = 12121912, upload-time = "2025-09-29T23:23:05.042Z" }, - { url = "https://files.pythonhosted.org/packages/53/da/d10013df5e6aaef6b425aa0c32e1fc1f3e431e4bcabd420517dceadce354/pandas-2.3.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:a45c765238e2ed7d7c608fc5bc4a6f88b642f2f01e70c0c23d2224dd21829d86", size = 12712160, upload-time = "2025-09-29T23:23:28.57Z" }, - { url = "https://files.pythonhosted.org/packages/bd/17/e756653095a083d8a37cbd816cb87148debcfcd920129b25f99dd8d04271/pandas-2.3.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:c4fc4c21971a1a9f4bdb4c73978c7f7256caa3e62b323f70d6cb80db583350bc", size = 13199233, upload-time = "2025-09-29T23:24:24.876Z" }, - { url = "https://files.pythonhosted.org/packages/04/fd/74903979833db8390b73b3a8a7d30d146d710bd32703724dd9083950386f/pandas-2.3.3-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:ee15f284898e7b246df8087fc82b87b01686f98ee67d85a17b7ab44143a3a9a0", size = 11540635, upload-time = "2025-09-29T23:25:52.486Z" }, - { url = "https://files.pythonhosted.org/packages/21/00/266d6b357ad5e6d3ad55093a7e8efc7dd245f5a842b584db9f30b0f0a287/pandas-2.3.3-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:1611aedd912e1ff81ff41c745822980c49ce4a7907537be8692c8dbc31924593", size = 10759079, upload-time = "2025-09-29T23:26:33.204Z" }, - { url = "https://files.pythonhosted.org/packages/ca/05/d01ef80a7a3a12b2f8bbf16daba1e17c98a2f039cbc8e2f77a2c5a63d382/pandas-2.3.3-cp314-cp314-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6d2cefc361461662ac48810cb14365a365ce864afe85ef1f447ff5a1e99ea81c", size = 11814049, upload-time = "2025-09-29T23:27:15.384Z" }, - { url = "https://files.pythonhosted.org/packages/15/b2/0e62f78c0c5ba7e3d2c5945a82456f4fac76c480940f805e0b97fcbc2f65/pandas-2.3.3-cp314-cp314-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ee67acbbf05014ea6c763beb097e03cd629961c8a632075eeb34247120abcb4b", size = 12332638, upload-time = "2025-09-29T23:27:51.625Z" }, - { url = "https://files.pythonhosted.org/packages/c5/33/dd70400631b62b9b29c3c93d2feee1d0964dc2bae2e5ad7a6c73a7f25325/pandas-2.3.3-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:c46467899aaa4da076d5abc11084634e2d197e9460643dd455ac3db5856b24d6", size = 12886834, upload-time = "2025-09-29T23:28:21.289Z" }, - { url = "https://files.pythonhosted.org/packages/d3/18/b5d48f55821228d0d2692b34fd5034bb185e854bdb592e9c640f6290e012/pandas-2.3.3-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:6253c72c6a1d990a410bc7de641d34053364ef8bcd3126f7e7450125887dffe3", size = 13409925, upload-time = "2025-09-29T23:28:58.261Z" }, - { url = "https://files.pythonhosted.org/packages/a6/3d/124ac75fcd0ecc09b8fdccb0246ef65e35b012030defb0e0eba2cbbbe948/pandas-2.3.3-cp314-cp314-win_amd64.whl", hash = "sha256:1b07204a219b3b7350abaae088f451860223a52cfb8a6c53358e7948735158e5", size = 11109071, upload-time = "2025-09-29T23:32:27.484Z" }, - { url = "https://files.pythonhosted.org/packages/89/9c/0e21c895c38a157e0faa1fb64587a9226d6dd46452cac4532d80c3c4a244/pandas-2.3.3-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:2462b1a365b6109d275250baaae7b760fd25c726aaca0054649286bcfbb3e8ec", size = 12048504, upload-time = "2025-09-29T23:29:31.47Z" }, - { url = "https://files.pythonhosted.org/packages/d7/82/b69a1c95df796858777b68fbe6a81d37443a33319761d7c652ce77797475/pandas-2.3.3-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:0242fe9a49aa8b4d78a4fa03acb397a58833ef6199e9aa40a95f027bb3a1b6e7", size = 11410702, upload-time = "2025-09-29T23:29:54.591Z" }, - { url = "https://files.pythonhosted.org/packages/f9/88/702bde3ba0a94b8c73a0181e05144b10f13f29ebfc2150c3a79062a8195d/pandas-2.3.3-cp314-cp314t-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:a21d830e78df0a515db2b3d2f5570610f5e6bd2e27749770e8bb7b524b89b450", size = 11634535, upload-time = "2025-09-29T23:30:21.003Z" }, - { url = "https://files.pythonhosted.org/packages/a4/1e/1bac1a839d12e6a82ec6cb40cda2edde64a2013a66963293696bbf31fbbb/pandas-2.3.3-cp314-cp314t-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:2e3ebdb170b5ef78f19bfb71b0dc5dc58775032361fa188e814959b74d726dd5", size = 12121582, upload-time = "2025-09-29T23:30:43.391Z" }, - { url = "https://files.pythonhosted.org/packages/44/91/483de934193e12a3b1d6ae7c8645d083ff88dec75f46e827562f1e4b4da6/pandas-2.3.3-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:d051c0e065b94b7a3cea50eb1ec32e912cd96dba41647eb24104b6c6c14c5788", size = 12699963, upload-time = "2025-09-29T23:31:10.009Z" }, - { url = "https://files.pythonhosted.org/packages/70/44/5191d2e4026f86a2a109053e194d3ba7a31a2d10a9c2348368c63ed4e85a/pandas-2.3.3-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:3869faf4bd07b3b66a9f462417d0ca3a9df29a9f6abd5d0d0dbab15dac7abe87", size = 13202175, upload-time = "2025-09-29T23:31:59.173Z" }, -] - [[package]] name = "pandas" version = "3.0.1" source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version >= '3.14' and sys_platform == 'win32'", - "python_full_version >= '3.14' and sys_platform == 'emscripten'", - "python_full_version >= '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'win32'", - "python_full_version == '3.11.*' and sys_platform == 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'emscripten'", - "python_full_version == '3.11.*' and sys_platform == 'emscripten'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version == '3.11.*' and sys_platform != 'emscripten' and sys_platform != 'win32'", -] dependencies = [ - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "python-dateutil", marker = "python_full_version >= '3.11'" }, - { name = "tzdata", marker = "(python_full_version >= '3.11' and sys_platform == 'emscripten') or (python_full_version >= '3.11' and sys_platform == 'win32')" }, + { name = "numpy" }, + { name = "python-dateutil" }, + { name = "tzdata", marker = "sys_platform == 'emscripten' or sys_platform == 'win32'" }, ] sdist = { url = "https://files.pythonhosted.org/packages/2e/0c/b28ed414f080ee0ad153f848586d61d1878f91689950f037f976ce15f6c8/pandas-3.0.1.tar.gz", hash = "sha256:4186a699674af418f655dbd420ed87f50d56b4cd6603784279d9eef6627823c8", size = 4641901, upload-time = "2026-02-17T22:20:16.434Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/ff/07/c7087e003ceee9b9a82539b40414ec557aa795b584a1a346e89180853d79/pandas-3.0.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:de09668c1bf3b925c07e5762291602f0d789eca1b3a781f99c1c78f6cac0e7ea", size = 10323380, upload-time = "2026-02-17T22:18:16.133Z" }, - { url = "https://files.pythonhosted.org/packages/c1/27/90683c7122febeefe84a56f2cde86a9f05f68d53885cebcc473298dfc33e/pandas-3.0.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:24ba315ba3d6e5806063ac6eb717504e499ce30bd8c236d8693a5fd3f084c796", size = 9923455, upload-time = "2026-02-17T22:18:19.13Z" }, - { url = "https://files.pythonhosted.org/packages/0e/f1/ed17d927f9950643bc7631aa4c99ff0cc83a37864470bc419345b656a41f/pandas-3.0.1-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:406ce835c55bac912f2a0dcfaf27c06d73c6b04a5dde45f1fd3169ce31337389", size = 10753464, upload-time = "2026-02-17T22:18:21.134Z" }, - { url = "https://files.pythonhosted.org/packages/2e/7c/870c7e7daec2a6c7ff2ac9e33b23317230d4e4e954b35112759ea4a924a7/pandas-3.0.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:830994d7e1f31dd7e790045235605ab61cff6c94defc774547e8b7fdfbff3dc7", size = 11255234, upload-time = "2026-02-17T22:18:24.175Z" }, - { url = "https://files.pythonhosted.org/packages/5c/39/3653fe59af68606282b989c23d1a543ceba6e8099cbcc5f1d506a7bae2aa/pandas-3.0.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:a64ce8b0f2de1d2efd2ae40b0abe7f8ae6b29fbfb3812098ed5a6f8e235ad9bf", size = 11767299, upload-time = "2026-02-17T22:18:26.824Z" }, - { url = "https://files.pythonhosted.org/packages/9b/31/1daf3c0c94a849c7a8dab8a69697b36d313b229918002ba3e409265c7888/pandas-3.0.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:9832c2c69da24b602c32e0c7b1b508a03949c18ba08d4d9f1c1033426685b447", size = 12333292, upload-time = "2026-02-17T22:18:28.996Z" }, - { url = "https://files.pythonhosted.org/packages/1f/67/af63f83cd6ca603a00fe8530c10a60f0879265b8be00b5930e8e78c5b30b/pandas-3.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:84f0904a69e7365f79a0c77d3cdfccbfb05bf87847e3a51a41e1426b0edb9c79", size = 9892176, upload-time = "2026-02-17T22:18:31.79Z" }, - { url = "https://files.pythonhosted.org/packages/79/ab/9c776b14ac4b7b4140788eca18468ea39894bc7340a408f1d1e379856a6b/pandas-3.0.1-cp311-cp311-win_arm64.whl", hash = "sha256:4a68773d5a778afb31d12e34f7dd4612ab90de8c6fb1d8ffe5d4a03b955082a1", size = 9151328, upload-time = "2026-02-17T22:18:35.721Z" }, - { url = "https://files.pythonhosted.org/packages/37/51/b467209c08dae2c624873d7491ea47d2b47336e5403309d433ea79c38571/pandas-3.0.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:476f84f8c20c9f5bc47252b66b4bb25e1a9fc2fa98cead96744d8116cb85771d", size = 10344357, upload-time = "2026-02-17T22:18:38.262Z" }, - { url = "https://files.pythonhosted.org/packages/7c/f1/e2567ffc8951ab371db2e40b2fe068e36b81d8cf3260f06ae508700e5504/pandas-3.0.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:0ab749dfba921edf641d4036c4c21c0b3ea70fea478165cb98a998fb2a261955", size = 9884543, upload-time = "2026-02-17T22:18:41.476Z" }, - { url = "https://files.pythonhosted.org/packages/d7/39/327802e0b6d693182403c144edacbc27eb82907b57062f23ef5a4c4a5ea7/pandas-3.0.1-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b8e36891080b87823aff3640c78649b91b8ff6eea3c0d70aeabd72ea43ab069b", size = 10396030, upload-time = "2026-02-17T22:18:43.822Z" }, - { url = "https://files.pythonhosted.org/packages/3d/fe/89d77e424365280b79d99b3e1e7d606f5165af2f2ecfaf0c6d24c799d607/pandas-3.0.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:532527a701281b9dd371e2f582ed9094f4c12dd9ffb82c0c54ee28d8ac9520c4", size = 10876435, upload-time = "2026-02-17T22:18:45.954Z" }, - { url = "https://files.pythonhosted.org/packages/b5/a6/2a75320849dd154a793f69c951db759aedb8d1dd3939eeacda9bdcfa1629/pandas-3.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:356e5c055ed9b0da1580d465657bc7d00635af4fd47f30afb23025352ba764d1", size = 11405133, upload-time = "2026-02-17T22:18:48.533Z" }, - { url = "https://files.pythonhosted.org/packages/58/53/1d68fafb2e02d7881df66aa53be4cd748d25cbe311f3b3c85c93ea5d30ca/pandas-3.0.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:9d810036895f9ad6345b8f2a338dd6998a74e8483847403582cab67745bff821", size = 11932065, upload-time = "2026-02-17T22:18:50.837Z" }, - { url = "https://files.pythonhosted.org/packages/75/08/67cc404b3a966b6df27b38370ddd96b3b023030b572283d035181854aac5/pandas-3.0.1-cp312-cp312-win_amd64.whl", hash = "sha256:536232a5fe26dd989bd633e7a0c450705fdc86a207fec7254a55e9a22950fe43", size = 9741627, upload-time = "2026-02-17T22:18:53.905Z" }, - { url = "https://files.pythonhosted.org/packages/86/4f/caf9952948fb00d23795f09b893d11f1cacb384e666854d87249530f7cbe/pandas-3.0.1-cp312-cp312-win_arm64.whl", hash = "sha256:0f463ebfd8de7f326d38037c7363c6dacb857c5881ab8961fb387804d6daf2f7", size = 9052483, upload-time = "2026-02-17T22:18:57.31Z" }, { url = "https://files.pythonhosted.org/packages/0b/48/aad6ec4f8d007534c091e9a7172b3ec1b1ee6d99a9cbb936b5eab6c6cf58/pandas-3.0.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:5272627187b5d9c20e55d27caf5f2cd23e286aba25cadf73c8590e432e2b7262", size = 10317509, upload-time = "2026-02-17T22:18:59.498Z" }, { url = "https://files.pythonhosted.org/packages/a8/14/5990826f779f79148ae9d3a2c39593dc04d61d5d90541e71b5749f35af95/pandas-3.0.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:661e0f665932af88c7877f31da0dc743fe9c8f2524bdffe23d24fdcb67ef9d56", size = 9860561, upload-time = "2026-02-17T22:19:02.265Z" }, { url = "https://files.pythonhosted.org/packages/fa/80/f01ff54664b6d70fed71475543d108a9b7c888e923ad210795bef04ffb7d/pandas-3.0.1-cp313-cp313-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:75e6e292ff898679e47a2199172593d9f6107fd2dd3617c22c2946e97d5df46e", size = 10365506, upload-time = "2026-02-17T22:19:05.017Z" }, @@ -1515,7 +1131,7 @@ name = "patsy" version = "1.0.2" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, + { name = "numpy" }, ] sdist = { url = "https://files.pythonhosted.org/packages/be/44/ed13eccdd0519eff265f44b670d46fbb0ec813e2274932dc1c0e48520f7d/patsy-1.0.2.tar.gz", hash = "sha256:cdc995455f6233e90e22de72c37fcadb344e7586fb83f06696f54d92f8ce74c0", size = 399942, upload-time = "2025-10-20T16:17:37.535Z" } wheels = [ @@ -1527,7 +1143,7 @@ name = "pexpect" version = "4.9.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "ptyprocess", marker = "python_full_version >= '3.11' and sys_platform != 'emscripten' and sys_platform != 'win32'" }, + { name = "ptyprocess", marker = "sys_platform != 'emscripten' and sys_platform != 'win32'" }, ] sdist = { url = "https://files.pythonhosted.org/packages/42/92/cc564bf6381ff43ce1f4d06852fc19a2f11d180f23dc32d9588bee2f149d/pexpect-4.9.0.tar.gz", hash = "sha256:ee7d41123f3c9911050ea2c2dac107568dc43b2d3b0c7557a33212c398ead30f", size = 166450, upload-time = "2023-11-25T09:07:26.339Z" } wheels = [ @@ -1539,8 +1155,8 @@ name = "plotly" version = "5.24.1" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "packaging", marker = "python_full_version >= '3.11'" }, - { name = "tenacity", marker = "python_full_version >= '3.11'" }, + { name = "packaging" }, + { name = "tenacity" }, ] sdist = { url = "https://files.pythonhosted.org/packages/79/4f/428f6d959818d7425a94c190a6b26fbc58035cbef40bf249be0b62a9aedd/plotly-5.24.1.tar.gz", hash = "sha256:dbc8ac8339d248a4bcc36e08a5659bacfe1b079390b8953533f4eb22169b4bae", size = 9479398, upload-time = "2024-09-12T15:36:31.068Z" } wheels = [ @@ -1589,22 +1205,22 @@ name = "policyengine-core" version = "3.23.6" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "dpath", marker = "python_full_version >= '3.11'" }, - { name = "h5py", marker = "python_full_version >= '3.11'" }, - { name = "huggingface-hub", marker = "python_full_version >= '3.11'" }, - { name = "ipython", marker = "python_full_version >= '3.11'" }, - { name = "microdf-python", marker = "python_full_version >= '3.11'" }, - { name = "numexpr", marker = "python_full_version >= '3.11'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "plotly", marker = "python_full_version >= '3.11'" }, - { name = "psutil", marker = "python_full_version >= '3.11'" }, - { name = "pytest", marker = "python_full_version >= '3.11'" }, - { name = "pyvis", marker = "python_full_version >= '3.11'" }, - { name = "requests", marker = "python_full_version >= '3.11'" }, - { name = "sortedcontainers", marker = "python_full_version >= '3.11'" }, - { name = "standard-imghdr", marker = "python_full_version >= '3.11'" }, - { name = "wheel", marker = "python_full_version >= '3.11'" }, + { name = "dpath" }, + { name = "h5py" }, + { name = "huggingface-hub" }, + { name = "ipython" }, + { name = "microdf-python" }, + { name = "numexpr" }, + { name = "numpy" }, + { name = "pandas" }, + { name = "plotly" }, + { name = "psutil" }, + { name = "pytest" }, + { name = "pyvis" }, + { name = "requests" }, + { name = "sortedcontainers" }, + { name = "standard-imghdr" }, + { name = "wheel" }, ] sdist = { url = "https://files.pythonhosted.org/packages/5d/de/5bc5b02626703ea7d288c84c474ec51e823aa726d55ebabafe7c85e7285f/policyengine_core-3.23.6.tar.gz", hash = "sha256:81bb4057f5d6380f2d7f1af2fe4932bd3bd37fdfda7b841f7ee38b30aa5cc8e6", size = 163499, upload-time = "2026-01-25T14:04:43.233Z" } wheels = [ @@ -1616,10 +1232,10 @@ name = "policyengine-us" version = "1.587.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "microdf-python", marker = "python_full_version >= '3.11'" }, - { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "policyengine-core", marker = "python_full_version >= '3.11'" }, - { name = "tqdm", marker = "python_full_version >= '3.11'" }, + { name = "microdf-python" }, + { name = "pandas" }, + { name = "policyengine-core" }, + { name = "tqdm" }, ] sdist = { url = "https://files.pythonhosted.org/packages/a8/15/8a12714d124b509346e60c927f7f344ee3b99c2b280bcfa9a053395d68e6/policyengine_us-1.587.0.tar.gz", hash = "sha256:399339eeea9a38caf6800432bc5eaa3b07b7b09ea269f4f3ba9f9c02aae587b9", size = 8630430, upload-time = "2026-02-25T23:35:46.002Z" } wheels = [ @@ -1632,12 +1248,9 @@ version = "0.2" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "joblib" }, - { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "scikit-learn", version = "1.7.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "scikit-learn", version = "1.8.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "scipy", version = "1.17.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "numpy" }, + { name = "scikit-learn" }, + { name = "scipy" }, ] sdist = { url = "https://files.pythonhosted.org/packages/16/3f/85c603c872ca28c870f1bd54bbe7020f5921efc1c04a9db32b75cf0c287c/prdc-0.2.tar.gz", hash = "sha256:247466c31743f334a2714dbd60ef62e523877c4162ddb7dc63a404cada09316f", size = 5253, upload-time = "2020-02-25T04:54:58.478Z" } wheels = [ @@ -1649,7 +1262,7 @@ name = "prompt-toolkit" version = "3.0.52" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "wcwidth", marker = "python_full_version >= '3.11'" }, + { name = "wcwidth" }, ] sdist = { url = "https://files.pythonhosted.org/packages/a1/96/06e01a7b38dce6fe1db213e061a4602dd6032a8a97ef6c1a862537732421/prompt_toolkit-3.0.52.tar.gz", hash = "sha256:28cde192929c8e7321de85de1ddbe736f1375148b02f2e17edd840042b1be855", size = 434198, upload-time = "2025-08-27T15:24:02.057Z" } wheels = [ @@ -1695,27 +1308,6 @@ version = "23.0.1" source = { registry = "https://pypi.org/simple" } sdist = { url = "https://files.pythonhosted.org/packages/88/22/134986a4cc224d593c1afde5494d18ff629393d74cc2eddb176669f234a4/pyarrow-23.0.1.tar.gz", hash = "sha256:b8c5873e33440b2bc2f4a79d2b47017a89c5a24116c055625e6f2ee50523f019", size = 1167336, upload-time = "2026-02-16T10:14:12.39Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/bc/a8/24e5dc6855f50a62936ceb004e6e9645e4219a8065f304145d7fb8a79d5d/pyarrow-23.0.1-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:3fab8f82571844eb3c460f90a75583801d14ca0cc32b1acc8c361650e006fd56", size = 34307390, upload-time = "2026-02-16T10:08:08.654Z" }, - { url = "https://files.pythonhosted.org/packages/bc/8e/4be5617b4aaae0287f621ad31c6036e5f63118cfca0dc57d42121ff49b51/pyarrow-23.0.1-cp310-cp310-macosx_12_0_x86_64.whl", hash = "sha256:3f91c038b95f71ddfc865f11d5876c42f343b4495535bd262c7b321b0b94507c", size = 35853761, upload-time = "2026-02-16T10:08:17.811Z" }, - { url = "https://files.pythonhosted.org/packages/2e/08/3e56a18819462210432ae37d10f5c8eed3828be1d6c751b6e6a2e93c286a/pyarrow-23.0.1-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:d0744403adabef53c985a7f8a082b502a368510c40d184df349a0a8754533258", size = 44493116, upload-time = "2026-02-16T10:08:25.792Z" }, - { url = "https://files.pythonhosted.org/packages/f8/82/c40b68001dbec8a3faa4c08cd8c200798ac732d2854537c5449dc859f55a/pyarrow-23.0.1-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:c33b5bf406284fd0bba436ed6f6c3ebe8e311722b441d89397c54f871c6863a2", size = 47564532, upload-time = "2026-02-16T10:08:34.27Z" }, - { url = "https://files.pythonhosted.org/packages/20/bc/73f611989116b6f53347581b02177f9f620efdf3cd3f405d0e83cdf53a83/pyarrow-23.0.1-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:ddf743e82f69dcd6dbbcb63628895d7161e04e56794ef80550ac6f3315eeb1d5", size = 48183685, upload-time = "2026-02-16T10:08:42.889Z" }, - { url = "https://files.pythonhosted.org/packages/b0/cc/6c6b3ecdae2a8c3aced99956187e8302fc954cc2cca2a37cf2111dad16ce/pyarrow-23.0.1-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:e052a211c5ac9848ae15d5ec875ed0943c0221e2fcfe69eee80b604b4e703222", size = 50605582, upload-time = "2026-02-16T10:08:51.641Z" }, - { url = "https://files.pythonhosted.org/packages/8d/94/d359e708672878d7638a04a0448edf7c707f9e5606cee11e15aaa5c7535a/pyarrow-23.0.1-cp310-cp310-win_amd64.whl", hash = "sha256:5abde149bb3ce524782d838eb67ac095cd3fd6090eba051130589793f1a7f76d", size = 27521148, upload-time = "2026-02-16T10:08:58.077Z" }, - { url = "https://files.pythonhosted.org/packages/b0/41/8e6b6ef7e225d4ceead8459427a52afdc23379768f54dd3566014d7618c1/pyarrow-23.0.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:6f0147ee9e0386f519c952cc670eb4a8b05caa594eeffe01af0e25f699e4e9bb", size = 34302230, upload-time = "2026-02-16T10:09:03.859Z" }, - { url = "https://files.pythonhosted.org/packages/bf/4a/1472c00392f521fea03ae93408bf445cc7bfa1ab81683faf9bc188e36629/pyarrow-23.0.1-cp311-cp311-macosx_12_0_x86_64.whl", hash = "sha256:0ae6e17c828455b6265d590100c295193f93cc5675eb0af59e49dbd00d2de350", size = 35850050, upload-time = "2026-02-16T10:09:11.877Z" }, - { url = "https://files.pythonhosted.org/packages/0c/b2/bd1f2f05ded56af7f54d702c8364c9c43cd6abb91b0e9933f3d77b4f4132/pyarrow-23.0.1-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:fed7020203e9ef273360b9e45be52a2a47d3103caf156a30ace5247ffb51bdbd", size = 44491918, upload-time = "2026-02-16T10:09:18.144Z" }, - { url = "https://files.pythonhosted.org/packages/0b/62/96459ef5b67957eac38a90f541d1c28833d1b367f014a482cb63f3b7cd2d/pyarrow-23.0.1-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:26d50dee49d741ac0e82185033488d28d35be4d763ae6f321f97d1140eb7a0e9", size = 47562811, upload-time = "2026-02-16T10:09:25.792Z" }, - { url = "https://files.pythonhosted.org/packages/7d/94/1170e235add1f5f45a954e26cd0e906e7e74e23392dcb560de471f7366ec/pyarrow-23.0.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:3c30143b17161310f151f4a2bcfe41b5ff744238c1039338779424e38579d701", size = 48183766, upload-time = "2026-02-16T10:09:34.645Z" }, - { url = "https://files.pythonhosted.org/packages/0e/2d/39a42af4570377b99774cdb47f63ee6c7da7616bd55b3d5001aa18edfe4f/pyarrow-23.0.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:db2190fa79c80a23fdd29fef4b8992893f024ae7c17d2f5f4db7171fa30c2c78", size = 50607669, upload-time = "2026-02-16T10:09:44.153Z" }, - { url = "https://files.pythonhosted.org/packages/00/ca/db94101c187f3df742133ac837e93b1f269ebdac49427f8310ee40b6a58f/pyarrow-23.0.1-cp311-cp311-win_amd64.whl", hash = "sha256:f00f993a8179e0e1c9713bcc0baf6d6c01326a406a9c23495ec1ba9c9ebf2919", size = 27527698, upload-time = "2026-02-16T10:09:50.263Z" }, - { url = "https://files.pythonhosted.org/packages/9a/4b/4166bb5abbfe6f750fc60ad337c43ecf61340fa52ab386da6e8dbf9e63c4/pyarrow-23.0.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:f4b0dbfa124c0bb161f8b5ebb40f1a680b70279aa0c9901d44a2b5a20806039f", size = 34214575, upload-time = "2026-02-16T10:09:56.225Z" }, - { url = "https://files.pythonhosted.org/packages/e1/da/3f941e3734ac8088ea588b53e860baeddac8323ea40ce22e3d0baa865cc9/pyarrow-23.0.1-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:7707d2b6673f7de054e2e83d59f9e805939038eebe1763fe811ee8fa5c0cd1a7", size = 35832540, upload-time = "2026-02-16T10:10:03.428Z" }, - { url = "https://files.pythonhosted.org/packages/88/7c/3d841c366620e906d54430817531b877ba646310296df42ef697308c2705/pyarrow-23.0.1-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:86ff03fb9f1a320266e0de855dee4b17da6794c595d207f89bba40d16b5c78b9", size = 44470940, upload-time = "2026-02-16T10:10:10.704Z" }, - { url = "https://files.pythonhosted.org/packages/2c/a5/da83046273d990f256cb79796a190bbf7ec999269705ddc609403f8c6b06/pyarrow-23.0.1-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:813d99f31275919c383aab17f0f455a04f5a429c261cc411b1e9a8f5e4aaaa05", size = 47586063, upload-time = "2026-02-16T10:10:17.95Z" }, - { url = "https://files.pythonhosted.org/packages/5b/3c/b7d2ebcff47a514f47f9da1e74b7949138c58cfeb108cdd4ee62f43f0cf3/pyarrow-23.0.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:bf5842f960cddd2ef757d486041d57c96483efc295a8c4a0e20e704cbbf39c67", size = 48173045, upload-time = "2026-02-16T10:10:25.363Z" }, - { url = "https://files.pythonhosted.org/packages/43/b2/b40961262213beaba6acfc88698eb773dfce32ecdf34d19291db94c2bd73/pyarrow-23.0.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:564baf97c858ecc03ec01a41062e8f4698abc3e6e2acd79c01c2e97880a19730", size = 50621741, upload-time = "2026-02-16T10:10:33.477Z" }, - { url = "https://files.pythonhosted.org/packages/f6/70/1fdda42d65b28b078e93d75d371b2185a61da89dda4def8ba6ba41ebdeb4/pyarrow-23.0.1-cp312-cp312-win_amd64.whl", hash = "sha256:07deae7783782ac7250989a7b2ecde9b3c343a643f82e8a4df03d93b633006f0", size = 27620678, upload-time = "2026-02-16T10:10:39.31Z" }, { url = "https://files.pythonhosted.org/packages/47/10/2cbe4c6f0fb83d2de37249567373d64327a5e4d8db72f486db42875b08f6/pyarrow-23.0.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:6b8fda694640b00e8af3c824f99f789e836720aa8c9379fb435d4c4953a756b8", size = 34210066, upload-time = "2026-02-16T10:10:45.487Z" }, { url = "https://files.pythonhosted.org/packages/cb/4f/679fa7e84dadbaca7a65f7cdba8d6c83febbd93ca12fa4adf40ba3b6362b/pyarrow-23.0.1-cp313-cp313-macosx_12_0_x86_64.whl", hash = "sha256:8ff51b1addc469b9444b7c6f3548e19dc931b172ab234e995a60aea9f6e6025f", size = 35825526, upload-time = "2026-02-16T10:10:52.266Z" }, { url = "https://files.pythonhosted.org/packages/f9/63/d2747d930882c9d661e9398eefc54f15696547b8983aaaf11d4a2e8b5426/pyarrow-23.0.1-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:71c5be5cbf1e1cb6169d2a0980850bccb558ddc9b747b6206435313c47c37677", size = 44473279, upload-time = "2026-02-16T10:11:01.557Z" }, @@ -1770,47 +1362,6 @@ dependencies = [ ] sdist = { url = "https://files.pythonhosted.org/packages/71/70/23b021c950c2addd24ec408e9ab05d59b035b39d97cdc1130e1bce647bb6/pydantic_core-2.41.5.tar.gz", hash = "sha256:08daa51ea16ad373ffd5e7606252cc32f07bc72b28284b6bc9c6df804816476e", size = 460952, upload-time = "2025-11-04T13:43:49.098Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/c6/90/32c9941e728d564b411d574d8ee0cf09b12ec978cb22b294995bae5549a5/pydantic_core-2.41.5-cp310-cp310-macosx_10_12_x86_64.whl", hash = "sha256:77b63866ca88d804225eaa4af3e664c5faf3568cea95360d21f4725ab6e07146", size = 2107298, upload-time = "2025-11-04T13:39:04.116Z" }, - { url = "https://files.pythonhosted.org/packages/fb/a8/61c96a77fe28993d9a6fb0f4127e05430a267b235a124545d79fea46dd65/pydantic_core-2.41.5-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:dfa8a0c812ac681395907e71e1274819dec685fec28273a28905df579ef137e2", size = 1901475, upload-time = "2025-11-04T13:39:06.055Z" }, - { url = "https://files.pythonhosted.org/packages/5d/b6/338abf60225acc18cdc08b4faef592d0310923d19a87fba1faf05af5346e/pydantic_core-2.41.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5921a4d3ca3aee735d9fd163808f5e8dd6c6972101e4adbda9a4667908849b97", size = 1918815, upload-time = "2025-11-04T13:39:10.41Z" }, - { url = "https://files.pythonhosted.org/packages/d1/1c/2ed0433e682983d8e8cba9c8d8ef274d4791ec6a6f24c58935b90e780e0a/pydantic_core-2.41.5-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e25c479382d26a2a41b7ebea1043564a937db462816ea07afa8a44c0866d52f9", size = 2065567, upload-time = "2025-11-04T13:39:12.244Z" }, - { url = "https://files.pythonhosted.org/packages/b3/24/cf84974ee7d6eae06b9e63289b7b8f6549d416b5c199ca2d7ce13bbcf619/pydantic_core-2.41.5-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:f547144f2966e1e16ae626d8ce72b4cfa0caedc7fa28052001c94fb2fcaa1c52", size = 2230442, upload-time = "2025-11-04T13:39:13.962Z" }, - { url = "https://files.pythonhosted.org/packages/fd/21/4e287865504b3edc0136c89c9c09431be326168b1eb7841911cbc877a995/pydantic_core-2.41.5-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:6f52298fbd394f9ed112d56f3d11aabd0d5bd27beb3084cc3d8ad069483b8941", size = 2350956, upload-time = "2025-11-04T13:39:15.889Z" }, - { url = "https://files.pythonhosted.org/packages/a8/76/7727ef2ffa4b62fcab916686a68a0426b9b790139720e1934e8ba797e238/pydantic_core-2.41.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:100baa204bb412b74fe285fb0f3a385256dad1d1879f0a5cb1499ed2e83d132a", size = 2068253, upload-time = "2025-11-04T13:39:17.403Z" }, - { url = "https://files.pythonhosted.org/packages/d5/8c/a4abfc79604bcb4c748e18975c44f94f756f08fb04218d5cb87eb0d3a63e/pydantic_core-2.41.5-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:05a2c8852530ad2812cb7914dc61a1125dc4e06252ee98e5638a12da6cc6fb6c", size = 2177050, upload-time = "2025-11-04T13:39:19.351Z" }, - { url = "https://files.pythonhosted.org/packages/67/b1/de2e9a9a79b480f9cb0b6e8b6ba4c50b18d4e89852426364c66aa82bb7b3/pydantic_core-2.41.5-cp310-cp310-musllinux_1_1_aarch64.whl", hash = "sha256:29452c56df2ed968d18d7e21f4ab0ac55e71dc59524872f6fc57dcf4a3249ed2", size = 2147178, upload-time = "2025-11-04T13:39:21Z" }, - { url = "https://files.pythonhosted.org/packages/16/c1/dfb33f837a47b20417500efaa0378adc6635b3c79e8369ff7a03c494b4ac/pydantic_core-2.41.5-cp310-cp310-musllinux_1_1_armv7l.whl", hash = "sha256:d5160812ea7a8a2ffbe233d8da666880cad0cbaf5d4de74ae15c313213d62556", size = 2341833, upload-time = "2025-11-04T13:39:22.606Z" }, - { url = "https://files.pythonhosted.org/packages/47/36/00f398642a0f4b815a9a558c4f1dca1b4020a7d49562807d7bc9ff279a6c/pydantic_core-2.41.5-cp310-cp310-musllinux_1_1_x86_64.whl", hash = "sha256:df3959765b553b9440adfd3c795617c352154e497a4eaf3752555cfb5da8fc49", size = 2321156, upload-time = "2025-11-04T13:39:25.843Z" }, - { url = "https://files.pythonhosted.org/packages/7e/70/cad3acd89fde2010807354d978725ae111ddf6d0ea46d1ea1775b5c1bd0c/pydantic_core-2.41.5-cp310-cp310-win32.whl", hash = "sha256:1f8d33a7f4d5a7889e60dc39856d76d09333d8a6ed0f5f1190635cbec70ec4ba", size = 1989378, upload-time = "2025-11-04T13:39:27.92Z" }, - { url = "https://files.pythonhosted.org/packages/76/92/d338652464c6c367e5608e4488201702cd1cbb0f33f7b6a85a60fe5f3720/pydantic_core-2.41.5-cp310-cp310-win_amd64.whl", hash = "sha256:62de39db01b8d593e45871af2af9e497295db8d73b085f6bfd0b18c83c70a8f9", size = 2013622, upload-time = "2025-11-04T13:39:29.848Z" }, - { url = "https://files.pythonhosted.org/packages/e8/72/74a989dd9f2084b3d9530b0915fdda64ac48831c30dbf7c72a41a5232db8/pydantic_core-2.41.5-cp311-cp311-macosx_10_12_x86_64.whl", hash = "sha256:a3a52f6156e73e7ccb0f8cced536adccb7042be67cb45f9562e12b319c119da6", size = 2105873, upload-time = "2025-11-04T13:39:31.373Z" }, - { url = "https://files.pythonhosted.org/packages/12/44/37e403fd9455708b3b942949e1d7febc02167662bf1a7da5b78ee1ea2842/pydantic_core-2.41.5-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7f3bf998340c6d4b0c9a2f02d6a400e51f123b59565d74dc60d252ce888c260b", size = 1899826, upload-time = "2025-11-04T13:39:32.897Z" }, - { url = "https://files.pythonhosted.org/packages/33/7f/1d5cab3ccf44c1935a359d51a8a2a9e1a654b744b5e7f80d41b88d501eec/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:378bec5c66998815d224c9ca994f1e14c0c21cb95d2f52b6021cc0b2a58f2a5a", size = 1917869, upload-time = "2025-11-04T13:39:34.469Z" }, - { url = "https://files.pythonhosted.org/packages/6e/6a/30d94a9674a7fe4f4744052ed6c5e083424510be1e93da5bc47569d11810/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:e7b576130c69225432866fe2f4a469a85a54ade141d96fd396dffcf607b558f8", size = 2063890, upload-time = "2025-11-04T13:39:36.053Z" }, - { url = "https://files.pythonhosted.org/packages/50/be/76e5d46203fcb2750e542f32e6c371ffa9b8ad17364cf94bb0818dbfb50c/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:6cb58b9c66f7e4179a2d5e0f849c48eff5c1fca560994d6eb6543abf955a149e", size = 2229740, upload-time = "2025-11-04T13:39:37.753Z" }, - { url = "https://files.pythonhosted.org/packages/d3/ee/fed784df0144793489f87db310a6bbf8118d7b630ed07aa180d6067e653a/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:88942d3a3dff3afc8288c21e565e476fc278902ae4d6d134f1eeda118cc830b1", size = 2350021, upload-time = "2025-11-04T13:39:40.94Z" }, - { url = "https://files.pythonhosted.org/packages/c8/be/8fed28dd0a180dca19e72c233cbf58efa36df055e5b9d90d64fd1740b828/pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f31d95a179f8d64d90f6831d71fa93290893a33148d890ba15de25642c5d075b", size = 2066378, upload-time = "2025-11-04T13:39:42.523Z" }, - { url = "https://files.pythonhosted.org/packages/b0/3b/698cf8ae1d536a010e05121b4958b1257f0b5522085e335360e53a6b1c8b/pydantic_core-2.41.5-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:c1df3d34aced70add6f867a8cf413e299177e0c22660cc767218373d0779487b", size = 2175761, upload-time = "2025-11-04T13:39:44.553Z" }, - { url = "https://files.pythonhosted.org/packages/b8/ba/15d537423939553116dea94ce02f9c31be0fa9d0b806d427e0308ec17145/pydantic_core-2.41.5-cp311-cp311-musllinux_1_1_aarch64.whl", hash = "sha256:4009935984bd36bd2c774e13f9a09563ce8de4abaa7226f5108262fa3e637284", size = 2146303, upload-time = "2025-11-04T13:39:46.238Z" }, - { url = "https://files.pythonhosted.org/packages/58/7f/0de669bf37d206723795f9c90c82966726a2ab06c336deba4735b55af431/pydantic_core-2.41.5-cp311-cp311-musllinux_1_1_armv7l.whl", hash = "sha256:34a64bc3441dc1213096a20fe27e8e128bd3ff89921706e83c0b1ac971276594", size = 2340355, upload-time = "2025-11-04T13:39:48.002Z" }, - { url = "https://files.pythonhosted.org/packages/e5/de/e7482c435b83d7e3c3ee5ee4451f6e8973cff0eb6007d2872ce6383f6398/pydantic_core-2.41.5-cp311-cp311-musllinux_1_1_x86_64.whl", hash = "sha256:c9e19dd6e28fdcaa5a1de679aec4141f691023916427ef9bae8584f9c2fb3b0e", size = 2319875, upload-time = "2025-11-04T13:39:49.705Z" }, - { url = "https://files.pythonhosted.org/packages/fe/e6/8c9e81bb6dd7560e33b9053351c29f30c8194b72f2d6932888581f503482/pydantic_core-2.41.5-cp311-cp311-win32.whl", hash = "sha256:2c010c6ded393148374c0f6f0bf89d206bf3217f201faa0635dcd56bd1520f6b", size = 1987549, upload-time = "2025-11-04T13:39:51.842Z" }, - { url = "https://files.pythonhosted.org/packages/11/66/f14d1d978ea94d1bc21fc98fcf570f9542fe55bfcc40269d4e1a21c19bf7/pydantic_core-2.41.5-cp311-cp311-win_amd64.whl", hash = "sha256:76ee27c6e9c7f16f47db7a94157112a2f3a00e958bc626e2f4ee8bec5c328fbe", size = 2011305, upload-time = "2025-11-04T13:39:53.485Z" }, - { url = "https://files.pythonhosted.org/packages/56/d8/0e271434e8efd03186c5386671328154ee349ff0354d83c74f5caaf096ed/pydantic_core-2.41.5-cp311-cp311-win_arm64.whl", hash = "sha256:4bc36bbc0b7584de96561184ad7f012478987882ebf9f9c389b23f432ea3d90f", size = 1972902, upload-time = "2025-11-04T13:39:56.488Z" }, - { url = "https://files.pythonhosted.org/packages/5f/5d/5f6c63eebb5afee93bcaae4ce9a898f3373ca23df3ccaef086d0233a35a7/pydantic_core-2.41.5-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:f41a7489d32336dbf2199c8c0a215390a751c5b014c2c1c5366e817202e9cdf7", size = 2110990, upload-time = "2025-11-04T13:39:58.079Z" }, - { url = "https://files.pythonhosted.org/packages/aa/32/9c2e8ccb57c01111e0fd091f236c7b371c1bccea0fa85247ac55b1e2b6b6/pydantic_core-2.41.5-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:070259a8818988b9a84a449a2a7337c7f430a22acc0859c6b110aa7212a6d9c0", size = 1896003, upload-time = "2025-11-04T13:39:59.956Z" }, - { url = "https://files.pythonhosted.org/packages/68/b8/a01b53cb0e59139fbc9e4fda3e9724ede8de279097179be4ff31f1abb65a/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:e96cea19e34778f8d59fe40775a7a574d95816eb150850a85a7a4c8f4b94ac69", size = 1919200, upload-time = "2025-11-04T13:40:02.241Z" }, - { url = "https://files.pythonhosted.org/packages/38/de/8c36b5198a29bdaade07b5985e80a233a5ac27137846f3bc2d3b40a47360/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:ed2e99c456e3fadd05c991f8f437ef902e00eedf34320ba2b0842bd1c3ca3a75", size = 2052578, upload-time = "2025-11-04T13:40:04.401Z" }, - { url = "https://files.pythonhosted.org/packages/00/b5/0e8e4b5b081eac6cb3dbb7e60a65907549a1ce035a724368c330112adfdd/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:65840751b72fbfd82c3c640cff9284545342a4f1eb1586ad0636955b261b0b05", size = 2208504, upload-time = "2025-11-04T13:40:06.072Z" }, - { url = "https://files.pythonhosted.org/packages/77/56/87a61aad59c7c5b9dc8caad5a41a5545cba3810c3e828708b3d7404f6cef/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:e536c98a7626a98feb2d3eaf75944ef6f3dbee447e1f841eae16f2f0a72d8ddc", size = 2335816, upload-time = "2025-11-04T13:40:07.835Z" }, - { url = "https://files.pythonhosted.org/packages/0d/76/941cc9f73529988688a665a5c0ecff1112b3d95ab48f81db5f7606f522d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:eceb81a8d74f9267ef4081e246ffd6d129da5d87e37a77c9bde550cb04870c1c", size = 2075366, upload-time = "2025-11-04T13:40:09.804Z" }, - { url = "https://files.pythonhosted.org/packages/d3/43/ebef01f69baa07a482844faaa0a591bad1ef129253ffd0cdaa9d8a7f72d3/pydantic_core-2.41.5-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:d38548150c39b74aeeb0ce8ee1d8e82696f4a4e16ddc6de7b1d8823f7de4b9b5", size = 2171698, upload-time = "2025-11-04T13:40:12.004Z" }, - { url = "https://files.pythonhosted.org/packages/b1/87/41f3202e4193e3bacfc2c065fab7706ebe81af46a83d3e27605029c1f5a6/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:c23e27686783f60290e36827f9c626e63154b82b116d7fe9adba1fda36da706c", size = 2132603, upload-time = "2025-11-04T13:40:13.868Z" }, - { url = "https://files.pythonhosted.org/packages/49/7d/4c00df99cb12070b6bccdef4a195255e6020a550d572768d92cc54dba91a/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_armv7l.whl", hash = "sha256:482c982f814460eabe1d3bb0adfdc583387bd4691ef00b90575ca0d2b6fe2294", size = 2329591, upload-time = "2025-11-04T13:40:15.672Z" }, - { url = "https://files.pythonhosted.org/packages/cc/6a/ebf4b1d65d458f3cda6a7335d141305dfa19bdc61140a884d165a8a1bbc7/pydantic_core-2.41.5-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:bfea2a5f0b4d8d43adf9d7b8bf019fb46fdd10a2e5cde477fbcb9d1fa08c68e1", size = 2319068, upload-time = "2025-11-04T13:40:17.532Z" }, - { url = "https://files.pythonhosted.org/packages/49/3b/774f2b5cd4192d5ab75870ce4381fd89cf218af999515baf07e7206753f0/pydantic_core-2.41.5-cp312-cp312-win32.whl", hash = "sha256:b74557b16e390ec12dca509bce9264c3bbd128f8a2c376eaa68003d7f327276d", size = 1985908, upload-time = "2025-11-04T13:40:19.309Z" }, - { url = "https://files.pythonhosted.org/packages/86/45/00173a033c801cacf67c190fef088789394feaf88a98a7035b0e40d53dc9/pydantic_core-2.41.5-cp312-cp312-win_amd64.whl", hash = "sha256:1962293292865bca8e54702b08a4f26da73adc83dd1fcf26fbc875b35d81c815", size = 2020145, upload-time = "2025-11-04T13:40:21.548Z" }, - { url = "https://files.pythonhosted.org/packages/f9/22/91fbc821fa6d261b376a3f73809f907cec5ca6025642c463d3488aad22fb/pydantic_core-2.41.5-cp312-cp312-win_arm64.whl", hash = "sha256:1746d4a3d9a794cacae06a5eaaccb4b8643a131d45fbc9af23e353dc0a5ba5c3", size = 1976179, upload-time = "2025-11-04T13:40:23.393Z" }, { url = "https://files.pythonhosted.org/packages/87/06/8806241ff1f70d9939f9af039c6c35f2360cf16e93c2ca76f184e76b1564/pydantic_core-2.41.5-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:941103c9be18ac8daf7b7adca8228f8ed6bb7a1849020f643b3a14d15b1924d9", size = 2120403, upload-time = "2025-11-04T13:40:25.248Z" }, { url = "https://files.pythonhosted.org/packages/94/02/abfa0e0bda67faa65fef1c84971c7e45928e108fe24333c81f3bfe35d5f5/pydantic_core-2.41.5-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:112e305c3314f40c93998e567879e887a3160bb8689ef3d2c04b6cc62c33ac34", size = 1896206, upload-time = "2025-11-04T13:40:27.099Z" }, { url = "https://files.pythonhosted.org/packages/15/df/a4c740c0943e93e6500f9eb23f4ca7ec9bf71b19e608ae5b579678c8d02f/pydantic_core-2.41.5-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0cbaad15cb0c90aa221d43c00e77bb33c93e8d36e0bf74760cd00e732d10a6a0", size = 1919307, upload-time = "2025-11-04T13:40:29.806Z" }, @@ -1853,30 +1404,6 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/5c/96/5fb7d8c3c17bc8c62fdb031c47d77a1af698f1d7a406b0f79aaa1338f9ad/pydantic_core-2.41.5-cp314-cp314t-win32.whl", hash = "sha256:b4ececa40ac28afa90871c2cc2b9ffd2ff0bf749380fbdf57d165fd23da353aa", size = 1988906, upload-time = "2025-11-04T13:41:56.606Z" }, { url = "https://files.pythonhosted.org/packages/22/ed/182129d83032702912c2e2d8bbe33c036f342cc735737064668585dac28f/pydantic_core-2.41.5-cp314-cp314t-win_amd64.whl", hash = "sha256:80aa89cad80b32a912a65332f64a4450ed00966111b6615ca6816153d3585a8c", size = 1981607, upload-time = "2025-11-04T13:41:58.889Z" }, { url = "https://files.pythonhosted.org/packages/9f/ed/068e41660b832bb0b1aa5b58011dea2a3fe0ba7861ff38c4d4904c1c1a99/pydantic_core-2.41.5-cp314-cp314t-win_arm64.whl", hash = "sha256:35b44f37a3199f771c3eaa53051bc8a70cd7b54f333531c59e29fd4db5d15008", size = 1974769, upload-time = "2025-11-04T13:42:01.186Z" }, - { url = "https://files.pythonhosted.org/packages/11/72/90fda5ee3b97e51c494938a4a44c3a35a9c96c19bba12372fb9c634d6f57/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-macosx_10_12_x86_64.whl", hash = "sha256:b96d5f26b05d03cc60f11a7761a5ded1741da411e7fe0909e27a5e6a0cb7b034", size = 2115441, upload-time = "2025-11-04T13:42:39.557Z" }, - { url = "https://files.pythonhosted.org/packages/1f/53/8942f884fa33f50794f119012dc6a1a02ac43a56407adaac20463df8e98f/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-macosx_11_0_arm64.whl", hash = "sha256:634e8609e89ceecea15e2d61bc9ac3718caaaa71963717bf3c8f38bfde64242c", size = 1930291, upload-time = "2025-11-04T13:42:42.169Z" }, - { url = "https://files.pythonhosted.org/packages/79/c8/ecb9ed9cd942bce09fc888ee960b52654fbdbede4ba6c2d6e0d3b1d8b49c/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:93e8740d7503eb008aa2df04d3b9735f845d43ae845e6dcd2be0b55a2da43cd2", size = 1948632, upload-time = "2025-11-04T13:42:44.564Z" }, - { url = "https://files.pythonhosted.org/packages/2e/1b/687711069de7efa6af934e74f601e2a4307365e8fdc404703afc453eab26/pydantic_core-2.41.5-graalpy311-graalpy242_311_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f15489ba13d61f670dcc96772e733aad1a6f9c429cc27574c6cdaed82d0146ad", size = 2138905, upload-time = "2025-11-04T13:42:47.156Z" }, - { url = "https://files.pythonhosted.org/packages/09/32/59b0c7e63e277fa7911c2fc70ccfb45ce4b98991e7ef37110663437005af/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_10_12_x86_64.whl", hash = "sha256:7da7087d756b19037bc2c06edc6c170eeef3c3bafcb8f532ff17d64dc427adfd", size = 2110495, upload-time = "2025-11-04T13:42:49.689Z" }, - { url = "https://files.pythonhosted.org/packages/aa/81/05e400037eaf55ad400bcd318c05bb345b57e708887f07ddb2d20e3f0e98/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-macosx_11_0_arm64.whl", hash = "sha256:aabf5777b5c8ca26f7824cb4a120a740c9588ed58df9b2d196ce92fba42ff8dc", size = 1915388, upload-time = "2025-11-04T13:42:52.215Z" }, - { url = "https://files.pythonhosted.org/packages/6e/0d/e3549b2399f71d56476b77dbf3cf8937cec5cd70536bdc0e374a421d0599/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c007fe8a43d43b3969e8469004e9845944f1a80e6acd47c150856bb87f230c56", size = 1942879, upload-time = "2025-11-04T13:42:56.483Z" }, - { url = "https://files.pythonhosted.org/packages/f7/07/34573da085946b6a313d7c42f82f16e8920bfd730665de2d11c0c37a74b5/pydantic_core-2.41.5-graalpy312-graalpy250_312_native-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:76d0819de158cd855d1cbb8fcafdf6f5cf1eb8e470abe056d5d161106e38062b", size = 2139017, upload-time = "2025-11-04T13:42:59.471Z" }, - { url = "https://files.pythonhosted.org/packages/e6/b0/1a2aa41e3b5a4ba11420aba2d091b2d17959c8d1519ece3627c371951e73/pydantic_core-2.41.5-pp310-pypy310_pp73-macosx_10_12_x86_64.whl", hash = "sha256:b5819cd790dbf0c5eb9f82c73c16b39a65dd6dd4d1439dcdea7816ec9adddab8", size = 2103351, upload-time = "2025-11-04T13:43:02.058Z" }, - { url = "https://files.pythonhosted.org/packages/a4/ee/31b1f0020baaf6d091c87900ae05c6aeae101fa4e188e1613c80e4f1ea31/pydantic_core-2.41.5-pp310-pypy310_pp73-macosx_11_0_arm64.whl", hash = "sha256:5a4e67afbc95fa5c34cf27d9089bca7fcab4e51e57278d710320a70b956d1b9a", size = 1925363, upload-time = "2025-11-04T13:43:05.159Z" }, - { url = "https://files.pythonhosted.org/packages/e1/89/ab8e86208467e467a80deaca4e434adac37b10a9d134cd2f99b28a01e483/pydantic_core-2.41.5-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ece5c59f0ce7d001e017643d8d24da587ea1f74f6993467d85ae8a5ef9d4f42b", size = 2135615, upload-time = "2025-11-04T13:43:08.116Z" }, - { url = "https://files.pythonhosted.org/packages/99/0a/99a53d06dd0348b2008f2f30884b34719c323f16c3be4e6cc1203b74a91d/pydantic_core-2.41.5-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:16f80f7abe3351f8ea6858914ddc8c77e02578544a0ebc15b4c2e1a0e813b0b2", size = 2175369, upload-time = "2025-11-04T13:43:12.49Z" }, - { url = "https://files.pythonhosted.org/packages/6d/94/30ca3b73c6d485b9bb0bc66e611cff4a7138ff9736b7e66bcf0852151636/pydantic_core-2.41.5-pp310-pypy310_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:33cb885e759a705b426baada1fe68cbb0a2e68e34c5d0d0289a364cf01709093", size = 2144218, upload-time = "2025-11-04T13:43:15.431Z" }, - { url = "https://files.pythonhosted.org/packages/87/57/31b4f8e12680b739a91f472b5671294236b82586889ef764b5fbc6669238/pydantic_core-2.41.5-pp310-pypy310_pp73-musllinux_1_1_armv7l.whl", hash = "sha256:c8d8b4eb992936023be7dee581270af5c6e0697a8559895f527f5b7105ecd36a", size = 2329951, upload-time = "2025-11-04T13:43:18.062Z" }, - { url = "https://files.pythonhosted.org/packages/7d/73/3c2c8edef77b8f7310e6fb012dbc4b8551386ed575b9eb6fb2506e28a7eb/pydantic_core-2.41.5-pp310-pypy310_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:242a206cd0318f95cd21bdacff3fcc3aab23e79bba5cac3db5a841c9ef9c6963", size = 2318428, upload-time = "2025-11-04T13:43:20.679Z" }, - { url = "https://files.pythonhosted.org/packages/2f/02/8559b1f26ee0d502c74f9cca5c0d2fd97e967e083e006bbbb4e97f3a043a/pydantic_core-2.41.5-pp310-pypy310_pp73-win_amd64.whl", hash = "sha256:d3a978c4f57a597908b7e697229d996d77a6d3c94901e9edee593adada95ce1a", size = 2147009, upload-time = "2025-11-04T13:43:23.286Z" }, - { url = "https://files.pythonhosted.org/packages/5f/9b/1b3f0e9f9305839d7e84912f9e8bfbd191ed1b1ef48083609f0dabde978c/pydantic_core-2.41.5-pp311-pypy311_pp73-macosx_10_12_x86_64.whl", hash = "sha256:b2379fa7ed44ddecb5bfe4e48577d752db9fc10be00a6b7446e9663ba143de26", size = 2101980, upload-time = "2025-11-04T13:43:25.97Z" }, - { url = "https://files.pythonhosted.org/packages/a4/ed/d71fefcb4263df0da6a85b5d8a7508360f2f2e9b3bf5814be9c8bccdccc1/pydantic_core-2.41.5-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:266fb4cbf5e3cbd0b53669a6d1b039c45e3ce651fd5442eff4d07c2cc8d66808", size = 1923865, upload-time = "2025-11-04T13:43:28.763Z" }, - { url = "https://files.pythonhosted.org/packages/ce/3a/626b38db460d675f873e4444b4bb030453bbe7b4ba55df821d026a0493c4/pydantic_core-2.41.5-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:58133647260ea01e4d0500089a8c4f07bd7aa6ce109682b1426394988d8aaacc", size = 2134256, upload-time = "2025-11-04T13:43:31.71Z" }, - { url = "https://files.pythonhosted.org/packages/83/d9/8412d7f06f616bbc053d30cb4e5f76786af3221462ad5eee1f202021eb4e/pydantic_core-2.41.5-pp311-pypy311_pp73-manylinux_2_5_i686.manylinux1_i686.whl", hash = "sha256:287dad91cfb551c363dc62899a80e9e14da1f0e2b6ebde82c806612ca2a13ef1", size = 2174762, upload-time = "2025-11-04T13:43:34.744Z" }, - { url = "https://files.pythonhosted.org/packages/55/4c/162d906b8e3ba3a99354e20faa1b49a85206c47de97a639510a0e673f5da/pydantic_core-2.41.5-pp311-pypy311_pp73-musllinux_1_1_aarch64.whl", hash = "sha256:03b77d184b9eb40240ae9fd676ca364ce1085f203e1b1256f8ab9984dca80a84", size = 2143141, upload-time = "2025-11-04T13:43:37.701Z" }, - { url = "https://files.pythonhosted.org/packages/1f/f2/f11dd73284122713f5f89fc940f370d035fa8e1e078d446b3313955157fe/pydantic_core-2.41.5-pp311-pypy311_pp73-musllinux_1_1_armv7l.whl", hash = "sha256:a668ce24de96165bb239160b3d854943128f4334822900534f2fe947930e5770", size = 2330317, upload-time = "2025-11-04T13:43:40.406Z" }, - { url = "https://files.pythonhosted.org/packages/88/9d/b06ca6acfe4abb296110fb1273a4d848a0bfb2ff65f3ee92127b3244e16b/pydantic_core-2.41.5-pp311-pypy311_pp73-musllinux_1_1_x86_64.whl", hash = "sha256:f14f8f046c14563f8eb3f45f499cc658ab8d10072961e07225e507adb700e93f", size = 2316992, upload-time = "2025-11-04T13:43:43.602Z" }, - { url = "https://files.pythonhosted.org/packages/36/c7/cfc8e811f061c841d7990b0201912c3556bfeb99cdcb7ed24adc8d6f8704/pydantic_core-2.41.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:56121965f7a4dc965bff783d70b907ddf3d57f6eba29b6d2e5dabfaf07799c51", size = 2145302, upload-time = "2025-11-04T13:43:46.64Z" }, ] [[package]] @@ -1894,12 +1421,10 @@ version = "8.4.2" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "colorama", marker = "sys_platform == 'win32'" }, - { name = "exceptiongroup", marker = "python_full_version < '3.11'" }, { name = "iniconfig" }, { name = "packaging" }, { name = "pluggy" }, { name = "pygments" }, - { name = "tomli", marker = "python_full_version < '3.11'" }, ] sdist = { url = "https://files.pythonhosted.org/packages/a3/5c/00a0e072241553e1a7496d638deababa67c5058571567b92a7eaa258397c/pytest-8.4.2.tar.gz", hash = "sha256:86c0d0b93306b961d58d62a4db4879f27fe25513d4b969df351abdddb3c30e01", size = 1519618, upload-time = "2025-09-04T14:34:22.711Z" } wheels = [ @@ -1918,24 +1443,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ec/57/56b9bcc3c9c6a792fcbaf139543cee77261f3651ca9da0c93f5c1221264b/python_dateutil-2.9.0.post0-py2.py3-none-any.whl", hash = "sha256:a8b2bc7bffae282281c8140a97d3aa9c14da0b136dfe83f850eea9a5f7470427", size = 229892, upload-time = "2024-03-01T18:36:18.57Z" }, ] -[[package]] -name = "pytz" -version = "2026.1.post1" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/56/db/b8721d71d945e6a8ac63c0fc900b2067181dbb50805958d4d4661cf7d277/pytz-2026.1.post1.tar.gz", hash = "sha256:3378dde6a0c3d26719182142c56e60c7f9af7e968076f31aae569d72a0358ee1", size = 321088, upload-time = "2026-03-03T07:47:50.683Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/10/99/781fe0c827be2742bcc775efefccb3b048a3a9c6ce9aec0cbf4a101677e5/pytz-2026.1.post1-py2.py3-none-any.whl", hash = "sha256:f2fd16142fda348286a75e1a524be810bb05d444e5a081f37f7affc635035f7a", size = 510489, upload-time = "2026-03-03T07:47:49.167Z" }, -] - [[package]] name = "pyvis" version = "0.3.2" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "ipython", marker = "python_full_version >= '3.11'" }, - { name = "jinja2", marker = "python_full_version >= '3.11'" }, - { name = "jsonpickle", marker = "python_full_version >= '3.11'" }, - { name = "networkx", version = "3.6.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "ipython" }, + { name = "jinja2" }, + { name = "jsonpickle" }, + { name = "networkx" }, ] wheels = [ { url = "https://files.pythonhosted.org/packages/ab/4b/e37e4e5d5ee1179694917b445768bdbfb084f5a59ecd38089d3413d4c70f/pyvis-0.3.2-py3-none-any.whl", hash = "sha256:5720c4ca8161dc5d9ab352015723abb7a8bb8fb443edeb07f7a322db34a97555", size = 756038, upload-time = "2023-02-24T20:29:46.758Z" }, @@ -1947,34 +1463,6 @@ version = "6.0.3" source = { registry = "https://pypi.org/simple" } sdist = { url = "https://files.pythonhosted.org/packages/05/8e/961c0007c59b8dd7729d542c61a4d537767a59645b82a0b521206e1e25c2/pyyaml-6.0.3.tar.gz", hash = "sha256:d76623373421df22fb4cf8817020cbb7ef15c725b9d5e45f17e189bfc384190f", size = 130960, upload-time = "2025-09-25T21:33:16.546Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/f4/a0/39350dd17dd6d6c6507025c0e53aef67a9293a6d37d3511f23ea510d5800/pyyaml-6.0.3-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:214ed4befebe12df36bcc8bc2b64b396ca31be9304b8f59e25c11cf94a4c033b", size = 184227, upload-time = "2025-09-25T21:31:46.04Z" }, - { url = "https://files.pythonhosted.org/packages/05/14/52d505b5c59ce73244f59c7a50ecf47093ce4765f116cdb98286a71eeca2/pyyaml-6.0.3-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:02ea2dfa234451bbb8772601d7b8e426c2bfa197136796224e50e35a78777956", size = 174019, upload-time = "2025-09-25T21:31:47.706Z" }, - { url = "https://files.pythonhosted.org/packages/43/f7/0e6a5ae5599c838c696adb4e6330a59f463265bfa1e116cfd1fbb0abaaae/pyyaml-6.0.3-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b30236e45cf30d2b8e7b3e85881719e98507abed1011bf463a8fa23e9c3e98a8", size = 740646, upload-time = "2025-09-25T21:31:49.21Z" }, - { url = "https://files.pythonhosted.org/packages/2f/3a/61b9db1d28f00f8fd0ae760459a5c4bf1b941baf714e207b6eb0657d2578/pyyaml-6.0.3-cp310-cp310-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:66291b10affd76d76f54fad28e22e51719ef9ba22b29e1d7d03d6777a9174198", size = 840793, upload-time = "2025-09-25T21:31:50.735Z" }, - { url = "https://files.pythonhosted.org/packages/7a/1e/7acc4f0e74c4b3d9531e24739e0ab832a5edf40e64fbae1a9c01941cabd7/pyyaml-6.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:9c7708761fccb9397fe64bbc0395abcae8c4bf7b0eac081e12b809bf47700d0b", size = 770293, upload-time = "2025-09-25T21:31:51.828Z" }, - { url = "https://files.pythonhosted.org/packages/8b/ef/abd085f06853af0cd59fa5f913d61a8eab65d7639ff2a658d18a25d6a89d/pyyaml-6.0.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:418cf3f2111bc80e0933b2cd8cd04f286338bb88bdc7bc8e6dd775ebde60b5e0", size = 732872, upload-time = "2025-09-25T21:31:53.282Z" }, - { url = "https://files.pythonhosted.org/packages/1f/15/2bc9c8faf6450a8b3c9fc5448ed869c599c0a74ba2669772b1f3a0040180/pyyaml-6.0.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:5e0b74767e5f8c593e8c9b5912019159ed0533c70051e9cce3e8b6aa699fcd69", size = 758828, upload-time = "2025-09-25T21:31:54.807Z" }, - { url = "https://files.pythonhosted.org/packages/a3/00/531e92e88c00f4333ce359e50c19b8d1de9fe8d581b1534e35ccfbc5f393/pyyaml-6.0.3-cp310-cp310-win32.whl", hash = "sha256:28c8d926f98f432f88adc23edf2e6d4921ac26fb084b028c733d01868d19007e", size = 142415, upload-time = "2025-09-25T21:31:55.885Z" }, - { url = "https://files.pythonhosted.org/packages/2a/fa/926c003379b19fca39dd4634818b00dec6c62d87faf628d1394e137354d4/pyyaml-6.0.3-cp310-cp310-win_amd64.whl", hash = "sha256:bdb2c67c6c1390b63c6ff89f210c8fd09d9a1217a465701eac7316313c915e4c", size = 158561, upload-time = "2025-09-25T21:31:57.406Z" }, - { url = "https://files.pythonhosted.org/packages/6d/16/a95b6757765b7b031c9374925bb718d55e0a9ba8a1b6a12d25962ea44347/pyyaml-6.0.3-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:44edc647873928551a01e7a563d7452ccdebee747728c1080d881d68af7b997e", size = 185826, upload-time = "2025-09-25T21:31:58.655Z" }, - { url = "https://files.pythonhosted.org/packages/16/19/13de8e4377ed53079ee996e1ab0a9c33ec2faf808a4647b7b4c0d46dd239/pyyaml-6.0.3-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:652cb6edd41e718550aad172851962662ff2681490a8a711af6a4d288dd96824", size = 175577, upload-time = "2025-09-25T21:32:00.088Z" }, - { url = "https://files.pythonhosted.org/packages/0c/62/d2eb46264d4b157dae1275b573017abec435397aa59cbcdab6fc978a8af4/pyyaml-6.0.3-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:10892704fc220243f5305762e276552a0395f7beb4dbf9b14ec8fd43b57f126c", size = 775556, upload-time = "2025-09-25T21:32:01.31Z" }, - { url = "https://files.pythonhosted.org/packages/10/cb/16c3f2cf3266edd25aaa00d6c4350381c8b012ed6f5276675b9eba8d9ff4/pyyaml-6.0.3-cp311-cp311-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:850774a7879607d3a6f50d36d04f00ee69e7fc816450e5f7e58d7f17f1ae5c00", size = 882114, upload-time = "2025-09-25T21:32:03.376Z" }, - { url = "https://files.pythonhosted.org/packages/71/60/917329f640924b18ff085ab889a11c763e0b573da888e8404ff486657602/pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:b8bb0864c5a28024fac8a632c443c87c5aa6f215c0b126c449ae1a150412f31d", size = 806638, upload-time = "2025-09-25T21:32:04.553Z" }, - { url = "https://files.pythonhosted.org/packages/dd/6f/529b0f316a9fd167281a6c3826b5583e6192dba792dd55e3203d3f8e655a/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1d37d57ad971609cf3c53ba6a7e365e40660e3be0e5175fa9f2365a379d6095a", size = 767463, upload-time = "2025-09-25T21:32:06.152Z" }, - { url = "https://files.pythonhosted.org/packages/f2/6a/b627b4e0c1dd03718543519ffb2f1deea4a1e6d42fbab8021936a4d22589/pyyaml-6.0.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:37503bfbfc9d2c40b344d06b2199cf0e96e97957ab1c1b546fd4f87e53e5d3e4", size = 794986, upload-time = "2025-09-25T21:32:07.367Z" }, - { url = "https://files.pythonhosted.org/packages/45/91/47a6e1c42d9ee337c4839208f30d9f09caa9f720ec7582917b264defc875/pyyaml-6.0.3-cp311-cp311-win32.whl", hash = "sha256:8098f252adfa6c80ab48096053f512f2321f0b998f98150cea9bd23d83e1467b", size = 142543, upload-time = "2025-09-25T21:32:08.95Z" }, - { url = "https://files.pythonhosted.org/packages/da/e3/ea007450a105ae919a72393cb06f122f288ef60bba2dc64b26e2646fa315/pyyaml-6.0.3-cp311-cp311-win_amd64.whl", hash = "sha256:9f3bfb4965eb874431221a3ff3fdcddc7e74e3b07799e0e84ca4a0f867d449bf", size = 158763, upload-time = "2025-09-25T21:32:09.96Z" }, - { url = "https://files.pythonhosted.org/packages/d1/33/422b98d2195232ca1826284a76852ad5a86fe23e31b009c9886b2d0fb8b2/pyyaml-6.0.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:7f047e29dcae44602496db43be01ad42fc6f1cc0d8cd6c83d342306c32270196", size = 182063, upload-time = "2025-09-25T21:32:11.445Z" }, - { url = "https://files.pythonhosted.org/packages/89/a0/6cf41a19a1f2f3feab0e9c0b74134aa2ce6849093d5517a0c550fe37a648/pyyaml-6.0.3-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:fc09d0aa354569bc501d4e787133afc08552722d3ab34836a80547331bb5d4a0", size = 173973, upload-time = "2025-09-25T21:32:12.492Z" }, - { url = "https://files.pythonhosted.org/packages/ed/23/7a778b6bd0b9a8039df8b1b1d80e2e2ad78aa04171592c8a5c43a56a6af4/pyyaml-6.0.3-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9149cad251584d5fb4981be1ecde53a1ca46c891a79788c0df828d2f166bda28", size = 775116, upload-time = "2025-09-25T21:32:13.652Z" }, - { url = "https://files.pythonhosted.org/packages/65/30/d7353c338e12baef4ecc1b09e877c1970bd3382789c159b4f89d6a70dc09/pyyaml-6.0.3-cp312-cp312-manylinux2014_s390x.manylinux_2_17_s390x.manylinux_2_28_s390x.whl", hash = "sha256:5fdec68f91a0c6739b380c83b951e2c72ac0197ace422360e6d5a959d8d97b2c", size = 844011, upload-time = "2025-09-25T21:32:15.21Z" }, - { url = "https://files.pythonhosted.org/packages/8b/9d/b3589d3877982d4f2329302ef98a8026e7f4443c765c46cfecc8858c6b4b/pyyaml-6.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ba1cc08a7ccde2d2ec775841541641e4548226580ab850948cbfda66a1befcdc", size = 807870, upload-time = "2025-09-25T21:32:16.431Z" }, - { url = "https://files.pythonhosted.org/packages/05/c0/b3be26a015601b822b97d9149ff8cb5ead58c66f981e04fedf4e762f4bd4/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8dc52c23056b9ddd46818a57b78404882310fb473d63f17b07d5c40421e47f8e", size = 761089, upload-time = "2025-09-25T21:32:17.56Z" }, - { url = "https://files.pythonhosted.org/packages/be/8e/98435a21d1d4b46590d5459a22d88128103f8da4c2d4cb8f14f2a96504e1/pyyaml-6.0.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:41715c910c881bc081f1e8872880d3c650acf13dfa8214bad49ed4cede7c34ea", size = 790181, upload-time = "2025-09-25T21:32:18.834Z" }, - { url = "https://files.pythonhosted.org/packages/74/93/7baea19427dcfbe1e5a372d81473250b379f04b1bd3c4c5ff825e2327202/pyyaml-6.0.3-cp312-cp312-win32.whl", hash = "sha256:96b533f0e99f6579b3d4d4995707cf36df9100d67e0c8303a0c55b27b5f99bc5", size = 137658, upload-time = "2025-09-25T21:32:20.209Z" }, - { url = "https://files.pythonhosted.org/packages/86/bf/899e81e4cce32febab4fb42bb97dcdf66bc135272882d1987881a4b519e9/pyyaml-6.0.3-cp312-cp312-win_amd64.whl", hash = "sha256:5fcd34e47f6e0b794d17de1b4ff496c00986e1c83f7ab2fb8fcfe9616ff7477b", size = 154003, upload-time = "2025-09-25T21:32:21.167Z" }, - { url = "https://files.pythonhosted.org/packages/1a/08/67bd04656199bbb51dbed1439b7f27601dfb576fb864099c7ef0c3e55531/pyyaml-6.0.3-cp312-cp312-win_arm64.whl", hash = "sha256:64386e5e707d03a7e172c0701abfb7e10f0fb753ee1d773128192742712a98fd", size = 140344, upload-time = "2025-09-25T21:32:22.617Z" }, { url = "https://files.pythonhosted.org/packages/d1/11/0fd08f8192109f7169db964b5707a2f1e8b745d4e239b784a5a1dd80d1db/pyyaml-6.0.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:8da9669d359f02c0b91ccc01cac4a67f16afec0dac22c2ad09f46bee0697eba8", size = 181669, upload-time = "2025-09-25T21:32:23.673Z" }, { url = "https://files.pythonhosted.org/packages/b1/16/95309993f1d3748cd644e02e38b75d50cbc0d9561d21f390a76242ce073f/pyyaml-6.0.3-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:2283a07e2c21a2aa78d9c4442724ec1eb15f5e42a723b99cb3d822d48f5f7ad1", size = 173252, upload-time = "2025-09-25T21:32:25.149Z" }, { url = "https://files.pythonhosted.org/packages/50/31/b20f376d3f810b9b2371e72ef5adb33879b25edb7a6d072cb7ca0c486398/pyyaml-6.0.3-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:ee2922902c45ae8ccada2c5b501ab86c36525b883eff4255313a253a3160861c", size = 767081, upload-time = "2025-09-25T21:32:26.575Z" }, @@ -2010,30 +1498,12 @@ name = "quantile-forest" version = "1.4.1" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "scikit-learn", version = "1.7.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "scikit-learn", version = "1.8.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "scipy", version = "1.17.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "numpy" }, + { name = "scikit-learn" }, + { name = "scipy" }, ] sdist = { url = "https://files.pythonhosted.org/packages/62/6e/3f1493d4abcce71fdc82ed575475d3e02da7b03375129e84be2622e1532f/quantile_forest-1.4.1.tar.gz", hash = "sha256:713a23c69562b7551ba4a05c22ce9d0e90db6a73d043e760b29c331cb19dc552", size = 486249, upload-time = "2025-09-10T12:48:04.578Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/6f/66/a82136c0bc2897334beac165d57c8a6e9457cca71655a68cfe007dace7c5/quantile_forest-1.4.1-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:ed3163bfe07404c1ed5732007f0d7262f9c8240e7b3c83f93f7dea3ef2d620b5", size = 949349, upload-time = "2025-09-10T12:47:31.398Z" }, - { url = "https://files.pythonhosted.org/packages/76/9a/61c91fc8a31a2e4187cbe0c193fbc6ff8e3b4667cdff4fd207534cc10f67/quantile_forest-1.4.1-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:f46955b6255a4b5502c2df7ff6a343e673f2650ef6ac536f95dfa92f9d97f78c", size = 715205, upload-time = "2025-09-10T12:47:33.32Z" }, - { url = "https://files.pythonhosted.org/packages/2c/ee/64ed254db04f7c746815c815fddae6e5d8005ef08aa8000e435605dbdec6/quantile_forest-1.4.1-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:14cc91ced4ecbb4f74e5dd26659db85c2d5aa28d94193efc2ded564830126705", size = 707183, upload-time = "2025-09-10T12:47:34.485Z" }, - { url = "https://files.pythonhosted.org/packages/60/af/3ca4d3cb1da0eb65cdd71f945cd8e8bd6c7b4aec8e88f0ba6dfbfd40fac6/quantile_forest-1.4.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:77caf1edde485a80690336f838bf8f6ddf79f4d7ba2e4881cd8d92b489a0a65c", size = 2360674, upload-time = "2025-09-10T12:47:35.923Z" }, - { url = "https://files.pythonhosted.org/packages/b9/38/6b5b59a271885728ebdc4b7a7448c10c52b02477c731b49476d4abc00a4b/quantile_forest-1.4.1-cp310-cp310-win_amd64.whl", hash = "sha256:7b50b6afdc99208cb329f160e755e0449b23fea84ac55ea8602293711fa13dee", size = 685559, upload-time = "2025-09-10T12:47:37.534Z" }, - { url = "https://files.pythonhosted.org/packages/75/cc/dc1d8d7a3bf1bf8eaff4d810f56970237458482f0a8e892a4d20a27d2386/quantile_forest-1.4.1-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:f4d1866c694defc077ee01190d1c69c9ef4092b31c0f86e5ae7ae3098ef7b9be", size = 954993, upload-time = "2025-09-10T12:47:38.784Z" }, - { url = "https://files.pythonhosted.org/packages/d4/eb/b9931f40427665a8bbfbbc00dfe26ecb0d8f9df08be8df6c5f20e4ae43c3/quantile_forest-1.4.1-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:da3e40acf24b60aeb1bf24f7648aeb40f984d6b9a722513e8f9bb13d7a75e1f9", size = 717871, upload-time = "2025-09-10T12:47:39.957Z" }, - { url = "https://files.pythonhosted.org/packages/ef/9a/47e0d2f81115ea4112f41239a669b7440bf71ad50dce92dad86be14aad86/quantile_forest-1.4.1-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:591e12ae0356206668e2ae8f2808749600da7c587ce7819b39b97d0a7c4053d2", size = 709737, upload-time = "2025-09-10T12:47:41.351Z" }, - { url = "https://files.pythonhosted.org/packages/02/2b/dfca97f4b6a8c63cdc839f119719a0f68455c3b1a013711a72f63b3dd90d/quantile_forest-1.4.1-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:443341c9047160f36464d72871da7babae04cb8092b9fd19eca86682277ee810", size = 2436079, upload-time = "2025-09-10T12:47:42.936Z" }, - { url = "https://files.pythonhosted.org/packages/a8/f0/9e375572814f44bb93caf942c0de36c483e22a0488241042536c0dc39fb6/quantile_forest-1.4.1-cp311-cp311-win_amd64.whl", hash = "sha256:69d39db8c434fa2aaa48716eb05774491b22d1087f2f24bfcd853b52869d01bc", size = 685513, upload-time = "2025-09-10T12:47:44.045Z" }, - { url = "https://files.pythonhosted.org/packages/93/53/63c400659404b45221405f7dbdb42fb0cea4b9cae0877a567d56d760a995/quantile_forest-1.4.1-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:f7d4eae276928f07c13e4784842768569e92c50e93f66c1feadf85c4967b3be4", size = 959038, upload-time = "2025-09-10T12:47:45.193Z" }, - { url = "https://files.pythonhosted.org/packages/e3/d7/694d428f94b5aec95bd9bb3805b119c1845bb63e215deeeab64e60812037/quantile_forest-1.4.1-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:c0526c117be0df98e79e1ce378968f1e1faa9ca23e08da449baa0651a52a81d1", size = 720471, upload-time = "2025-09-10T12:47:46.873Z" }, - { url = "https://files.pythonhosted.org/packages/8d/fb/747bf715bfba7570f88c7c601ef3f3350eceb4ce4bf72a1d36fb9845fdd2/quantile_forest-1.4.1-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:b67fc17c82ea85f575617f7a093f3ad8ef0dc5a159f886a9948224b98483ad8c", size = 710769, upload-time = "2025-09-10T12:47:47.88Z" }, - { url = "https://files.pythonhosted.org/packages/99/05/86bbce5503c007cfeeb74068edf608c4216e570ad13c9500513f5473740c/quantile_forest-1.4.1-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:d402c4af3f72d21c3ca3e9dda25a68207d29ae4d34b8126bcf19fc3680ce23e0", size = 2406284, upload-time = "2025-09-10T12:47:49.42Z" }, - { url = "https://files.pythonhosted.org/packages/8b/93/1ae45144ab80bdd8cf8e7bf983137440b1c3430516a7db340caee9b6d77d/quantile_forest-1.4.1-cp312-cp312-win_amd64.whl", hash = "sha256:b1513b039f7ea5b9467201807b41594d25ecaf088868221e2f1ddea4edeb13b8", size = 685743, upload-time = "2025-09-10T12:47:50.525Z" }, { url = "https://files.pythonhosted.org/packages/33/61/f8ff4e348dc2d265ea97287f921b92bca265229c48be64b94756ecff4078/quantile_forest-1.4.1-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:37c2da2ab54aceacdf5292065147f40a073b13cc3844262f0f3cbd5b8a8d928e", size = 955098, upload-time = "2025-09-10T12:47:52.137Z" }, { url = "https://files.pythonhosted.org/packages/4f/95/75f3eea1c7cc3786c1ffdf4685e79c4979a4ae6ccedfed80362c9162f0d4/quantile_forest-1.4.1-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:3f0436ac7622442c2995cf121e0960332e769791f3f3c7ea62363e8480803bb3", size = 718470, upload-time = "2025-09-10T12:47:53.566Z" }, { url = "https://files.pythonhosted.org/packages/fe/f1/0f26386bf164ede156099d18e3e4493dd21dc48e329e1be68232e5cf8b52/quantile_forest-1.4.1-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a594bd3552507beffa6ca6002143601be5defd5cc7329154f41317110f895f7a", size = 709245, upload-time = "2025-09-10T12:47:54.54Z" }, @@ -2046,10 +1516,10 @@ name = "requests" version = "2.33.0" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "certifi", marker = "python_full_version >= '3.11'" }, - { name = "charset-normalizer", marker = "python_full_version >= '3.11'" }, - { name = "idna", marker = "python_full_version >= '3.11'" }, - { name = "urllib3", marker = "python_full_version >= '3.11'" }, + { name = "certifi" }, + { name = "charset-normalizer" }, + { name = "idna" }, + { name = "urllib3" }, ] sdist = { url = "https://files.pythonhosted.org/packages/34/64/8860370b167a9721e8956ae116825caff829224fbca0ca6e7bf8ddef8430/requests-2.33.0.tar.gz", hash = "sha256:c7ebc5e8b0f21837386ad0e1c8fe8b829fa5f544d8df3b2253bff14ef29d7652", size = 134232, upload-time = "2026-03-25T15:10:41.586Z" } wheels = [ @@ -2094,88 +1564,18 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/8f/e8/726643a3ea68c727da31570bde48c7a10f1aa60eddd628d94078fec586ff/ruff-0.15.7-py3-none-win_arm64.whl", hash = "sha256:18e8d73f1c3fdf27931497972250340f92e8c861722161a9caeb89a58ead6ed2", size = 11023304, upload-time = "2026-03-19T16:26:51.669Z" }, ] -[[package]] -name = "scikit-learn" -version = "1.7.2" -source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version < '3.11'", -] -dependencies = [ - { name = "joblib", marker = "python_full_version < '3.11'" }, - { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "scipy", version = "1.15.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "threadpoolctl", marker = "python_full_version < '3.11'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/98/c2/a7855e41c9d285dfe86dc50b250978105dce513d6e459ea66a6aeb0e1e0c/scikit_learn-1.7.2.tar.gz", hash = "sha256:20e9e49ecd130598f1ca38a1d85090e1a600147b9c02fa6f15d69cb53d968fda", size = 7193136, upload-time = "2025-09-09T08:21:29.075Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/ba/3e/daed796fd69cce768b8788401cc464ea90b306fb196ae1ffed0b98182859/scikit_learn-1.7.2-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:6b33579c10a3081d076ab403df4a4190da4f4432d443521674637677dc91e61f", size = 9336221, upload-time = "2025-09-09T08:20:19.328Z" }, - { url = "https://files.pythonhosted.org/packages/1c/ce/af9d99533b24c55ff4e18d9b7b4d9919bbc6cd8f22fe7a7be01519a347d5/scikit_learn-1.7.2-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:36749fb62b3d961b1ce4fedf08fa57a1986cd409eff2d783bca5d4b9b5fce51c", size = 8653834, upload-time = "2025-09-09T08:20:22.073Z" }, - { url = "https://files.pythonhosted.org/packages/58/0e/8c2a03d518fb6bd0b6b0d4b114c63d5f1db01ff0f9925d8eb10960d01c01/scikit_learn-1.7.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:7a58814265dfc52b3295b1900cfb5701589d30a8bb026c7540f1e9d3499d5ec8", size = 9660938, upload-time = "2025-09-09T08:20:24.327Z" }, - { url = "https://files.pythonhosted.org/packages/2b/75/4311605069b5d220e7cf5adabb38535bd96f0079313cdbb04b291479b22a/scikit_learn-1.7.2-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4a847fea807e278f821a0406ca01e387f97653e284ecbd9750e3ee7c90347f18", size = 9477818, upload-time = "2025-09-09T08:20:26.845Z" }, - { url = "https://files.pythonhosted.org/packages/7f/9b/87961813c34adbca21a6b3f6b2bea344c43b30217a6d24cc437c6147f3e8/scikit_learn-1.7.2-cp310-cp310-win_amd64.whl", hash = "sha256:ca250e6836d10e6f402436d6463d6c0e4d8e0234cfb6a9a47835bd392b852ce5", size = 8886969, upload-time = "2025-09-09T08:20:29.329Z" }, - { url = "https://files.pythonhosted.org/packages/43/83/564e141eef908a5863a54da8ca342a137f45a0bfb71d1d79704c9894c9d1/scikit_learn-1.7.2-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:c7509693451651cd7361d30ce4e86a1347493554f172b1c72a39300fa2aea79e", size = 9331967, upload-time = "2025-09-09T08:20:32.421Z" }, - { url = "https://files.pythonhosted.org/packages/18/d6/ba863a4171ac9d7314c4d3fc251f015704a2caeee41ced89f321c049ed83/scikit_learn-1.7.2-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:0486c8f827c2e7b64837c731c8feff72c0bd2b998067a8a9cbc10643c31f0fe1", size = 8648645, upload-time = "2025-09-09T08:20:34.436Z" }, - { url = "https://files.pythonhosted.org/packages/ef/0e/97dbca66347b8cf0ea8b529e6bb9367e337ba2e8be0ef5c1a545232abfde/scikit_learn-1.7.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:89877e19a80c7b11a2891a27c21c4894fb18e2c2e077815bcade10d34287b20d", size = 9715424, upload-time = "2025-09-09T08:20:36.776Z" }, - { url = "https://files.pythonhosted.org/packages/f7/32/1f3b22e3207e1d2c883a7e09abb956362e7d1bd2f14458c7de258a26ac15/scikit_learn-1.7.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8da8bf89d4d79aaec192d2bda62f9b56ae4e5b4ef93b6a56b5de4977e375c1f1", size = 9509234, upload-time = "2025-09-09T08:20:38.957Z" }, - { url = "https://files.pythonhosted.org/packages/9f/71/34ddbd21f1da67c7a768146968b4d0220ee6831e4bcbad3e03dd3eae88b6/scikit_learn-1.7.2-cp311-cp311-win_amd64.whl", hash = "sha256:9b7ed8d58725030568523e937c43e56bc01cadb478fc43c042a9aca1dacb3ba1", size = 8894244, upload-time = "2025-09-09T08:20:41.166Z" }, - { url = "https://files.pythonhosted.org/packages/a7/aa/3996e2196075689afb9fce0410ebdb4a09099d7964d061d7213700204409/scikit_learn-1.7.2-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8d91a97fa2b706943822398ab943cde71858a50245e31bc71dba62aab1d60a96", size = 9259818, upload-time = "2025-09-09T08:20:43.19Z" }, - { url = "https://files.pythonhosted.org/packages/43/5d/779320063e88af9c4a7c2cf463ff11c21ac9c8bd730c4a294b0000b666c9/scikit_learn-1.7.2-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:acbc0f5fd2edd3432a22c69bed78e837c70cf896cd7993d71d51ba6708507476", size = 8636997, upload-time = "2025-09-09T08:20:45.468Z" }, - { url = "https://files.pythonhosted.org/packages/5c/d0/0c577d9325b05594fdd33aa970bf53fb673f051a45496842caee13cfd7fe/scikit_learn-1.7.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:e5bf3d930aee75a65478df91ac1225ff89cd28e9ac7bd1196853a9229b6adb0b", size = 9478381, upload-time = "2025-09-09T08:20:47.982Z" }, - { url = "https://files.pythonhosted.org/packages/82/70/8bf44b933837ba8494ca0fc9a9ab60f1c13b062ad0197f60a56e2fc4c43e/scikit_learn-1.7.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b4d6e9deed1a47aca9fe2f267ab8e8fe82ee20b4526b2c0cd9e135cea10feb44", size = 9300296, upload-time = "2025-09-09T08:20:50.366Z" }, - { url = "https://files.pythonhosted.org/packages/c6/99/ed35197a158f1fdc2fe7c3680e9c70d0128f662e1fee4ed495f4b5e13db0/scikit_learn-1.7.2-cp312-cp312-win_amd64.whl", hash = "sha256:6088aa475f0785e01bcf8529f55280a3d7d298679f50c0bb70a2364a82d0b290", size = 8731256, upload-time = "2025-09-09T08:20:52.627Z" }, - { url = "https://files.pythonhosted.org/packages/ae/93/a3038cb0293037fd335f77f31fe053b89c72f17b1c8908c576c29d953e84/scikit_learn-1.7.2-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0b7dacaa05e5d76759fb071558a8b5130f4845166d88654a0f9bdf3eb57851b7", size = 9212382, upload-time = "2025-09-09T08:20:54.731Z" }, - { url = "https://files.pythonhosted.org/packages/40/dd/9a88879b0c1104259136146e4742026b52df8540c39fec21a6383f8292c7/scikit_learn-1.7.2-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:abebbd61ad9e1deed54cca45caea8ad5f79e1b93173dece40bb8e0c658dbe6fe", size = 8592042, upload-time = "2025-09-09T08:20:57.313Z" }, - { url = "https://files.pythonhosted.org/packages/46/af/c5e286471b7d10871b811b72ae794ac5fe2989c0a2df07f0ec723030f5f5/scikit_learn-1.7.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:502c18e39849c0ea1a5d681af1dbcf15f6cce601aebb657aabbfe84133c1907f", size = 9434180, upload-time = "2025-09-09T08:20:59.671Z" }, - { url = "https://files.pythonhosted.org/packages/f1/fd/df59faa53312d585023b2da27e866524ffb8faf87a68516c23896c718320/scikit_learn-1.7.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7a4c328a71785382fe3fe676a9ecf2c86189249beff90bf85e22bdb7efaf9ae0", size = 9283660, upload-time = "2025-09-09T08:21:01.71Z" }, - { url = "https://files.pythonhosted.org/packages/a7/c7/03000262759d7b6f38c836ff9d512f438a70d8a8ddae68ee80de72dcfb63/scikit_learn-1.7.2-cp313-cp313-win_amd64.whl", hash = "sha256:63a9afd6f7b229aad94618c01c252ce9e6fa97918c5ca19c9a17a087d819440c", size = 8702057, upload-time = "2025-09-09T08:21:04.234Z" }, - { url = "https://files.pythonhosted.org/packages/55/87/ef5eb1f267084532c8e4aef98a28b6ffe7425acbfd64b5e2f2e066bc29b3/scikit_learn-1.7.2-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:9acb6c5e867447b4e1390930e3944a005e2cb115922e693c08a323421a6966e8", size = 9558731, upload-time = "2025-09-09T08:21:06.381Z" }, - { url = "https://files.pythonhosted.org/packages/93/f8/6c1e3fc14b10118068d7938878a9f3f4e6d7b74a8ddb1e5bed65159ccda8/scikit_learn-1.7.2-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:2a41e2a0ef45063e654152ec9d8bcfc39f7afce35b08902bfe290c2498a67a6a", size = 9038852, upload-time = "2025-09-09T08:21:08.628Z" }, - { url = "https://files.pythonhosted.org/packages/83/87/066cafc896ee540c34becf95d30375fe5cbe93c3b75a0ee9aa852cd60021/scikit_learn-1.7.2-cp313-cp313t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:98335fb98509b73385b3ab2bd0639b1f610541d3988ee675c670371d6a87aa7c", size = 9527094, upload-time = "2025-09-09T08:21:11.486Z" }, - { url = "https://files.pythonhosted.org/packages/9c/2b/4903e1ccafa1f6453b1ab78413938c8800633988c838aa0be386cbb33072/scikit_learn-1.7.2-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:191e5550980d45449126e23ed1d5e9e24b2c68329ee1f691a3987476e115e09c", size = 9367436, upload-time = "2025-09-09T08:21:13.602Z" }, - { url = "https://files.pythonhosted.org/packages/b5/aa/8444be3cfb10451617ff9d177b3c190288f4563e6c50ff02728be67ad094/scikit_learn-1.7.2-cp313-cp313t-win_amd64.whl", hash = "sha256:57dc4deb1d3762c75d685507fbd0bc17160144b2f2ba4ccea5dc285ab0d0e973", size = 9275749, upload-time = "2025-09-09T08:21:15.96Z" }, - { url = "https://files.pythonhosted.org/packages/d9/82/dee5acf66837852e8e68df6d8d3a6cb22d3df997b733b032f513d95205b7/scikit_learn-1.7.2-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:fa8f63940e29c82d1e67a45d5297bdebbcb585f5a5a50c4914cc2e852ab77f33", size = 9208906, upload-time = "2025-09-09T08:21:18.557Z" }, - { url = "https://files.pythonhosted.org/packages/3c/30/9029e54e17b87cb7d50d51a5926429c683d5b4c1732f0507a6c3bed9bf65/scikit_learn-1.7.2-cp314-cp314-macosx_12_0_arm64.whl", hash = "sha256:f95dc55b7902b91331fa4e5845dd5bde0580c9cd9612b1b2791b7e80c3d32615", size = 8627836, upload-time = "2025-09-09T08:21:20.695Z" }, - { url = "https://files.pythonhosted.org/packages/60/18/4a52c635c71b536879f4b971c2cedf32c35ee78f48367885ed8025d1f7ee/scikit_learn-1.7.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl", hash = "sha256:9656e4a53e54578ad10a434dc1f993330568cfee176dff07112b8785fb413106", size = 9426236, upload-time = "2025-09-09T08:21:22.645Z" }, - { url = "https://files.pythonhosted.org/packages/99/7e/290362f6ab582128c53445458a5befd471ed1ea37953d5bcf80604619250/scikit_learn-1.7.2-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:96dc05a854add0e50d3f47a1ef21a10a595016da5b007c7d9cd9d0bffd1fcc61", size = 9312593, upload-time = "2025-09-09T08:21:24.65Z" }, - { url = "https://files.pythonhosted.org/packages/8e/87/24f541b6d62b1794939ae6422f8023703bbf6900378b2b34e0b4384dfefd/scikit_learn-1.7.2-cp314-cp314-win_amd64.whl", hash = "sha256:bb24510ed3f9f61476181e4db51ce801e2ba37541def12dc9333b946fc7a9cf8", size = 8820007, upload-time = "2025-09-09T08:21:26.713Z" }, -] - [[package]] name = "scikit-learn" version = "1.8.0" source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version >= '3.14' and sys_platform == 'win32'", - "python_full_version >= '3.14' and sys_platform == 'emscripten'", - "python_full_version >= '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'win32'", - "python_full_version == '3.11.*' and sys_platform == 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'emscripten'", - "python_full_version == '3.11.*' and sys_platform == 'emscripten'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version == '3.11.*' and sys_platform != 'emscripten' and sys_platform != 'win32'", -] dependencies = [ - { name = "joblib", marker = "python_full_version >= '3.11'" }, - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "scipy", version = "1.17.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, - { name = "threadpoolctl", marker = "python_full_version >= '3.11'" }, + { name = "joblib" }, + { name = "numpy" }, + { name = "scipy" }, + { name = "threadpoolctl" }, ] sdist = { url = "https://files.pythonhosted.org/packages/0e/d4/40988bf3b8e34feec1d0e6a051446b1f66225f8529b9309becaeef62b6c4/scikit_learn-1.8.0.tar.gz", hash = "sha256:9bccbb3b40e3de10351f8f5068e105d0f4083b1a65fa07b6634fbc401a6287fd", size = 7335585, upload-time = "2025-12-10T07:08:53.618Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/c9/92/53ea2181da8ac6bf27170191028aee7251f8f841f8d3edbfdcaf2008fde9/scikit_learn-1.8.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:146b4d36f800c013d267b29168813f7a03a43ecd2895d04861f1240b564421da", size = 8595835, upload-time = "2025-12-10T07:07:39.385Z" }, - { url = "https://files.pythonhosted.org/packages/01/18/d154dc1638803adf987910cdd07097d9c526663a55666a97c124d09fb96a/scikit_learn-1.8.0-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:f984ca4b14914e6b4094c5d52a32ea16b49832c03bd17a110f004db3c223e8e1", size = 8080381, upload-time = "2025-12-10T07:07:41.93Z" }, - { url = "https://files.pythonhosted.org/packages/8a/44/226142fcb7b7101e64fdee5f49dbe6288d4c7af8abf593237b70fca080a4/scikit_learn-1.8.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:5e30adb87f0cc81c7690a84f7932dd66be5bac57cfe16b91cb9151683a4a2d3b", size = 8799632, upload-time = "2025-12-10T07:07:43.899Z" }, - { url = "https://files.pythonhosted.org/packages/36/4d/4a67f30778a45d542bbea5db2dbfa1e9e100bf9ba64aefe34215ba9f11f6/scikit_learn-1.8.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:ada8121bcb4dac28d930febc791a69f7cb1673c8495e5eee274190b73a4559c1", size = 9103788, upload-time = "2025-12-10T07:07:45.982Z" }, - { url = "https://files.pythonhosted.org/packages/89/3c/45c352094cfa60050bcbb967b1faf246b22e93cb459f2f907b600f2ceda5/scikit_learn-1.8.0-cp311-cp311-win_amd64.whl", hash = "sha256:c57b1b610bd1f40ba43970e11ce62821c2e6569e4d74023db19c6b26f246cb3b", size = 8081706, upload-time = "2025-12-10T07:07:48.111Z" }, - { url = "https://files.pythonhosted.org/packages/3d/46/5416595bb395757f754feb20c3d776553a386b661658fb21b7c814e89efe/scikit_learn-1.8.0-cp311-cp311-win_arm64.whl", hash = "sha256:2838551e011a64e3053ad7618dda9310175f7515f1742fa2d756f7c874c05961", size = 7688451, upload-time = "2025-12-10T07:07:49.873Z" }, - { url = "https://files.pythonhosted.org/packages/90/74/e6a7cc4b820e95cc38cf36cd74d5aa2b42e8ffc2d21fe5a9a9c45c1c7630/scikit_learn-1.8.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:5fb63362b5a7ddab88e52b6dbb47dac3fd7dafeee740dc6c8d8a446ddedade8e", size = 8548242, upload-time = "2025-12-10T07:07:51.568Z" }, - { url = "https://files.pythonhosted.org/packages/49/d8/9be608c6024d021041c7f0b3928d4749a706f4e2c3832bbede4fb4f58c95/scikit_learn-1.8.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:5025ce924beccb28298246e589c691fe1b8c1c96507e6d27d12c5fadd85bfd76", size = 8079075, upload-time = "2025-12-10T07:07:53.697Z" }, - { url = "https://files.pythonhosted.org/packages/dd/47/f187b4636ff80cc63f21cd40b7b2d177134acaa10f6bb73746130ee8c2e5/scikit_learn-1.8.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:4496bb2cf7a43ce1a2d7524a79e40bc5da45cf598dbf9545b7e8316ccba47bb4", size = 8660492, upload-time = "2025-12-10T07:07:55.574Z" }, - { url = "https://files.pythonhosted.org/packages/97/74/b7a304feb2b49df9fafa9382d4d09061a96ee9a9449a7cbea7988dda0828/scikit_learn-1.8.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a0bcfe4d0d14aec44921545fd2af2338c7471de9cb701f1da4c9d85906ab847a", size = 8931904, upload-time = "2025-12-10T07:07:57.666Z" }, - { url = "https://files.pythonhosted.org/packages/9f/c4/0ab22726a04ede56f689476b760f98f8f46607caecff993017ac1b64aa5d/scikit_learn-1.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:35c007dedb2ffe38fe3ee7d201ebac4a2deccd2408e8621d53067733e3c74809", size = 8019359, upload-time = "2025-12-10T07:07:59.838Z" }, - { url = "https://files.pythonhosted.org/packages/24/90/344a67811cfd561d7335c1b96ca21455e7e472d281c3c279c4d3f2300236/scikit_learn-1.8.0-cp312-cp312-win_arm64.whl", hash = "sha256:8c497fff237d7b4e07e9ef1a640887fa4fb765647f86fbe00f969ff6280ce2bb", size = 7641898, upload-time = "2025-12-10T07:08:01.36Z" }, { url = "https://files.pythonhosted.org/packages/03/aa/e22e0768512ce9255eba34775be2e85c2048da73da1193e841707f8f039c/scikit_learn-1.8.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:0d6ae97234d5d7079dc0040990a6f7aeb97cb7fa7e8945f1999a429b23569e0a", size = 8513770, upload-time = "2025-12-10T07:08:03.251Z" }, { url = "https://files.pythonhosted.org/packages/58/37/31b83b2594105f61a381fc74ca19e8780ee923be2d496fcd8d2e1147bd99/scikit_learn-1.8.0-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:edec98c5e7c128328124a029bceb09eda2d526997780fef8d65e9a69eead963e", size = 8044458, upload-time = "2025-12-10T07:08:05.336Z" }, { url = "https://files.pythonhosted.org/packages/2d/5a/3f1caed8765f33eabb723596666da4ebbf43d11e96550fb18bdec42b467b/scikit_learn-1.8.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:74b66d8689d52ed04c271e1329f0c61635bcaf5b926db9b12d58914cdc01fe57", size = 8610341, upload-time = "2025-12-10T07:08:07.732Z" }, @@ -2202,105 +1602,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/60/22/d7b2ebe4704a5e50790ba089d5c2ae308ab6bb852719e6c3bd4f04c3a363/scikit_learn-1.8.0-cp314-cp314t-win_arm64.whl", hash = "sha256:f28dd15c6bb0b66ba09728cf09fd8736c304be29409bd8445a080c1280619e8c", size = 8002647, upload-time = "2025-12-10T07:08:51.601Z" }, ] -[[package]] -name = "scipy" -version = "1.15.3" -source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version < '3.11'", -] -dependencies = [ - { name = "numpy", version = "2.2.6", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, -] -sdist = { url = "https://files.pythonhosted.org/packages/0f/37/6964b830433e654ec7485e45a00fc9a27cf868d622838f6b6d9c5ec0d532/scipy-1.15.3.tar.gz", hash = "sha256:eae3cf522bc7df64b42cad3925c876e1b0b6c35c1337c93e12c0f366f55b0eaf", size = 59419214, upload-time = "2025-05-08T16:13:05.955Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/78/2f/4966032c5f8cc7e6a60f1b2e0ad686293b9474b65246b0c642e3ef3badd0/scipy-1.15.3-cp310-cp310-macosx_10_13_x86_64.whl", hash = "sha256:a345928c86d535060c9c2b25e71e87c39ab2f22fc96e9636bd74d1dbf9de448c", size = 38702770, upload-time = "2025-05-08T16:04:20.849Z" }, - { url = "https://files.pythonhosted.org/packages/a0/6e/0c3bf90fae0e910c274db43304ebe25a6b391327f3f10b5dcc638c090795/scipy-1.15.3-cp310-cp310-macosx_12_0_arm64.whl", hash = "sha256:ad3432cb0f9ed87477a8d97f03b763fd1d57709f1bbde3c9369b1dff5503b253", size = 30094511, upload-time = "2025-05-08T16:04:27.103Z" }, - { url = "https://files.pythonhosted.org/packages/ea/b1/4deb37252311c1acff7f101f6453f0440794f51b6eacb1aad4459a134081/scipy-1.15.3-cp310-cp310-macosx_14_0_arm64.whl", hash = "sha256:aef683a9ae6eb00728a542b796f52a5477b78252edede72b8327a886ab63293f", size = 22368151, upload-time = "2025-05-08T16:04:31.731Z" }, - { url = "https://files.pythonhosted.org/packages/38/7d/f457626e3cd3c29b3a49ca115a304cebb8cc6f31b04678f03b216899d3c6/scipy-1.15.3-cp310-cp310-macosx_14_0_x86_64.whl", hash = "sha256:1c832e1bd78dea67d5c16f786681b28dd695a8cb1fb90af2e27580d3d0967e92", size = 25121732, upload-time = "2025-05-08T16:04:36.596Z" }, - { url = "https://files.pythonhosted.org/packages/db/0a/92b1de4a7adc7a15dcf5bddc6e191f6f29ee663b30511ce20467ef9b82e4/scipy-1.15.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:263961f658ce2165bbd7b99fa5135195c3a12d9bef045345016b8b50c315cb82", size = 35547617, upload-time = "2025-05-08T16:04:43.546Z" }, - { url = "https://files.pythonhosted.org/packages/8e/6d/41991e503e51fc1134502694c5fa7a1671501a17ffa12716a4a9151af3df/scipy-1.15.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9e2abc762b0811e09a0d3258abee2d98e0c703eee49464ce0069590846f31d40", size = 37662964, upload-time = "2025-05-08T16:04:49.431Z" }, - { url = "https://files.pythonhosted.org/packages/25/e1/3df8f83cb15f3500478c889be8fb18700813b95e9e087328230b98d547ff/scipy-1.15.3-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:ed7284b21a7a0c8f1b6e5977ac05396c0d008b89e05498c8b7e8f4a1423bba0e", size = 37238749, upload-time = "2025-05-08T16:04:55.215Z" }, - { url = "https://files.pythonhosted.org/packages/93/3e/b3257cf446f2a3533ed7809757039016b74cd6f38271de91682aa844cfc5/scipy-1.15.3-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:5380741e53df2c566f4d234b100a484b420af85deb39ea35a1cc1be84ff53a5c", size = 40022383, upload-time = "2025-05-08T16:05:01.914Z" }, - { url = "https://files.pythonhosted.org/packages/d1/84/55bc4881973d3f79b479a5a2e2df61c8c9a04fcb986a213ac9c02cfb659b/scipy-1.15.3-cp310-cp310-win_amd64.whl", hash = "sha256:9d61e97b186a57350f6d6fd72640f9e99d5a4a2b8fbf4b9ee9a841eab327dc13", size = 41259201, upload-time = "2025-05-08T16:05:08.166Z" }, - { url = "https://files.pythonhosted.org/packages/96/ab/5cc9f80f28f6a7dff646c5756e559823614a42b1939d86dd0ed550470210/scipy-1.15.3-cp311-cp311-macosx_10_13_x86_64.whl", hash = "sha256:993439ce220d25e3696d1b23b233dd010169b62f6456488567e830654ee37a6b", size = 38714255, upload-time = "2025-05-08T16:05:14.596Z" }, - { url = "https://files.pythonhosted.org/packages/4a/4a/66ba30abe5ad1a3ad15bfb0b59d22174012e8056ff448cb1644deccbfed2/scipy-1.15.3-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:34716e281f181a02341ddeaad584205bd2fd3c242063bd3423d61ac259ca7eba", size = 30111035, upload-time = "2025-05-08T16:05:20.152Z" }, - { url = "https://files.pythonhosted.org/packages/4b/fa/a7e5b95afd80d24313307f03624acc65801846fa75599034f8ceb9e2cbf6/scipy-1.15.3-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:3b0334816afb8b91dab859281b1b9786934392aa3d527cd847e41bb6f45bee65", size = 22384499, upload-time = "2025-05-08T16:05:24.494Z" }, - { url = "https://files.pythonhosted.org/packages/17/99/f3aaddccf3588bb4aea70ba35328c204cadd89517a1612ecfda5b2dd9d7a/scipy-1.15.3-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:6db907c7368e3092e24919b5e31c76998b0ce1684d51a90943cb0ed1b4ffd6c1", size = 25152602, upload-time = "2025-05-08T16:05:29.313Z" }, - { url = "https://files.pythonhosted.org/packages/56/c5/1032cdb565f146109212153339f9cb8b993701e9fe56b1c97699eee12586/scipy-1.15.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:721d6b4ef5dc82ca8968c25b111e307083d7ca9091bc38163fb89243e85e3889", size = 35503415, upload-time = "2025-05-08T16:05:34.699Z" }, - { url = "https://files.pythonhosted.org/packages/bd/37/89f19c8c05505d0601ed5650156e50eb881ae3918786c8fd7262b4ee66d3/scipy-1.15.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:39cb9c62e471b1bb3750066ecc3a3f3052b37751c7c3dfd0fd7e48900ed52982", size = 37652622, upload-time = "2025-05-08T16:05:40.762Z" }, - { url = "https://files.pythonhosted.org/packages/7e/31/be59513aa9695519b18e1851bb9e487de66f2d31f835201f1b42f5d4d475/scipy-1.15.3-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:795c46999bae845966368a3c013e0e00947932d68e235702b5c3f6ea799aa8c9", size = 37244796, upload-time = "2025-05-08T16:05:48.119Z" }, - { url = "https://files.pythonhosted.org/packages/10/c0/4f5f3eeccc235632aab79b27a74a9130c6c35df358129f7ac8b29f562ac7/scipy-1.15.3-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:18aaacb735ab38b38db42cb01f6b92a2d0d4b6aabefeb07f02849e47f8fb3594", size = 40047684, upload-time = "2025-05-08T16:05:54.22Z" }, - { url = "https://files.pythonhosted.org/packages/ab/a7/0ddaf514ce8a8714f6ed243a2b391b41dbb65251affe21ee3077ec45ea9a/scipy-1.15.3-cp311-cp311-win_amd64.whl", hash = "sha256:ae48a786a28412d744c62fd7816a4118ef97e5be0bee968ce8f0a2fba7acf3bb", size = 41246504, upload-time = "2025-05-08T16:06:00.437Z" }, - { url = "https://files.pythonhosted.org/packages/37/4b/683aa044c4162e10ed7a7ea30527f2cbd92e6999c10a8ed8edb253836e9c/scipy-1.15.3-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:6ac6310fdbfb7aa6612408bd2f07295bcbd3fda00d2d702178434751fe48e019", size = 38766735, upload-time = "2025-05-08T16:06:06.471Z" }, - { url = "https://files.pythonhosted.org/packages/7b/7e/f30be3d03de07f25dc0ec926d1681fed5c732d759ac8f51079708c79e680/scipy-1.15.3-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:185cd3d6d05ca4b44a8f1595af87f9c372bb6acf9c808e99aa3e9aa03bd98cf6", size = 30173284, upload-time = "2025-05-08T16:06:11.686Z" }, - { url = "https://files.pythonhosted.org/packages/07/9c/0ddb0d0abdabe0d181c1793db51f02cd59e4901da6f9f7848e1f96759f0d/scipy-1.15.3-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:05dc6abcd105e1a29f95eada46d4a3f251743cfd7d3ae8ddb4088047f24ea477", size = 22446958, upload-time = "2025-05-08T16:06:15.97Z" }, - { url = "https://files.pythonhosted.org/packages/af/43/0bce905a965f36c58ff80d8bea33f1f9351b05fad4beaad4eae34699b7a1/scipy-1.15.3-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:06efcba926324df1696931a57a176c80848ccd67ce6ad020c810736bfd58eb1c", size = 25242454, upload-time = "2025-05-08T16:06:20.394Z" }, - { url = "https://files.pythonhosted.org/packages/56/30/a6f08f84ee5b7b28b4c597aca4cbe545535c39fe911845a96414700b64ba/scipy-1.15.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:c05045d8b9bfd807ee1b9f38761993297b10b245f012b11b13b91ba8945f7e45", size = 35210199, upload-time = "2025-05-08T16:06:26.159Z" }, - { url = "https://files.pythonhosted.org/packages/0b/1f/03f52c282437a168ee2c7c14a1a0d0781a9a4a8962d84ac05c06b4c5b555/scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:271e3713e645149ea5ea3e97b57fdab61ce61333f97cfae392c28ba786f9bb49", size = 37309455, upload-time = "2025-05-08T16:06:32.778Z" }, - { url = "https://files.pythonhosted.org/packages/89/b1/fbb53137f42c4bf630b1ffdfc2151a62d1d1b903b249f030d2b1c0280af8/scipy-1.15.3-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:6cfd56fc1a8e53f6e89ba3a7a7251f7396412d655bca2aa5611c8ec9a6784a1e", size = 36885140, upload-time = "2025-05-08T16:06:39.249Z" }, - { url = "https://files.pythonhosted.org/packages/2e/2e/025e39e339f5090df1ff266d021892694dbb7e63568edcfe43f892fa381d/scipy-1.15.3-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:0ff17c0bb1cb32952c09217d8d1eed9b53d1463e5f1dd6052c7857f83127d539", size = 39710549, upload-time = "2025-05-08T16:06:45.729Z" }, - { url = "https://files.pythonhosted.org/packages/e6/eb/3bf6ea8ab7f1503dca3a10df2e4b9c3f6b3316df07f6c0ded94b281c7101/scipy-1.15.3-cp312-cp312-win_amd64.whl", hash = "sha256:52092bc0472cfd17df49ff17e70624345efece4e1a12b23783a1ac59a1b728ed", size = 40966184, upload-time = "2025-05-08T16:06:52.623Z" }, - { url = "https://files.pythonhosted.org/packages/73/18/ec27848c9baae6e0d6573eda6e01a602e5649ee72c27c3a8aad673ebecfd/scipy-1.15.3-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:2c620736bcc334782e24d173c0fdbb7590a0a436d2fdf39310a8902505008759", size = 38728256, upload-time = "2025-05-08T16:06:58.696Z" }, - { url = "https://files.pythonhosted.org/packages/74/cd/1aef2184948728b4b6e21267d53b3339762c285a46a274ebb7863c9e4742/scipy-1.15.3-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:7e11270a000969409d37ed399585ee530b9ef6aa99d50c019de4cb01e8e54e62", size = 30109540, upload-time = "2025-05-08T16:07:04.209Z" }, - { url = "https://files.pythonhosted.org/packages/5b/d8/59e452c0a255ec352bd0a833537a3bc1bfb679944c4938ab375b0a6b3a3e/scipy-1.15.3-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:8c9ed3ba2c8a2ce098163a9bdb26f891746d02136995df25227a20e71c396ebb", size = 22383115, upload-time = "2025-05-08T16:07:08.998Z" }, - { url = "https://files.pythonhosted.org/packages/08/f5/456f56bbbfccf696263b47095291040655e3cbaf05d063bdc7c7517f32ac/scipy-1.15.3-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:0bdd905264c0c9cfa74a4772cdb2070171790381a5c4d312c973382fc6eaf730", size = 25163884, upload-time = "2025-05-08T16:07:14.091Z" }, - { url = "https://files.pythonhosted.org/packages/a2/66/a9618b6a435a0f0c0b8a6d0a2efb32d4ec5a85f023c2b79d39512040355b/scipy-1.15.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:79167bba085c31f38603e11a267d862957cbb3ce018d8b38f79ac043bc92d825", size = 35174018, upload-time = "2025-05-08T16:07:19.427Z" }, - { url = "https://files.pythonhosted.org/packages/b5/09/c5b6734a50ad4882432b6bb7c02baf757f5b2f256041da5df242e2d7e6b6/scipy-1.15.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c9deabd6d547aee2c9a81dee6cc96c6d7e9a9b1953f74850c179f91fdc729cb7", size = 37269716, upload-time = "2025-05-08T16:07:25.712Z" }, - { url = "https://files.pythonhosted.org/packages/77/0a/eac00ff741f23bcabd352731ed9b8995a0a60ef57f5fd788d611d43d69a1/scipy-1.15.3-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:dde4fc32993071ac0c7dd2d82569e544f0bdaff66269cb475e0f369adad13f11", size = 36872342, upload-time = "2025-05-08T16:07:31.468Z" }, - { url = "https://files.pythonhosted.org/packages/fe/54/4379be86dd74b6ad81551689107360d9a3e18f24d20767a2d5b9253a3f0a/scipy-1.15.3-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:f77f853d584e72e874d87357ad70f44b437331507d1c311457bed8ed2b956126", size = 39670869, upload-time = "2025-05-08T16:07:38.002Z" }, - { url = "https://files.pythonhosted.org/packages/87/2e/892ad2862ba54f084ffe8cc4a22667eaf9c2bcec6d2bff1d15713c6c0703/scipy-1.15.3-cp313-cp313-win_amd64.whl", hash = "sha256:b90ab29d0c37ec9bf55424c064312930ca5f4bde15ee8619ee44e69319aab163", size = 40988851, upload-time = "2025-05-08T16:08:33.671Z" }, - { url = "https://files.pythonhosted.org/packages/1b/e9/7a879c137f7e55b30d75d90ce3eb468197646bc7b443ac036ae3fe109055/scipy-1.15.3-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:3ac07623267feb3ae308487c260ac684b32ea35fd81e12845039952f558047b8", size = 38863011, upload-time = "2025-05-08T16:07:44.039Z" }, - { url = "https://files.pythonhosted.org/packages/51/d1/226a806bbd69f62ce5ef5f3ffadc35286e9fbc802f606a07eb83bf2359de/scipy-1.15.3-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:6487aa99c2a3d509a5227d9a5e889ff05830a06b2ce08ec30df6d79db5fcd5c5", size = 30266407, upload-time = "2025-05-08T16:07:49.891Z" }, - { url = "https://files.pythonhosted.org/packages/e5/9b/f32d1d6093ab9eeabbd839b0f7619c62e46cc4b7b6dbf05b6e615bbd4400/scipy-1.15.3-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:50f9e62461c95d933d5c5ef4a1f2ebf9a2b4e83b0db374cb3f1de104d935922e", size = 22540030, upload-time = "2025-05-08T16:07:54.121Z" }, - { url = "https://files.pythonhosted.org/packages/e7/29/c278f699b095c1a884f29fda126340fcc201461ee8bfea5c8bdb1c7c958b/scipy-1.15.3-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:14ed70039d182f411ffc74789a16df3835e05dc469b898233a245cdfd7f162cb", size = 25218709, upload-time = "2025-05-08T16:07:58.506Z" }, - { url = "https://files.pythonhosted.org/packages/24/18/9e5374b617aba742a990581373cd6b68a2945d65cc588482749ef2e64467/scipy-1.15.3-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a769105537aa07a69468a0eefcd121be52006db61cdd8cac8a0e68980bbb723", size = 34809045, upload-time = "2025-05-08T16:08:03.929Z" }, - { url = "https://files.pythonhosted.org/packages/e1/fe/9c4361e7ba2927074360856db6135ef4904d505e9b3afbbcb073c4008328/scipy-1.15.3-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:9db984639887e3dffb3928d118145ffe40eff2fa40cb241a306ec57c219ebbbb", size = 36703062, upload-time = "2025-05-08T16:08:09.558Z" }, - { url = "https://files.pythonhosted.org/packages/b7/8e/038ccfe29d272b30086b25a4960f757f97122cb2ec42e62b460d02fe98e9/scipy-1.15.3-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:40e54d5c7e7ebf1aa596c374c49fa3135f04648a0caabcb66c52884b943f02b4", size = 36393132, upload-time = "2025-05-08T16:08:15.34Z" }, - { url = "https://files.pythonhosted.org/packages/10/7e/5c12285452970be5bdbe8352c619250b97ebf7917d7a9a9e96b8a8140f17/scipy-1.15.3-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:5e721fed53187e71d0ccf382b6bf977644c533e506c4d33c3fb24de89f5c3ed5", size = 38979503, upload-time = "2025-05-08T16:08:21.513Z" }, - { url = "https://files.pythonhosted.org/packages/81/06/0a5e5349474e1cbc5757975b21bd4fad0e72ebf138c5592f191646154e06/scipy-1.15.3-cp313-cp313t-win_amd64.whl", hash = "sha256:76ad1fb5f8752eabf0fa02e4cc0336b4e8f021e2d5f061ed37d6d264db35e3ca", size = 40308097, upload-time = "2025-05-08T16:08:27.627Z" }, -] - [[package]] name = "scipy" version = "1.17.1" source = { registry = "https://pypi.org/simple" } -resolution-markers = [ - "python_full_version >= '3.14' and sys_platform == 'win32'", - "python_full_version >= '3.14' and sys_platform == 'emscripten'", - "python_full_version >= '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'win32'", - "python_full_version == '3.11.*' and sys_platform == 'win32'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform == 'emscripten'", - "python_full_version == '3.11.*' and sys_platform == 'emscripten'", - "python_full_version >= '3.12' and python_full_version < '3.14' and sys_platform != 'emscripten' and sys_platform != 'win32'", - "python_full_version == '3.11.*' and sys_platform != 'emscripten' and sys_platform != 'win32'", -] dependencies = [ - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "numpy" }, ] sdist = { url = "https://files.pythonhosted.org/packages/7a/97/5a3609c4f8d58b039179648e62dd220f89864f56f7357f5d4f45c29eb2cc/scipy-1.17.1.tar.gz", hash = "sha256:95d8e012d8cb8816c226aef832200b1d45109ed4464303e997c5b13122b297c0", size = 30573822, upload-time = "2026-02-23T00:26:24.851Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/df/75/b4ce781849931fef6fd529afa6b63711d5a733065722d0c3e2724af9e40a/scipy-1.17.1-cp311-cp311-macosx_10_14_x86_64.whl", hash = "sha256:1f95b894f13729334fb990162e911c9e5dc1ab390c58aa6cbecb389c5b5e28ec", size = 31613675, upload-time = "2026-02-23T00:16:00.13Z" }, - { url = "https://files.pythonhosted.org/packages/f7/58/bccc2861b305abdd1b8663d6130c0b3d7cc22e8d86663edbc8401bfd40d4/scipy-1.17.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:e18f12c6b0bc5a592ed23d3f7b891f68fd7f8241d69b7883769eb5d5dfb52696", size = 28162057, upload-time = "2026-02-23T00:16:09.456Z" }, - { url = "https://files.pythonhosted.org/packages/6d/ee/18146b7757ed4976276b9c9819108adbc73c5aad636e5353e20746b73069/scipy-1.17.1-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:a3472cfbca0a54177d0faa68f697d8ba4c80bbdc19908c3465556d9f7efce9ee", size = 20334032, upload-time = "2026-02-23T00:16:17.358Z" }, - { url = "https://files.pythonhosted.org/packages/ec/e6/cef1cf3557f0c54954198554a10016b6a03b2ec9e22a4e1df734936bd99c/scipy-1.17.1-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:766e0dc5a616d026a3a1cffa379af959671729083882f50307e18175797b3dfd", size = 22709533, upload-time = "2026-02-23T00:16:25.791Z" }, - { url = "https://files.pythonhosted.org/packages/4d/60/8804678875fc59362b0fb759ab3ecce1f09c10a735680318ac30da8cd76b/scipy-1.17.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:744b2bf3640d907b79f3fd7874efe432d1cf171ee721243e350f55234b4cec4c", size = 33062057, upload-time = "2026-02-23T00:16:36.931Z" }, - { url = "https://files.pythonhosted.org/packages/09/7d/af933f0f6e0767995b4e2d705a0665e454d1c19402aa7e895de3951ebb04/scipy-1.17.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:43af8d1f3bea642559019edfe64e9b11192a8978efbd1539d7bc2aaa23d92de4", size = 35349300, upload-time = "2026-02-23T00:16:49.108Z" }, - { url = "https://files.pythonhosted.org/packages/b4/3d/7ccbbdcbb54c8fdc20d3b6930137c782a163fa626f0aef920349873421ba/scipy-1.17.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:cd96a1898c0a47be4520327e01f874acfd61fb48a9420f8aa9f6483412ffa444", size = 35127333, upload-time = "2026-02-23T00:17:01.293Z" }, - { url = "https://files.pythonhosted.org/packages/e8/19/f926cb11c42b15ba08e3a71e376d816ac08614f769b4f47e06c3580c836a/scipy-1.17.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:4eb6c25dd62ee8d5edf68a8e1c171dd71c292fdae95d8aeb3dd7d7de4c364082", size = 37741314, upload-time = "2026-02-23T00:17:12.576Z" }, - { url = "https://files.pythonhosted.org/packages/95/da/0d1df507cf574b3f224ccc3d45244c9a1d732c81dcb26b1e8a766ae271a8/scipy-1.17.1-cp311-cp311-win_amd64.whl", hash = "sha256:d30e57c72013c2a4fe441c2fcb8e77b14e152ad48b5464858e07e2ad9fbfceff", size = 36607512, upload-time = "2026-02-23T00:17:23.424Z" }, - { url = "https://files.pythonhosted.org/packages/68/7f/bdd79ceaad24b671543ffe0ef61ed8e659440eb683b66f033454dcee90eb/scipy-1.17.1-cp311-cp311-win_arm64.whl", hash = "sha256:9ecb4efb1cd6e8c4afea0daa91a87fbddbce1b99d2895d151596716c0b2e859d", size = 24599248, upload-time = "2026-02-23T00:17:34.561Z" }, - { url = "https://files.pythonhosted.org/packages/35/48/b992b488d6f299dbe3f11a20b24d3dda3d46f1a635ede1c46b5b17a7b163/scipy-1.17.1-cp312-cp312-macosx_10_14_x86_64.whl", hash = "sha256:35c3a56d2ef83efc372eaec584314bd0ef2e2f0d2adb21c55e6ad5b344c0dcb8", size = 31610954, upload-time = "2026-02-23T00:17:49.855Z" }, - { url = "https://files.pythonhosted.org/packages/b2/02/cf107b01494c19dc100f1d0b7ac3cc08666e96ba2d64db7626066cee895e/scipy-1.17.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:fcb310ddb270a06114bb64bbe53c94926b943f5b7f0842194d585c65eb4edd76", size = 28172662, upload-time = "2026-02-23T00:18:01.64Z" }, - { url = "https://files.pythonhosted.org/packages/cf/a9/599c28631bad314d219cf9ffd40e985b24d603fc8a2f4ccc5ae8419a535b/scipy-1.17.1-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:cc90d2e9c7e5c7f1a482c9875007c095c3194b1cfedca3c2f3291cdc2bc7c086", size = 20344366, upload-time = "2026-02-23T00:18:12.015Z" }, - { url = "https://files.pythonhosted.org/packages/35/f5/906eda513271c8deb5af284e5ef0206d17a96239af79f9fa0aebfe0e36b4/scipy-1.17.1-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:c80be5ede8f3f8eded4eff73cc99a25c388ce98e555b17d31da05287015ffa5b", size = 22704017, upload-time = "2026-02-23T00:18:21.502Z" }, - { url = "https://files.pythonhosted.org/packages/da/34/16f10e3042d2f1d6b66e0428308ab52224b6a23049cb2f5c1756f713815f/scipy-1.17.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e19ebea31758fac5893a2ac360fedd00116cbb7628e650842a6691ba7ca28a21", size = 32927842, upload-time = "2026-02-23T00:18:35.367Z" }, - { url = "https://files.pythonhosted.org/packages/01/8e/1e35281b8ab6d5d72ebe9911edcdffa3f36b04ed9d51dec6dd140396e220/scipy-1.17.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:02ae3b274fde71c5e92ac4d54bc06c42d80e399fec704383dcd99b301df37458", size = 35235890, upload-time = "2026-02-23T00:18:49.188Z" }, - { url = "https://files.pythonhosted.org/packages/c5/5c/9d7f4c88bea6e0d5a4f1bc0506a53a00e9fcb198de372bfe4d3652cef482/scipy-1.17.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8a604bae87c6195d8b1045eddece0514d041604b14f2727bbc2b3020172045eb", size = 35003557, upload-time = "2026-02-23T00:18:54.74Z" }, - { url = "https://files.pythonhosted.org/packages/65/94/7698add8f276dbab7a9de9fb6b0e02fc13ee61d51c7c3f85ac28b65e1239/scipy-1.17.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:f590cd684941912d10becc07325a3eeb77886fe981415660d9265c4c418d0bea", size = 37625856, upload-time = "2026-02-23T00:19:00.307Z" }, - { url = "https://files.pythonhosted.org/packages/a2/84/dc08d77fbf3d87d3ee27f6a0c6dcce1de5829a64f2eae85a0ecc1f0daa73/scipy-1.17.1-cp312-cp312-win_amd64.whl", hash = "sha256:41b71f4a3a4cab9d366cd9065b288efc4d4f3c0b37a91a8e0947fb5bd7f31d87", size = 36549682, upload-time = "2026-02-23T00:19:07.67Z" }, - { url = "https://files.pythonhosted.org/packages/bc/98/fe9ae9ffb3b54b62559f52dedaebe204b408db8109a8c66fdd04869e6424/scipy-1.17.1-cp312-cp312-win_arm64.whl", hash = "sha256:f4115102802df98b2b0db3cce5cb9b92572633a1197c77b7553e5203f284a5b3", size = 24547340, upload-time = "2026-02-23T00:19:12.024Z" }, { url = "https://files.pythonhosted.org/packages/76/27/07ee1b57b65e92645f219b37148a7e7928b82e2b5dbeccecb4dff7c64f0b/scipy-1.17.1-cp313-cp313-macosx_10_14_x86_64.whl", hash = "sha256:5e3c5c011904115f88a39308379c17f91546f77c1667cea98739fe0fccea804c", size = 31590199, upload-time = "2026-02-23T00:19:17.192Z" }, { url = "https://files.pythonhosted.org/packages/ec/ae/db19f8ab842e9b724bf5dbb7db29302a91f1e55bc4d04b1025d6d605a2c5/scipy-1.17.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:6fac755ca3d2c3edcb22f479fceaa241704111414831ddd3bc6056e18516892f", size = 28154001, upload-time = "2026-02-23T00:19:22.241Z" }, { url = "https://files.pythonhosted.org/packages/5b/58/3ce96251560107b381cbd6e8413c483bbb1228a6b919fa8652b0d4090e7f/scipy-1.17.1-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:7ff200bf9d24f2e4d5dc6ee8c3ac64d739d3a89e2326ba68aaf6c4a2b838fd7d", size = 20325719, upload-time = "2026-02-23T00:19:26.329Z" }, @@ -2384,32 +1694,11 @@ name = "sqlalchemy" version = "2.0.49" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "greenlet", marker = "(python_full_version >= '3.12' and platform_machine == 'AMD64') or (python_full_version >= '3.12' and platform_machine == 'WIN32') or (python_full_version >= '3.12' and platform_machine == 'aarch64') or (python_full_version >= '3.12' and platform_machine == 'amd64') or (python_full_version >= '3.12' and platform_machine == 'ppc64le') or (python_full_version >= '3.12' and platform_machine == 'win32') or (python_full_version >= '3.12' and platform_machine == 'x86_64')" }, - { name = "typing-extensions", marker = "python_full_version >= '3.12'" }, + { name = "greenlet", marker = "platform_machine == 'AMD64' or platform_machine == 'WIN32' or platform_machine == 'aarch64' or platform_machine == 'amd64' or platform_machine == 'ppc64le' or platform_machine == 'win32' or platform_machine == 'x86_64'" }, + { name = "typing-extensions" }, ] sdist = { url = "https://files.pythonhosted.org/packages/09/45/461788f35e0364a8da7bda51a1fe1b09762d0c32f12f63727998d85a873b/sqlalchemy-2.0.49.tar.gz", hash = "sha256:d15950a57a210e36dd4cec1aac22787e2a4d57ba9318233e2ef8b2daf9ff2d5f", size = 9898221, upload-time = "2026-04-03T16:38:11.704Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/96/76/f908955139842c362aa877848f42f9249642d5b69e06cee9eae5111da1bd/sqlalchemy-2.0.49-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:42e8804962f9e6f4be2cbaedc0c3718f08f60a16910fa3d86da5a1e3b1bfe60f", size = 2159321, upload-time = "2026-04-03T16:50:11.8Z" }, - { url = "https://files.pythonhosted.org/packages/24/e2/17ba0b7bfbd8de67196889b6d951de269e8a46057d92baca162889beb16d/sqlalchemy-2.0.49-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:cc992c6ed024c8c3c592c5fc9846a03dd68a425674900c70122c77ea16c5fb0b", size = 3238937, upload-time = "2026-04-03T16:54:45.731Z" }, - { url = "https://files.pythonhosted.org/packages/90/1e/410dd499c039deacff395eec01a9da057125fcd0c97e3badc252c6a2d6a7/sqlalchemy-2.0.49-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:6eb188b84269f357669b62cb576b5b918de10fb7c728a005fa0ebb0b758adce1", size = 3237188, upload-time = "2026-04-03T16:56:53.217Z" }, - { url = "https://files.pythonhosted.org/packages/ab/06/e797a8b98a3993ac4bc785309b9b6d005457fc70238ee6cefa7c8867a92e/sqlalchemy-2.0.49-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:62557958002b69699bdb7f5137c6714ca1133f045f97b3903964f47db97ea339", size = 3190061, upload-time = "2026-04-03T16:54:47.489Z" }, - { url = "https://files.pythonhosted.org/packages/44/d3/5a9f7ef580af1031184b38235da6ac58c3b571df01c9ec061c44b2b0c5a6/sqlalchemy-2.0.49-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:da9b91bca419dc9b9267ffadde24eae9b1a6bffcd09d0a207e5e3af99a03ce0d", size = 3211477, upload-time = "2026-04-03T16:56:55.056Z" }, - { url = "https://files.pythonhosted.org/packages/69/ec/7be8c8cb35f038e963a203e4fe5a028989167cc7299927b7cf297c271e37/sqlalchemy-2.0.49-cp310-cp310-win32.whl", hash = "sha256:5e61abbec255be7b122aa461021daa7c3f310f3e743411a67079f9b3cc91ece3", size = 2119965, upload-time = "2026-04-03T17:00:50.009Z" }, - { url = "https://files.pythonhosted.org/packages/b5/31/0defb93e3a10b0cf7d1271aedd87251a08c3a597ee4f353281769b547b5a/sqlalchemy-2.0.49-cp310-cp310-win_amd64.whl", hash = "sha256:0c98c59075b890df8abfcc6ad632879540f5791c68baebacb4f833713b510e75", size = 2142935, upload-time = "2026-04-03T17:00:51.675Z" }, - { url = "https://files.pythonhosted.org/packages/60/b5/e3617cc67420f8f403efebd7b043128f94775e57e5b84e7255203390ceae/sqlalchemy-2.0.49-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:c5070135e1b7409c4161133aa525419b0062088ed77c92b1da95366ec5cbebbe", size = 2159126, upload-time = "2026-04-03T16:50:13.242Z" }, - { url = "https://files.pythonhosted.org/packages/20/9b/91ca80403b17cd389622a642699e5f6564096b698e7cdcbcbb6409898bc4/sqlalchemy-2.0.49-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:9ac7a3e245fd0310fd31495eb61af772e637bdf7d88ee81e7f10a3f271bff014", size = 3315509, upload-time = "2026-04-03T16:54:49.332Z" }, - { url = "https://files.pythonhosted.org/packages/b1/61/0722511d98c54de95acb327824cb759e8653789af2b1944ab1cc69d32565/sqlalchemy-2.0.49-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:4d4e5a0ceba319942fa6b585cf82539288a61e314ef006c1209f734551ab9536", size = 3315014, upload-time = "2026-04-03T16:56:56.376Z" }, - { url = "https://files.pythonhosted.org/packages/46/55/d514a653ffeb4cebf4b54c47bec32ee28ad89d39fafba16eeed1d81dccd5/sqlalchemy-2.0.49-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:3ddcb27fb39171de36e207600116ac9dfd4ae46f86c82a9bf3934043e80ebb88", size = 3267388, upload-time = "2026-04-03T16:54:51.272Z" }, - { url = "https://files.pythonhosted.org/packages/2f/16/0dcc56cb6d3335c1671a2258f5d2cb8267c9a2260e27fde53cbfb1b3540a/sqlalchemy-2.0.49-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:32fe6a41ad97302db2931f05bb91abbcc65b5ce4c675cd44b972428dd2947700", size = 3289602, upload-time = "2026-04-03T16:56:57.63Z" }, - { url = "https://files.pythonhosted.org/packages/51/6c/f8ab6fb04470a133cd80608db40aa292e6bae5f162c3a3d4ab19544a67af/sqlalchemy-2.0.49-cp311-cp311-win32.whl", hash = "sha256:46d51518d53edfbe0563662c96954dc8fcace9832332b914375f45a99b77cc9a", size = 2119044, upload-time = "2026-04-03T17:00:53.455Z" }, - { url = "https://files.pythonhosted.org/packages/c4/59/55a6d627d04b6ebb290693681d7683c7da001eddf90b60cfcc41ee907978/sqlalchemy-2.0.49-cp311-cp311-win_amd64.whl", hash = "sha256:951d4a210744813be63019f3df343bf233b7432aadf0db54c75802247330d3af", size = 2143642, upload-time = "2026-04-03T17:00:54.769Z" }, - { url = "https://files.pythonhosted.org/packages/49/b3/2de412451330756aaaa72d27131db6dde23995efe62c941184e15242a5fa/sqlalchemy-2.0.49-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4bbccb45260e4ff1b7db0be80a9025bb1e6698bdb808b83fff0000f7a90b2c0b", size = 2157681, upload-time = "2026-04-03T16:53:07.132Z" }, - { url = "https://files.pythonhosted.org/packages/50/84/b2a56e2105bd11ebf9f0b93abddd748e1a78d592819099359aa98134a8bf/sqlalchemy-2.0.49-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fb37f15714ec2652d574f021d479e78cd4eb9d04396dca36568fdfffb3487982", size = 3338976, upload-time = "2026-04-03T17:07:40Z" }, - { url = "https://files.pythonhosted.org/packages/2c/fa/65fcae2ed62f84ab72cf89536c7c3217a156e71a2c111b1305ab6f0690e2/sqlalchemy-2.0.49-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:3bb9ec6436a820a4c006aad1ac351f12de2f2dbdaad171692ee457a02429b672", size = 3351937, upload-time = "2026-04-03T17:12:23.374Z" }, - { url = "https://files.pythonhosted.org/packages/f8/2f/6fd118563572a7fe475925742eb6b3443b2250e346a0cc27d8d408e73773/sqlalchemy-2.0.49-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8d6efc136f44a7e8bc8088507eaabbb8c2b55b3dbb63fe102c690da0ddebe55e", size = 3281646, upload-time = "2026-04-03T17:07:41.949Z" }, - { url = "https://files.pythonhosted.org/packages/c5/d7/410f4a007c65275b9cf82354adb4bb8ba587b176d0a6ee99caa16fe638f8/sqlalchemy-2.0.49-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:e06e617e3d4fd9e51d385dfe45b077a41e9d1b033a7702551e3278ac597dc750", size = 3316695, upload-time = "2026-04-03T17:12:25.642Z" }, - { url = "https://files.pythonhosted.org/packages/d9/95/81f594aa60ded13273a844539041ccf1e66c5a7bed0a8e27810a3b52d522/sqlalchemy-2.0.49-cp312-cp312-win32.whl", hash = "sha256:83101a6930332b87653886c01d1ee7e294b1fe46a07dd9a2d2b4f91bcc88eec0", size = 2117483, upload-time = "2026-04-03T17:05:40.896Z" }, - { url = "https://files.pythonhosted.org/packages/47/9e/fd90114059175cac64e4fafa9bf3ac20584384d66de40793ae2e2f26f3bb/sqlalchemy-2.0.49-cp312-cp312-win_amd64.whl", hash = "sha256:618a308215b6cececb6240b9abde545e3acdabac7ae3e1d4e666896bf5ba44b4", size = 2144494, upload-time = "2026-04-03T17:05:42.282Z" }, { url = "https://files.pythonhosted.org/packages/ae/81/81755f50eb2478eaf2049728491d4ea4f416c1eb013338682173259efa09/sqlalchemy-2.0.49-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:df2d441bacf97022e81ad047e1597552eb3f83ca8a8f1a1fdd43cd7fe3898120", size = 2154547, upload-time = "2026-04-03T16:53:08.64Z" }, { url = "https://files.pythonhosted.org/packages/a2/bc/3494270da80811d08bcfa247404292428c4fe16294932bce5593f215cad9/sqlalchemy-2.0.49-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:8e20e511dc15265fb433571391ba313e10dd8ea7e509d51686a51313b4ac01a2", size = 3280782, upload-time = "2026-04-03T17:07:43.508Z" }, { url = "https://files.pythonhosted.org/packages/cd/f5/038741f5e747a5f6ea3e72487211579d8cbea5eb9827a9cbd61d0108c4bd/sqlalchemy-2.0.49-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:47604cb2159f8bbd5a1ab48a714557156320f20871ee64d550d8bf2683d980d3", size = 3297156, upload-time = "2026-04-03T17:12:27.697Z" }, @@ -2444,9 +1733,9 @@ name = "stack-data" version = "0.6.3" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "asttokens", marker = "python_full_version >= '3.11'" }, - { name = "executing", marker = "python_full_version >= '3.11'" }, - { name = "pure-eval", marker = "python_full_version >= '3.11'" }, + { name = "asttokens" }, + { name = "executing" }, + { name = "pure-eval" }, ] sdist = { url = "https://files.pythonhosted.org/packages/28/e3/55dcc2cfbc3ca9c29519eb6884dd1415ecb53b0e934862d3559ddcb7e20b/stack_data-0.6.3.tar.gz", hash = "sha256:836a778de4fec4dcd1dcd89ed8abff8a221f58308462e1c4aa2a3cf30148f0b9", size = 44707, upload-time = "2023-09-30T13:58:05.479Z" } wheels = [ @@ -2467,32 +1756,14 @@ name = "statsmodels" version = "0.14.6" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "numpy", version = "2.4.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, - { name = "packaging", marker = "python_full_version >= '3.12'" }, - { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, - { name = "patsy", marker = "python_full_version >= '3.12'" }, - { name = "scipy", version = "1.17.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, + { name = "numpy" }, + { name = "packaging" }, + { name = "pandas" }, + { name = "patsy" }, + { name = "scipy" }, ] sdist = { url = "https://files.pythonhosted.org/packages/0d/81/e8d74b34f85285f7335d30c5e3c2d7c0346997af9f3debf9a0a9a63de184/statsmodels-0.14.6.tar.gz", hash = "sha256:4d17873d3e607d398b85126cd4ed7aad89e4e9d89fc744cdab1af3189a996c2a", size = 20689085, upload-time = "2025-12-05T23:08:39.522Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/b5/6d/9ec309a175956f88eb8420ac564297f37cf9b1f73f89db74da861052dc29/statsmodels-0.14.6-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:f4ff0649a2df674c7ffb6fa1a06bffdb82a6adf09a48e90e000a15a6aaa734b0", size = 10142419, upload-time = "2025-12-05T19:27:35.625Z" }, - { url = "https://files.pythonhosted.org/packages/86/8f/338c5568315ec5bf3ac7cd4b71e34b98cb3b0f834919c0c04a0762f878a1/statsmodels-0.14.6-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:109012088b3e370080846ab053c76d125268631410142daad2f8c10770e8e8d9", size = 10022819, upload-time = "2025-12-05T19:27:49.385Z" }, - { url = "https://files.pythonhosted.org/packages/b0/77/5fc4cbc2d608f9b483b0675f82704a8bcd672962c379fe4d82100d388dbf/statsmodels-0.14.6-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e93bd5d220f3cb6fc5fc1bffd5b094966cab8ee99f6c57c02e95710513d6ac3f", size = 10118927, upload-time = "2025-12-05T23:07:51.256Z" }, - { url = "https://files.pythonhosted.org/packages/94/55/b86c861c32186403fe121d9ab27bc16d05839b170d92a978beb33abb995e/statsmodels-0.14.6-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:06eec42d682fdb09fe5d70a05930857efb141754ec5a5056a03304c1b5e32fd9", size = 10413015, upload-time = "2025-12-05T23:08:53.95Z" }, - { url = "https://files.pythonhosted.org/packages/f9/be/daf0dba729ccdc4176605f4a0fd5cfe71cdda671749dca10e74a732b8b1c/statsmodels-0.14.6-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:0444e88557df735eda7db330806fe09d51c9f888bb1f5906cb3a61fb1a3ed4a8", size = 10441248, upload-time = "2025-12-05T23:09:09.353Z" }, - { url = "https://files.pythonhosted.org/packages/9a/1c/2e10b7c7cc44fa418272996bf0427b8016718fd62f995d9c1f7ab37adf35/statsmodels-0.14.6-cp310-cp310-win_amd64.whl", hash = "sha256:e83a9abe653835da3b37fb6ae04b45480c1de11b3134bd40b09717192a1456ea", size = 9583410, upload-time = "2025-12-05T19:28:02.086Z" }, - { url = "https://files.pythonhosted.org/packages/a9/4d/df4dd089b406accfc3bb5ee53ba29bb3bdf5ae61643f86f8f604baa57656/statsmodels-0.14.6-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:6ad5c2810fc6c684254a7792bf1cbaf1606cdee2a253f8bd259c43135d87cfb4", size = 10121514, upload-time = "2025-12-05T19:28:16.521Z" }, - { url = "https://files.pythonhosted.org/packages/82/af/ec48daa7f861f993b91a0dcc791d66e1cf56510a235c5cbd2ab991a31d5c/statsmodels-0.14.6-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:341fa68a7403e10a95c7b6e41134b0da3a7b835ecff1eb266294408535a06eb6", size = 10003346, upload-time = "2025-12-05T19:28:29.568Z" }, - { url = "https://files.pythonhosted.org/packages/a9/2c/c8f7aa24cd729970728f3f98822fb45149adc216f445a9301e441f7ac760/statsmodels-0.14.6-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:bdf1dfe2a3ca56f5529118baf33a13efed2783c528f4a36409b46bbd2d9d48eb", size = 10129872, upload-time = "2025-12-05T23:09:25.724Z" }, - { url = "https://files.pythonhosted.org/packages/40/c6/9ae8e9b0721e9b6eb5f340c3a0ce8cd7cce4f66e03dd81f80d60f111987f/statsmodels-0.14.6-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a3764ba8195c9baf0925a96da0743ff218067a269f01d155ca3558deed2658ca", size = 10381964, upload-time = "2025-12-05T23:09:41.326Z" }, - { url = "https://files.pythonhosted.org/packages/28/8c/cf3d30c8c2da78e2ad1f50ade8b7fabec3ff4cdfc56fbc02e097c4577f90/statsmodels-0.14.6-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:9e8d2e519852adb1b420e018f5ac6e6684b2b877478adf7fda2cfdb58f5acb5d", size = 10409611, upload-time = "2025-12-05T23:09:57.131Z" }, - { url = "https://files.pythonhosted.org/packages/bf/cc/018f14ecb58c6cb89de9d52695740b7d1f5a982aa9ea312483ea3c3d5f77/statsmodels-0.14.6-cp311-cp311-win_amd64.whl", hash = "sha256:2738a00fca51196f5a7d44b06970ace6b8b30289839e4808d656f8a98e35faa7", size = 9580385, upload-time = "2025-12-05T19:28:42.778Z" }, - { url = "https://files.pythonhosted.org/packages/25/ce/308e5e5da57515dd7cab3ec37ea2d5b8ff50bef1fcc8e6d31456f9fae08e/statsmodels-0.14.6-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:fe76140ae7adc5ff0e60a3f0d56f4fffef484efa803c3efebf2fcd734d72ecb5", size = 10091932, upload-time = "2025-12-05T19:28:55.446Z" }, - { url = "https://files.pythonhosted.org/packages/05/30/affbabf3c27fb501ec7b5808230c619d4d1a4525c07301074eb4bda92fa9/statsmodels-0.14.6-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:26d4f0ed3b31f3c86f83a92f5c1f5cbe63fc992cd8915daf28ca49be14463a1c", size = 9997345, upload-time = "2025-12-05T19:29:10.278Z" }, - { url = "https://files.pythonhosted.org/packages/48/f5/3a73b51e6450c31652c53a8e12e24eac64e3824be816c0c2316e7dbdcb7d/statsmodels-0.14.6-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:d8c00a42863e4f4733ac9d078bbfad816249c01451740e6f5053ecc7db6d6368", size = 10058649, upload-time = "2025-12-05T23:10:12.775Z" }, - { url = "https://files.pythonhosted.org/packages/81/68/dddd76117df2ef14c943c6bbb6618be5c9401280046f4ddfc9fb4596a1b8/statsmodels-0.14.6-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:19b58cf7474aa9e7e3b0771a66537148b2df9b5884fbf156096c0e6c1ff0469d", size = 10339446, upload-time = "2025-12-05T23:10:28.503Z" }, - { url = "https://files.pythonhosted.org/packages/56/4a/dce451c74c4050535fac1ec0c14b80706d8fc134c9da22db3c8a0ec62c33/statsmodels-0.14.6-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:81e7dcc5e9587f2567e52deaff5220b175bf2f648951549eae5fc9383b62bc37", size = 10368705, upload-time = "2025-12-05T23:10:44.339Z" }, - { url = "https://files.pythonhosted.org/packages/60/15/3daba2df40be8b8a9a027d7f54c8dedf24f0d81b96e54b52293f5f7e3418/statsmodels-0.14.6-cp312-cp312-win_amd64.whl", hash = "sha256:b5eb07acd115aa6208b4058211138393a7e6c2cf12b6f213ede10f658f6a714f", size = 9543991, upload-time = "2025-12-05T23:10:58.536Z" }, { url = "https://files.pythonhosted.org/packages/81/59/a5aad5b0cc266f5be013db8cde563ac5d2a025e7efc0c328d83b50c72992/statsmodels-0.14.6-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:47ee7af083623d2091954fa71c7549b8443168f41b7c5dce66510274c50fd73e", size = 10072009, upload-time = "2025-12-05T23:11:14.021Z" }, { url = "https://files.pythonhosted.org/packages/53/dd/d8cfa7922fc6dc3c56fa6c59b348ea7de829a94cd73208c6f8202dd33f17/statsmodels-0.14.6-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:aa60d82e29fcd0a736e86feb63a11d2380322d77a9369a54be8b0965a3985f71", size = 9980018, upload-time = "2025-12-05T23:11:30.907Z" }, { url = "https://files.pythonhosted.org/packages/ee/77/0ec96803eba444efd75dba32f2ef88765ae3e8f567d276805391ec2c98c6/statsmodels-0.14.6-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:89ee7d595f5939cc20bf946faedcb5137d975f03ae080f300ebb4398f16a5bd4", size = 10060269, upload-time = "2025-12-05T23:11:46.338Z" }, @@ -2537,60 +1808,6 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/32/d5/f9a850d79b0851d1d4ef6456097579a9005b31fea68726a4ae5f2d82ddd9/threadpoolctl-3.6.0-py3-none-any.whl", hash = "sha256:43a0b8fd5a2928500110039e43a5eed8480b918967083ea48dc3ab9f13c4a7fb", size = 18638, upload-time = "2025-03-13T13:49:21.846Z" }, ] -[[package]] -name = "tomli" -version = "2.4.0" -source = { registry = "https://pypi.org/simple" } -sdist = { url = "https://files.pythonhosted.org/packages/82/30/31573e9457673ab10aa432461bee537ce6cef177667deca369efb79df071/tomli-2.4.0.tar.gz", hash = "sha256:aa89c3f6c277dd275d8e243ad24f3b5e701491a860d5121f2cdd399fbb31fc9c", size = 17477, upload-time = "2026-01-11T11:22:38.165Z" } -wheels = [ - { url = "https://files.pythonhosted.org/packages/3c/d9/3dc2289e1f3b32eb19b9785b6a006b28ee99acb37d1d47f78d4c10e28bf8/tomli-2.4.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:b5ef256a3fd497d4973c11bf142e9ed78b150d36f5773f1ca6088c230ffc5867", size = 153663, upload-time = "2026-01-11T11:21:45.27Z" }, - { url = "https://files.pythonhosted.org/packages/51/32/ef9f6845e6b9ca392cd3f64f9ec185cc6f09f0a2df3db08cbe8809d1d435/tomli-2.4.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:5572e41282d5268eb09a697c89a7bee84fae66511f87533a6f88bd2f7b652da9", size = 148469, upload-time = "2026-01-11T11:21:46.873Z" }, - { url = "https://files.pythonhosted.org/packages/d6/c2/506e44cce89a8b1b1e047d64bd495c22c9f71f21e05f380f1a950dd9c217/tomli-2.4.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:551e321c6ba03b55676970b47cb1b73f14a0a4dce6a3e1a9458fd6d921d72e95", size = 236039, upload-time = "2026-01-11T11:21:48.503Z" }, - { url = "https://files.pythonhosted.org/packages/b3/40/e1b65986dbc861b7e986e8ec394598187fa8aee85b1650b01dd925ca0be8/tomli-2.4.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:5e3f639a7a8f10069d0e15408c0b96a2a828cfdec6fca05296ebcdcc28ca7c76", size = 243007, upload-time = "2026-01-11T11:21:49.456Z" }, - { url = "https://files.pythonhosted.org/packages/9c/6f/6e39ce66b58a5b7ae572a0f4352ff40c71e8573633deda43f6a379d56b3e/tomli-2.4.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:1b168f2731796b045128c45982d3a4874057626da0e2ef1fdd722848b741361d", size = 240875, upload-time = "2026-01-11T11:21:50.755Z" }, - { url = "https://files.pythonhosted.org/packages/aa/ad/cb089cb190487caa80204d503c7fd0f4d443f90b95cf4ef5cf5aa0f439b0/tomli-2.4.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:133e93646ec4300d651839d382d63edff11d8978be23da4cc106f5a18b7d0576", size = 246271, upload-time = "2026-01-11T11:21:51.81Z" }, - { url = "https://files.pythonhosted.org/packages/0b/63/69125220e47fd7a3a27fd0de0c6398c89432fec41bc739823bcc66506af6/tomli-2.4.0-cp311-cp311-win32.whl", hash = "sha256:b6c78bdf37764092d369722d9946cb65b8767bfa4110f902a1b2542d8d173c8a", size = 96770, upload-time = "2026-01-11T11:21:52.647Z" }, - { url = "https://files.pythonhosted.org/packages/1e/0d/a22bb6c83f83386b0008425a6cd1fa1c14b5f3dd4bad05e98cf3dbbf4a64/tomli-2.4.0-cp311-cp311-win_amd64.whl", hash = "sha256:d3d1654e11d724760cdb37a3d7691f0be9db5fbdaef59c9f532aabf87006dbaa", size = 107626, upload-time = "2026-01-11T11:21:53.459Z" }, - { url = "https://files.pythonhosted.org/packages/2f/6d/77be674a3485e75cacbf2ddba2b146911477bd887dda9d8c9dfb2f15e871/tomli-2.4.0-cp311-cp311-win_arm64.whl", hash = "sha256:cae9c19ed12d4e8f3ebf46d1a75090e4c0dc16271c5bce1c833ac168f08fb614", size = 94842, upload-time = "2026-01-11T11:21:54.831Z" }, - { url = "https://files.pythonhosted.org/packages/3c/43/7389a1869f2f26dba52404e1ef13b4784b6b37dac93bac53457e3ff24ca3/tomli-2.4.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:920b1de295e72887bafa3ad9f7a792f811847d57ea6b1215154030cf131f16b1", size = 154894, upload-time = "2026-01-11T11:21:56.07Z" }, - { url = "https://files.pythonhosted.org/packages/e9/05/2f9bf110b5294132b2edf13fe6ca6ae456204f3d749f623307cbb7a946f2/tomli-2.4.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:7d6d9a4aee98fac3eab4952ad1d73aee87359452d1c086b5ceb43ed02ddb16b8", size = 149053, upload-time = "2026-01-11T11:21:57.467Z" }, - { url = "https://files.pythonhosted.org/packages/e8/41/1eda3ca1abc6f6154a8db4d714a4d35c4ad90adc0bcf700657291593fbf3/tomli-2.4.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:36b9d05b51e65b254ea6c2585b59d2c4cb91c8a3d91d0ed0f17591a29aaea54a", size = 243481, upload-time = "2026-01-11T11:21:58.661Z" }, - { url = "https://files.pythonhosted.org/packages/d2/6d/02ff5ab6c8868b41e7d4b987ce2b5f6a51d3335a70aa144edd999e055a01/tomli-2.4.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1c8a885b370751837c029ef9bc014f27d80840e48bac415f3412e6593bbc18c1", size = 251720, upload-time = "2026-01-11T11:22:00.178Z" }, - { url = "https://files.pythonhosted.org/packages/7b/57/0405c59a909c45d5b6f146107c6d997825aa87568b042042f7a9c0afed34/tomli-2.4.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8768715ffc41f0008abe25d808c20c3d990f42b6e2e58305d5da280ae7d1fa3b", size = 247014, upload-time = "2026-01-11T11:22:01.238Z" }, - { url = "https://files.pythonhosted.org/packages/2c/0e/2e37568edd944b4165735687cbaf2fe3648129e440c26d02223672ee0630/tomli-2.4.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:7b438885858efd5be02a9a133caf5812b8776ee0c969fea02c45e8e3f296ba51", size = 251820, upload-time = "2026-01-11T11:22:02.727Z" }, - { url = "https://files.pythonhosted.org/packages/5a/1c/ee3b707fdac82aeeb92d1a113f803cf6d0f37bdca0849cb489553e1f417a/tomli-2.4.0-cp312-cp312-win32.whl", hash = "sha256:0408e3de5ec77cc7f81960c362543cbbd91ef883e3138e81b729fc3eea5b9729", size = 97712, upload-time = "2026-01-11T11:22:03.777Z" }, - { url = "https://files.pythonhosted.org/packages/69/13/c07a9177d0b3bab7913299b9278845fc6eaaca14a02667c6be0b0a2270c8/tomli-2.4.0-cp312-cp312-win_amd64.whl", hash = "sha256:685306e2cc7da35be4ee914fd34ab801a6acacb061b6a7abca922aaf9ad368da", size = 108296, upload-time = "2026-01-11T11:22:04.86Z" }, - { url = "https://files.pythonhosted.org/packages/18/27/e267a60bbeeee343bcc279bb9e8fbed0cbe224bc7b2a3dc2975f22809a09/tomli-2.4.0-cp312-cp312-win_arm64.whl", hash = "sha256:5aa48d7c2356055feef06a43611fc401a07337d5b006be13a30f6c58f869e3c3", size = 94553, upload-time = "2026-01-11T11:22:05.854Z" }, - { url = "https://files.pythonhosted.org/packages/34/91/7f65f9809f2936e1f4ce6268ae1903074563603b2a2bd969ebbda802744f/tomli-2.4.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:84d081fbc252d1b6a982e1870660e7330fb8f90f676f6e78b052ad4e64714bf0", size = 154915, upload-time = "2026-01-11T11:22:06.703Z" }, - { url = "https://files.pythonhosted.org/packages/20/aa/64dd73a5a849c2e8f216b755599c511badde80e91e9bc2271baa7b2cdbb1/tomli-2.4.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:9a08144fa4cba33db5255f9b74f0b89888622109bd2776148f2597447f92a94e", size = 149038, upload-time = "2026-01-11T11:22:07.56Z" }, - { url = "https://files.pythonhosted.org/packages/9e/8a/6d38870bd3d52c8d1505ce054469a73f73a0fe62c0eaf5dddf61447e32fa/tomli-2.4.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c73add4bb52a206fd0c0723432db123c0c75c280cbd67174dd9d2db228ebb1b4", size = 242245, upload-time = "2026-01-11T11:22:08.344Z" }, - { url = "https://files.pythonhosted.org/packages/59/bb/8002fadefb64ab2669e5b977df3f5e444febea60e717e755b38bb7c41029/tomli-2.4.0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:1fb2945cbe303b1419e2706e711b7113da57b7db31ee378d08712d678a34e51e", size = 250335, upload-time = "2026-01-11T11:22:09.951Z" }, - { url = "https://files.pythonhosted.org/packages/a5/3d/4cdb6f791682b2ea916af2de96121b3cb1284d7c203d97d92d6003e91c8d/tomli-2.4.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:bbb1b10aa643d973366dc2cb1ad94f99c1726a02343d43cbc011edbfac579e7c", size = 245962, upload-time = "2026-01-11T11:22:11.27Z" }, - { url = "https://files.pythonhosted.org/packages/f2/4a/5f25789f9a460bd858ba9756ff52d0830d825b458e13f754952dd15fb7bb/tomli-2.4.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:4cbcb367d44a1f0c2be408758b43e1ffb5308abe0ea222897d6bfc8e8281ef2f", size = 250396, upload-time = "2026-01-11T11:22:12.325Z" }, - { url = "https://files.pythonhosted.org/packages/aa/2f/b73a36fea58dfa08e8b3a268750e6853a6aac2a349241a905ebd86f3047a/tomli-2.4.0-cp313-cp313-win32.whl", hash = "sha256:7d49c66a7d5e56ac959cb6fc583aff0651094ec071ba9ad43df785abc2320d86", size = 97530, upload-time = "2026-01-11T11:22:13.865Z" }, - { url = "https://files.pythonhosted.org/packages/3b/af/ca18c134b5d75de7e8dc551c5234eaba2e8e951f6b30139599b53de9c187/tomli-2.4.0-cp313-cp313-win_amd64.whl", hash = "sha256:3cf226acb51d8f1c394c1b310e0e0e61fecdd7adcb78d01e294ac297dd2e7f87", size = 108227, upload-time = "2026-01-11T11:22:15.224Z" }, - { url = "https://files.pythonhosted.org/packages/22/c3/b386b832f209fee8073c8138ec50f27b4460db2fdae9ffe022df89a57f9b/tomli-2.4.0-cp313-cp313-win_arm64.whl", hash = "sha256:d20b797a5c1ad80c516e41bc1fb0443ddb5006e9aaa7bda2d71978346aeb9132", size = 94748, upload-time = "2026-01-11T11:22:16.009Z" }, - { url = "https://files.pythonhosted.org/packages/f3/c4/84047a97eb1004418bc10bdbcfebda209fca6338002eba2dc27cc6d13563/tomli-2.4.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:26ab906a1eb794cd4e103691daa23d95c6919cc2fa9160000ac02370cc9dd3f6", size = 154725, upload-time = "2026-01-11T11:22:17.269Z" }, - { url = "https://files.pythonhosted.org/packages/a8/5d/d39038e646060b9d76274078cddf146ced86dc2b9e8bbf737ad5983609a0/tomli-2.4.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:20cedb4ee43278bc4f2fee6cb50daec836959aadaf948db5172e776dd3d993fc", size = 148901, upload-time = "2026-01-11T11:22:18.287Z" }, - { url = "https://files.pythonhosted.org/packages/73/e5/383be1724cb30f4ce44983d249645684a48c435e1cd4f8b5cded8a816d3c/tomli-2.4.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:39b0b5d1b6dd03684b3fb276407ebed7090bbec989fa55838c98560c01113b66", size = 243375, upload-time = "2026-01-11T11:22:19.154Z" }, - { url = "https://files.pythonhosted.org/packages/31/f0/bea80c17971c8d16d3cc109dc3585b0f2ce1036b5f4a8a183789023574f2/tomli-2.4.0-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a26d7ff68dfdb9f87a016ecfd1e1c2bacbe3108f4e0f8bcd2228ef9a766c787d", size = 250639, upload-time = "2026-01-11T11:22:20.168Z" }, - { url = "https://files.pythonhosted.org/packages/2c/8f/2853c36abbb7608e3f945d8a74e32ed3a74ee3a1f468f1ffc7d1cb3abba6/tomli-2.4.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:20ffd184fb1df76a66e34bd1b36b4a4641bd2b82954befa32fe8163e79f1a702", size = 246897, upload-time = "2026-01-11T11:22:21.544Z" }, - { url = "https://files.pythonhosted.org/packages/49/f0/6c05e3196ed5337b9fe7ea003e95fd3819a840b7a0f2bf5a408ef1dad8ed/tomli-2.4.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:75c2f8bbddf170e8effc98f5e9084a8751f8174ea6ccf4fca5398436e0320bc8", size = 254697, upload-time = "2026-01-11T11:22:23.058Z" }, - { url = "https://files.pythonhosted.org/packages/f3/f5/2922ef29c9f2951883525def7429967fc4d8208494e5ab524234f06b688b/tomli-2.4.0-cp314-cp314-win32.whl", hash = "sha256:31d556d079d72db7c584c0627ff3a24c5d3fb4f730221d3444f3efb1b2514776", size = 98567, upload-time = "2026-01-11T11:22:24.033Z" }, - { url = "https://files.pythonhosted.org/packages/7b/31/22b52e2e06dd2a5fdbc3ee73226d763b184ff21fc24e20316a44ccc4d96b/tomli-2.4.0-cp314-cp314-win_amd64.whl", hash = "sha256:43e685b9b2341681907759cf3a04e14d7104b3580f808cfde1dfdb60ada85475", size = 108556, upload-time = "2026-01-11T11:22:25.378Z" }, - { url = "https://files.pythonhosted.org/packages/48/3d/5058dff3255a3d01b705413f64f4306a141a8fd7a251e5a495e3f192a998/tomli-2.4.0-cp314-cp314-win_arm64.whl", hash = "sha256:3d895d56bd3f82ddd6faaff993c275efc2ff38e52322ea264122d72729dca2b2", size = 96014, upload-time = "2026-01-11T11:22:26.138Z" }, - { url = "https://files.pythonhosted.org/packages/b8/4e/75dab8586e268424202d3a1997ef6014919c941b50642a1682df43204c22/tomli-2.4.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:5b5807f3999fb66776dbce568cc9a828544244a8eb84b84b9bafc080c99597b9", size = 163339, upload-time = "2026-01-11T11:22:27.143Z" }, - { url = "https://files.pythonhosted.org/packages/06/e3/b904d9ab1016829a776d97f163f183a48be6a4deb87304d1e0116a349519/tomli-2.4.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:c084ad935abe686bd9c898e62a02a19abfc9760b5a79bc29644463eaf2840cb0", size = 159490, upload-time = "2026-01-11T11:22:28.399Z" }, - { url = "https://files.pythonhosted.org/packages/e3/5a/fc3622c8b1ad823e8ea98a35e3c632ee316d48f66f80f9708ceb4f2a0322/tomli-2.4.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0f2e3955efea4d1cfbcb87bc321e00dc08d2bcb737fd1d5e398af111d86db5df", size = 269398, upload-time = "2026-01-11T11:22:29.345Z" }, - { url = "https://files.pythonhosted.org/packages/fd/33/62bd6152c8bdd4c305ad9faca48f51d3acb2df1f8791b1477d46ff86e7f8/tomli-2.4.0-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0e0fe8a0b8312acf3a88077a0802565cb09ee34107813bba1c7cd591fa6cfc8d", size = 276515, upload-time = "2026-01-11T11:22:30.327Z" }, - { url = "https://files.pythonhosted.org/packages/4b/ff/ae53619499f5235ee4211e62a8d7982ba9e439a0fb4f2f351a93d67c1dd2/tomli-2.4.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:413540dce94673591859c4c6f794dfeaa845e98bf35d72ed59636f869ef9f86f", size = 273806, upload-time = "2026-01-11T11:22:32.56Z" }, - { url = "https://files.pythonhosted.org/packages/47/71/cbca7787fa68d4d0a9f7072821980b39fbb1b6faeb5f5cf02f4a5559fa28/tomli-2.4.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:0dc56fef0e2c1c470aeac5b6ca8cc7b640bb93e92d9803ddaf9ea03e198f5b0b", size = 281340, upload-time = "2026-01-11T11:22:33.505Z" }, - { url = "https://files.pythonhosted.org/packages/f5/00/d595c120963ad42474cf6ee7771ad0d0e8a49d0f01e29576ee9195d9ecdf/tomli-2.4.0-cp314-cp314t-win32.whl", hash = "sha256:d878f2a6707cc9d53a1be1414bbb419e629c3d6e67f69230217bb663e76b5087", size = 108106, upload-time = "2026-01-11T11:22:34.451Z" }, - { url = "https://files.pythonhosted.org/packages/de/69/9aa0c6a505c2f80e519b43764f8b4ba93b5a0bbd2d9a9de6e2b24271b9a5/tomli-2.4.0-cp314-cp314t-win_amd64.whl", hash = "sha256:2add28aacc7425117ff6364fe9e06a183bb0251b03f986df0e78e974047571fd", size = 120504, upload-time = "2026-01-11T11:22:35.764Z" }, - { url = "https://files.pythonhosted.org/packages/b3/9f/f1668c281c58cfae01482f7114a4b88d345e4c140386241a1a24dcc9e7bc/tomli-2.4.0-cp314-cp314t-win_arm64.whl", hash = "sha256:2b1e3b80e1d5e52e40e9b924ec43d81570f0e7d09d11081b797bc4692765a3d4", size = 99561, upload-time = "2026-01-11T11:22:36.624Z" }, - { url = "https://files.pythonhosted.org/packages/23/d1/136eb2cb77520a31e1f64cbae9d33ec6df0d78bdf4160398e86eec8a8754/tomli-2.4.0-py3-none-any.whl", hash = "sha256:1f776e7d669ebceb01dee46484485f43a4048746235e683bcdffacdf1fb4785a", size = 14477, upload-time = "2026-01-11T11:22:37.446Z" }, -] - [[package]] name = "torch" version = "2.11.0" @@ -2601,8 +1818,7 @@ dependencies = [ { name = "filelock" }, { name = "fsspec" }, { name = "jinja2" }, - { name = "networkx", version = "3.4.2", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.11'" }, - { name = "networkx", version = "3.6.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, + { name = "networkx" }, { name = "nvidia-cudnn-cu13", marker = "sys_platform == 'linux'" }, { name = "nvidia-cusparselt-cu13", marker = "sys_platform == 'linux'" }, { name = "nvidia-nccl-cu13", marker = "sys_platform == 'linux'" }, @@ -2613,18 +1829,6 @@ dependencies = [ { name = "typing-extensions" }, ] wheels = [ - { url = "https://files.pythonhosted.org/packages/ac/f2/c1690994afe461aae2d0cac62251e6802a703dec0a6c549c02ecd0de92a9/torch-2.11.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:2c0d7fcfbc0c4e8bb5ebc3907cbc0c6a0da1b8f82b1fc6e14e914fa0b9baf74e", size = 80526521, upload-time = "2026-03-23T18:12:06.86Z" }, - { url = "https://files.pythonhosted.org/packages/a4/f0/98ae802fa8c09d3149b0c8690741f3f5753c90e779bd28c9613257295945/torch-2.11.0-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:4cf8687f4aec3900f748d553483ef40e0ac38411c3c48d0a86a438f6d7a99b18", size = 419723025, upload-time = "2026-03-23T18:11:43.774Z" }, - { url = "https://files.pythonhosted.org/packages/f9/1e/18a9b10b4bd34f12d4e561c52b0ae7158707b8193c6cfc0aad2b48167090/torch-2.11.0-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:1b32ceda909818a03b112006709b02be1877240c31750a8d9c6b7bf5f2d8a6e5", size = 530589207, upload-time = "2026-03-23T18:11:23.756Z" }, - { url = "https://files.pythonhosted.org/packages/35/40/2d532e8c0e23705be9d1debce5bc37b68d59a39bda7584c26fe9668076fe/torch-2.11.0-cp310-cp310-win_amd64.whl", hash = "sha256:b3c712ae6fb8e7a949051a953fc412fe0a6940337336c3b6f905e905dac5157f", size = 114518313, upload-time = "2026-03-23T18:11:58.281Z" }, - { url = "https://files.pythonhosted.org/packages/ae/0d/98b410492609e34a155fa8b121b55c7dca229f39636851c3a9ec20edea21/torch-2.11.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:7b6a60d48062809f58595509c524b88e6ddec3ebe25833d6462eeab81e5f2ce4", size = 80529712, upload-time = "2026-03-23T18:12:02.608Z" }, - { url = "https://files.pythonhosted.org/packages/84/03/acea680005f098f79fd70c1d9d5ccc0cb4296ec2af539a0450108232fc0c/torch-2.11.0-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:d91aac77f24082809d2c5a93f52a5f085032740a1ebc9252a7b052ef5a4fddc6", size = 419718178, upload-time = "2026-03-23T18:10:46.675Z" }, - { url = "https://files.pythonhosted.org/packages/8c/8b/d7be22fbec9ffee6cff31a39f8750d4b3a65d349a286cf4aec74c2375662/torch-2.11.0-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:7aa2f9bbc6d4595ba72138026b2074be1233186150e9292865e04b7a63b8c67a", size = 530604548, upload-time = "2026-03-23T18:10:03.569Z" }, - { url = "https://files.pythonhosted.org/packages/d1/bd/9912d30b68845256aabbb4a40aeefeef3c3b20db5211ccda653544ada4b6/torch-2.11.0-cp311-cp311-win_amd64.whl", hash = "sha256:73e24aaf8f36ab90d95cd1761208b2eb70841c2a9ca1a3f9061b39fc5331b708", size = 114519675, upload-time = "2026-03-23T18:11:52.995Z" }, - { url = "https://files.pythonhosted.org/packages/6f/8b/69e3008d78e5cee2b30183340cc425081b78afc5eff3d080daab0adda9aa/torch-2.11.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:4b5866312ee6e52ea625cd211dcb97d6a2cdc1131a5f15cc0d87eec948f6dd34", size = 80606338, upload-time = "2026-03-23T18:11:34.781Z" }, - { url = "https://files.pythonhosted.org/packages/13/16/42e5915ebe4868caa6bac83a8ed59db57f12e9a61b7d749d584776ed53d5/torch-2.11.0-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:f99924682ef0aa6a4ab3b1b76f40dc6e273fca09f367d15a524266db100a723f", size = 419731115, upload-time = "2026-03-23T18:11:06.944Z" }, - { url = "https://files.pythonhosted.org/packages/1a/c9/82638ef24d7877510f83baf821f5619a61b45568ce21c0a87a91576510aa/torch-2.11.0-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:0f68f4ac6d95d12e896c3b7a912b5871619542ec54d3649cf48cc1edd4dd2756", size = 530712279, upload-time = "2026-03-23T18:10:31.481Z" }, - { url = "https://files.pythonhosted.org/packages/1c/ff/6756f1c7ee302f6d202120e0f4f05b432b839908f9071157302cedfc5232/torch-2.11.0-cp312-cp312-win_amd64.whl", hash = "sha256:fbf39280699d1b869f55eac536deceaa1b60bd6788ba74f399cc67e60a5fab10", size = 114556047, upload-time = "2026-03-23T18:10:55.931Z" }, { url = "https://files.pythonhosted.org/packages/87/89/5ea6722763acee56b045435fb84258db7375c48165ec8be7880ab2b281c5/torch-2.11.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:1e6debd97ccd3205bbb37eb806a9d8219e1139d15419982c09e23ef7d4369d18", size = 80606801, upload-time = "2026-03-23T18:10:18.649Z" }, { url = "https://files.pythonhosted.org/packages/32/d1/8ed2173589cbfe744ed54e5a73efc107c0085ba5777ee93a5f4c1ab90553/torch-2.11.0-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:63a68fa59de8f87acc7e85a5478bb2dddbb3392b7593ec3e78827c793c4b73fd", size = 419732382, upload-time = "2026-03-23T18:08:30.835Z" }, { url = "https://files.pythonhosted.org/packages/3d/e1/b73f7c575a4b8f87a5928f50a1e35416b5e27295d8be9397d5293e7e8d4c/torch-2.11.0-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:cc89b9b173d9adfab59fd227f0ab5e5516d9a52b658ae41d64e59d2e55a418db", size = 530711509, upload-time = "2026-03-23T18:08:47.213Z" }, @@ -2669,12 +1873,6 @@ name = "triton" version = "3.6.0" source = { registry = "https://pypi.org/simple" } wheels = [ - { url = "https://files.pythonhosted.org/packages/44/ba/b1b04f4b291a3205d95ebd24465de0e5bf010a2df27a4e58a9b5f039d8f2/triton-3.6.0-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6c723cfb12f6842a0ae94ac307dba7e7a44741d720a40cf0e270ed4a4e3be781", size = 175972180, upload-time = "2026-01-20T16:15:53.664Z" }, - { url = "https://files.pythonhosted.org/packages/8c/f7/f1c9d3424ab199ac53c2da567b859bcddbb9c9e7154805119f8bd95ec36f/triton-3.6.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a6550fae429e0667e397e5de64b332d1e5695b73650ee75a6146e2e902770bea", size = 188105201, upload-time = "2026-01-20T16:00:29.272Z" }, - { url = "https://files.pythonhosted.org/packages/0f/2c/96f92f3c60387e14cc45aed49487f3486f89ea27106c1b1376913c62abe4/triton-3.6.0-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:49df5ef37379c0c2b5c0012286f80174fcf0e073e5ade1ca9a86c36814553651", size = 176081190, upload-time = "2026-01-20T16:16:00.523Z" }, - { url = "https://files.pythonhosted.org/packages/e0/12/b05ba554d2c623bffa59922b94b0775673de251f468a9609bc9e45de95e9/triton-3.6.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:e8e323d608e3a9bfcc2d9efcc90ceefb764a82b99dea12a86d643c72539ad5d3", size = 188214640, upload-time = "2026-01-20T16:00:35.869Z" }, - { url = "https://files.pythonhosted.org/packages/17/5d/08201db32823bdf77a0e2b9039540080b2e5c23a20706ddba942924ebcd6/triton-3.6.0-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:374f52c11a711fd062b4bfbb201fd9ac0a5febd28a96fb41b4a0f51dde3157f4", size = 176128243, upload-time = "2026-01-20T16:16:07.857Z" }, - { url = "https://files.pythonhosted.org/packages/ab/a8/cdf8b3e4c98132f965f88c2313a4b493266832ad47fb52f23d14d4f86bb5/triton-3.6.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:74caf5e34b66d9f3a429af689c1c7128daba1d8208df60e81106b115c00d6fca", size = 188266850, upload-time = "2026-01-20T16:00:43.041Z" }, { url = "https://files.pythonhosted.org/packages/3c/12/34d71b350e89a204c2c7777a9bba0dcf2f19a5bfdd70b57c4dbc5ffd7154/triton-3.6.0-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:448e02fe6dc898e9e5aa89cf0ee5c371e99df5aa5e8ad976a80b93334f3494fd", size = 176133521, upload-time = "2026-01-20T16:16:13.321Z" }, { url = "https://files.pythonhosted.org/packages/f9/0b/37d991d8c130ce81a8728ae3c25b6e60935838e9be1b58791f5997b24a54/triton-3.6.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:10c7f76c6e72d2ef08df639e3d0d30729112f47a56b0c81672edc05ee5116ac9", size = 188289450, upload-time = "2026-01-20T16:00:49.136Z" }, { url = "https://files.pythonhosted.org/packages/ce/4e/41b0c8033b503fd3cfcd12392cdd256945026a91ff02452bef40ec34bee7/triton-3.6.0-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:1722e172d34e32abc3eb7711d0025bb69d7959ebea84e3b7f7a341cd7ed694d6", size = 176276087, upload-time = "2026-01-20T16:16:18.989Z" }, @@ -2753,7 +1951,7 @@ name = "wheel" version = "0.46.3" source = { registry = "https://pypi.org/simple" } dependencies = [ - { name = "packaging", marker = "python_full_version >= '3.11'" }, + { name = "packaging" }, ] sdist = { url = "https://files.pythonhosted.org/packages/89/24/a2eb353a6edac9a0303977c4cb048134959dd2a51b48a269dfc9dde00c8a/wheel-0.46.3.tar.gz", hash = "sha256:e3e79874b07d776c40bd6033f8ddf76a7dad46a7b8aa1b2787a83083519a1803", size = 60605, upload-time = "2026-01-22T12:39:49.136Z" } wheels = [