diff --git a/.env.example b/.env.example index 359b121..ea77878 100644 --- a/.env.example +++ b/.env.example @@ -1,36 +1,36 @@ # ============================================================================ -# AlphaBrain 环境变量模板 +# AlphaBrain Environment Variables Template # ---------------------------------------------------------------------------- -# 复制为 .env 后按本地路径填写。.env 已在 .gitignore,不会被提交。 +# Copy to .env and fill in your local paths. .env is in .gitignore and won't be committed. # cp .env.example .env # vim .env -# 启动脚本(run_script//train.sh, eval.sh)会在执行前自动 source .env, -# 同时 OmegaConf 通过 ${oc.env:VAR} 在 yaml 里引用这些变量。 +# Startup scripts (scripts/run_/train.sh, eval.sh) will automatically source .env +# before executing. OmegaConf can reference these variables in yaml via ${oc.env:VAR}. # ============================================================================ -# ── 持续学习训练必需 ──────────────────────────────────────────────────── -# 预训练模型根目录(包含 Qwen2.5-VL-3B-Instruct/、Llama-3.2-11B-Vision-Instruct/、 -# paligemma-3b-pt-224/ 等子目录) +# ── Required for Continual Learning Training ──────────────────────────── +# Root directory of pretrained models (contains subdirs like Qwen2.5-VL-3B-Instruct/, +# Llama-3.2-11B-Vision-Instruct/, paligemma-3b-pt-224/, etc.) PRETRAINED_MODELS_DIR=/path/to/pretrained_models -# LEROBOT 格式 LIBERO 数据根目录 -# 子目录: libero_goal_no_noops_1.0.0_lerobot/, libero_spatial_..., 等 +# Root directory for LeRobot-format LIBERO data +# Subdirs: libero_goal_no_noops_1.0.0_lerobot/, libero_spatial_..., etc. LEROBOT_LIBERO_DATA_DIR=/path/to/LEROBOT_LIBERO_DATA -# ── 评估必需 ──────────────────────────────────────────────────────────── -# RLDS 格式 LIBERO 数据根目录(IPEC / openvla 流程使用) +# ── Required for Evaluation ───────────────────────────────────────────── +# Root directory of RLDS-format LIBERO data (used by IPEC / OpenVLA pipelines) LIBERO_DATA_ROOT=/path/to/IPEC-COMMUNITY -# LIBERO Project 根目录(用于 simulation env) +# LIBERO Project root directory (for simulation env) LIBERO_HOME=/path/to/LIBERO -# 评估客户端 Python 解释器(独立 conda env,含 LIBERO 仿真依赖) +# Evaluation client Python interpreter (separate conda env with LIBERO simulation dependencies) LIBERO_PYTHON=/path/to/anaconda3/envs/AlphaBrain/bin/python -# ── 可选:W&B 追踪 ───────────────────────────────────────────────────── +# ── Optional: Weights & Biases Tracking ──────────────────────────────── # WANDB_API_KEY=your_wandb_api_key # WANDB_BASE_URL=https://api.wandb.ai -# ── 可选:其他 benchmark ─────────────────────────────────────────────── +# ── Optional: Other Benchmarks ───────────────────────────────────────── # ROBOCASA_TABLETOP_PYTHON=/path/to/anaconda3/envs/robocasa2/bin/python # ROBOCASA365_PYTHON=/path/to/anaconda3/envs/robocasa/bin/python diff --git a/.github/workflows/claude.yml b/.github/workflows/claude.yml new file mode 100644 index 0000000..1ea2e24 --- /dev/null +++ b/.github/workflows/claude.yml @@ -0,0 +1,31 @@ +name: Claude Code + +on: + issue_comment: + types: [created] + pull_request_review_comment: + types: [created] + issues: + types: [opened, assigned] + pull_request_review: + types: [submitted] + +permissions: + contents: write # allows Claude to push commits + pull-requests: write + issues: write + id-token: write + +jobs: + claude: + if: | + (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) || + (github.event_name == 'issues' && contains(github.event.issue.body, '@claude')) || + (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) + runs-on: ubuntu-latest + steps: + - name: Claude Code Action Official + uses: anthropics/claude-code-action@v1 + with: + anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }} diff --git a/.gitignore b/.gitignore index b1c97f5..ac807ee 100644 --- a/.gitignore +++ b/.gitignore @@ -2,6 +2,8 @@ debug/ ckpts/ third_party/ .env +demo-video/ + # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] @@ -255,3 +257,4 @@ jeff_modify.md AlphaBrain.egg-info .nfs* *.egg-info +.gstack/ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..be5f638 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,185 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +This repo has two distinct layers: + +1. **AlphaBrain** — a modular PyTorch framework for VLA (Vision-Language-Action) robot models (training, continual learning, RL, world models). +2. **Nvex** — a product layer built on top: a FastAPI backend (`nvex_server/`) and a React investor demo (`demo/`) that showcases the autonomous self-improvement loop. + +Active development is focused on Nvex. AlphaBrain is the execution engine Nvex orchestrates. + +## Nvex Demo Stack + +### Running locally + +Start the backend and frontend in two terminals: + +```bash +# Terminal 1 — FastAPI backend (auto-reloads, seeds demo state on startup) +.venv/bin/python -m uvicorn nvex_server.app:app --reload --port 8000 + +# Terminal 2 — React dev server (proxies /api and /health to :8000) +cd demo && npm install && npm run dev # → http://127.0.0.1:5173 +``` + +Standalone demo (no backend needed): +```bash +open demo/nvex-demo.html +``` + +### Frontend commands + +```bash +cd demo +npm run dev # dev server with hot reload +npm run build # production build → dist/ +npm run lint # ESLint +npm run preview # serve the built dist/ +``` + +### Backend lint + +```bash +ruff check nvex_server/ +black nvex_server/ # line-length 121 +``` + +## Nvex Architecture + +### `nvex_server/` — FastAPI backend + +All state is held in `InMemoryStore` (defined in `app.py`) — no database. `create_app()` seeds demo state on startup using fixture files in `nvex_server/examples/`. + +| File | Role | +|---|---| +| `app.py` | FastAPI app factory, `InMemoryStore`, all route handlers, `seed_demo_state()` | +| `schemas.py` | All Pydantic models — `EvalRun`, `PatchPlan`, `IterationJob`, `AgentRunState`, `AgentEvent`, etc. | +| `agent.py` | `SelfImprovementAgent` — autonomous eval→diagnose→plan→dispatch→verify→memory loop | +| `dispatcher.py` | `JobDispatcher` — builds and launches AlphaBrain shell commands; writes per-job metadata to `results/nvex_jobs/` | +| `exporters.py` | `EvalArtifactExporter` — parses LIBERO, RoboCasa, and generic JSON eval artifacts into `EvalRun` | +| `patch_plan_generator.py` | Rule-based `PatchPlanGenerator` — maps failure clusters to training strategies | +| `llm_narrator.py` | `LLMNarrator` — optional LLM-generated narration for diagnosis steps | + +**Key API routes:** + +| Route | Purpose | +|---|---| +| `GET /api/demo/state` | Returns full seeded dashboard state for the React app | +| `GET /api/demo/agent` | Returns the pre-seeded `AgentRunState` | +| `POST /api/agent/run` | Start a new autonomous improvement run | +| `POST /api/agent/{id}/advance` | Advance demo agent by one step (called by the UI stream loop) | +| `POST /api/eval/import` | Import benchmark artifacts into `EvalRun` | +| `POST /api/plan/generate` | Generate a `PatchPlan` from an `EvalRun` | +| `POST /api/iteration/start` | Create and optionally run an `IterationJob` | + +### Agent demo flow + +`SelfImprovementAgent` runs in two modes: + +- **simulate=True** (default for demos): A precomputed 4-loop sequence is stored in `_DEMO_LOOPS` in `agent.py`. Each call to `advance_step` marks the current running step complete and activates the next. Loop 3 demonstrates a regression + rollback (81% → 79%) before recovering to 85% in loop 4. +- **simulate=False** (real mode): Drives `JobDispatcher` → AlphaBrain shell commands in a background thread. Real-mode loop body is a M4 extension point. + +The React frontend calls `POST /api/agent/{id}/advance` on a timer driven by `step.expected_duration_ms`. The stream loop lives in `NvexRuntimeContext.jsx`. + +### `demo/src/` — React frontend + +`NvexRuntimeContext.jsx` is the central data layer: it fetches `/api/demo/state`, normalizes the snake_case API response into camelCase dashboard shape (`normalizeDemoState`), and exposes stream controls (`streamAgentRun`, `advanceAgentStep`, `stopAgentStream`). + +Pages map to the 7-page investor narrative: Home → Overview → Failure Map → Patch Plan → Iteration Runner → Improvement Report → Platform Memory. + +No UI framework — pure CSS with dark `#07090f` theme and indigo-violet gradients. + +--- + +## AlphaBrain (ML Framework) + +### Environment Setup + +```bash +conda create -n alphabrain python=3.10 -y && conda activate alphabrain +pip install -r requirements.txt && pip install -e . +pip install flash-attn --no-build-isolation +cp .env.example .env +``` + +Key `.env` variables: +```bash +PRETRAINED_MODELS_DIR=/path/to/pretrained_models # Qwen2.5-VL, Qwen3-VL weights +LEROBOT_LIBERO_DATA_DIR=/path/to/lerobot/libero # LeRobot-format training data +LIBERO_DATA_ROOT=/path/to/IPEC-COMMUNITY # RLDS-format eval data +LIBERO_HOME=/path/to/LIBERO # LIBERO simulation env +LIBERO_PYTHON=/path/to/envs/libero/bin/python # Separate eval conda env +``` + +Evaluation (LIBERO) requires a **separate conda environment** — see `docs/quickstart/installation.md`. + +### Commands + +```bash +ruff check AlphaBrain/ # lint +black AlphaBrain/ # format (line-length 121) +python -c "import AlphaBrain; print('ok')" # verify install + +bash scripts/run_finetune.sh # main training entry point +bash scripts/run_continual_learning_scripts/run_cl_train.sh # CL training +bash scripts/run_continual_learning_scripts/run_cl_eval.sh --run-id +bash scripts/run_brain_inspired_scripts/run_neurovla_pretrain.sh +MODEL=cos2 bash scripts/run_world_model/train/run_world_model.sh +bash scripts/run_rl_scripts/run_action_token_5traj_alltasks.sh +``` + +### Package Layout + +``` +AlphaBrain/ +├── model/ +│ ├── framework/ # VLA model implementations (one file per architecture) +│ └── modules/ # Shared sub-modules: action_model, vlm, projector, dino, world_model +├── training/ +│ ├── train_alphabrain.py # Main trainer (Accelerate + DeepSpeed ZeRO-2) +│ ├── train_alphabrain_vlm.py # VLM co-training variant +│ ├── continual_learning/ # Experience-replay CL trainer +│ └── reinforcement_learning/ # RLActionToken TD3 (needs 6 GPUs) +└── dataloader/ + ├── lerobot_datasets.py # LeRobot-format datasets (main) + └── cosmos_datasets.py # World model datasets +``` + +### Config System (priority low → high) + +``` +configs/models/.yaml # architecture defaults +configs/datasets/.yaml # dataset paths & task lists +configs/trainer/default.yaml # optimizer, LR schedule, save intervals +configs/finetune_config.yaml # named modes with per-run overrides (highest priority) +CLI args # dot-list overrides, e.g. trainer.learning_rate.base=1e-5 +``` + +`configs/finetune_config.yaml` defines named **modes**. `scripts/parse_config.py` resolves them into shell env-vars consumed by `run_finetune.sh`. + +### Model Framework + +All VLA architectures extend `BaseFramework` (`AlphaBrain/model/framework/base_framework.py`), which handles config loading, action normalization, and trainable-module discovery. New frameworks register via `@FRAMEWORK_REGISTRY.register("MyFramework")` in `AlphaBrain/model/tools.py`. + +VLM backends are detected via `_VLM_REGISTRY` in `base_framework.py`: `paligemma` → `vlm_interface`, `llamavl` → `llama_vl_interface`, `qwenvl` → `qwen_vl_interface`. + +### Capability Modules + +| Module | Entry Point | Notes | +|---|---|---| +| NeuroVLA (SNN + STDP) | `train_stdp.py` | QFormer → LIF neurons; R-STDP and online STDP modes | +| RLActionToken (TD3) | `reinforcement_learning/` | Needs 6 GPUs (5 rollout + 1 train) | +| Continual Learning | `continual_learning/` | Experience replay; LoRA (~6% trainable params) | +| World Model | `WorldModelVLA.py` | Cosmos 2/2.5, Wan 2.2, V-JEPA 2.1 backbones | + +### Dispatcher → AlphaBrain bridge + +`JobDispatcher._build_command()` maps `ExecutionBackend` values to AlphaBrain shell commands: +- `alphabrain_cl` → `bash scripts/run_continual_learning_scripts/run_cl_train.sh` +- `alphabrain_finetune` / `alphabrain_vlm_cotrain` → `bash scripts/run_finetune.sh ` +- `alphabrain_eval` → `bash scripts/run_eval.sh ` + +Job logs and metadata land in `results/nvex_jobs//`. diff --git a/IMPLEMENTATION_PLAN.md b/IMPLEMENTATION_PLAN.md new file mode 100644 index 0000000..e2ca9ae --- /dev/null +++ b/IMPLEMENTATION_PLAN.md @@ -0,0 +1,236 @@ +# Nvex × AlphaBrain — Implementation Plan + +**Last Updated:** April 25, 2026 +**Based on:** PRD, existing codebase state, AlphaBrain capability audit + +--- + +## Current State + +### What exists and works today + +| Component | Status | Notes | +|-----------|--------|-------| +| AlphaBrain VLA frameworks (OFT, GR00T, PI, NeuroVLA, CosmosPolicy, etc.) | ✅ Complete | 11 architectures, unified trainer | +| Training stack (Accelerate + DeepSpeed ZeRO-2) | ✅ Complete | Multi-GPU, W&B logging | +| Continual Learning module | ✅ Complete | LoRA-based, experience replay, cross-arch | +| RL fine-tuning (RLActionToken / TD3) | ✅ Complete | Requires 6 GPUs | +| World Model integration (Cosmos, Wan, V-JEPA) | ✅ Complete | 4 backbones | +| Benchmark suites (LIBERO, LIBERO-plus, Robocasa, Robocasa365) | ✅ Complete | Eval scripts + artifacts | +| Config system (YAML + modes + CLI overrides) | ✅ Complete | | +| Deployment module (model_server, upload) | ✅ Partial | Basic server exists, not productionized | +| Nvex investor demo HTML (`demo/nvex-demo.html`) | ✅ Complete | All 7 pages, fully interactive | +| React demo app (`demo/src/`) | ✅ Complete | All 7 pages implemented with shared components, mock data, and build validation | +| Nvex backend / orchestration logic | ✅ Complete | `nvex_server/` now provides export, planning, dispatch, polling, report, and demo bootstrap endpoints | +| Real AlphaBrain ↔ Nvex job interface | ✅ Partial | `JobDispatcher` wraps AlphaBrain shell entry points and supports file-backed polling plus simulated demo jobs | + +--- + +## Milestone 1 — Narrative MVP (Demo-Ready) ✅ Complete + +**Goal:** A polished, clickable demo that tells the full Nvex story end-to-end. All data can be mocked or pre-generated. + +### Deliverables +- [x] `demo/nvex-demo.html` — standalone 7-page demo, all pages implemented +- [x] README.md rewritten to position Nvex as orchestration layer for both investors and customers +- [x] PRD finalized (`prd.md`) +- [x] Frontend wireframe/IA documented (`frontend-design.md`) +- [x] React demo app — build all 7 pages and components to match `nvex-demo.html` +- [x] Add a second demo scenario (non-LIBERO) to show breadth + +### React App Components Needed +``` +demo/src/ +├── components/ +│ ├── Sidebar.jsx +│ ├── TopBar.jsx +│ ├── KPICard.jsx +│ ├── FailureCluster.jsx +│ ├── RadarChart.jsx +│ ├── TimelineStep.jsx +│ └── AssetCard.jsx +├── pages/ +│ ├── Home.jsx +│ ├── ProjectOverview.jsx +│ ├── FailureMap.jsx +│ ├── PatchPlan.jsx +│ ├── IterationRunner.jsx +│ ├── ImprovementReport.jsx +│ └── PlatformMemory.jsx +└── data/ + └── mockData.js +``` + +### Milestone 1 Exit Criteria Reached +- React demo builds successfully with Vite (`npm run build`) +- Shared component layer now covers KPI cards, failure clusters, radar chart, timeline steps, and asset cards +- The demo includes breadth beyond LIBERO via an additional RoboCasa scenario on the hub +- The standalone HTML and React demo now tell the same investor narrative at the page level + +--- + +## Milestone 2 — Executable MVP (Real Loop) ✅ Complete + +**Goal:** Wire at least one real AlphaBrain execution path into the Nvex demo. Produce a genuine before/after improvement artifact. + +### 2A — Real Eval Artifact Ingestion +- [x] Define `EvalRun` schema (see PRD §8.2) +- [x] Write an AlphaBrain eval artifact exporter: converts benchmark output to `EvalRun` JSON +- [x] Load real LIBERO eval results into the Failure Map page +- [x] Replace mocked failure clusters with real per-task breakdown + +### 2B — Patch Plan Engine (Rule-Based v1) +- [x] Implement `PatchPlanGenerator` — maps failure cluster patterns to training strategy recommendations + - Rule: occlusion failures → target diverse viewpoint data + CL update + - Rule: recovery gaps → teleop corrections + fine-tune + - Rule: language variation failures → language augmentation + VLM co-training +- [x] Output structured `PatchPlan` JSON (see PRD §8.3) +- [x] Connect Patch Plan page to live generator + +### 2C — AlphaBrain Job Interface +- [x] Define `IterationJob` schema: `plan_id`, `execution_backend`, `checkpoint`, `config` +- [x] Implement `JobDispatcher`: wraps AlphaBrain training scripts as callable jobs + - Support `alphabrain_cl` (continual learning) + - Support `alphabrain_finetune` (baseline fine-tune) + - Support `alphabrain_eval` (re-evaluation only) +- [x] Implement job status polling (file-based or lightweight queue) +- [x] Wire Iteration Runner page to live job status + +### 2D — Improvement Report from Real Artifacts +- [x] Load before/after eval artifacts and compute actual uplift +- [x] Save patch recipe to Platform Memory as a `ReusableAsset` +- [x] Produce at least one real improvement case: LIBERO Kitchen, 62% → 74% + +### Infrastructure for Milestone 2 +- [x] FastAPI service (`nvex_server/`) wrapping the orchestration logic +- [x] `POST /api/eval/import` — ingest eval artifact +- [x] `POST /api/plan/generate` — run PatchPlanGenerator +- [x] `POST /api/iteration/start` — dispatch job to AlphaBrain +- [x] `GET /api/iteration/{id}/status` — poll job progress +- [x] `GET /api/report/{iteration_id}` — fetch improvement report +- [x] Update React demo to consume these endpoints + +--- + +## Milestone 3 — Self-Improving Agent ✅ Complete + +**Goal:** Nvex runs the full failure-to-fix loop autonomously, without human intervention at each step. See [`SELF_IMPROVEMENT_AGENT.md`](SELF_IMPROVEMENT_AGENT.md) for full design. + +### 3A — Autonomous Loop Orchestrator +- [x] Implement `SelfImprovementAgent` — orchestrator that: + - Triggers eval on a checkpoint + - Reads the EvalRun and identifies failure clusters + - Selects the highest-leverage patch strategy + - Dispatches to AlphaBrain + - Evaluates the result + - Decides whether to iterate again or terminate +- [x] Add stopping criteria: target KPI reached, max iterations, diminishing returns threshold +- [x] Add structured logging of agent reasoning at each step + +### 3B — Agent Tool Registry +- [x] `run_eval(checkpoint, benchmark)` → EvalRun +- [x] `diagnose_failures(eval_run)` → FailureDiagnosis +- [x] `generate_patch_plan(diagnosis)` → PatchPlan +- [x] `dispatch_training(plan)` → IterationJob +- [x] `compare_checkpoints(before, after)` → ImprovementReport +- [x] `save_to_memory(asset)` → ReusableAsset + +### 3C — Demo Mode for Self-Improving Agent +- [x] Add "Auto-Improve" button on Iteration Runner page +- [x] Animate the full loop: each step highlights as the agent processes it +- [x] Show agent reasoning panel (why it chose CL over SFT, why it targeted occlusion data) +- [x] Show multi-iteration view: loop 1 (62→74%), loop 2 (74→81%), convergence at 85% + +### 3D — LLM Integration +- [x] `LLMNarrator` — uses OpenAI (gpt-4o-mini) when `OPENAI_API_KEY` is set, falls back to deterministic templates +- [x] Natural-language narration for diagnosis, plan, verify, and stop-check steps + +### Infrastructure for Milestone 3 +- [x] `nvex_server/agent.py` — `SelfImprovementAgent` with demo (precomputed replay) and real modes +- [x] `nvex_server/llm_narrator.py` — LLM narration with OpenAI + template fallback +- [x] New schemas: `FailureDiagnosis`, `AgentStep`, `LoopIteration`, `AgentRunState`, `AgentRunRequest` +- [x] `POST /api/agent/run` — launch autonomous loop +- [x] `GET /api/agent/{id}/status` — poll agent state +- [x] `POST /api/agent/{id}/advance` — advance one step (demo mode animation) +- [x] `GET /api/demo/agent` — pre-seeded demo agent run +- [x] `demo/src/components/AgentReasoningPanel.jsx` — step-by-step reasoning UI +- [x] `demo/src/components/MultiIterationChart.jsx` — pure-SVG multi-loop chart +- [x] Updated `IterationRunner.jsx` — Auto-Improve toggle, loop progress bar, reasoning panel +- [x] Updated `ImprovementReport.jsx` — multi-loop comparison chart, stop-reason callout + +--- + +## Milestone 4 — Customer-Grade Platform + +**Goal:** Extend from a single demo scenario to a multi-project, multi-user platform. + +### P0 Tickets (Investor-Critical, Build Now) + +- [ ] **M4-P0-01: Streaming Agent Timeline + Variable Step Durations** + - Scope: backend emits ordered agent events; frontend shows live timeline and auto-play controls. + - Acceptance criteria: run shows `run_started`/`step_started`/`step_completed`/`run_stopped` events in sequence; demo can play without manual clicking. + - Status: **In Progress** + +- [ ] **M4-P0-02: Multi-Iteration Arc with Non-Monotonic Reality** + - Scope: demo arc includes at least one regression before recovery; chart and report show per-loop deltas. + - Acceptance criteria: one loop has negative delta and is visibly rendered as regression. + - Status: **In Progress** + +- [ ] **M4-P0-03: Rollback Event + Recovery Loop** + - Scope: stopping logic can emit rollback event, mark loop as rolled back, and continue from prior checkpoint baseline. + - Acceptance criteria: timeline contains a rollback event and follow-up loop resumes from rollback baseline. + - Status: **In Progress** + +- [ ] **M4-P0-04: Multi-Project Isolation in Backend Store** + - Scope: split global in-memory maps into project-scoped collections and enforce project_id on agent/eval/plan/iteration routes. + - Acceptance criteria: project A data never appears in project B responses. + +- [ ] **M4-P0-05: Persistent Platform Memory (File/DB-backed)** + - Scope: replace volatile `InMemoryStore` memory assets with persistent repository (SQLite or file-backed JSON). + - Acceptance criteria: server restart preserves recipes, templates, and failure patterns. + +### P1 Tickets (Customer Readiness) + +- [ ] **M4-P1-01: Customer Onboarding API (BYO Checkpoint + Eval Artifact)** + - Scope: guided endpoints for registering a project, uploading checkpoint metadata, and importing benchmark artifacts. + - Acceptance criteria: new customer project can be created and run through eval -> plan without code edits. + +- [ ] **M4-P1-02: Benchmark Connector Expansion (RoboCasa/Tabletop/Custom)** + - Scope: unify exporter adapters and normalize imported metrics across suites. + - Acceptance criteria: at least 3 suites render correctly in Failure Map and Improvement Report. + +- [ ] **M4-P1-03: Role-Based Views (Operator vs Executive)** + - Scope: frontend route guards and dashboard tailoring by role. + - Acceptance criteria: operator sees full execution logs; executive sees KPI/ROI view with guardrail summaries. + +- [ ] **M4-P1-04: Governance + Audit Trail** + - Scope: store full run decisions, tool inputs/outputs, rollback triggers, and approval checkpoints. + - Acceptance criteria: each run has an exportable audit log bundle. + +### P2 Tickets (Scale + Enterprise) + +- [ ] **M4-P2-01: External Integration API** + - Scope: webhooks/REST endpoints for external eval pipelines and training infra. + - Acceptance criteria: external system can push eval results and receive patch plans. + +- [ ] **M4-P2-02: Cost/ROI Observatory** + - Scope: per-iteration compute/time/cost tracking with ROI rollups. + - Acceptance criteria: report displays uplift per dollar and estimated monthly run cost. + +- [ ] **M4-P2-03: Security Hardening + SOC2 Readiness Track** + - Scope: authn/authz baseline, secrets handling, logging controls, dependency audit checklist. + - Acceptance criteria: security checklist documented and first-pass audit completed. + +--- + +## Priority Order for Next Sprint + +| Priority | Task | Milestone | Effort | +|----------|------|-----------|--------| +| 🔴 High | M4-P0-01 Streaming timeline + auto-play controls | M4 | ~1.5 days | +| 🔴 High | M4-P0-02 Multi-iteration non-monotonic arc (regression) | M4 | ~1 day | +| 🔴 High | M4-P0-03 Rollback event + recovery loop | M4 | ~1 day | +| 🔴 High | M4-P0-04 Multi-project isolation layer | M4 | ~2 days | +| 🟡 Med | M4-P0-05 Persistent platform memory backend | M4 | ~2 days | +| 🟡 Med | M4-P1-01 Customer onboarding flow (BYO artifacts/checkpoint) | M4 | ~2 days | +| 🟢 Low | M4-P1-03 Role-based operator/executive views | M4 | ~2 days | diff --git a/README.md b/README.md index d906d90..d1739af 100644 --- a/README.md +++ b/README.md @@ -1,175 +1,240 @@
-# AlphaBrain +### NVex: Self-Improving Physical AI Orchestration -### A Modular Open-Source Framework for Embodied Intelligence Research - -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -[![Docs](https://img.shields.io/badge/Docs-Online-green.svg)](https://alphabraingroup.github.io/AlphaBrain/) -[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97-Models-orange.svg)](https://huggingface.co/AlphaBrainGroup) -[![WeChat](https://img.shields.io/badge/WeChat-Group-07C160.svg)](assets/wechat.jpg) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) +[![Demo](https://img.shields.io/badge/Demo-React%20%2B%20FastAPI-16a34a.svg)](#run-the-demo) +[![Status](https://img.shields.io/badge/Status-Milestone%202%20MVP-2563eb.svg)](#current-status)

- AlphaBrain Architecture Overview + Nvex PEARL self-improving Physical AI loop

-**AlphaBrain** is an all-in-one, open-source community for embodied intelligence, built to be ready out of the box. We unifies multiple VLA architectures, world model backbones, biologically-inspired learning algorithms, and reinforcement learning paradigms under a single, extensible framework. AlphaBrain brings embodied AI within everyone’s reach. +**Nvex turns policy failure into a repeatable improvement loop.** + +When a Physical AI policy fails, Nvex imports the evaluation artifact, maps failure modes, diagnoses capability gaps, generates a targeted patch plan, dispatches the improvement run, verifies the new checkpoint, and saves the resulting recipe to platform memory. -[Quick Start & Documentation](#-quick-start--documentation) · [Key Features](#-key-features) · [Community](#-community) · [Citation](#-citation) +[What Nvex Does](#what-nvex-does) · [The Loop](#the-loop) · [Demo](#run-the-demo) · [API](#api-surface) · [Roadmap](#roadmap) · [Acknowledgment](#acknowledgment)
--- -## Highlights - - - - - - - - - - - - - - - - - - - - - - - -
🧠Brain-Inspired VLA (NeuroVLA) — The first open-source biologically-inspired VLA model, achieving SOTA on brain-inspired control tasks. Integrates spiking neural networks (SNN) with STDP learning rules, advancing embodied intelligence toward biological brain learning mechanisms.
🔄Cross-Architecture Continual Learning — The first open-source continual learning algorithm designed for cross-architecture VLA, breaking architecture compatibility bottlenecks and supporting universal adaptation and knowledge accumulation across different VLA models.
🎯RLActionToken Training Paradigm — The first open-source VLA training architecture based on RL Token, a novel architecture that compresses VLA hidden states through an information bottleneck, followed by off-policy Actor-Critic reinforcement learning.
🌍Native World Model Integration — The first open-source VLA to natively integrate Cosmos Policy original weights, supporting flexible world model switching across Cosmos 2 / 2.5, Wan 2.2, and V-JEPA 2.1.
📊Comprehensive Benchmark Suite — Full adaptation to the latest embodied benchmarks with open-source support for long-horizon task execution and memory: LIBERO, LIBERO-plus, RoboCasa, RoboCasa365 and more to come.
+## What Nvex Does ---- +Robotics and Physical AI teams often know that a checkpoint is failing before they know what to do about it. A policy may plateau at 60-70% success, but the next step is usually scattered across video review, benchmark logs, intuition, ad hoc data collection, and disconnected training runs. -## 🚀 Quick Start & Documentation +Nvex is the orchestration layer for that gap. -Full setup, training, evaluation, and deployment instructions live in our documentation site. Step-by-step guides, configuration references, and troubleshooting notes are all maintained there. +It takes the messy middle between "the robot failed" and "the next checkpoint is better" and turns it into structured work: -👉 **[AlphaBrain Documentation →](https://alphabraingroup.github.io/AlphaBrain/)** +```text +eval artifact -> failure map -> root-cause diagnosis -> patch plan + -> improvement run -> re-eval -> platform memory +``` ---- +Nvex does not present failure as a static dashboard. It decides what should happen next: which failures matter, what data to target, which training strategy to run, what success criteria to verify, and what reusable knowledge should be saved for future loops. -## 🔬 Key Features +## Who It Is For -AlphaBrain delivers five core capabilities on a single stack: the **VLA framework family** as the base, with **NeuroVLA / RLActionToken / Continual Learning / World Model** as composable capability modules. All capabilities share the same trainer, config system, and inference interface. +**Robotics and Physical AI teams** use Nvex when they already have a policy and need faster, more disciplined improvement cycles: -### VLA Frameworks +- Diagnose why a checkpoint fails instead of only seeing aggregate success rate. +- Convert benchmark results into a concrete data and training plan. +- Track the improvement run from patch plan to verified checkpoint. +- Save each fix as reusable platform memory. -| Framework | Action Decoding | Typical Use | -|:----------|:----------------|:------------| -| **OFT** | MLP action head, parallel continuous decoding | Fast prototyping, baseline alignment | -| **GR00T** | System1 + Flow-Matching DiT System2 | High-precision manipulation, long-horizon planning | -| **PI** | Flow-Matching action prediction | Diffusion-style policies | -| **Adapter** | Lightweight Adapter decoding | Parameter-efficient fine-tuning | -| **NeuroVLA** | Bio-inspired spiking + STDP | Brain-inspired control | -| **CosmosPolicy** | Latent-space video diffusion | World-model-native policy | +**Technical and business evaluators** use Nvex to understand whether a Physical AI system can compound: -### Brain-Inspired VLA (NeuroVLA + STDP) +- Does every failed run create reusable knowledge? +- Can the system explain why a patch should work? +- Can improvement be measured and repeated? +- Is the platform more than a training script, annotation queue, or MLOps dashboard? -NeuroVLA integrates spiking neural networks with biological learning rules into the VLA pipeline: +## The Loop -- **QFormer** extracts layer-wise action-relevant features from VLM hidden states; -- **SNN Action Head** with Leaky Integrate-and-Fire (LIF) neurons for spike-based action prediction; -- **R-STDP Training** — Reward-Modulated Spike-Timing-Dependent Plasticity, supporting both hybrid (backprop + STDP) and pure STDP modes; -- **Online STDP** — Test-time adaptation with zero backpropagation, using self-supervised reward signals from environment interaction. +Nvex follows the PEARL loop shown above: experience, diagnosis, improvement, and reuse. -### RLActionToken Online RL Fine-tuning +| Stage | Nvex output | +| --- | --- | +| **Project Intake** | Checkpoint summary, benchmark context, starting KPI, risk flags | +| **Failure Map** | Failure clusters, affected tasks, suspected root causes, priority ranking | +| **Patch Plan** | Target data spec, training strategy, verification protocol, expected uplift | +| **Iteration Runner** | Job dispatch, stage tracking, logs, produced artifacts | +| **Improvement Report** | Before/after KPI comparison, pass/fail verification, next action | +| **Platform Memory** | Reusable recipes, failure ontology entries, pipeline templates | -A novel architecture that compresses VLA hidden states through an information bottleneck, followed by off-policy Actor-Critic reinforcement learning: -- **Encoder-Decoder**: Extracts a compact action token from the VLA's internal features to serve as the state representation for RL. -- **Two-Phase Training**: An initial adaptation stage to expose the action token → RL fine-tuning with a frozen VLA. -- **Low Resource Requirements**: The actual reinforcement learning gradient update phase involves a highly lightweight parameters. +The demo scenario uses a LIBERO Kitchen pick-and-place checkpoint: -### Continual Learning +```text +NeuroVLA-LIBERO-ckpt_v0.7: 62% success +Nvex diagnosis: occlusion-heavy scenes and missing recovery trajectories +Nvex patch: targeted episodes + teleop corrections + continual-learning update +Verified result: ckpt_v0.8 at 74% success +Saved memory: reusable occlusion/recovery patch recipe +``` -Experience-replay-based continual learning for sequential task acquisition: +## Product Surface -- **Incremental design** — all changes are additive, no modification to base training code; -- **LoRA integration** — parameter-efficient fine-tuning (~6% trainable params, ~3× memory savings); -- **Replay buffer** with configurable per-task capacity; -- **Cross-architecture adaptation** — the same CL algorithm drops directly onto different VLA frameworks. +The current Nvex demo is a 7-page interactive workflow: -### World Model Integration +| Page | Purpose | +| --- | --- | +| **Project Hub** | Select a Physical AI project and see loop-level platform metrics | +| **Project Overview** | Understand current checkpoint status, task breakdown, and next action | +| **Failure Map** | Inspect failure clusters and root-cause hypotheses | +| **Patch Plan** | Review the data, training, and verification plan Nvex generated | +| **Iteration Runner** | Watch the improvement run move through execution stages | +| **Improvement Report** | Compare before/after results and verify uplift | +| **Platform Memory** | See the recipes and patterns created by prior loops | -Native support for 4 world model backbones plus full CosmosPolicy finetuning: +## Current Status -| Backbone | Params | Mode Name | Text Encoder | -|:---------|:-------|:----------|:-------------| -| V-JEPA 2.1 | ~1.8B | `world_model_vjepa` | T5-small | -| Cosmos Predict 2.5 | ~2.1B | `world_model_cosmos` | Reason1-7B | -| Cosmos Predict 2 | ~2.1B | `world_model_cosmos2` | T5-XXL | -| Wan 2.2 | ~5B | `world_model_wan` | UMT5-XXL | +**Nvex Milestone 3 is fully implemented.** The platform runs the complete failure-to-improvement loop autonomously. ---- +### Milestone 2 ✅ — Executable MVP +- `nvex_server/` provides the FastAPI backend for demo state, artifact import, patch-plan generation, job dispatch, status polling, and improvement reports. +- `demo/` provides the React + Vite interactive 7-page product demo. +- `nvex_server/examples/` contains seeded before/after LIBERO Kitchen eval artifacts for the 62% → 74% improvement case. +- The backend can ingest structured eval artifacts and produce Nvex-native schemas for failure maps, patch plans, iteration jobs, reports, and reusable memory assets. -### Benchmarks +### Milestone 3 ✅ — Self-Improving Agent +- `SelfImprovementAgent` orchestrates the full autonomous loop: eval → diagnose → plan → dispatch → verify → memory. +- Two modes: **simulate=True** (demo with precomputed 4-loop replay) and **simulate=False** (real AlphaBrain dispatch). +- Multi-iteration support with regression detection and rollback semantics (e.g., 62% → 74% → 81% → 79% rollback → 85%). +- `LLMNarrator` generates natural-language explanations for each step (powered by OpenAI gpt-4o-mini when available). +- React demo includes "Auto-Improve" button, live agent reasoning panel, and multi-iteration convergence chart. -| Benchmark | Tasks | Highlights | Path | -|:----------|:------|:-----------|:-----| -| **LIBERO** | Spatial / Object / Goal / Long-horizon | Core evaluation suite, 4 task suites | `benchmarks/LIBERO/` | -| **LIBERO-plus** | Robustness (Camera, Robot, Language, Light, etc.) | Zero-shot generalization testing | `benchmarks/LIBERO-plus/` | -| **RoboCasa** | Tabletop & kitchen manipulation | Real-world scene diversity | `benchmarks/Robocasa_tabletop/` | -| **RoboCasa365** | 365-day kitchen task collection | Large-scale daily tasks | `benchmarks/Robocasa365/` | -| ... | | | +### Milestone 4 — Customer-Grade Platform (In Progress) +- Streaming agent timeline with variable step durations +- Multi-project isolation and persistent platform memory +- Customer onboarding API (BYO checkpoint + eval artifact) +- Role-based views (operator vs executive) ---- +## Run The Demo -## 🤝 Community +### Quick Start -We welcome contributions from the community — including new frameworks, benchmark adapters, bug fixes, and improvements that achieve stronger benchmark performance. Outstanding contributors may be invited to join the community as core members. Every contribution matters. +Start the backend: -| Channel | Link | -|:--------|:-----| -| GitHub Issues | [Report bugs & request features](https://github.com/AlphaBrainGroup/AlphaBrain/issues) | -| HuggingFace | [Models](https://huggingface.co/AlphaBrainGroup) | -| WeChat Group | [Scan the QR code to join](assets/wechat.jpg) | +```bash +./.venv/bin/python -m uvicorn nvex_server.app:app --reload --port 8000 +``` -### Acknowledgments +Start the React app in another terminal: -AlphaBrain is mainly forked from [starVLA](https://github.com/starVLA/starVLA) and stands on the shoulders of an incredible open-source ecosystem. We are deeply grateful to the authors and maintainers of the following projects, whose code, models, datasets, and ideas directly enabled this work: +```bash +cd demo +npm install +npm run dev +``` -- [starVLA/starVLA](https://github.com/starVLA/starVLA) -- [openvla/openvla](https://github.com/openvla/openvla) -- [moojink/openvla-oft](https://github.com/moojink/openvla-oft) -- [Physical-Intelligence/openpi](https://github.com/Physical-Intelligence/openpi) -- [NVIDIA/Isaac-GR00T](https://github.com/NVIDIA/Isaac-GR00T) -- [QwenLM/Qwen3-VL](https://github.com/QwenLM/Qwen3-VL) -- [nvidia-cosmos/cosmos-predict2.5](https://github.com/nvidia-cosmos/cosmos-predict2.5) -- [Wan-Video/Wan2.2](https://github.com/Wan-Video/Wan2.2) -- [Lifelong-Robot-Learning/LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO) -- [robocasa/robocasa](https://github.com/robocasa/robocasa) -- [guoweiyu/NeuroVLA](https://github.com/guoweiyu/NeuroVLA) +Open the Vite URL, usually: +```text +http://127.0.0.1:5173 +``` ---- +The Vite dev server proxies `/api` requests to `http://127.0.0.1:8000`, so the React app consumes the local Nvex backend directly. -## 📝 Citation +### Standalone Demo + +For a static walkthrough without running a backend: + +```bash +open demo/nvex-demo.html +``` + +### Demo Features + +- **Seeded Improvement Scenario**: LIBERO Kitchen checkpoint 62% → 74% (then 81% → 85% with recovery after regression) +- **Auto-Improve Mode**: Click "Auto-Improve" to watch the agent run the full loop autonomously in demo mode +- **Live Agent Reasoning**: See the agent's decisions at each step (why CL over SFT, why targeting occlusion data) +- **Multi-Iteration Chart**: Visualize progression across loops, including rollback and recovery + +## Key Capabilities + +## Key Capabilities + +✅ **Autonomous Improvement Loop** — Run the full failure→diagnosis→plan→train→verify cycle without manual steps +✅ **Intelligent Patch Planning** — Map failure clusters to targeted training strategies (CL, SFT, VLM co-training) +✅ **Real AlphaBrain Integration** — Dispatch jobs to continual learning, fine-tuning, or RL training backends +✅ **LLM-Generated Reasoning** — Natural-language explanations for diagnosis, planning, and verification steps +✅ **Multi-Iteration Convergence** — Track non-monotonic improvement arcs with regression detection and rollback +✅ **Platform Memory** — Save recipes and patterns from each loop for reuse across projects +✅ **Interactive Dashboard** — 7-page investor-focused narrative with live agent streaming and reasoning panels + +## API Surface + +The local backend exposes: + +| Endpoint | Purpose | +| --- | --- | +| `GET /health` | Health check | +| `GET /api/demo/state` | Seeded full demo state | +| `GET /api/demo/agent` | Pre-seeded autonomous agent run state | +| `POST /api/eval/import` | Import a benchmark artifact as an eval run | +| `POST /api/plan/generate` | Generate a patch plan from failures | +| `POST /api/iteration/start` | Start an improvement iteration | +| `GET /api/iteration/{id}/status` | Poll iteration status | +| `GET /api/report/{iteration_id}` | Fetch the improvement report | +| `POST /api/agent/run` | Launch a new autonomous improvement run | +| `GET /api/agent/{id}/status` | Poll autonomous agent state | +| `POST /api/agent/{id}/advance` | Advance agent by one step (for demo mode) | + +## Repository Map + +```text +nvex_server/ + app.py FastAPI routes and InMemoryStore + agent.py SelfImprovementAgent orchestrator (demo + real modes) + schemas.py Nvex data contracts + patch_plan_generator.py Rule-based patch planner + dispatcher.py Iteration dispatch and status tracking + exporters.py Benchmark artifact import helpers + llm_narrator.py LLM-powered reasoning narration (OpenAI fallback) + examples/ Seeded before/after eval artifacts + +demo/ + src/ React product demo with 7 pages + nvex-demo.html Standalone static walkthrough + +SELF_IMPROVEMENT_AGENT.md Autonomous-loop design and semantics +IMPLEMENTATION_PLAN.md Detailed milestones, delivery, and roadmap +assets/ README and demo imagery +``` + +## Roadmap + +| Milestone | Status | Focus | +| --- | --- | --- | +| **M2: Executable MVP** | ✅ Complete | Real backend path, seeded LIBERO improvement case, interactive demo | +| **M3: Self-Improving Agent** | ✅ Complete | Autonomous loop runner, LLM narration, stopping criteria, reasoning UI | +| **M4: Customer-Grade Platform** | 🔄 In Progress | Streaming timeline, multi-project support, persistent memory, customer onboarding API | + +See [`IMPLEMENTATION_PLAN.md`](IMPLEMENTATION_PLAN.md) and [`SELF_IMPROVEMENT_AGENT.md`](SELF_IMPROVEMENT_AGENT.md) for the full plan. + +## Citation ```bibtex -@software{AlphaBrain2026, - title = {AlphaBrain: a Modular Open-Source Framework for Embodied Intelligence Research}, - author = {AlphaBrain Community}, - year = {2026}, - url = {https://github.com/AlphaBrainGroup/AlphaBrain}, - license = {MIT}, - doi = {} +@software{Nvex2026, + title = {Nvex: Self-Improving Physical AI Orchestration}, + author = {Nvex Contributors}, + year = {2026}, + license = {MIT} } ``` ---- +## License + +[MIT License](LICENSE) + +## Acknowledgment -## 📄 License +Nvex is the orchestration and intelligence layer. This repository currently includes **AlphaBrain** as the bundled execution layer for VLA training, evaluation, continual learning, world-model experiments, RL fine-tuning, and benchmark integration. -This project is licensed under the [MIT License](LICENSE). +AlphaBrain is a modular PyTorch framework for embodied intelligence research and is forked from [starVLA](https://github.com/starVLA/starVLA). The AlphaBrain code builds on work from the broader open-source robotics and VLA ecosystem, including OpenVLA, openvla-oft, openpi, Isaac-GR00T, Qwen-VL, Cosmos, Wan, LIBERO, RoboCasa, and NeuroVLA.
-Built with passion by the AlphaBrain Community upon starVLA +Nvex learns from every failure and makes every improvement reusable.
diff --git a/SELF_IMPROVEMENT_AGENT.md b/SELF_IMPROVEMENT_AGENT.md new file mode 100644 index 0000000..99f8e18 --- /dev/null +++ b/SELF_IMPROVEMENT_AGENT.md @@ -0,0 +1,182 @@ +# Nvex Self-Improving Agent — Design & Demo Brainstorm + +**Last Updated:** April 25, 2026 + +--- + +## Core Idea + +The self-improvement agent is the clearest proof that Nvex is not a dashboard. It takes a failing policy and — without human intervention at each step — runs the full diagnosis → plan → training → verification loop until the policy meets a target KPI or hits a stopping condition. + +This is the "aha moment" for both investors and customers: + +> You upload a checkpoint. You set a target (e.g., 75% success on LIBERO Kitchen). The agent runs. You come back and the policy is better — and it can tell you exactly why, what it did, and what it learned. + +--- + +## What "Self-Improving" Actually Means + +The agent doesn't retrain from scratch. It does **targeted, incremental improvement**: + +1. **Identify** — Run eval, cluster failures, rank root causes by impact +2. **Decide** — Choose the highest-leverage intervention (CL patch, SFT, data augmentation, env verification) +3. **Execute** — Dispatch the job to AlphaBrain +4. **Verify** — Re-run eval on the new checkpoint +5. **Store** — Save the recipe, the failure pattern, and the improvement delta to Platform Memory +6. **Loop** — If target not met and improvement is positive, go again + +The agent terminates when: +- Target KPI is reached +- Max iterations exceeded +- Improvement delta falls below a threshold (diminishing returns) +- A blocking failure is detected (e.g., data source unavailable) + +--- + +## Demo Modes + +### Mode A — Precomputed Replay (Milestone 1 / Investor Demo) +The fastest, most reliable demo. All results are pre-generated from a real AlphaBrain run. + +**Flow:** +1. User clicks "Run Self-Improvement" on the Iteration Runner page +2. The UI animates through each agent step with realistic timing (2–5 seconds per step) +3. Failure Map updates: cluster sizes shrink as the agent identifies and patches them +4. Improvement Report appears: 62% → 74%, with failure cluster diffs +5. Platform Memory gains a new recipe + +**Why it works:** The underlying AlphaBrain results are real. The "live" execution is a replay. This is fully stable for investor demos and customer presentations. + +**Implementation:** `demo/nvex-demo.html` already supports this mode — just add the animation trigger and step-by-step state updates. + +--- + +### Mode B — Live Agent with Real AlphaBrain (Milestone 3) +For customer POCs and hands-on demos where the policy is the customer's own checkpoint. + +**Flow:** +1. Customer provides checkpoint + eval results (or runs eval inside Nvex) +2. `SelfImprovementAgent` starts: calls `diagnose_failures()`, shows intermediate reasoning +3. Agent proposes patch plan — customer can approve or override before execution +4. AlphaBrain CL job runs (typically 30–60 min for a small LIBERO patch) +5. Re-eval runs automatically +6. Results appear in Improvement Report + +**Why it works for customers:** They see their own data moving. The improvement is real. The patch recipe is saved for their next project. + +**Caution:** Don't run this live at an investor meeting — too much depends on training stability. Use Mode A for investor demos, Mode B for customer POCs. + +--- + +### Mode C — Multi-Iteration Compound View (Milestone 3+) +Shows the compounding effect across multiple loops — the "platform moat" visual. + +**Flow:** +- Loop 1: 62% → 74% (occlusion patch) +- Loop 2: 74% → 81% (recovery trajectory patch) +- Loop 3: 81% → 85% (lighting variation patch, smaller gain — agent detects diminishing returns and stops) + +**What it proves:** +- Each loop uses a recipe from Platform Memory — getting faster over time +- The agent knows when to stop (stopping criteria) +- The platform gets smarter across projects, not just within one + +--- + +## Agent Architecture + +### Core Components + +``` +SelfImprovementAgent +├── EvalRunner — triggers AlphaBrain eval, returns EvalRun artifact +├── FailureDiagnoser — clusters failures, ranks root causes, returns FailureDiagnosis +├── PatchPlanner — maps diagnosis to PatchPlan (rule-based v1, LLM-enhanced v2) +├── JobDispatcher — submits IterationJob to AlphaBrain, polls status +├── Comparator — diffs before/after EvalRun, produces ImprovementReport +├── MemoryWriter — saves reusable assets to Platform Memory +└── LoopController — manages iteration state, stopping criteria, convergence check +``` + +### Agent Tools (function-calling interface) + +| Tool | Input | Output | +|------|-------|--------| +| `run_eval` | checkpoint_path, benchmark_suite | EvalRun | +| `diagnose_failures` | EvalRun | FailureDiagnosis | +| `generate_patch_plan` | FailureDiagnosis, memory_context | PatchPlan | +| `dispatch_training` | PatchPlan, execution_backend | IterationJob | +| `poll_job_status` | job_id | JobStatus | +| `compare_checkpoints` | EvalRun (before), EvalRun (after) | ImprovementReport | +| `save_to_memory` | ImprovementReport, PatchPlan | ReusableAsset | +| `check_stopping` | ImprovementReport, loop_state | ShouldStop (bool + reason) | + +### PatchPlanner — Strategy Selection Rules (v1) + +| Failure Pattern | Recommended Strategy | AlphaBrain Backend | +|----------------|---------------------|-------------------| +| Occlusion / object visibility | CL patch with occluded scene episodes | `alphabrain_cl` | +| Recovery / error correction | Fine-tune on teleop correction trajectories | `alphabrain_finetune` | +| Language variation | VLM co-training with augmented instructions | `alphabrain_vlm_cotrain` | +| Lighting / appearance shift | CL patch with lighting-augmented data | `alphabrain_cl` | +| Long-horizon failure (step N) | World model rollout verification + targeted re-train | `alphabrain_world_model` | +| Generalization across robots | Cross-architecture CL | `alphabrain_cl` (cross-arch) | + +### Memory Context in Planning + +Platform Memory makes each loop smarter: +- If a similar failure pattern was patched before, the agent reuses that recipe +- Recipe confidence score: how many times it was applied successfully +- The agent favors high-confidence recipes and experiments on low-confidence ones + +--- + +## What to Show in the Demo UI + +### Iteration Runner Page Additions +- **"Auto-Improve" toggle** — switches from manual to autonomous mode +- **Agent Reasoning Panel** — shows the agent's step-by-step decisions in plain language: + - *"Failure clusters detected: occlusion (38%), recovery (24%). Targeting occlusion first — highest impact."* + - *"Found matching recipe in Platform Memory: occlusion_recovery_v1 (used 3 times, avg +9% uplift). Applying."* + - *"Training job dispatched to AlphaBrain CL. Estimated time: 45 min."* +- **Loop Progress Bar** — shows current iteration (1/3), current success rate, target +- **Stop / Override button** — lets the user intervene, inspect the plan, and resume + +### Improvement Report Page Additions +- **Multi-loop comparison chart** — success rate over iterations +- **"Why did it stop?"** — agent explains the stopping reason +- **Asset trail** — which Platform Memory assets were used and which new ones were created + +--- + +## Technical Risks & Mitigations + +| Risk | Mitigation | +|------|-----------| +| Training run takes too long for live demo | Use Mode A (precomputed replay) for demos; Mode B only for async customer POCs | +| Agent makes a bad plan (wrong strategy) | v1 is rule-based, not fully autonomous — customer can review and approve before execution | +| Improvement is negative on first loop | Build in retry logic; if delta < 0, agent tries a different strategy from the strategy library | +| AlphaBrain job fails mid-run | Job dispatcher catches failures, saves partial state, allows resume | +| Customers don't trust "autonomous" decisions | Make agent reasoning fully transparent — every decision is logged and explainable | + +--- + +## Roadmap + +| Phase | What | When | +|-------|------|------| +| **M1** | Precomputed replay animation in demo HTML | Now | +| **M2** | Real eval artifact ingestion + rule-based PatchPlanner + JobDispatcher | Next sprint | +| **M3A** | `SelfImprovementAgent` skeleton with tool registry | ~4 weeks | +| **M3B** | Agent reasoning panel in demo UI | ~5 weeks | +| **M3C** | Multi-iteration compound view | ~6 weeks | +| **M3D** | LLM-enhanced PatchPlanner (natural language reasoning) | ~8 weeks | +| **M4** | Customer-uploadable checkpoints, async POC mode | ~12 weeks | + +--- + +## Key Message for Demos + +> "Most tools tell you what happened. Nvex tells you what to do — and then does it." + +The self-improvement agent is proof that Nvex is an intelligence layer, not a dashboard. Every loop makes the platform smarter. Every project compounds the knowledge. That's the moat. diff --git a/assets/INVESTOR_DEMO_SCRIPT.md b/assets/INVESTOR_DEMO_SCRIPT.md new file mode 100644 index 0000000..3cad6be --- /dev/null +++ b/assets/INVESTOR_DEMO_SCRIPT.md @@ -0,0 +1,388 @@ +# Nvex Investor Demo Script + +Use this runbook for a 10-15 minute investor demo of Nvex. The goal is not to show every feature. The goal is to make one idea unmistakable: + +> Most Physical AI tools tell you what happened. Nvex tells you what to do next, executes the loop, verifies the improvement, and saves the learning. + +## Demo Goal + +By the end of the demo, investors should understand: + +- **Problem:** Physical AI teams can train policies, but improving failed checkpoints is still slow, manual, and ad hoc. +- **Product:** Nvex is the orchestration and intelligence layer for policy improvement. +- **Proof:** The demo runs a realistic autonomous loop from **62% -> 74% -> 81% -> 79% (regression) -> rollback -> 85%** with a streamed timeline and stop condition. +- **Moat:** Every loop creates reusable platform memory: recipes, failure patterns, verification plans, and execution templates. +- **Roadmap:** Milestone 4 extends this into multi-project isolation, persistence, and customer onboarding. + +## Pre-Demo Setup + +Run the backend: + +```bash +./.venv/bin/python -m uvicorn nvex_server.app:app --reload --port 8000 +``` + +Run the React demo: + +```bash +cd demo +npm install +npm run dev +``` + +Open: + +```text +http://127.0.0.1:5173 +``` + +Have this backup ready in case the local server has issues: + +```bash +open demo/nvex-demo.html +``` + +Before the call: + +- Open the demo in a clean browser window. +- Zoom to 90-100%, depending on screen size. +- Close unrelated tabs and terminals. +- Keep one terminal visible only if you want to briefly prove the backend is live. +- Start on the Project Hub page. +- Practice the transition from Failure Map to Patch Plan to Iteration Runner. That is the heart of the demo. + +## Recommended Timing + +| Segment | Time | Purpose | +| --- | ---: | --- | +| Opening frame | 1 min | Name the market problem | +| Project Hub | 1 min | Show Nvex as a platform, not a one-off demo | +| Project Overview | 1 min | Establish the failing checkpoint | +| Failure Map | 2 min | Show diagnosis, not dashboards | +| Patch Plan | 2 min | Show Nvex deciding the next action | +| Iteration Runner | 3 min | Show streamed autonomous execution + rollback discipline | +| Improvement Report | 2 min | Show verified uplift | +| Platform Memory | 2 min | Show compounding platform value | +| Close | 1 min | Tie to roadmap and investment thesis | + +Total: 15 minutes. + +## Opening Talk Track + +Say: + +> Physical AI is moving from model training into model improvement. Teams can get an initial robot policy working, but once it fails in the real world, the loop becomes messy: video review, benchmark logs, manual diagnosis, new data collection, retraining, and then another eval. The hard part is not just training. The hard part is knowing exactly what to fix next. + +Then: + +> Nvex is the orchestration layer for that loop. It takes a failing checkpoint, diagnoses why it failed, generates a targeted patch plan, dispatches the improvement run, verifies the new checkpoint, and saves what it learned so the next project starts smarter. + +Optional one-liner: + +> Think of Nvex as the self-improvement layer for Physical AI policies. + +Optional follow-up one-liner: + +> The key credibility point is not just improvement. It is safe improvement with rollback when a loop regresses. + +## Page-By-Page Script + +### 1. Project Hub + +What to show: + +- Start on the home/project hub page. +- Point to the project list and platform-level metrics. +- Do not linger. This is the map, not the story. + +Say: + +> This is the Nvex project hub. Each project is a policy improvement loop. The important thing is that Nvex is not just tracking experiments. It is organizing the full failure-to-fix workflow across projects. + +Investor point: + +> The platform gets more valuable as it sees more failures, because the memory of prior fixes becomes reusable. + +Transition: + +> I will walk through one concrete policy: a LIBERO Kitchen pick-and-place checkpoint that starts at 62% success. + +### 2. Project Overview + +What to show: + +- Current checkpoint: `ckpt_v0.7`. +- Current success rate: `62%`. +- Next recommended action. + +Say: + +> Here Nvex has imported an evaluation artifact for a trained policy. The policy is not useless, but it is not deployable either: 62% success. This is the exact zone where teams lose time. The eval score tells you there is a problem, but not what to do next. + +Investor point: + +> Nvex treats eval as the beginning of the improvement loop, not the end of reporting. + +Transition: + +> So the next question is: what is actually failing? + +### 3. Failure Map + +What to show: + +- Failure clusters. +- Occlusion as the top cluster. +- Recovery behavior as a secondary cluster. +- Root-cause explanation. + +Say: + +> Nvex compresses raw benchmark output into a failure map. In this case, failures cluster around occlusion-heavy scenes and missing recovery behavior. The policy can sometimes complete the task, but when an object is partially obstructed or the first grasp fails, it does not recover reliably. + +Then: + +> This is the first key distinction: Nvex is not just showing charts. It is converting raw eval results into an actionable diagnosis. + +Investor point: + +> The product wedge is post-training intelligence: diagnosis, prioritization, and targeted improvement. + +Transition: + +> Once Nvex knows why the policy fails, it can generate the patch plan. + +### 4. Patch Plan + +What to show: + +- Data recipe: occlusion-heavy patch episodes. +- Recovery traces / teleop corrections. +- Training strategy: continual learning or fine-tune update. +- Verification plan and expected uplift. + +Say: + +> This is the patch plan generated from the failure map. Nvex recommends targeted data, not random more-data collection. For this checkpoint, it proposes 120 occlusion-heavy episodes and 40 recovery correction trajectories, then a continual-learning update and a verification pass. + +Then: + +> The key is that this plan is structured. It can be reviewed by a human, dispatched to an execution backend, and reused later if it works. + +Investor point: + +> This is where Nvex becomes more than analytics. It turns diagnosis into an executable improvement plan. + +Transition: + +> Now we move from plan to execution. + +### 5. Iteration Runner + +What to show: + +- Streaming timeline events (`run_started`, `step_started`, `step_completed`, `rollback`, `run_stopped`). +- Auto-stream controls (start/pause) and variable step durations. +- Four-loop arc with one intentional regression and rollback. +- Logs/artifacts and backend-driven execution path if asked. + +Say: + +> The iteration runner is the operating heart of Nvex. It streams each reasoning and execution event in order, so you can see what the agent is doing and why, not just a final score. + +Then: + +> Notice this run is realistic, not perfectly monotonic. The third loop regresses from 81% to 79%. Nvex triggers rollback automatically, reverts to the prior checkpoint baseline, and launches a safer corrective loop. + +Be precise: + +> For investor demos, this run is seeded and replayable so timing is stable. The event model, schemas, dispatch interface, polling, rollback signaling, and report generation are implemented. For customer POCs, this is where we connect their checkpoint and run the same loop asynchronously. + +Investor point: + +> Nvex is designed to sit above execution frameworks. It owns decisioning, governance, and loop control. Execution backends can vary. + +Transition: + +> After the run finishes, Nvex does not just say "job complete." It verifies whether the policy actually improved. + +### 6. Improvement Report + +What to show: + +- Loop-by-loop: `62 -> 74 -> 81 -> 79 (rollback) -> 85`. +- Final uplift: `+23pp` versus the starting checkpoint. +- Regression loop rendered in red and marked as rolled back. +- Cluster reduction / recovery improvement and generated assets. +- Generated assets. + +Say: + +> Here is the result. Nvex climbs from 62% to 85%, but importantly it does not hide failure. It surfaces a regression, applies rollback, and recovers safely. That behavior is critical for customer trust. + +Then: + +> This matters because Physical AI improvement needs to be auditable and controlled. You want to know not only that the score improved, but where it regressed, why rollback was triggered, and which recipe is now safe to reuse. + +Investor point: + +> Nvex makes improvement measurable and repeatable. That is what turns a services-like process into software. + +Transition: + +> The most important screen for the company thesis is the last one: Platform Memory. + +### 7. Platform Memory + +What to show: + +- Recipes. +- Failure ontology. +- Pipeline templates. +- Compounding chart or memory assets. + +Say: + +> Every loop deposits reusable assets into Platform Memory: a patch recipe, a failure pattern, a verification setup, and execution metadata. The next time Nvex sees a similar occlusion or recovery failure, it does not start from scratch. + +Then: + +> This is the compounding loop. More projects produce more failures. More failures produce more recipes. More recipes make future improvement faster and more reliable. + +Add: + +> The failed loop is also memory. Nvex stores anti-patterns so teams avoid repeating known bad interventions. + +Investor point: + +> The moat is not one benchmark result. The moat is the accumulated memory of how to fix Physical AI failures across tasks, environments, embodiments, and customers. + +## Closing Script + +Say: + +> The takeaway is simple: Physical AI teams need a self-improvement layer. Training frameworks are necessary, but they do not decide what to fix next. Nvex starts from failure, diagnoses the gap, generates the plan, runs the iteration, verifies the result, and saves the learning. + +Then: + +> Today, the demo shows the full autonomous loop on a seeded LIBERO Kitchen case: 62% to 85% with a visible regression and rollback. The next milestone extends this into customer-grade multi-project operation: project isolation, persistent memory, and onboarding for bring-your-own checkpoints and eval artifacts. + +Final line: + +> We are building the operating layer for Physical AI systems that learn from every failure. + +## Short Version: 5-Minute Script + +Use this if time is tight. + +1. **Open:** "Physical AI teams can train policies, but improving failed checkpoints is still manual. Nvex closes that loop." +2. **Overview:** "This checkpoint starts at 62% success. The eval score tells us it failed, but Nvex tells us why." +3. **Failure Map:** "Failures cluster around occlusion and missing recovery behavior." +4. **Patch Plan:** "Nvex generates targeted data, training, and verification steps instead of asking the team to guess." +5. **Runner:** "Nvex dispatches the improvement job and tracks artifacts." +6. **Report:** "The run reaches 85%, including one regression that Nvex rolls back automatically." +7. **Memory:** "The fix becomes reusable platform memory, so every loop makes the system smarter." +8. **Close:** "This is the self-improvement layer for Physical AI." + +## Investor Q&A Prep + +### "Is this just an MLOps dashboard?" + +Answer: + +> No. MLOps tracks runs and artifacts. Nvex decides what to do next. The core product surface is diagnosis, patch planning, orchestration, verification, and reusable memory. + +### "Is this just a wrapper around AlphaBrain?" + +Answer: + +> No. AlphaBrain is one execution backend bundled in this repo. Nvex owns the intelligence layer above execution: failure maps, patch plans, iteration control, improvement reports, and platform memory. Over time, Nvex can dispatch to multiple training and eval backends. + +### "What is real today?" + +Answer: + +> The current demo includes a React product surface and FastAPI backend path with schema contracts, eval import, patch-plan generation, job dispatch, polling, multi-iteration agent state, streamed timeline events, rollback signaling, and report generation. The investor run is replayable for stability; customer POCs connect real checkpoints and run asynchronously. + +### "Why will this compound?" + +Answer: + +> Every loop creates structured assets: failure patterns, recipes, verification specs, and execution templates. Those assets reduce the time and uncertainty of future loops. That is especially important because robotics failures repeat across tasks and environments. + +### "Who is the first customer?" + +Answer: + +> Robotics and Physical AI teams with an initial trained policy and a painful post-training loop: benchmark failures, field failures, or sim-to-real regressions where they need targeted improvement rather than another undirected training run. + +### "What is the wedge?" + +Answer: + +> Start with post-eval diagnosis and patch planning for VLA manipulation policies. The product expands from "tell me why this checkpoint failed" to "run the improvement loop for me." + +### "What is the autonomous agent?" + +Answer: + +> The agent runs eval, diagnoses failures, selects interventions, dispatches training, verifies the checkpoint, saves memory, emits timeline events, and decides whether to continue, rollback, or stop. It is targeted incremental improvement, not retraining from scratch. + +### "Why now?" + +Answer: + +> Physical AI is moving from demos to deployment. As policies enter more varied environments, the bottleneck shifts from initial model training to continuous improvement after failure. Teams need infrastructure for that loop. + +## Phrases To Use + +- "Eval is the beginning of the improvement loop, not the end of reporting." +- "Nvex turns failure into an executable patch plan." +- "We are not replacing training frameworks; we are orchestrating them." +- "The product compounds because each fix becomes memory." +- "The demo is stable and replayable; the architecture is designed for live customer POCs." +- "We show real control discipline: if a loop regresses, Nvex rolls back and recovers." +- "The moat is the growing library of failure patterns and successful interventions." + +## Phrases To Avoid + +- Avoid implying this seeded investor flow is a live multi-hour training run. +- Avoid promising real-time training during an investor meeting. +- Avoid positioning Nvex as only a benchmark dashboard. +- Avoid making AlphaBrain the center of the story. Mention it as the bundled execution layer only if asked or during the runner section. +- Avoid claiming the 62% to 85% case proves general deployment readiness. It proves loop quality, control, and product thesis. + +## Backup Plan + +If the backend fails: + +1. Open `demo/nvex-demo.html`. +2. Say: + + > I am switching to the static walkthrough so we can keep the story moving. It shows the same product flow and seeded improvement case. + +3. Continue from the same page sequence. + +If the UI is slow: + +1. Skip the Project Overview. +2. Go directly: Failure Map -> Patch Plan -> Improvement Report -> Platform Memory. +3. Use the 5-minute script. + +If someone asks for implementation detail: + +> The backend is FastAPI. The key objects are eval runs, failure diagnoses, patch plans, iteration jobs, improvement reports, and reusable memory assets. The dispatch layer is intentionally separated so Nvex can orchestrate different execution backends. + +If they ask about control and safety: + +> The agent emits explicit run events and applies rollback when a loop regresses beyond tolerance. That is the foundation for customer-facing governance and audit trails. + +## One-Slide Summary + +Use this as the verbal summary if you only get one minute: + +> Nvex is the self-improvement layer for Physical AI. It starts with a failing checkpoint, diagnoses the failure modes, generates a targeted patch plan, dispatches the improvement run, verifies the new checkpoint, and saves the recipe to platform memory. In the demo, a LIBERO Kitchen policy improves from 62% to 74%. The long-term thesis is that every failure makes the platform smarter, creating a compounding library of recipes for robot policy improvement. + +Replace the metric callout if using the current Milestone 4 demo flow: + +> In the current demo, a LIBERO Kitchen policy progresses from 62% to 85%, including a visible regression and automatic rollback before final convergence. diff --git a/assets/NVEX-SIA-for-PhysicalAI.png b/assets/NVEX-SIA-for-PhysicalAI.png new file mode 100644 index 0000000..88ef1f3 Binary files /dev/null and b/assets/NVEX-SIA-for-PhysicalAI.png differ diff --git a/assets/pearl-illustration.png b/assets/pearl-illustration.png new file mode 100644 index 0000000..16f706b Binary files /dev/null and b/assets/pearl-illustration.png differ diff --git a/benchmarks/LIBERO/eval/eval_libero.py b/benchmarks/LIBERO/eval/eval_libero.py index fe8891c..ef463dd 100644 --- a/benchmarks/LIBERO/eval/eval_libero.py +++ b/benchmarks/LIBERO/eval/eval_libero.py @@ -143,6 +143,7 @@ def eval_libero(args: Args) -> None: # Start evaluation total_episodes, total_successes = 0, 0 + task_breakdown = [] for task_id in tqdm.tqdm(range(num_tasks_in_suite)): # Get task task = task_suite.get_task(task_id) @@ -419,6 +420,15 @@ def eval_libero(args: Args) -> None: f"{_CB}Current total success rate:{_C0} {_sr_color(_total_sr)}" f"{_total_sr:.2f}{_C0} {_CD}({_total_sr*100:.1f}%){_C0}" ) + task_breakdown.append( + { + "task_id": task_id, + "task_name": task_description, + "success_rate": _task_sr, + "attempts": task_episodes, + "successes": task_successes, + } + ) # Explicitly close the environment to avoid EGL cleanup errors during GC env.close() @@ -437,6 +447,19 @@ def eval_libero(args: Args) -> None: f"{_CB}{'━' * 60}{_C0}" ) + results = { + "task_suite": args.task_suite_name, + "checkpoint": args.pretrained_path, + "total_episodes": total_episodes, + "total_successes": total_successes, + "success_rate": _final_sr, + "task_breakdown": task_breakdown, + } + results_path = pathlib.Path(args.video_out_path) / "eval_results.json" + with open(results_path, "w", encoding="utf-8") as f: + json.dump(results, f, indent=2, default=str) + logging.info(f"Results saved to {results_path}") + def _get_libero_env(task, resolution, seed): """Initializes and returns the LIBERO environment, along with the task description.""" diff --git a/demo/.gitignore b/demo/.gitignore new file mode 100644 index 0000000..a547bf3 --- /dev/null +++ b/demo/.gitignore @@ -0,0 +1,24 @@ +# Logs +logs +*.log +npm-debug.log* +yarn-debug.log* +yarn-error.log* +pnpm-debug.log* +lerna-debug.log* + +node_modules +dist +dist-ssr +*.local + +# Editor directories and files +.vscode/* +!.vscode/extensions.json +.idea +.DS_Store +*.suo +*.ntvs* +*.njsproj +*.sln +*.sw? diff --git a/demo/README.md b/demo/README.md new file mode 100644 index 0000000..0e0f87e --- /dev/null +++ b/demo/README.md @@ -0,0 +1,78 @@ +# Nvex Physical AI Demo + +A 7-page interactive web demo showcasing Nvex as the **self-improving Physical AI orchestration layer** — for investor presentations, customer discovery calls, and technical evaluations. + +The React demo is now backed by the Milestone 2 Nvex server path. Failure Map, Patch Plan, Iteration Runner, Improvement Report, and Platform Memory all consume live API responses from `nvex_server.app`, seeded with a LIBERO Kitchen before/after improvement case. + +## Audience Modes + +| Audience | Focus | Key Pages | +|----------|-------|-----------| +| **Investors** | Platform moat, compound value, orchestration layer vs. training framework | Home → Platform Memory | +| **Potential customers** | "What happens to my failing policy?" | Overview → Failure Map → Patch Plan → Improvement Report | +| **Technical evaluators** | AlphaBrain execution depth, benchmark results, agent reasoning | Failure Map → Iteration Runner | + +## Pages + +| Route | Page | What it proves | +|-------|------|----------------| +| Home | Project Hub — intelligence loop diagram, platform metrics, project list | Nvex is a platform, not a one-off tool | +| Overview | Project Overview — KPI cards, task breakdown, loop position, next action | Eval is the starting point, not the end | +| Failure Map | Interactive failure clusters, radar chart, root-cause diagnosis | Nvex knows *why* the policy failed | +| Patch Plan | Data targeting, training strategy, verification, expected uplift | Nvex decides *what to do next* | +| Iteration Runner | Animated timeline, live console, artifact tracker | AlphaBrain executes; Nvex orchestrates | +| Improvement Report | Before/after metrics, assets created, next iteration suggestion | Measurable, verifiable improvement | +| Platform Memory | Recipes, pipeline templates, failure ontology, compounding chart | Every loop makes the platform smarter | + +## Quick Start + +**Standalone (no build needed):** +```bash +open nvex-demo.html # or just double-click it +``` + +**React dev server with local backend:** +```bash +cd .. +./.venv/bin/python -m uvicorn nvex_server.app:app --reload --port 8000 + +cd demo +npm install +npm run dev # http://127.0.0.1:5173 +npm run build # production build → dist/ +``` + +Vite proxies `/api` and `/health` to `http://127.0.0.1:8000`, so no extra frontend env var is required for local development. + +## API-Backed Demo Flow + +The current local demo uses these backend endpoints: + +- `GET /api/demo/state` — returns the seeded full dashboard state used by the React app +- `POST /api/eval/import` — imports benchmark artifacts into `EvalRun` +- `POST /api/plan/generate` — produces a rule-based `PatchPlan` +- `POST /api/iteration/start` — creates an `IterationJob` +- `GET /api/iteration/{id}/status` — polls file-backed job state +- `GET /api/report/{iteration_id}` — returns the resulting `ImprovementReport` + +The seeded scenario uses `nvex_server/examples/libero_kitchen_before_eval.json` and `nvex_server/examples/libero_kitchen_after_eval.json` to demonstrate a real structured loop from `62%` to `74%` success. + +## Stack + +- `nvex-demo.html` — self-contained single-file demo, no dependencies +- React 19 + Vite 8 app in `src/` +- Pure CSS (no UI framework) — dark `#07090f` theme, indigo-violet gradients +- SVG for Intelligence Loop diagram and radar chart +- `src/data/NvexRuntimeContext.jsx` adapts backend demo state into the dashboard data shape +- `nvex_server/` provides the M2 backend orchestration path +- Real artifact import is supported for LIBERO `eval_results.json`, RoboCasa365 `aggregate_stats.json`, RoboCasa tabletop stats JSON, generic JSON, and LIBERO logs + +## Story + +Demo follows the LIBERO Kitchen Pick-and-Place scenario: + +> `NeuroVLA-LIBERO-ckpt_v0.7` at **62% success** → Nvex diagnoses failure clusters (occlusion 38%, recovery 24%) → generates targeted patch plan → AlphaBrain CL update → `ckpt_v0.8` at **74% (+12%)** → recipe saved to Platform Memory. + +For local M2 development, this story is served through the backend, not hardcoded page-by-page in the React app. + +For the self-improvement agent story (autonomous multi-loop), see [`../SELF_IMPROVEMENT_AGENT.md`](../SELF_IMPROVEMENT_AGENT.md). diff --git a/demo/eslint.config.js b/demo/eslint.config.js new file mode 100644 index 0000000..ea36dd3 --- /dev/null +++ b/demo/eslint.config.js @@ -0,0 +1,21 @@ +import js from '@eslint/js' +import globals from 'globals' +import reactHooks from 'eslint-plugin-react-hooks' +import reactRefresh from 'eslint-plugin-react-refresh' +import { defineConfig, globalIgnores } from 'eslint/config' + +export default defineConfig([ + globalIgnores(['dist']), + { + files: ['**/*.{js,jsx}'], + extends: [ + js.configs.recommended, + reactHooks.configs.flat.recommended, + reactRefresh.configs.vite, + ], + languageOptions: { + globals: globals.browser, + parserOptions: { ecmaFeatures: { jsx: true } }, + }, + }, +]) diff --git a/demo/index.html b/demo/index.html new file mode 100644 index 0000000..952fc7a --- /dev/null +++ b/demo/index.html @@ -0,0 +1,13 @@ + + + + + + + Nvex — Physical AI Orchestration Demo + + +
+ + + diff --git a/demo/nvex-demo.html b/demo/nvex-demo.html new file mode 100644 index 0000000..eaae644 --- /dev/null +++ b/demo/nvex-demo.html @@ -0,0 +1,1770 @@ + + + + + +Nvex — Physical AI Orchestration + + + + + + + +
+
+
+
+
+ +
+ + + + +
+
+ +
+ + +
+
+ +
+
+
+ + + + diff --git a/demo/package-lock.json b/demo/package-lock.json new file mode 100644 index 0000000..b6c1c23 --- /dev/null +++ b/demo/package-lock.json @@ -0,0 +1,2428 @@ +{ + "name": "demo", + "version": "0.0.0", + "lockfileVersion": 3, + "requires": true, + "packages": { + "": { + "name": "demo", + "version": "0.0.0", + "dependencies": { + "react": "^19.2.5", + "react-dom": "^19.2.5" + }, + "devDependencies": { + "@eslint/js": "^10.0.1", + "@types/react": "^19.2.14", + "@types/react-dom": "^19.2.3", + "@vitejs/plugin-react": "^6.0.1", + "eslint": "^10.2.1", + "eslint-plugin-react-hooks": "^7.1.1", + "eslint-plugin-react-refresh": "^0.5.2", + "globals": "^17.5.0", + "vite": "^8.0.10" + } + }, + "node_modules/@babel/code-frame": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.29.0.tgz", + "integrity": "sha512-9NhCeYjq9+3uxgdtp20LSiJXJvN0FeCtNGpJxuMFZ1Kv3cWUNb6DOhJwUvcVCzKGR66cw4njwM6hrJLqgOwbcw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-validator-identifier": "^7.28.5", + "js-tokens": "^4.0.0", + "picocolors": "^1.1.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/compat-data": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/compat-data/-/compat-data-7.29.0.tgz", + "integrity": "sha512-T1NCJqT/j9+cn8fvkt7jtwbLBfLC/1y1c7NtCeXFRgzGTsafi68MRv8yzkYSapBnFA6L3U2VSc02ciDzoAJhJg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/core": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/core/-/core-7.29.0.tgz", + "integrity": "sha512-CGOfOJqWjg2qW/Mb6zNsDm+u5vFQ8DxXfbM09z69p5Z6+mE1ikP2jUXw+j42Pf1XTYED2Rni5f95npYeuwMDQA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.29.0", + "@babel/generator": "^7.29.0", + "@babel/helper-compilation-targets": "^7.28.6", + "@babel/helper-module-transforms": "^7.28.6", + "@babel/helpers": "^7.28.6", + "@babel/parser": "^7.29.0", + "@babel/template": "^7.28.6", + "@babel/traverse": "^7.29.0", + "@babel/types": "^7.29.0", + "@jridgewell/remapping": "^2.3.5", + "convert-source-map": "^2.0.0", + "debug": "^4.1.0", + "gensync": "^1.0.0-beta.2", + "json5": "^2.2.3", + "semver": "^6.3.1" + }, + "engines": { + "node": ">=6.9.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/babel" + } + }, + "node_modules/@babel/generator": { + "version": "7.29.1", + "resolved": "https://registry.npmjs.org/@babel/generator/-/generator-7.29.1.tgz", + "integrity": "sha512-qsaF+9Qcm2Qv8SRIMMscAvG4O3lJ0F1GuMo5HR/Bp02LopNgnZBC/EkbevHFeGs4ls/oPz9v+Bsmzbkbe+0dUw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/parser": "^7.29.0", + "@babel/types": "^7.29.0", + "@jridgewell/gen-mapping": "^0.3.12", + "@jridgewell/trace-mapping": "^0.3.28", + "jsesc": "^3.0.2" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-compilation-targets": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/helper-compilation-targets/-/helper-compilation-targets-7.28.6.tgz", + "integrity": "sha512-JYtls3hqi15fcx5GaSNL7SCTJ2MNmjrkHXg4FSpOA/grxK8KwyZ5bubHsCq8FXCkua6xhuaaBit+3b7+VZRfcA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/compat-data": "^7.28.6", + "@babel/helper-validator-option": "^7.27.1", + "browserslist": "^4.24.0", + "lru-cache": "^5.1.1", + "semver": "^6.3.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-globals": { + "version": "7.28.0", + "resolved": "https://registry.npmjs.org/@babel/helper-globals/-/helper-globals-7.28.0.tgz", + "integrity": "sha512-+W6cISkXFa1jXsDEdYA8HeevQT/FULhxzR99pxphltZcVaugps53THCeiWA8SguxxpSp3gKPiuYfSWopkLQ4hw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-module-imports": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/helper-module-imports/-/helper-module-imports-7.28.6.tgz", + "integrity": "sha512-l5XkZK7r7wa9LucGw9LwZyyCUscb4x37JWTPz7swwFE/0FMQAGpiWUZn8u9DzkSBWEcK25jmvubfpw2dnAMdbw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/traverse": "^7.28.6", + "@babel/types": "^7.28.6" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-module-transforms": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/helper-module-transforms/-/helper-module-transforms-7.28.6.tgz", + "integrity": "sha512-67oXFAYr2cDLDVGLXTEABjdBJZ6drElUSI7WKp70NrpyISso3plG9SAGEF6y7zbha/wOzUByWWTJvEDVNIUGcA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-module-imports": "^7.28.6", + "@babel/helper-validator-identifier": "^7.28.5", + "@babel/traverse": "^7.28.6" + }, + "engines": { + "node": ">=6.9.0" + }, + "peerDependencies": { + "@babel/core": "^7.0.0" + } + }, + "node_modules/@babel/helper-string-parser": { + "version": "7.27.1", + "resolved": "https://registry.npmjs.org/@babel/helper-string-parser/-/helper-string-parser-7.27.1.tgz", + "integrity": "sha512-qMlSxKbpRlAridDExk92nSobyDdpPijUq2DW6oDnUqd0iOGxmQjyqhMIihI9+zv4LPyZdRje2cavWPbCbWm3eA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-validator-identifier": { + "version": "7.28.5", + "resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.28.5.tgz", + "integrity": "sha512-qSs4ifwzKJSV39ucNjsvc6WVHs6b7S03sOh2OcHF9UHfVPqWWALUsNUVzhSBiItjRZoLHx7nIarVjqKVusUZ1Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-validator-option": { + "version": "7.27.1", + "resolved": "https://registry.npmjs.org/@babel/helper-validator-option/-/helper-validator-option-7.27.1.tgz", + "integrity": "sha512-YvjJow9FxbhFFKDSuFnVCe2WxXk1zWc22fFePVNEaWJEu8IrZVlda6N0uHwzZrUM1il7NC9Mlp4MaJYbYd9JSg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helpers": { + "version": "7.29.2", + "resolved": "https://registry.npmjs.org/@babel/helpers/-/helpers-7.29.2.tgz", + "integrity": "sha512-HoGuUs4sCZNezVEKdVcwqmZN8GoHirLUcLaYVNBK2J0DadGtdcqgr3BCbvH8+XUo4NGjNl3VOtSjEKNzqfFgKw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/template": "^7.28.6", + "@babel/types": "^7.29.0" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/parser": { + "version": "7.29.2", + "resolved": "https://registry.npmjs.org/@babel/parser/-/parser-7.29.2.tgz", + "integrity": "sha512-4GgRzy/+fsBa72/RZVJmGKPmZu9Byn8o4MoLpmNe1m8ZfYnz5emHLQz3U4gLud6Zwl0RZIcgiLD7Uq7ySFuDLA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/types": "^7.29.0" + }, + "bin": { + "parser": "bin/babel-parser.js" + }, + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/@babel/template": { + "version": "7.28.6", + "resolved": "https://registry.npmjs.org/@babel/template/-/template-7.28.6.tgz", + "integrity": "sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.28.6", + "@babel/parser": "^7.28.6", + "@babel/types": "^7.28.6" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/traverse": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/traverse/-/traverse-7.29.0.tgz", + "integrity": "sha512-4HPiQr0X7+waHfyXPZpWPfWL/J7dcN1mx9gL6WdQVMbPnF3+ZhSMs8tCxN7oHddJE9fhNE7+lxdnlyemKfJRuA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.29.0", + "@babel/generator": "^7.29.0", + "@babel/helper-globals": "^7.28.0", + "@babel/parser": "^7.29.0", + "@babel/template": "^7.28.6", + "@babel/types": "^7.29.0", + "debug": "^4.3.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/types": { + "version": "7.29.0", + "resolved": "https://registry.npmjs.org/@babel/types/-/types-7.29.0.tgz", + "integrity": "sha512-LwdZHpScM4Qz8Xw2iKSzS+cfglZzJGvofQICy7W7v4caru4EaAmyUuO6BGrbyQ2mYV11W0U8j5mBhd14dd3B0A==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-string-parser": "^7.27.1", + "@babel/helper-validator-identifier": "^7.28.5" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@emnapi/core": { + "version": "1.10.0", + "resolved": "https://registry.npmjs.org/@emnapi/core/-/core-1.10.0.tgz", + "integrity": "sha512-yq6OkJ4p82CAfPl0u9mQebQHKPJkY7WrIuk205cTYnYe+k2Z8YBh11FrbRG/H6ihirqcacOgl2BIO8oyMQLeXw==", + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "@emnapi/wasi-threads": "1.2.1", + "tslib": "^2.4.0" + } + }, + "node_modules/@emnapi/runtime": { + "version": "1.10.0", + "resolved": "https://registry.npmjs.org/@emnapi/runtime/-/runtime-1.10.0.tgz", + "integrity": "sha512-ewvYlk86xUoGI0zQRNq/mC+16R1QeDlKQy21Ki3oSYXNgLb45GV1P6A0M+/s6nyCuNDqe5VpaY84BzXGwVbwFA==", + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "tslib": "^2.4.0" + } + }, + "node_modules/@emnapi/wasi-threads": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/@emnapi/wasi-threads/-/wasi-threads-1.2.1.tgz", + "integrity": "sha512-uTII7OYF+/Mes/MrcIOYp5yOtSMLBWSIoLPpcgwipoiKbli6k322tcoFsxoIIxPDqW01SQGAgko4EzZi2BNv2w==", + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "tslib": "^2.4.0" + } + }, + "node_modules/@eslint-community/eslint-utils": { + "version": "4.9.1", + "resolved": "https://registry.npmjs.org/@eslint-community/eslint-utils/-/eslint-utils-4.9.1.tgz", + "integrity": "sha512-phrYmNiYppR7znFEdqgfWHXR6NCkZEK7hwWDHZUjit/2/U0r6XvkDl0SYnoM51Hq7FhCGdLDT6zxCCOY1hexsQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "eslint-visitor-keys": "^3.4.3" + }, + "engines": { + "node": "^12.22.0 || ^14.17.0 || >=16.0.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + }, + "peerDependencies": { + "eslint": "^6.0.0 || ^7.0.0 || >=8.0.0" + } + }, + "node_modules/@eslint-community/eslint-utils/node_modules/eslint-visitor-keys": { + "version": "3.4.3", + "resolved": "https://registry.npmjs.org/eslint-visitor-keys/-/eslint-visitor-keys-3.4.3.tgz", + "integrity": "sha512-wpc+LXeiyiisxPlEkUzU6svyS1frIO3Mgxj1fdy7Pm8Ygzguax2N3Fa/D/ag1WqbOprdI+uY6wMUl8/a2G+iag==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": "^12.22.0 || ^14.17.0 || >=16.0.0" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/@eslint-community/regexpp": { + "version": "4.12.2", + "resolved": "https://registry.npmjs.org/@eslint-community/regexpp/-/regexpp-4.12.2.tgz", + "integrity": "sha512-EriSTlt5OC9/7SXkRSCAhfSxxoSUgBm33OH+IkwbdpgoqsSsUg7y3uh+IICI/Qg4BBWr3U2i39RpmycbxMq4ew==", + "dev": true, + "license": "MIT", + "engines": { + "node": "^12.0.0 || ^14.0.0 || >=16.0.0" + } + }, + "node_modules/@eslint/config-array": { + "version": "0.23.5", + "resolved": "https://registry.npmjs.org/@eslint/config-array/-/config-array-0.23.5.tgz", + "integrity": "sha512-Y3kKLvC1dvTOT+oGlqNQ1XLqK6D1HU2YXPc52NmAlJZbMMWDzGYXMiPRJ8TYD39muD/OTjlZmNJ4ib7dvSrMBA==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@eslint/object-schema": "^3.0.5", + "debug": "^4.3.1", + "minimatch": "^10.2.4" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + } + }, + "node_modules/@eslint/config-helpers": { + "version": "0.5.5", + "resolved": "https://registry.npmjs.org/@eslint/config-helpers/-/config-helpers-0.5.5.tgz", + "integrity": "sha512-eIJYKTCECbP/nsKaaruF6LW967mtbQbsw4JTtSVkUQc9MneSkbrgPJAbKl9nWr0ZeowV8BfsarBmPpBzGelA2w==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@eslint/core": "^1.2.1" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + } + }, + "node_modules/@eslint/core": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/@eslint/core/-/core-1.2.1.tgz", + "integrity": "sha512-MwcE1P+AZ4C6DWlpin/OmOA54mmIZ/+xZuJiQd4SyB29oAJjN30UW9wkKNptW2ctp4cEsvhlLY/CsQ1uoHDloQ==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@types/json-schema": "^7.0.15" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + } + }, + "node_modules/@eslint/js": { + "version": "10.0.1", + "resolved": "https://registry.npmjs.org/@eslint/js/-/js-10.0.1.tgz", + "integrity": "sha512-zeR9k5pd4gxjZ0abRoIaxdc7I3nDktoXZk2qOv9gCNWx3mVwEn32VRhyLaRsDiJjTs0xq/T8mfPtyuXu7GWBcA==", + "dev": true, + "license": "MIT", + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + }, + "funding": { + "url": "https://eslint.org/donate" + }, + "peerDependencies": { + "eslint": "^10.0.0" + }, + "peerDependenciesMeta": { + "eslint": { + "optional": true + } + } + }, + "node_modules/@eslint/object-schema": { + "version": "3.0.5", + "resolved": "https://registry.npmjs.org/@eslint/object-schema/-/object-schema-3.0.5.tgz", + "integrity": "sha512-vqTaUEgxzm+YDSdElad6PiRoX4t8VGDjCtt05zn4nU810UIx/uNEV7/lZJ6KwFThKZOzOxzXy48da+No7HZaMw==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + } + }, + "node_modules/@eslint/plugin-kit": { + "version": "0.7.1", + "resolved": "https://registry.npmjs.org/@eslint/plugin-kit/-/plugin-kit-0.7.1.tgz", + "integrity": "sha512-rZAP3aVgB9ds9KOeUSL+zZ21hPmo8dh6fnIFwRQj5EAZl9gzR7wxYbYXYysAM8CTqGmUGyp2S4kUdV17MnGuWQ==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@eslint/core": "^1.2.1", + "levn": "^0.4.1" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + } + }, + "node_modules/@humanfs/core": { + "version": "0.19.2", + "resolved": "https://registry.npmjs.org/@humanfs/core/-/core-0.19.2.tgz", + "integrity": "sha512-UhXNm+CFMWcbChXywFwkmhqjs3PRCmcSa/hfBgLIb7oQ5HNb1wS0icWsGtSAUNgefHeI+eBrA8I1fxmbHsGdvA==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@humanfs/types": "^0.15.0" + }, + "engines": { + "node": ">=18.18.0" + } + }, + "node_modules/@humanfs/node": { + "version": "0.16.8", + "resolved": "https://registry.npmjs.org/@humanfs/node/-/node-0.16.8.tgz", + "integrity": "sha512-gE1eQNZ3R++kTzFUpdGlpmy8kDZD/MLyHqDwqjkVQI0JMdI1D51sy1H958PNXYkM2rAac7e5/CnIKZrHtPh3BQ==", + "dev": true, + "license": "Apache-2.0", + "dependencies": { + "@humanfs/core": "^0.19.2", + "@humanfs/types": "^0.15.0", + "@humanwhocodes/retry": "^0.4.0" + }, + "engines": { + "node": ">=18.18.0" + } + }, + "node_modules/@humanfs/types": { + "version": "0.15.0", + "resolved": "https://registry.npmjs.org/@humanfs/types/-/types-0.15.0.tgz", + "integrity": "sha512-ZZ1w0aoQkwuUuC7Yf+7sdeaNfqQiiLcSRbfI08oAxqLtpXQr9AIVX7Ay7HLDuiLYAaFPu8oBYNq/QIi9URHJ3Q==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=18.18.0" + } + }, + "node_modules/@humanwhocodes/module-importer": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/@humanwhocodes/module-importer/-/module-importer-1.0.1.tgz", + "integrity": "sha512-bxveV4V8v5Yb4ncFTT3rPSgZBOpCkjfK0y4oVVVJwIuDVBRMDXrPyXRL988i5ap9m9bnyEEjWfm5WkBmtffLfA==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=12.22" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/nzakas" + } + }, + "node_modules/@humanwhocodes/retry": { + "version": "0.4.3", + "resolved": "https://registry.npmjs.org/@humanwhocodes/retry/-/retry-0.4.3.tgz", + "integrity": "sha512-bV0Tgo9K4hfPCek+aMAn81RppFKv2ySDQeMoSZuvTASywNTnVJCArCZE2FWqpvIatKu7VMRLWlR1EazvVhDyhQ==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=18.18" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/nzakas" + } + }, + "node_modules/@jridgewell/gen-mapping": { + "version": "0.3.13", + "resolved": "https://registry.npmjs.org/@jridgewell/gen-mapping/-/gen-mapping-0.3.13.tgz", + "integrity": "sha512-2kkt/7niJ6MgEPxF0bYdQ6etZaA+fQvDcLKckhy1yIQOzaoKjBBjSj63/aLVjYE3qhRt5dvM+uUyfCg6UKCBbA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@jridgewell/sourcemap-codec": "^1.5.0", + "@jridgewell/trace-mapping": "^0.3.24" + } + }, + "node_modules/@jridgewell/remapping": { + "version": "2.3.5", + "resolved": "https://registry.npmjs.org/@jridgewell/remapping/-/remapping-2.3.5.tgz", + "integrity": "sha512-LI9u/+laYG4Ds1TDKSJW2YPrIlcVYOwi2fUC6xB43lueCjgxV4lffOCZCtYFiH6TNOX+tQKXx97T4IKHbhyHEQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@jridgewell/gen-mapping": "^0.3.5", + "@jridgewell/trace-mapping": "^0.3.24" + } + }, + "node_modules/@jridgewell/resolve-uri": { + "version": "3.1.2", + "resolved": "https://registry.npmjs.org/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz", + "integrity": "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/@jridgewell/sourcemap-codec": { + "version": "1.5.5", + "resolved": "https://registry.npmjs.org/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz", + "integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==", + "dev": true, + "license": "MIT" + }, + "node_modules/@jridgewell/trace-mapping": { + "version": "0.3.31", + "resolved": "https://registry.npmjs.org/@jridgewell/trace-mapping/-/trace-mapping-0.3.31.tgz", + "integrity": "sha512-zzNR+SdQSDJzc8joaeP8QQoCQr8NuYx2dIIytl1QeBEZHJ9uW6hebsrYgbz8hJwUQao3TWCMtmfV8Nu1twOLAw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@jridgewell/resolve-uri": "^3.1.0", + "@jridgewell/sourcemap-codec": "^1.4.14" + } + }, + "node_modules/@napi-rs/wasm-runtime": { + "version": "1.1.4", + "resolved": "https://registry.npmjs.org/@napi-rs/wasm-runtime/-/wasm-runtime-1.1.4.tgz", + "integrity": "sha512-3NQNNgA1YSlJb/kMH1ildASP9HW7/7kYnRI2szWJaofaS1hWmbGI4H+d3+22aGzXXN9IJ+n+GiFVcGipJP18ow==", + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "@tybys/wasm-util": "^0.10.1" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/Brooooooklyn" + }, + "peerDependencies": { + "@emnapi/core": "^1.7.1", + "@emnapi/runtime": "^1.7.1" + } + }, + "node_modules/@oxc-project/types": { + "version": "0.127.0", + "resolved": "https://registry.npmjs.org/@oxc-project/types/-/types-0.127.0.tgz", + "integrity": "sha512-aIYXQBo4lCbO4z0R3FHeucQHpF46l2LbMdxRvqvuRuW2OxdnSkcng5B8+K12spgLDj93rtN3+J2Vac/TIO+ciQ==", + "dev": true, + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/Boshen" + } + }, + "node_modules/@rolldown/binding-android-arm64": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-android-arm64/-/binding-android-arm64-1.0.0-rc.17.tgz", + "integrity": "sha512-s70pVGhw4zqGeFnXWvAzJDlvxhlRollagdCCKRgOsgUOH3N1l0LIxf83AtGzmb5SiVM4Hjl5HyarMRfdfj3DaQ==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-darwin-arm64": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-darwin-arm64/-/binding-darwin-arm64-1.0.0-rc.17.tgz", + "integrity": "sha512-4ksWc9n0mhlZpZ9PMZgTGjeOPRu8MB1Z3Tz0Mo02eWfWCHMW1zN82Qz/pL/rC+yQa+8ZnutMF0JjJe7PjwasYw==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-darwin-x64": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-darwin-x64/-/binding-darwin-x64-1.0.0-rc.17.tgz", + "integrity": "sha512-SUSDOI6WwUVNcWxd02QEBjLdY1VPHvlEkw6T/8nYG322iYWCTxRb1vzk4E+mWWYehTp7ERibq54LSJGjmouOsw==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-freebsd-x64": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-freebsd-x64/-/binding-freebsd-x64-1.0.0-rc.17.tgz", + "integrity": "sha512-hwnz3nw9dbJ05EDO/PvcjaaewqqDy7Y1rn1UO81l8iIK1GjenME75dl16ajbvSSMfv66WXSRCYKIqfgq2KCfxw==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-arm-gnueabihf": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-arm-gnueabihf/-/binding-linux-arm-gnueabihf-1.0.0-rc.17.tgz", + "integrity": "sha512-IS+W7epTcwANmFSQFrS1SivEXHtl1JtuQA9wlxrZTcNi6mx+FDOYrakGevvvTwgj2JvWiK8B29/qD9BELZPyXQ==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-arm64-gnu": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-arm64-gnu/-/binding-linux-arm64-gnu-1.0.0-rc.17.tgz", + "integrity": "sha512-e6usGaHKW5BMNZOymS1UcEYGowQMWcgZ71Z17Sl/h2+ZziNJ1a9n3Zvcz6LdRyIW5572wBCTH/Z+bKuZouGk9Q==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-arm64-musl": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-arm64-musl/-/binding-linux-arm64-musl-1.0.0-rc.17.tgz", + "integrity": "sha512-b/CgbwAJpmrRLp02RPfhbudf5tZnN9nsPWK82znefso832etkem8H7FSZwxrOI9djcdTP7U6YfNhbRnh7djErg==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-ppc64-gnu": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-ppc64-gnu/-/binding-linux-ppc64-gnu-1.0.0-rc.17.tgz", + "integrity": "sha512-4EII1iNGRUN5WwGbF/kOh/EIkoDN9HsupgLQoXfY+D1oyJm7/F4t5PYU5n8SWZgG0FEwakyM8pGgwcBYruGTlA==", + "cpu": [ + "ppc64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-s390x-gnu": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-s390x-gnu/-/binding-linux-s390x-gnu-1.0.0-rc.17.tgz", + "integrity": "sha512-AH8oq3XqQo4IibpVXvPeLDI5pzkpYn0WiZAfT05kFzoJ6tQNzwRdDYQ45M8I/gslbodRZwW8uxLhbSBbkv96rA==", + "cpu": [ + "s390x" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-x64-gnu": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-x64-gnu/-/binding-linux-x64-gnu-1.0.0-rc.17.tgz", + "integrity": "sha512-cLnjV3xfo7KslbU41Z7z8BH/E1y5mzUYzAqih1d1MDaIGZRCMqTijqLv76/P7fyHuvUcfGsIpqCdddbxLLK9rA==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-linux-x64-musl": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-linux-x64-musl/-/binding-linux-x64-musl-1.0.0-rc.17.tgz", + "integrity": "sha512-0phclDw1spsL7dUB37sIARuis2tAgomCJXAHZlpt8PXZ4Ba0dRP1e+66lsRqrfhISeN9bEGNjQs+T/Fbd7oYGw==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-openharmony-arm64": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-openharmony-arm64/-/binding-openharmony-arm64-1.0.0-rc.17.tgz", + "integrity": "sha512-0ag/hEgXOwgw4t8QyQvUCxvEg+V0KBcA6YuOx9g0r02MprutRF5dyljgm3EmR02O292UX7UeS6HzWHAl6KgyhA==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "openharmony" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-wasm32-wasi": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-wasm32-wasi/-/binding-wasm32-wasi-1.0.0-rc.17.tgz", + "integrity": "sha512-LEXei6vo0E5wTGwpkJ4KoT3OZJRnglwldt5ziLzOlc6qqb55z4tWNq2A+PFqCJuvWWdP53CVhG1Z9NtToDPJrA==", + "cpu": [ + "wasm32" + ], + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "@emnapi/core": "1.10.0", + "@emnapi/runtime": "1.10.0", + "@napi-rs/wasm-runtime": "^1.1.4" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-win32-arm64-msvc": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-win32-arm64-msvc/-/binding-win32-arm64-msvc-1.0.0-rc.17.tgz", + "integrity": "sha512-gUmyzBl3SPMa6hrqFUth9sVfcLBlYsbMzBx5PlexMroZStgzGqlZ26pYG89rBb45Mnia+oil6YAIFeEWGWhoZA==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/binding-win32-x64-msvc": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/binding-win32-x64-msvc/-/binding-win32-x64-msvc-1.0.0-rc.17.tgz", + "integrity": "sha512-3hkiolcUAvPB9FLb3UZdfjVVNWherN1f/skkGWJP/fgSQhYUZpSIRr0/I8ZK9TkF3F7kxvJAk0+IcKvPHk9qQg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": "^20.19.0 || >=22.12.0" + } + }, + "node_modules/@rolldown/pluginutils": { + "version": "1.0.0-rc.7", + "resolved": "https://registry.npmjs.org/@rolldown/pluginutils/-/pluginutils-1.0.0-rc.7.tgz", + "integrity": "sha512-qujRfC8sFVInYSPPMLQByRh7zhwkGFS4+tyMQ83srV1qrxL4g8E2tyxVVyxd0+8QeBM1mIk9KbWxkegRr76XzA==", + "dev": true, + "license": "MIT" + }, + "node_modules/@tybys/wasm-util": { + "version": "0.10.1", + "resolved": "https://registry.npmjs.org/@tybys/wasm-util/-/wasm-util-0.10.1.tgz", + "integrity": "sha512-9tTaPJLSiejZKx+Bmog4uSubteqTvFrVrURwkmHixBo0G4seD0zUxp98E1DzUBJxLQ3NPwXrGKDiVjwx/DpPsg==", + "dev": true, + "license": "MIT", + "optional": true, + "dependencies": { + "tslib": "^2.4.0" + } + }, + "node_modules/@types/esrecurse": { + "version": "4.3.1", + "resolved": "https://registry.npmjs.org/@types/esrecurse/-/esrecurse-4.3.1.tgz", + "integrity": "sha512-xJBAbDifo5hpffDBuHl0Y8ywswbiAp/Wi7Y/GtAgSlZyIABppyurxVueOPE8LUQOxdlgi6Zqce7uoEpqNTeiUw==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/estree": { + "version": "1.0.8", + "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.8.tgz", + "integrity": "sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/json-schema": { + "version": "7.0.15", + "resolved": "https://registry.npmjs.org/@types/json-schema/-/json-schema-7.0.15.tgz", + "integrity": "sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/react": { + "version": "19.2.14", + "resolved": "https://registry.npmjs.org/@types/react/-/react-19.2.14.tgz", + "integrity": "sha512-ilcTH/UniCkMdtexkoCN0bI7pMcJDvmQFPvuPvmEaYA/NSfFTAgdUSLAoVjaRJm7+6PvcM+q1zYOwS4wTYMF9w==", + "dev": true, + "license": "MIT", + "dependencies": { + "csstype": "^3.2.2" + } + }, + "node_modules/@types/react-dom": { + "version": "19.2.3", + "resolved": "https://registry.npmjs.org/@types/react-dom/-/react-dom-19.2.3.tgz", + "integrity": "sha512-jp2L/eY6fn+KgVVQAOqYItbF0VY/YApe5Mz2F0aykSO8gx31bYCZyvSeYxCHKvzHG5eZjc+zyaS5BrBWya2+kQ==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "@types/react": "^19.2.0" + } + }, + "node_modules/@vitejs/plugin-react": { + "version": "6.0.1", + "resolved": "https://registry.npmjs.org/@vitejs/plugin-react/-/plugin-react-6.0.1.tgz", + "integrity": "sha512-l9X/E3cDb+xY3SWzlG1MOGt2usfEHGMNIaegaUGFsLkb3RCn/k8/TOXBcab+OndDI4TBtktT8/9BwwW8Vi9KUQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@rolldown/pluginutils": "1.0.0-rc.7" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + }, + "peerDependencies": { + "@rolldown/plugin-babel": "^0.1.7 || ^0.2.0", + "babel-plugin-react-compiler": "^1.0.0", + "vite": "^8.0.0" + }, + "peerDependenciesMeta": { + "@rolldown/plugin-babel": { + "optional": true + }, + "babel-plugin-react-compiler": { + "optional": true + } + } + }, + "node_modules/acorn": { + "version": "8.16.0", + "resolved": "https://registry.npmjs.org/acorn/-/acorn-8.16.0.tgz", + "integrity": "sha512-UVJyE9MttOsBQIDKw1skb9nAwQuR5wuGD3+82K6JgJlm/Y+KI92oNsMNGZCYdDsVtRHSak0pcV5Dno5+4jh9sw==", + "dev": true, + "license": "MIT", + "bin": { + "acorn": "bin/acorn" + }, + "engines": { + "node": ">=0.4.0" + } + }, + "node_modules/acorn-jsx": { + "version": "5.3.2", + "resolved": "https://registry.npmjs.org/acorn-jsx/-/acorn-jsx-5.3.2.tgz", + "integrity": "sha512-rq9s+JNhf0IChjtDXxllJ7g41oZk5SlXtp0LHwyA5cejwn7vKmKp4pPri6YEePv2PU65sAsegbXtIinmDFDXgQ==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "acorn": "^6.0.0 || ^7.0.0 || ^8.0.0" + } + }, + "node_modules/ajv": { + "version": "6.15.0", + "resolved": "https://registry.npmjs.org/ajv/-/ajv-6.15.0.tgz", + "integrity": "sha512-fgFx7Hfoq60ytK2c7DhnF8jIvzYgOMxfugjLOSMHjLIPgenqa7S7oaagATUq99mV6IYvN2tRmC0wnTYX6iPbMw==", + "dev": true, + "license": "MIT", + "dependencies": { + "fast-deep-equal": "^3.1.1", + "fast-json-stable-stringify": "^2.0.0", + "json-schema-traverse": "^0.4.1", + "uri-js": "^4.2.2" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/epoberezkin" + } + }, + "node_modules/balanced-match": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-4.0.4.tgz", + "integrity": "sha512-BLrgEcRTwX2o6gGxGOCNyMvGSp35YofuYzw9h1IMTRmKqttAZZVU67bdb9Pr2vUHA8+j3i2tJfjO6C6+4myGTA==", + "dev": true, + "license": "MIT", + "engines": { + "node": "18 || 20 || >=22" + } + }, + "node_modules/baseline-browser-mapping": { + "version": "2.10.22", + "resolved": "https://registry.npmjs.org/baseline-browser-mapping/-/baseline-browser-mapping-2.10.22.tgz", + "integrity": "sha512-6qruVrb5rse6WylFkU0FhBKKGuecWseqdpQfhkawn6ztyk2QlfwSRjsDxMCLJrkfmfN21qvhl9ABgaMeRkuwww==", + "dev": true, + "license": "Apache-2.0", + "bin": { + "baseline-browser-mapping": "dist/cli.cjs" + }, + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/brace-expansion": { + "version": "5.0.5", + "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-5.0.5.tgz", + "integrity": "sha512-VZznLgtwhn+Mact9tfiwx64fA9erHH/MCXEUfB/0bX/6Fz6ny5EGTXYltMocqg4xFAQZtnO3DHWWXi8RiuN7cQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "balanced-match": "^4.0.2" + }, + "engines": { + "node": "18 || 20 || >=22" + } + }, + "node_modules/browserslist": { + "version": "4.28.2", + "resolved": "https://registry.npmjs.org/browserslist/-/browserslist-4.28.2.tgz", + "integrity": "sha512-48xSriZYYg+8qXna9kwqjIVzuQxi+KYWp2+5nCYnYKPTr0LvD89Jqk2Or5ogxz0NUMfIjhh2lIUX/LyX9B4oIg==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/browserslist" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "baseline-browser-mapping": "^2.10.12", + "caniuse-lite": "^1.0.30001782", + "electron-to-chromium": "^1.5.328", + "node-releases": "^2.0.36", + "update-browserslist-db": "^1.2.3" + }, + "bin": { + "browserslist": "cli.js" + }, + "engines": { + "node": "^6 || ^7 || ^8 || ^9 || ^10 || ^11 || ^12 || >=13.7" + } + }, + "node_modules/caniuse-lite": { + "version": "1.0.30001790", + "resolved": "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001790.tgz", + "integrity": "sha512-bOoxfJPyYo+ds6W0YfptaCWbFnJYjh2Y1Eow5lRv+vI2u8ganPZqNm1JwNh0t2ELQCqIWg4B3dWEusgAmsoyOw==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/caniuse-lite" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "CC-BY-4.0" + }, + "node_modules/convert-source-map": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/convert-source-map/-/convert-source-map-2.0.0.tgz", + "integrity": "sha512-Kvp459HrV2FEJ1CAsi1Ku+MY3kasH19TFykTz2xWmMeq6bk2NU3XXvfJ+Q61m0xktWwt+1HSYf3JZsTms3aRJg==", + "dev": true, + "license": "MIT" + }, + "node_modules/cross-spawn": { + "version": "7.0.6", + "resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz", + "integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==", + "dev": true, + "license": "MIT", + "dependencies": { + "path-key": "^3.1.0", + "shebang-command": "^2.0.0", + "which": "^2.0.1" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/csstype": { + "version": "3.2.3", + "resolved": "https://registry.npmjs.org/csstype/-/csstype-3.2.3.tgz", + "integrity": "sha512-z1HGKcYy2xA8AGQfwrn0PAy+PB7X/GSj3UVJW9qKyn43xWa+gl5nXmU4qqLMRzWVLFC8KusUX8T/0kCiOYpAIQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/debug": { + "version": "4.4.3", + "resolved": "https://registry.npmjs.org/debug/-/debug-4.4.3.tgz", + "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "dev": true, + "license": "MIT", + "dependencies": { + "ms": "^2.1.3" + }, + "engines": { + "node": ">=6.0" + }, + "peerDependenciesMeta": { + "supports-color": { + "optional": true + } + } + }, + "node_modules/deep-is": { + "version": "0.1.4", + "resolved": "https://registry.npmjs.org/deep-is/-/deep-is-0.1.4.tgz", + "integrity": "sha512-oIPzksmTg4/MriiaYGO+okXDT7ztn/w3Eptv/+gSIdMdKsJo0u4CfYNFJPy+4SKMuCqGw2wxnA+URMg3t8a/bQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/detect-libc": { + "version": "2.1.2", + "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.1.2.tgz", + "integrity": "sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": ">=8" + } + }, + "node_modules/electron-to-chromium": { + "version": "1.5.344", + "resolved": "https://registry.npmjs.org/electron-to-chromium/-/electron-to-chromium-1.5.344.tgz", + "integrity": "sha512-4MxfbmNDm+KPh066EZy+eUnkcDPcZ35wNmOWzFuh/ijvHsve6kbLTLURy88uCNK5FbpN+yk2nQY6BYh1GEt+wg==", + "dev": true, + "license": "ISC" + }, + "node_modules/escalade": { + "version": "3.2.0", + "resolved": "https://registry.npmjs.org/escalade/-/escalade-3.2.0.tgz", + "integrity": "sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/escape-string-regexp": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-4.0.0.tgz", + "integrity": "sha512-TtpcNJ3XAzx3Gq8sWRzJaVajRs0uVxA2YAkdb1jm2YkPz4G6egUFAyA3n5vtEIZefPk5Wa4UXbKuS5fKkJWdgA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/eslint": { + "version": "10.2.1", + "resolved": "https://registry.npmjs.org/eslint/-/eslint-10.2.1.tgz", + "integrity": "sha512-wiyGaKsDgqXvF40P8mDwiUp/KQjE1FdrIEJsM8PZ3XCiniTMXS3OHWWUe5FI5agoCnr8x4xPrTDZuxsBlNHl+Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "@eslint-community/eslint-utils": "^4.8.0", + "@eslint-community/regexpp": "^4.12.2", + "@eslint/config-array": "^0.23.5", + "@eslint/config-helpers": "^0.5.5", + "@eslint/core": "^1.2.1", + "@eslint/plugin-kit": "^0.7.1", + "@humanfs/node": "^0.16.6", + "@humanwhocodes/module-importer": "^1.0.1", + "@humanwhocodes/retry": "^0.4.2", + "@types/estree": "^1.0.6", + "ajv": "^6.14.0", + "cross-spawn": "^7.0.6", + "debug": "^4.3.2", + "escape-string-regexp": "^4.0.0", + "eslint-scope": "^9.1.2", + "eslint-visitor-keys": "^5.0.1", + "espree": "^11.2.0", + "esquery": "^1.7.0", + "esutils": "^2.0.2", + "fast-deep-equal": "^3.1.3", + "file-entry-cache": "^8.0.0", + "find-up": "^5.0.0", + "glob-parent": "^6.0.2", + "ignore": "^5.2.0", + "imurmurhash": "^0.1.4", + "is-glob": "^4.0.0", + "json-stable-stringify-without-jsonify": "^1.0.1", + "minimatch": "^10.2.4", + "natural-compare": "^1.4.0", + "optionator": "^0.9.3" + }, + "bin": { + "eslint": "bin/eslint.js" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + }, + "funding": { + "url": "https://eslint.org/donate" + }, + "peerDependencies": { + "jiti": "*" + }, + "peerDependenciesMeta": { + "jiti": { + "optional": true + } + } + }, + "node_modules/eslint-plugin-react-hooks": { + "version": "7.1.1", + "resolved": "https://registry.npmjs.org/eslint-plugin-react-hooks/-/eslint-plugin-react-hooks-7.1.1.tgz", + "integrity": "sha512-f2I7Gw6JbvCexzIInuSbZpfdQ44D7iqdWX01FKLvrPgqxoE7oMj8clOfto8U6vYiz4yd5oKu39rRSVOe1zRu0g==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/core": "^7.24.4", + "@babel/parser": "^7.24.4", + "hermes-parser": "^0.25.1", + "zod": "^3.25.0 || ^4.0.0", + "zod-validation-error": "^3.5.0 || ^4.0.0" + }, + "engines": { + "node": ">=18" + }, + "peerDependencies": { + "eslint": "^3.0.0 || ^4.0.0 || ^5.0.0 || ^6.0.0 || ^7.0.0 || ^8.0.0-0 || ^9.0.0 || ^10.0.0" + } + }, + "node_modules/eslint-plugin-react-refresh": { + "version": "0.5.2", + "resolved": "https://registry.npmjs.org/eslint-plugin-react-refresh/-/eslint-plugin-react-refresh-0.5.2.tgz", + "integrity": "sha512-hmgTH57GfzoTFjVN0yBwTggnsVUF2tcqi7RJZHqi9lIezSs4eFyAMktA68YD4r5kNw1mxyY4dmkyoFDb3FIqrA==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "eslint": "^9 || ^10" + } + }, + "node_modules/eslint-scope": { + "version": "9.1.2", + "resolved": "https://registry.npmjs.org/eslint-scope/-/eslint-scope-9.1.2.tgz", + "integrity": "sha512-xS90H51cKw0jltxmvmHy2Iai1LIqrfbw57b79w/J7MfvDfkIkFZ+kj6zC3BjtUwh150HsSSdxXZcsuv72miDFQ==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "@types/esrecurse": "^4.3.1", + "@types/estree": "^1.0.8", + "esrecurse": "^4.3.0", + "estraverse": "^5.2.0" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/eslint-visitor-keys": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/eslint-visitor-keys/-/eslint-visitor-keys-5.0.1.tgz", + "integrity": "sha512-tD40eHxA35h0PEIZNeIjkHoDR4YjjJp34biM0mDvplBe//mB+IHCqHDGV7pxF+7MklTvighcCPPZC7ynWyjdTA==", + "dev": true, + "license": "Apache-2.0", + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/espree": { + "version": "11.2.0", + "resolved": "https://registry.npmjs.org/espree/-/espree-11.2.0.tgz", + "integrity": "sha512-7p3DrVEIopW1B1avAGLuCSh1jubc01H2JHc8B4qqGblmg5gI9yumBgACjWo4JlIc04ufug4xJ3SQI8HkS/Rgzw==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "acorn": "^8.16.0", + "acorn-jsx": "^5.3.2", + "eslint-visitor-keys": "^5.0.1" + }, + "engines": { + "node": "^20.19.0 || ^22.13.0 || >=24" + }, + "funding": { + "url": "https://opencollective.com/eslint" + } + }, + "node_modules/esquery": { + "version": "1.7.0", + "resolved": "https://registry.npmjs.org/esquery/-/esquery-1.7.0.tgz", + "integrity": "sha512-Ap6G0WQwcU/LHsvLwON1fAQX9Zp0A2Y6Y/cJBl9r/JbW90Zyg4/zbG6zzKa2OTALELarYHmKu0GhpM5EO+7T0g==", + "dev": true, + "license": "BSD-3-Clause", + "dependencies": { + "estraverse": "^5.1.0" + }, + "engines": { + "node": ">=0.10" + } + }, + "node_modules/esrecurse": { + "version": "4.3.0", + "resolved": "https://registry.npmjs.org/esrecurse/-/esrecurse-4.3.0.tgz", + "integrity": "sha512-KmfKL3b6G+RXvP8N1vr3Tq1kL/oCFgn2NYXEtqP8/L3pKapUA4G8cFVaoF3SU323CD4XypR/ffioHmkti6/Tag==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "estraverse": "^5.2.0" + }, + "engines": { + "node": ">=4.0" + } + }, + "node_modules/estraverse": { + "version": "5.3.0", + "resolved": "https://registry.npmjs.org/estraverse/-/estraverse-5.3.0.tgz", + "integrity": "sha512-MMdARuVEQziNTeJD8DgMqmhwR11BRQ/cBP+pLtYdSTnf3MIO8fFeiINEbX36ZdNlfU/7A9f3gUw49B3oQsvwBA==", + "dev": true, + "license": "BSD-2-Clause", + "engines": { + "node": ">=4.0" + } + }, + "node_modules/esutils": { + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/esutils/-/esutils-2.0.3.tgz", + "integrity": "sha512-kVscqXk4OCp68SZ0dkgEKVi6/8ij300KBWTJq32P/dYeWTSwK41WyTxalN1eRmA5Z9UU/LX9D7FWSmV9SAYx6g==", + "dev": true, + "license": "BSD-2-Clause", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/fast-deep-equal": { + "version": "3.1.3", + "resolved": "https://registry.npmjs.org/fast-deep-equal/-/fast-deep-equal-3.1.3.tgz", + "integrity": "sha512-f3qQ9oQy9j2AhBe/H9VC91wLmKBCCU/gDOnKNAYG5hswO7BLKj09Hc5HYNz9cGI++xlpDCIgDaitVs03ATR84Q==", + "dev": true, + "license": "MIT" + }, + "node_modules/fast-json-stable-stringify": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/fast-json-stable-stringify/-/fast-json-stable-stringify-2.1.0.tgz", + "integrity": "sha512-lhd/wF+Lk98HZoTCtlVraHtfh5XYijIjalXck7saUtuanSDyLMxnHhSXEDJqHxD7msR8D0uCmqlkwjCV8xvwHw==", + "dev": true, + "license": "MIT" + }, + "node_modules/fast-levenshtein": { + "version": "2.0.6", + "resolved": "https://registry.npmjs.org/fast-levenshtein/-/fast-levenshtein-2.0.6.tgz", + "integrity": "sha512-DCXu6Ifhqcks7TZKY3Hxp3y6qphY5SJZmrWMDrKcERSOXWQdMhU9Ig/PYrzyw/ul9jOIyh0N4M0tbC5hodg8dw==", + "dev": true, + "license": "MIT" + }, + "node_modules/fdir": { + "version": "6.5.0", + "resolved": "https://registry.npmjs.org/fdir/-/fdir-6.5.0.tgz", + "integrity": "sha512-tIbYtZbucOs0BRGqPJkshJUYdL+SDH7dVM8gjy+ERp3WAUjLEFJE+02kanyHtwjWOnwrKYBiwAmM0p4kLJAnXg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=12.0.0" + }, + "peerDependencies": { + "picomatch": "^3 || ^4" + }, + "peerDependenciesMeta": { + "picomatch": { + "optional": true + } + } + }, + "node_modules/file-entry-cache": { + "version": "8.0.0", + "resolved": "https://registry.npmjs.org/file-entry-cache/-/file-entry-cache-8.0.0.tgz", + "integrity": "sha512-XXTUwCvisa5oacNGRP9SfNtYBNAMi+RPwBFmblZEF7N7swHYQS6/Zfk7SRwx4D5j3CH211YNRco1DEMNVfZCnQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "flat-cache": "^4.0.0" + }, + "engines": { + "node": ">=16.0.0" + } + }, + "node_modules/find-up": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/find-up/-/find-up-5.0.0.tgz", + "integrity": "sha512-78/PXT1wlLLDgTzDs7sjq9hzz0vXD+zn+7wypEe4fXQxCmdmqfGsEPQxmiCSQI3ajFV91bVSsvNtrJRiW6nGng==", + "dev": true, + "license": "MIT", + "dependencies": { + "locate-path": "^6.0.0", + "path-exists": "^4.0.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/flat-cache": { + "version": "4.0.1", + "resolved": "https://registry.npmjs.org/flat-cache/-/flat-cache-4.0.1.tgz", + "integrity": "sha512-f7ccFPK3SXFHpx15UIGyRJ/FJQctuKZ0zVuN3frBo4HnK3cay9VEW0R6yPYFHC0AgqhukPzKjq22t5DmAyqGyw==", + "dev": true, + "license": "MIT", + "dependencies": { + "flatted": "^3.2.9", + "keyv": "^4.5.4" + }, + "engines": { + "node": ">=16" + } + }, + "node_modules/flatted": { + "version": "3.4.2", + "resolved": "https://registry.npmjs.org/flatted/-/flatted-3.4.2.tgz", + "integrity": "sha512-PjDse7RzhcPkIJwy5t7KPWQSZ9cAbzQXcafsetQoD7sOJRQlGikNbx7yZp2OotDnJyrDcbyRq3Ttb18iYOqkxA==", + "dev": true, + "license": "ISC" + }, + "node_modules/fsevents": { + "version": "2.3.3", + "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz", + "integrity": "sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==", + "dev": true, + "hasInstallScript": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^8.16.0 || ^10.6.0 || >=11.0.0" + } + }, + "node_modules/gensync": { + "version": "1.0.0-beta.2", + "resolved": "https://registry.npmjs.org/gensync/-/gensync-1.0.0-beta.2.tgz", + "integrity": "sha512-3hN7NaskYvMDLQY55gnW3NQ+mesEAepTqlg+VEbj7zzqEMBVNhzcGYYeqFo/TlYz6eQiFcp1HcsCZO+nGgS8zg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/glob-parent": { + "version": "6.0.2", + "resolved": "https://registry.npmjs.org/glob-parent/-/glob-parent-6.0.2.tgz", + "integrity": "sha512-XxwI8EOhVQgWp6iDL+3b0r86f4d6AX6zSU55HfB4ydCEuXLXc5FcYeOu+nnGftS4TEju/11rt4KJPTMgbfmv4A==", + "dev": true, + "license": "ISC", + "dependencies": { + "is-glob": "^4.0.3" + }, + "engines": { + "node": ">=10.13.0" + } + }, + "node_modules/globals": { + "version": "17.5.0", + "resolved": "https://registry.npmjs.org/globals/-/globals-17.5.0.tgz", + "integrity": "sha512-qoV+HK2yFl/366t2/Cb3+xxPUo5BuMynomoDmiaZBIdbs+0pYbjfZU+twLhGKp4uCZ/+NbtpVepH5bGCxRyy2g==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/hermes-estree": { + "version": "0.25.1", + "resolved": "https://registry.npmjs.org/hermes-estree/-/hermes-estree-0.25.1.tgz", + "integrity": "sha512-0wUoCcLp+5Ev5pDW2OriHC2MJCbwLwuRx+gAqMTOkGKJJiBCLjtrvy4PWUGn6MIVefecRpzoOZ/UV6iGdOr+Cw==", + "dev": true, + "license": "MIT" + }, + "node_modules/hermes-parser": { + "version": "0.25.1", + "resolved": "https://registry.npmjs.org/hermes-parser/-/hermes-parser-0.25.1.tgz", + "integrity": "sha512-6pEjquH3rqaI6cYAXYPcz9MS4rY6R4ngRgrgfDshRptUZIc3lw0MCIJIGDj9++mfySOuPTHB4nrSW99BCvOPIA==", + "dev": true, + "license": "MIT", + "dependencies": { + "hermes-estree": "0.25.1" + } + }, + "node_modules/ignore": { + "version": "5.3.2", + "resolved": "https://registry.npmjs.org/ignore/-/ignore-5.3.2.tgz", + "integrity": "sha512-hsBTNUqQTDwkWtcdYI2i06Y/nUBEsNEDJKjWdigLvegy8kDuJAS8uRlpkkcQpyEXL0Z/pjDy5HBmMjRCJ2gq+g==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 4" + } + }, + "node_modules/imurmurhash": { + "version": "0.1.4", + "resolved": "https://registry.npmjs.org/imurmurhash/-/imurmurhash-0.1.4.tgz", + "integrity": "sha512-JmXMZ6wuvDmLiHEml9ykzqO6lwFbof0GG4IkcGaENdCRDDmMVnny7s5HsIgHCbaq0w2MyPhDqkhTUgS2LU2PHA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.8.19" + } + }, + "node_modules/is-extglob": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/is-extglob/-/is-extglob-2.1.1.tgz", + "integrity": "sha512-SbKbANkN603Vi4jEZv49LeVJMn4yGwsbzZworEoyEiutsN3nJYdbO36zfhGJ6QEDpOZIFkDtnq5JRxmvl3jsoQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/is-glob": { + "version": "4.0.3", + "resolved": "https://registry.npmjs.org/is-glob/-/is-glob-4.0.3.tgz", + "integrity": "sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg==", + "dev": true, + "license": "MIT", + "dependencies": { + "is-extglob": "^2.1.1" + }, + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/isexe": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz", + "integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==", + "dev": true, + "license": "ISC" + }, + "node_modules/js-tokens": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/js-tokens/-/js-tokens-4.0.0.tgz", + "integrity": "sha512-RdJUflcE3cUzKiMqQgsCu06FPu9UdIJO0beYbPhHN4k6apgJtifcoCtT9bcxOpYBtpD2kCM6Sbzg4CausW/PKQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/jsesc": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/jsesc/-/jsesc-3.1.0.tgz", + "integrity": "sha512-/sM3dO2FOzXjKQhJuo0Q173wf2KOo8t4I8vHy6lF9poUp7bKT0/NHE8fPX23PwfhnykfqnC2xRxOnVw5XuGIaA==", + "dev": true, + "license": "MIT", + "bin": { + "jsesc": "bin/jsesc" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/json-buffer": { + "version": "3.0.1", + "resolved": "https://registry.npmjs.org/json-buffer/-/json-buffer-3.0.1.tgz", + "integrity": "sha512-4bV5BfR2mqfQTJm+V5tPPdf+ZpuhiIvTuAB5g8kcrXOZpTT/QwwVRWBywX1ozr6lEuPdbHxwaJlm9G6mI2sfSQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/json-schema-traverse": { + "version": "0.4.1", + "resolved": "https://registry.npmjs.org/json-schema-traverse/-/json-schema-traverse-0.4.1.tgz", + "integrity": "sha512-xbbCH5dCYU5T8LcEhhuh7HJ88HXuW3qsI3Y0zOZFKfZEHcpWiHU/Jxzk629Brsab/mMiHQti9wMP+845RPe3Vg==", + "dev": true, + "license": "MIT" + }, + "node_modules/json-stable-stringify-without-jsonify": { + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/json-stable-stringify-without-jsonify/-/json-stable-stringify-without-jsonify-1.0.1.tgz", + "integrity": "sha512-Bdboy+l7tA3OGW6FjyFHWkP5LuByj1Tk33Ljyq0axyzdk9//JSi2u3fP1QSmd1KNwq6VOKYGlAu87CisVir6Pw==", + "dev": true, + "license": "MIT" + }, + "node_modules/json5": { + "version": "2.2.3", + "resolved": "https://registry.npmjs.org/json5/-/json5-2.2.3.tgz", + "integrity": "sha512-XmOWe7eyHYH14cLdVPoyg+GOH3rYX++KpzrylJwSW98t3Nk+U8XOl8FWKOgwtzdb8lXGf6zYwDUzeHMWfxasyg==", + "dev": true, + "license": "MIT", + "bin": { + "json5": "lib/cli.js" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/keyv": { + "version": "4.5.4", + "resolved": "https://registry.npmjs.org/keyv/-/keyv-4.5.4.tgz", + "integrity": "sha512-oxVHkHR/EJf2CNXnWxRLW6mg7JyCCUcG0DtEGmL2ctUo1PNTin1PUil+r/+4r5MpVgC/fn1kjsx7mjSujKqIpw==", + "dev": true, + "license": "MIT", + "dependencies": { + "json-buffer": "3.0.1" + } + }, + "node_modules/levn": { + "version": "0.4.1", + "resolved": "https://registry.npmjs.org/levn/-/levn-0.4.1.tgz", + "integrity": "sha512-+bT2uH4E5LGE7h/n3evcS/sQlJXCpIp6ym8OWJ5eV6+67Dsql/LaaT7qJBAt2rzfoa/5QBGBhxDix1dMt2kQKQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "prelude-ls": "^1.2.1", + "type-check": "~0.4.0" + }, + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/lightningcss": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss/-/lightningcss-1.32.0.tgz", + "integrity": "sha512-NXYBzinNrblfraPGyrbPoD19C1h9lfI/1mzgWYvXUTe414Gz/X1FD2XBZSZM7rRTrMA8JL3OtAaGifrIKhQ5yQ==", + "dev": true, + "license": "MPL-2.0", + "dependencies": { + "detect-libc": "^2.0.3" + }, + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + }, + "optionalDependencies": { + "lightningcss-android-arm64": "1.32.0", + "lightningcss-darwin-arm64": "1.32.0", + "lightningcss-darwin-x64": "1.32.0", + "lightningcss-freebsd-x64": "1.32.0", + "lightningcss-linux-arm-gnueabihf": "1.32.0", + "lightningcss-linux-arm64-gnu": "1.32.0", + "lightningcss-linux-arm64-musl": "1.32.0", + "lightningcss-linux-x64-gnu": "1.32.0", + "lightningcss-linux-x64-musl": "1.32.0", + "lightningcss-win32-arm64-msvc": "1.32.0", + "lightningcss-win32-x64-msvc": "1.32.0" + } + }, + "node_modules/lightningcss-android-arm64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-android-arm64/-/lightningcss-android-arm64-1.32.0.tgz", + "integrity": "sha512-YK7/ClTt4kAK0vo6w3X+Pnm0D2cf2vPHbhOXdoNti1Ga0al1P4TBZhwjATvjNwLEBCnKvjJc2jQgHXH0NEwlAg==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-darwin-arm64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-darwin-arm64/-/lightningcss-darwin-arm64-1.32.0.tgz", + "integrity": "sha512-RzeG9Ju5bag2Bv1/lwlVJvBE3q6TtXskdZLLCyfg5pt+HLz9BqlICO7LZM7VHNTTn/5PRhHFBSjk5lc4cmscPQ==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-darwin-x64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-darwin-x64/-/lightningcss-darwin-x64-1.32.0.tgz", + "integrity": "sha512-U+QsBp2m/s2wqpUYT/6wnlagdZbtZdndSmut/NJqlCcMLTWp5muCrID+K5UJ6jqD2BFshejCYXniPDbNh73V8w==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-freebsd-x64": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-freebsd-x64/-/lightningcss-freebsd-x64-1.32.0.tgz", + "integrity": "sha512-JCTigedEksZk3tHTTthnMdVfGf61Fky8Ji2E4YjUTEQX14xiy/lTzXnu1vwiZe3bYe0q+SpsSH/CTeDXK6WHig==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm-gnueabihf": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm-gnueabihf/-/lightningcss-linux-arm-gnueabihf-1.32.0.tgz", + "integrity": "sha512-x6rnnpRa2GL0zQOkt6rts3YDPzduLpWvwAF6EMhXFVZXD4tPrBkEFqzGowzCsIWsPjqSK+tyNEODUBXeeVHSkw==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm64-gnu": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm64-gnu/-/lightningcss-linux-arm64-gnu-1.32.0.tgz", + "integrity": "sha512-0nnMyoyOLRJXfbMOilaSRcLH3Jw5z9HDNGfT/gwCPgaDjnx0i8w7vBzFLFR1f6CMLKF8gVbebmkUN3fa/kQJpQ==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-arm64-musl": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-arm64-musl/-/lightningcss-linux-arm64-musl-1.32.0.tgz", + "integrity": "sha512-UpQkoenr4UJEzgVIYpI80lDFvRmPVg6oqboNHfoH4CQIfNA+HOrZ7Mo7KZP02dC6LjghPQJeBsvXhJod/wnIBg==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-x64-gnu": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-x64-gnu/-/lightningcss-linux-x64-gnu-1.32.0.tgz", + "integrity": "sha512-V7Qr52IhZmdKPVr+Vtw8o+WLsQJYCTd8loIfpDaMRWGUZfBOYEJeyJIkqGIDMZPwPx24pUMfwSxxI8phr/MbOA==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-linux-x64-musl": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-linux-x64-musl/-/lightningcss-linux-x64-musl-1.32.0.tgz", + "integrity": "sha512-bYcLp+Vb0awsiXg/80uCRezCYHNg1/l3mt0gzHnWV9XP1W5sKa5/TCdGWaR/zBM2PeF/HbsQv/j2URNOiVuxWg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-win32-arm64-msvc": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-win32-arm64-msvc/-/lightningcss-win32-arm64-msvc-1.32.0.tgz", + "integrity": "sha512-8SbC8BR40pS6baCM8sbtYDSwEVQd4JlFTOlaD3gWGHfThTcABnNDBda6eTZeqbofalIJhFx0qKzgHJmcPTnGdw==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/lightningcss-win32-x64-msvc": { + "version": "1.32.0", + "resolved": "https://registry.npmjs.org/lightningcss-win32-x64-msvc/-/lightningcss-win32-x64-msvc-1.32.0.tgz", + "integrity": "sha512-Amq9B/SoZYdDi1kFrojnoqPLxYhQ4Wo5XiL8EVJrVsB8ARoC1PWW6VGtT0WKCemjy8aC+louJnjS7U18x3b06Q==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MPL-2.0", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">= 12.0.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/parcel" + } + }, + "node_modules/locate-path": { + "version": "6.0.0", + "resolved": "https://registry.npmjs.org/locate-path/-/locate-path-6.0.0.tgz", + "integrity": "sha512-iPZK6eYjbxRu3uB4/WZ3EsEIMJFMqAoopl3R+zuq0UjcAm/MO6KCweDgPfP3elTztoKP3KtnVHxTn2NHBSDVUw==", + "dev": true, + "license": "MIT", + "dependencies": { + "p-locate": "^5.0.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/lru-cache": { + "version": "5.1.1", + "resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-5.1.1.tgz", + "integrity": "sha512-KpNARQA3Iwv+jTA0utUVVbrh+Jlrr1Fv0e56GGzAFOXN7dk/FviaDW8LHmK52DlcH4WP2n6gI8vN1aesBFgo9w==", + "dev": true, + "license": "ISC", + "dependencies": { + "yallist": "^3.0.2" + } + }, + "node_modules/minimatch": { + "version": "10.2.5", + "resolved": "https://registry.npmjs.org/minimatch/-/minimatch-10.2.5.tgz", + "integrity": "sha512-MULkVLfKGYDFYejP07QOurDLLQpcjk7Fw+7jXS2R2czRQzR56yHRveU5NDJEOviH+hETZKSkIk5c+T23GjFUMg==", + "dev": true, + "license": "BlueOak-1.0.0", + "dependencies": { + "brace-expansion": "^5.0.5" + }, + "engines": { + "node": "18 || 20 || >=22" + }, + "funding": { + "url": "https://github.com/sponsors/isaacs" + } + }, + "node_modules/ms": { + "version": "2.1.3", + "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz", + "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "dev": true, + "license": "MIT" + }, + "node_modules/nanoid": { + "version": "3.3.11", + "resolved": "https://registry.npmjs.org/nanoid/-/nanoid-3.3.11.tgz", + "integrity": "sha512-N8SpfPUnUp1bK+PMYW8qSWdl9U+wwNWI4QKxOYDy9JAro3WMX7p2OeVRF9v+347pnakNevPmiHhNmZ2HbFA76w==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "bin": { + "nanoid": "bin/nanoid.cjs" + }, + "engines": { + "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" + } + }, + "node_modules/natural-compare": { + "version": "1.4.0", + "resolved": "https://registry.npmjs.org/natural-compare/-/natural-compare-1.4.0.tgz", + "integrity": "sha512-OWND8ei3VtNC9h7V60qff3SVobHr996CTwgxubgyQYEpg290h9J0buyECNNJexkFm5sOajh5G116RYA1c8ZMSw==", + "dev": true, + "license": "MIT" + }, + "node_modules/node-releases": { + "version": "2.0.38", + "resolved": "https://registry.npmjs.org/node-releases/-/node-releases-2.0.38.tgz", + "integrity": "sha512-3qT/88Y3FbH/Kx4szpQQ4HzUbVrHPKTLVpVocKiLfoYvw9XSGOX2FmD2d6DrXbVYyAQTF2HeF6My8jmzx7/CRw==", + "dev": true, + "license": "MIT" + }, + "node_modules/optionator": { + "version": "0.9.4", + "resolved": "https://registry.npmjs.org/optionator/-/optionator-0.9.4.tgz", + "integrity": "sha512-6IpQ7mKUxRcZNLIObR0hz7lxsapSSIYNZJwXPGeF0mTVqGKFIXj1DQcMoT22S3ROcLyY/rz0PWaWZ9ayWmad9g==", + "dev": true, + "license": "MIT", + "dependencies": { + "deep-is": "^0.1.3", + "fast-levenshtein": "^2.0.6", + "levn": "^0.4.1", + "prelude-ls": "^1.2.1", + "type-check": "^0.4.0", + "word-wrap": "^1.2.5" + }, + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/p-limit": { + "version": "3.1.0", + "resolved": "https://registry.npmjs.org/p-limit/-/p-limit-3.1.0.tgz", + "integrity": "sha512-TYOanM3wGwNGsZN2cVTYPArw454xnXj5qmWF1bEoAc4+cU/ol7GVh7odevjp1FNHduHc3KZMcFduxU5Xc6uJRQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "yocto-queue": "^0.1.0" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/p-locate": { + "version": "5.0.0", + "resolved": "https://registry.npmjs.org/p-locate/-/p-locate-5.0.0.tgz", + "integrity": "sha512-LaNjtRWUBY++zB5nE/NwcaoMylSPk+S+ZHNB1TzdbMJMny6dynpAGt7X/tl/QYq3TIeE6nxHppbo2LGymrG5Pw==", + "dev": true, + "license": "MIT", + "dependencies": { + "p-limit": "^3.0.2" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/path-exists": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/path-exists/-/path-exists-4.0.0.tgz", + "integrity": "sha512-ak9Qy5Q7jYb2Wwcey5Fpvg2KoAc/ZIhLSLOSBmRmygPsGwkVVt0fZa0qrtMz+m6tJTAHfZQ8FnmB4MG4LWy7/w==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/path-key": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz", + "integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/picocolors": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/picocolors/-/picocolors-1.1.1.tgz", + "integrity": "sha512-xceH2snhtb5M9liqDsmEw56le376mTZkEX/jEb/RxNFyegNul7eNslCXP9FDj/Lcu0X8KEyMceP2ntpaHrDEVA==", + "dev": true, + "license": "ISC" + }, + "node_modules/picomatch": { + "version": "4.0.4", + "resolved": "https://registry.npmjs.org/picomatch/-/picomatch-4.0.4.tgz", + "integrity": "sha512-QP88BAKvMam/3NxH6vj2o21R6MjxZUAd6nlwAS/pnGvN9IVLocLHxGYIzFhg6fUQ+5th6P4dv4eW9jX3DSIj7A==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/sponsors/jonschlinkert" + } + }, + "node_modules/postcss": { + "version": "8.5.10", + "resolved": "https://registry.npmjs.org/postcss/-/postcss-8.5.10.tgz", + "integrity": "sha512-pMMHxBOZKFU6HgAZ4eyGnwXF/EvPGGqUr0MnZ5+99485wwW41kW91A4LOGxSHhgugZmSChL5AlElNdwlNgcnLQ==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/postcss/" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/postcss" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "nanoid": "^3.3.11", + "picocolors": "^1.1.1", + "source-map-js": "^1.2.1" + }, + "engines": { + "node": "^10 || ^12 || >=14" + } + }, + "node_modules/prelude-ls": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/prelude-ls/-/prelude-ls-1.2.1.tgz", + "integrity": "sha512-vkcDPrRZo1QZLbn5RLGPpg/WmIQ65qoWWhcGKf/b5eplkkarX0m9z8ppCat4mlOqUsWpyNuYgO3VRyrYHSzX5g==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/punycode": { + "version": "2.3.1", + "resolved": "https://registry.npmjs.org/punycode/-/punycode-2.3.1.tgz", + "integrity": "sha512-vYt7UD1U9Wg6138shLtLOvdAu+8DsC/ilFtEVHcH+wydcSpNE20AfSOduf6MkRFahL5FY7X1oU7nKVZFtfq8Fg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/react": { + "version": "19.2.5", + "resolved": "https://registry.npmjs.org/react/-/react-19.2.5.tgz", + "integrity": "sha512-llUJLzz1zTUBrskt2pwZgLq59AemifIftw4aB7JxOqf1HY2FDaGDxgwpAPVzHU1kdWabH7FauP4i1oEeer2WCA==", + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/react-dom": { + "version": "19.2.5", + "resolved": "https://registry.npmjs.org/react-dom/-/react-dom-19.2.5.tgz", + "integrity": "sha512-J5bAZz+DXMMwW/wV3xzKke59Af6CHY7G4uYLN1OvBcKEsWOs4pQExj86BBKamxl/Ik5bx9whOrvBlSDfWzgSag==", + "license": "MIT", + "dependencies": { + "scheduler": "^0.27.0" + }, + "peerDependencies": { + "react": "^19.2.5" + } + }, + "node_modules/rolldown": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/rolldown/-/rolldown-1.0.0-rc.17.tgz", + "integrity": "sha512-ZrT53oAKrtA4+YtBWPQbtPOxIbVDbxT0orcYERKd63VJTF13zPcgXTvD4843L8pcsI7M6MErt8QtON6lrB9tyA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@oxc-project/types": "=0.127.0", + "@rolldown/pluginutils": "1.0.0-rc.17" + }, + "bin": { + "rolldown": "bin/cli.mjs" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + }, + "optionalDependencies": { + "@rolldown/binding-android-arm64": "1.0.0-rc.17", + "@rolldown/binding-darwin-arm64": "1.0.0-rc.17", + "@rolldown/binding-darwin-x64": "1.0.0-rc.17", + "@rolldown/binding-freebsd-x64": "1.0.0-rc.17", + "@rolldown/binding-linux-arm-gnueabihf": "1.0.0-rc.17", + "@rolldown/binding-linux-arm64-gnu": "1.0.0-rc.17", + "@rolldown/binding-linux-arm64-musl": "1.0.0-rc.17", + "@rolldown/binding-linux-ppc64-gnu": "1.0.0-rc.17", + "@rolldown/binding-linux-s390x-gnu": "1.0.0-rc.17", + "@rolldown/binding-linux-x64-gnu": "1.0.0-rc.17", + "@rolldown/binding-linux-x64-musl": "1.0.0-rc.17", + "@rolldown/binding-openharmony-arm64": "1.0.0-rc.17", + "@rolldown/binding-wasm32-wasi": "1.0.0-rc.17", + "@rolldown/binding-win32-arm64-msvc": "1.0.0-rc.17", + "@rolldown/binding-win32-x64-msvc": "1.0.0-rc.17" + } + }, + "node_modules/rolldown/node_modules/@rolldown/pluginutils": { + "version": "1.0.0-rc.17", + "resolved": "https://registry.npmjs.org/@rolldown/pluginutils/-/pluginutils-1.0.0-rc.17.tgz", + "integrity": "sha512-n8iosDOt6Ig1UhJ2AYqoIhHWh/isz0xpicHTzpKBeotdVsTEcxsSA/i3EVM7gQAj0rU27OLAxCjzlj15IWY7bg==", + "dev": true, + "license": "MIT" + }, + "node_modules/scheduler": { + "version": "0.27.0", + "resolved": "https://registry.npmjs.org/scheduler/-/scheduler-0.27.0.tgz", + "integrity": "sha512-eNv+WrVbKu1f3vbYJT/xtiF5syA5HPIMtf9IgY/nKg0sWqzAUEvqY/xm7OcZc/qafLx/iO9FgOmeSAp4v5ti/Q==", + "license": "MIT" + }, + "node_modules/semver": { + "version": "6.3.1", + "resolved": "https://registry.npmjs.org/semver/-/semver-6.3.1.tgz", + "integrity": "sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==", + "dev": true, + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + } + }, + "node_modules/shebang-command": { + "version": "2.0.0", + "resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz", + "integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==", + "dev": true, + "license": "MIT", + "dependencies": { + "shebang-regex": "^3.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/shebang-regex": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz", + "integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + } + }, + "node_modules/source-map-js": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/source-map-js/-/source-map-js-1.2.1.tgz", + "integrity": "sha512-UXWMKhLOwVKb728IUtQPXxfYU+usdybtUrK/8uGE8CQMvrhOpwvzDBwj0QhSL7MQc7vIsISBG8VQ8+IDQxpfQA==", + "dev": true, + "license": "BSD-3-Clause", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/tinyglobby": { + "version": "0.2.16", + "resolved": "https://registry.npmjs.org/tinyglobby/-/tinyglobby-0.2.16.tgz", + "integrity": "sha512-pn99VhoACYR8nFHhxqix+uvsbXineAasWm5ojXoN8xEwK5Kd3/TrhNn1wByuD52UxWRLy8pu+kRMniEi6Eq9Zg==", + "dev": true, + "license": "MIT", + "dependencies": { + "fdir": "^6.5.0", + "picomatch": "^4.0.4" + }, + "engines": { + "node": ">=12.0.0" + }, + "funding": { + "url": "https://github.com/sponsors/SuperchupuDev" + } + }, + "node_modules/tslib": { + "version": "2.8.1", + "resolved": "https://registry.npmjs.org/tslib/-/tslib-2.8.1.tgz", + "integrity": "sha512-oJFu94HQb+KVduSUQL7wnpmqnfmLsOA/nAh6b6EH0wCEoK0/mPeXU6c3wKDV83MkOuHPRHtSXKKU99IBazS/2w==", + "dev": true, + "license": "0BSD", + "optional": true + }, + "node_modules/type-check": { + "version": "0.4.0", + "resolved": "https://registry.npmjs.org/type-check/-/type-check-0.4.0.tgz", + "integrity": "sha512-XleUoc9uwGXqjWwXaUTZAmzMcFZ5858QA2vvx1Ur5xIcixXIP+8LnFDgRplU30us6teqdlskFfu+ae4K79Ooew==", + "dev": true, + "license": "MIT", + "dependencies": { + "prelude-ls": "^1.2.1" + }, + "engines": { + "node": ">= 0.8.0" + } + }, + "node_modules/update-browserslist-db": { + "version": "1.2.3", + "resolved": "https://registry.npmjs.org/update-browserslist-db/-/update-browserslist-db-1.2.3.tgz", + "integrity": "sha512-Js0m9cx+qOgDxo0eMiFGEueWztz+d4+M3rGlmKPT+T4IS/jP4ylw3Nwpu6cpTTP8R1MAC1kF4VbdLt3ARf209w==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/browserslist" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "escalade": "^3.2.0", + "picocolors": "^1.1.1" + }, + "bin": { + "update-browserslist-db": "cli.js" + }, + "peerDependencies": { + "browserslist": ">= 4.21.0" + } + }, + "node_modules/uri-js": { + "version": "4.4.1", + "resolved": "https://registry.npmjs.org/uri-js/-/uri-js-4.4.1.tgz", + "integrity": "sha512-7rKUyy33Q1yc98pQ1DAmLtwX109F7TIfWlW1Ydo8Wl1ii1SeHieeh0HHfPeL2fMXK6z0s8ecKs9frCuLJvndBg==", + "dev": true, + "license": "BSD-2-Clause", + "dependencies": { + "punycode": "^2.1.0" + } + }, + "node_modules/vite": { + "version": "8.0.10", + "resolved": "https://registry.npmjs.org/vite/-/vite-8.0.10.tgz", + "integrity": "sha512-rZuUu9j6J5uotLDs+cAA4O5H4K1SfPliUlQwqa6YEwSrWDZzP4rhm00oJR5snMewjxF5V/K3D4kctsUTsIU9Mw==", + "dev": true, + "license": "MIT", + "dependencies": { + "lightningcss": "^1.32.0", + "picomatch": "^4.0.4", + "postcss": "^8.5.10", + "rolldown": "1.0.0-rc.17", + "tinyglobby": "^0.2.16" + }, + "bin": { + "vite": "bin/vite.js" + }, + "engines": { + "node": "^20.19.0 || >=22.12.0" + }, + "funding": { + "url": "https://github.com/vitejs/vite?sponsor=1" + }, + "optionalDependencies": { + "fsevents": "~2.3.3" + }, + "peerDependencies": { + "@types/node": "^20.19.0 || >=22.12.0", + "@vitejs/devtools": "^0.1.0", + "esbuild": "^0.27.0 || ^0.28.0", + "jiti": ">=1.21.0", + "less": "^4.0.0", + "sass": "^1.70.0", + "sass-embedded": "^1.70.0", + "stylus": ">=0.54.8", + "sugarss": "^5.0.0", + "terser": "^5.16.0", + "tsx": "^4.8.1", + "yaml": "^2.4.2" + }, + "peerDependenciesMeta": { + "@types/node": { + "optional": true + }, + "@vitejs/devtools": { + "optional": true + }, + "esbuild": { + "optional": true + }, + "jiti": { + "optional": true + }, + "less": { + "optional": true + }, + "sass": { + "optional": true + }, + "sass-embedded": { + "optional": true + }, + "stylus": { + "optional": true + }, + "sugarss": { + "optional": true + }, + "terser": { + "optional": true + }, + "tsx": { + "optional": true + }, + "yaml": { + "optional": true + } + } + }, + "node_modules/which": { + "version": "2.0.2", + "resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz", + "integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==", + "dev": true, + "license": "ISC", + "dependencies": { + "isexe": "^2.0.0" + }, + "bin": { + "node-which": "bin/node-which" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/word-wrap": { + "version": "1.2.5", + "resolved": "https://registry.npmjs.org/word-wrap/-/word-wrap-1.2.5.tgz", + "integrity": "sha512-BN22B5eaMMI9UMtjrGd5g5eCYPpCPDUy0FJXbYsaT5zYxjFOckS53SQDE3pWkVoWpHXVb3BrYcEN4Twa55B5cA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/yallist": { + "version": "3.1.1", + "resolved": "https://registry.npmjs.org/yallist/-/yallist-3.1.1.tgz", + "integrity": "sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==", + "dev": true, + "license": "ISC" + }, + "node_modules/yocto-queue": { + "version": "0.1.0", + "resolved": "https://registry.npmjs.org/yocto-queue/-/yocto-queue-0.1.0.tgz", + "integrity": "sha512-rVksvsnNCdJ/ohGc6xgPwyN8eheCxsiLM8mxuE/t/mOVqJewPuO1miLpTHQiRgTKCLexL4MeAFVagts7HmNZ2Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/zod": { + "version": "4.3.6", + "resolved": "https://registry.npmjs.org/zod/-/zod-4.3.6.tgz", + "integrity": "sha512-rftlrkhHZOcjDwkGlnUtZZkvaPHCsDATp4pGpuOOMDaTdDDXF91wuVDJoWoPsKX/3YPQ5fHuF3STjcYyKr+Qhg==", + "dev": true, + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/colinhacks" + } + }, + "node_modules/zod-validation-error": { + "version": "4.0.2", + "resolved": "https://registry.npmjs.org/zod-validation-error/-/zod-validation-error-4.0.2.tgz", + "integrity": "sha512-Q6/nZLe6jxuU80qb/4uJ4t5v2VEZ44lzQjPDhYJNztRQ4wyWc6VF3D3Kb/fAuPetZQnhS3hnajCf9CsWesghLQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=18.0.0" + }, + "peerDependencies": { + "zod": "^3.25.0 || ^4.0.0" + } + } + } +} diff --git a/demo/package.json b/demo/package.json new file mode 100644 index 0000000..56abde1 --- /dev/null +++ b/demo/package.json @@ -0,0 +1,27 @@ +{ + "name": "demo", + "private": true, + "version": "0.0.0", + "type": "module", + "scripts": { + "dev": "vite", + "build": "vite build", + "lint": "eslint .", + "preview": "vite preview" + }, + "dependencies": { + "react": "^19.2.5", + "react-dom": "^19.2.5" + }, + "devDependencies": { + "@eslint/js": "^10.0.1", + "@types/react": "^19.2.14", + "@types/react-dom": "^19.2.3", + "@vitejs/plugin-react": "^6.0.1", + "eslint": "^10.2.1", + "eslint-plugin-react-hooks": "^7.1.1", + "eslint-plugin-react-refresh": "^0.5.2", + "globals": "^17.5.0", + "vite": "^8.0.10" + } +} diff --git a/demo/src/App.jsx b/demo/src/App.jsx new file mode 100644 index 0000000..cf2f76b --- /dev/null +++ b/demo/src/App.jsx @@ -0,0 +1,47 @@ +import { useState } from 'react'; +import Sidebar from './components/Sidebar'; +import TopBar from './components/TopBar'; +import Home from './pages/Home'; +import ProjectOverview from './pages/ProjectOverview'; +import FailureMap from './pages/FailureMap'; +import PatchPlan from './pages/PatchPlan'; +import IterationRunner from './pages/IterationRunner'; +import ImprovementReport from './pages/ImprovementReport'; +import PlatformMemory from './pages/PlatformMemory'; +import { NvexRuntimeProvider } from './data/NvexRuntimeContext'; + +const PAGES = { + home: Home, + overview: ProjectOverview, + failure: FailureMap, + patch: PatchPlan, + runner: IterationRunner, + report: ImprovementReport, + memory: PlatformMemory, +}; + +export default function App() { + const [page, setPage] = useState('home'); + const PageComponent = PAGES[page] || Home; + + return ( + + <> +
+
+
+
+
+
+ +
+ +
+ +
+
+
+ + + ); +} diff --git a/demo/src/components/AgentReasoningPanel.jsx b/demo/src/components/AgentReasoningPanel.jsx new file mode 100644 index 0000000..c5c03cd --- /dev/null +++ b/demo/src/components/AgentReasoningPanel.jsx @@ -0,0 +1,180 @@ +/** + * AgentReasoningPanel + * ------------------- + * Renders the step-by-step reasoning log from a SelfImprovementAgent run. + * + * Props: + * agentRun — AgentRunState object (from /api/demo/agent or /api/agent/:id/status) + * onAdvance — callback fired when the user clicks "Next Step" (demo mode) + */ + +const STEP_ICONS = { + eval: '📊', + diagnose: '🔬', + plan: '📋', + dispatch: '🚀', + verify: '✅', + memory: '💾', + stop_check: '🛑', +}; + +const STATUS_CLASS = { + pending: 'idle', + running: 'active', + completed: 'done', + failed: 'error', + skipped: 'idle', +}; + +const EVENT_ICONS = { + run_started: '▶', + iteration_started: '↻', + step_started: '•', + step_completed: '✓', + iteration_completed: '▣', + rollback: '⤺', + run_completed: '🏁', + run_stopped: '🛑', +}; + +function formatTime(isoValue) { + if (!isoValue) return '--:--:--'; + try { + return new Date(isoValue).toLocaleTimeString(); + } catch { + return '--:--:--'; + } +} + +function StepRow({ step }) { + const icon = STEP_ICONS[step.step_type] || '•'; + const cls = STATUS_CLASS[step.status] || 'idle'; + return ( +
+ {icon} +
+
{step.label}
+ {step.message && step.status !== 'pending' && ( +
{step.message}
+ )} +
+
+ {step.status === 'completed' ? '✓' : step.status === 'running' ? '…' : step.status} +
+
+ ); +} + +export default function AgentReasoningPanel({ agentRun, onAdvance, onStream, onPause, isStreaming = false }) { + if (!agentRun) { + return ( +
+
Agent Reasoning
+

Start Auto-Improve to see the agent reasoning step by step.

+
+ ); + } + + const { iterations = [], status, stop_reason, reasoning_log = [], current_iteration, events = [] } = agentRun; + const isDone = status === 'completed' || status === 'stopped'; + + return ( +
+ {/* Header row */} +
+
+
Agent Reasoning
+
+ {isDone + ? (stop_reason || 'Run complete.') + : `Loop ${current_iteration} of ${iterations.length} — ${status}`} +
+
+
+ {!isDone && onAdvance && ( + + )} + {!isDone && onStream && !isStreaming && ( + + )} + {!isDone && onPause && isStreaming && ( + + )} +
+
+ + {events.length > 0 && ( +
+
Streaming Timeline
+
+ {events.slice(-14).map((event) => ( +
+ {EVENT_ICONS[event.event_type] || '•'} +
+
{event.label}
+ {event.message &&
{event.message}
} +
+
+ {event.duration_ms != null && {(event.duration_ms / 1000).toFixed(1)}s} + {formatTime(event.occurred_at)} +
+
+ ))} +
+
+ )} + + {/* Per-loop step lists */} + {iterations.map((loop) => ( +
+
+ Loop {loop.iteration_index} + + {loop.patch_cluster} · {loop.patch_strategy} + {loop.eval_after != null && ( + + {' '}· {Math.round(loop.eval_before * 100)}%{' '} + + → {Math.round(loop.eval_after * 100)}% + + {loop.delta != null && ( + + {' '}({loop.delta >= 0 ? '+' : ''}{Math.round(loop.delta * 100)}pp) + + )} + {loop.rolled_back && · rolled back} + + )} + +
+ {loop.rollback_reason && ( +
{loop.rollback_reason}
+ )} +
+ {loop.steps.map((step) => ( + + ))} +
+
+ ))} + + {/* Raw reasoning log (collapsible) */} + {reasoning_log.length > 0 && ( +
+ Full reasoning log ({reasoning_log.length} entries) +
+ {reasoning_log.map((entry, i) => ( +
{entry}
+ ))} +
+
+ )} +
+ ); +} diff --git a/demo/src/components/AssetCard.jsx b/demo/src/components/AssetCard.jsx new file mode 100644 index 0000000..c6c8c32 --- /dev/null +++ b/demo/src/components/AssetCard.jsx @@ -0,0 +1,21 @@ +const TONE_CLASS = { + blue: 'badge-blue', + cyan: 'badge-cyan', + green: 'badge-green', + red: 'badge-red', + orange: 'badge-orange', + yellow: 'badge-yellow', +}; + +export default function AssetCard({ label, value, tone = 'blue', sub }) { + return ( +
+
+
{label}
+ {tone} +
+
{value}
+ {sub ?
{sub}
: null} +
+ ); +} \ No newline at end of file diff --git a/demo/src/components/FailureCluster.jsx b/demo/src/components/FailureCluster.jsx new file mode 100644 index 0000000..ae9978e --- /dev/null +++ b/demo/src/components/FailureCluster.jsx @@ -0,0 +1,28 @@ +const SEV_BADGE = { + critical: 'badge-red', + high: 'badge-orange', + medium: 'badge-yellow', + low: 'badge-blue', +}; + +export default function FailureCluster({ cluster }) { + const { id, label, pct, count, color, sev } = cluster; + return ( +
+
+
+ {id} +
+
{label}
+ + {sev} + +
+
{pct}%
+
{count} episodes
+
+
+
+
+ ); +} diff --git a/demo/src/components/KPICard.jsx b/demo/src/components/KPICard.jsx new file mode 100644 index 0000000..2b60cdd --- /dev/null +++ b/demo/src/components/KPICard.jsx @@ -0,0 +1,13 @@ +export default function KPICard({ title, value, sub, accentColor, children }) { + return ( +
+ {accentColor && ( +
+ )} +
{title}
+
{value}
+ {sub &&
{sub}
} + {children} +
+ ); +} diff --git a/demo/src/components/MultiIterationChart.jsx b/demo/src/components/MultiIterationChart.jsx new file mode 100644 index 0000000..a17038b --- /dev/null +++ b/demo/src/components/MultiIterationChart.jsx @@ -0,0 +1,175 @@ +/** + * MultiIterationChart + * -------------------- + * A pure-CSS/SVG bar chart showing success rate across agent loop iterations. + * No external charting library required. + * + * Props: + * iterations — array of { iteration_index, eval_before, eval_after, patch_cluster } + * targetKpi — number (0-1), draws a horizontal target line + */ + +const CHART_H = 120; +const BAR_W = 48; +const GAP = 24; + +export default function MultiIterationChart({ iterations = [], targetKpi = 0.75 }) { + if (!iterations.length) { + return ( +
+
Multi-Loop Progress
+

No iteration data yet. Run Auto-Improve to populate this chart.

+
+ ); + } + + // Build a flat list of data points: one "before" + one "after" per loop + const points = []; + iterations.forEach((loop, idx) => { + if (idx === 0) { + points.push({ label: `ckpt_v0.${idx}`, value: loop.eval_before, type: 'before' }); + } + if (loop.eval_after != null) { + points.push({ + label: `Loop ${loop.iteration_index}`, + value: loop.eval_after, + type: 'after', + cluster: loop.patch_cluster, + }); + } + }); + + const maxVal = 1.0; + const svgW = points.length * (BAR_W + GAP) + GAP; + + function barY(val) { + return CHART_H - Math.round((val / maxVal) * CHART_H); + } + function barH(val) { + return Math.round((val / maxVal) * CHART_H); + } + + const targetY = barY(targetKpi); + + return ( +
+
Multi-Loop Progress
+ +
+ + {/* Grid lines */} + {[0.25, 0.5, 0.75, 1.0].map((v) => { + const y = barY(v); + return ( + + + + {Math.round(v * 100)}% + + + ); + })} + + {/* Target KPI line */} + + + Target {Math.round(targetKpi * 100)}% + + + {/* Bars */} + {points.map((pt, i) => { + const x = GAP + i * (BAR_W + GAP); + const y = barY(pt.value); + const h = barH(pt.value); + const pct = Math.round(pt.value * 100); + const loopData = pt.type === 'after' + ? iterations.find((loop) => loop.iteration_index === Number(pt.label.replace('Loop ', ''))) + : null; + const fill = pt.type === 'before' + ? 'rgba(99,102,241,0.4)' + : loopData?.delta != null && loopData.delta < 0 + ? 'rgba(244,63,94,0.7)' + : pt.value >= targetKpi + ? 'rgba(16,185,129,0.7)' + : 'rgba(99,102,241,0.65)'; + return ( + + + {/* Value label on top */} + + {pct}% + + {/* X-axis label */} + + {pt.label} + + {loopData?.rolled_back && ( + + rollback + + )} + + ); + })} + + {/* Connector lines between bars */} + {points.map((pt, i) => { + if (i === 0) return null; + const x1 = GAP + (i - 1) * (BAR_W + GAP) + BAR_W; + const y1 = barY(points[i - 1].value); + const x2 = GAP + i * (BAR_W + GAP); + const y2 = barY(pt.value); + return ( + + ); + })} + +
+ + {/* Legend row */} +
+ {points + .filter((p) => p.cluster) + .map((p, i) => ( +
+ Loop {p.label.replace('Loop ', '')} + · + {p.cluster} + · + {Math.round(p.value * 100)}% +
+ ))} +
+
+ ); +} diff --git a/demo/src/components/RadarChart.jsx b/demo/src/components/RadarChart.jsx new file mode 100644 index 0000000..8cbd1ec --- /dev/null +++ b/demo/src/components/RadarChart.jsx @@ -0,0 +1,63 @@ +export default function RadarChart({ data }) { + const cx = 120, cy = 110, r = 80; + const n = data.length; + const pts = data.map((d, i) => { + const angle = (i * 2 * Math.PI / n) - Math.PI / 2; + const fr = (d.pct / 100) * r; + return { + x: cx + fr * Math.cos(angle), + y: cy + fr * Math.sin(angle), + lx: cx + (r + 18) * Math.cos(angle), + ly: cy + (r + 18) * Math.sin(angle), + ...d, + }; + }); + + const polyPoints = pts.map(p => `${p.x.toFixed(1)},${p.y.toFixed(1)}`).join(' '); + + // Grid circles + const grids = [0.25, 0.5, 0.75, 1].map(f => ({ + r: r * f, + cx, cy, + })); + + return ( + + {grids.map((g, i) => ( + + ))} + {pts.map((p, i) => { + const angle = (i * 2 * Math.PI / n) - Math.PI / 2; + return ( + + ); + })} + + {pts.map((p, i) => ( + + + + {p.name.replace('libero_', '')} + + + ))} + + ); +} diff --git a/demo/src/components/Sidebar.jsx b/demo/src/components/Sidebar.jsx new file mode 100644 index 0000000..6a45a3c --- /dev/null +++ b/demo/src/components/Sidebar.jsx @@ -0,0 +1,55 @@ +const NAV_ITEMS = [ + { id: 'home', label: 'Project Hub', badge: null, + icon: }, + { id: 'overview', label: 'Project Overview', badge: null, + icon: }, + { id: 'failure', label: 'Failure Map', badge: '4', + icon: }, + { id: 'patch', label: 'Patch Plan', badge: null, + icon: }, + { id: 'runner', label: 'Iteration Runner', badge: null, + icon: }, + { id: 'report', label: 'Improvement Report', badge: null, + icon: }, + { id: 'memory', label: 'Platform Memory', badge: null, + icon: }, +]; + +export default function Sidebar({ active, onNav }) { + return ( + + ); +} diff --git a/demo/src/components/TimelineStep.jsx b/demo/src/components/TimelineStep.jsx new file mode 100644 index 0000000..0b3c22a --- /dev/null +++ b/demo/src/components/TimelineStep.jsx @@ -0,0 +1,9 @@ +export default function TimelineStep({ step, label, status }) { + // status: 'done' | 'active' | 'idle' + return ( +
+
{status === 'done' ? '✓' : step}
+
{label}
+
+ ); +} diff --git a/demo/src/components/TopBar.jsx b/demo/src/components/TopBar.jsx new file mode 100644 index 0000000..8f889fb --- /dev/null +++ b/demo/src/components/TopBar.jsx @@ -0,0 +1,24 @@ +const PAGE_LABELS = { + home: 'Project Hub', + overview: 'Project Overview', + failure: 'Failure Map', + patch: 'Patch Plan', + runner: 'Iteration Runner', + report: 'Improvement Report', + memory: 'Platform Memory', +}; + +export default function TopBar({ page, onNav }) { + return ( +
+
+ Nvex + / + {PAGE_LABELS[page] || page} +
+
+ + +
+ ); +} diff --git a/demo/src/main.jsx b/demo/src/main.jsx new file mode 100644 index 0000000..7234b35 --- /dev/null +++ b/demo/src/main.jsx @@ -0,0 +1,10 @@ +import { StrictMode } from 'react' +import { createRoot } from 'react-dom/client' +import './styles.css' +import App from './App.jsx' + +createRoot(document.getElementById('root')).render( + + + , +) diff --git a/demo/src/pages/FailureMap.jsx b/demo/src/pages/FailureMap.jsx new file mode 100644 index 0000000..d20fbae --- /dev/null +++ b/demo/src/pages/FailureMap.jsx @@ -0,0 +1,79 @@ +import FailureCluster from '../components/FailureCluster'; +import RadarChart from '../components/RadarChart'; +import { useNvexRuntime } from '../data/NvexRuntimeContext.jsx'; + +export default function FailureMap() { + const { data } = useNvexRuntime(); + + return ( +
+
+

Failure Map

+

Clustered benchmark failures and task-level risk concentration.

+
+ +
+
+
Failure Clusters
+
+ {data.clusters.map((cluster) => ( + + ))} +
+
+ +
+
Task Breakdown
+
+ +
+ {data.taskBreakdown.map((task) => ( +
+
+ {task.name} + {task.pct}% +
+
+
+
+
+ ))} +
+
+
+
+ +
+
+
Root-Cause Hypotheses
+
+ {data.rootCauses.map((cause, index) => ( +
+
{index + 1}
+
{cause}
+
+ ))} +
+
+ +
+
Representative Episodes
+
+ {data.representativeEpisodes.map((episode) => ( +
+
{episode.cluster}
+
{episode.label} · {episode.id}
+
Cluster {episode.cluster} · {episode.detail}
+
+ ))} +
+
+
+ +
+
Nvex Diagnosis
+
{data.diagnosis}
+
+
+ ); +} \ No newline at end of file diff --git a/demo/src/pages/Home.jsx b/demo/src/pages/Home.jsx new file mode 100644 index 0000000..73027b2 --- /dev/null +++ b/demo/src/pages/Home.jsx @@ -0,0 +1,89 @@ +import KPICard from '../components/KPICard'; +import AssetCard from '../components/AssetCard'; +import { useNvexRuntime } from '../data/NvexRuntimeContext.jsx'; + +export default function Home({ onNav }) { + const { data } = useNvexRuntime(); + const { project, featuredValue, recentProjects, availableAssets } = data; + + return ( +
+
+

Nvex Project Hub

+

Failure-to-fix orchestration for physical AI post-training.

+
+ +
+
+
Active Project
+

{project.name}

+

+ Nvex has diagnosed a perception-heavy failure pattern and queued the next highest-leverage patch. +

+
+
{project.kpi.successRate}%
+
Current success rate before patching. The next loop targets occlusion and recovery gaps.
+
+
+ + +
+
+ +
+ + + +
+
+ +
+ + + + +
+ +
+
+
Featured Value
+
+ {featuredValue.map((item) => ( +
+
{item.title}
+
{item.description}
+
+ ))} +
+
+ +
+
Recent Projects
+
+ {recentProjects.map((entry) => ( + + ))} +
+
+
+ +
+
Available Assets
+
+ {availableAssets.map((asset) => ( + + ))} +
+
+
+ ); +} \ No newline at end of file diff --git a/demo/src/pages/ImprovementReport.jsx b/demo/src/pages/ImprovementReport.jsx new file mode 100644 index 0000000..a2eb900 --- /dev/null +++ b/demo/src/pages/ImprovementReport.jsx @@ -0,0 +1,88 @@ +import MultiIterationChart from '../components/MultiIterationChart'; +import { useNvexRuntime } from '../data/NvexRuntimeContext.jsx'; + +export default function ImprovementReport() { + const { data, agentRun } = useNvexRuntime(); + const { before, after, changes, assets, nextIter } = data.report; + const uplift = after.success - before.success; + + // Multi-loop data from agent run (if available) + const loopIterations = agentRun?.iterations?.filter((l) => l.eval_after != null) ?? []; + const hasMultiLoop = loopIterations.length > 1; + const targetKpi = agentRun?.target_kpi ?? 0.75; + const stopReason = agentRun?.stop_reason; + + return ( +
+
+

Improvement Report

+

Before-and-after results from the completed patch loop.

+
+ +
+
Before
{before.success}%
{before.clusters} failure clusters
+
After
{after.success}%
{after.clusters} failure clusters
+
Uplift
+{uplift}pp
Recovery score {before.recovery}% → {after.recovery}%
+
+ + {/* Multi-loop chart — shown only when agent has run ≥2 loops */} + {hasMultiLoop && ( + + )} + + {/* Agent stop reason callout */} + {stopReason && ( +
+
Why the agent stopped
+

{stopReason}

+
+ )} + +
+
+
What Changed
+
+ {changes.map((item) => ( +
{item}
+ ))} + {/* Inject per-loop summaries when multi-loop is active */} + {hasMultiLoop && loopIterations.map((loop) => ( +
+ Loop {loop.iteration_index}: {loop.patch_cluster} patch →{' '} + {Math.round(loop.eval_before * 100)}%{' '} + + → {Math.round(loop.eval_after * 100)}% + + {loop.delta != null && ( + + {' '}({loop.delta >= 0 ? '+' : ''}{Math.round(loop.delta * 100)}pp) + + )} + {loop.rolled_back && · rollback applied} +
+ ))} +
+
+ +
+
Assets Created
+
+ {assets.map((asset) => ( + + {asset.type}: {asset.name} + + ))} + {/* Recipe chips from agent loops */} + {hasMultiLoop && loopIterations.map((loop) => ( + + recipe: {loop.patch_strategy}_loop{loop.iteration_index} + + ))} +
+
+
Next iteration target: {nextIter}
+
+
+
+ ); +} diff --git a/demo/src/pages/IterationRunner.jsx b/demo/src/pages/IterationRunner.jsx new file mode 100644 index 0000000..9554299 --- /dev/null +++ b/demo/src/pages/IterationRunner.jsx @@ -0,0 +1,160 @@ +import { useEffect, useState } from 'react'; +import AgentReasoningPanel from '../components/AgentReasoningPanel'; +import TimelineStep from '../components/TimelineStep'; +import { useNvexRuntime } from '../data/NvexRuntimeContext.jsx'; + +const STEPS = ['Load checkpoint', 'Dispatch backend job', 'Poll job status', 'Compare results']; + +export default function IterationRunner() { + const { + data, + demoState, + pollIterationStatus, + agentRun, + startAutoImprove, + advanceAgentStep, + streamAgentRun, + stopAgentStream, + isAgentStreaming, + } = useNvexRuntime(); + const status = demoState?.iteration_job?.status || 'completed'; + const [autoMode, setAutoMode] = useState(false); + + // Poll M2 job status + useEffect(() => { + if (status === 'running' || status === 'queued') { + const interval = window.setInterval(() => { + pollIterationStatus(); + }, 2000); + return () => window.clearInterval(interval); + } + return undefined; + }, [status, pollIterationStatus]); + + const handleAutoImprove = () => { + setAutoMode(true); + startAutoImprove(); + }; + + const agentIsDone = agentRun?.status === 'completed' || agentRun?.status === 'stopped'; + + // Loop progress derived from agent run + const totalLoops = agentRun?.iterations?.length ?? 3; + const doneLoops = agentRun?.iterations?.filter((l) => l.status === 'completed').length ?? 0; + const currentKpi = agentRun?.iterations + ?.filter((l) => l.eval_after != null) + .slice(-1)[0]?.eval_after ?? null; + const targetKpi = agentRun?.target_kpi ?? 0.75; + + return ( +
+
+
+

Iteration Runner

+

Execution timeline and live training log for the current patch cycle.

+
+ {!autoMode && ( + + )} +
+ + {/* Loop progress bar — only shown in auto mode */} + {autoMode && ( +
+
+
+
Autonomous Loop Progress
+
+ {agentIsDone + ? (agentRun?.stop_reason || 'Run complete.') + : `Loop ${agentRun?.current_iteration ?? 1} / ${totalLoops} running…`} +
+
+
+ {currentKpi !== null && ( + <> + = targetKpi ? 'var(--green)' : 'var(--t1)' }}> + {Math.round(currentKpi * 100)}% + + / {Math.round(targetKpi * 100)}% target + + )} +
+
+
+
0 ? (doneLoops / totalLoops) * 100 : 0}%` }} + /> +
+
+ {Array.from({ length: totalLoops }, (_, i) => { + const loop = agentRun?.iterations?.[i]; + const cls = !loop ? 'idle' + : loop.status === 'completed' ? 'done' + : loop.status === 'running' ? 'active' + : 'idle'; + return ( +
+ {loop?.status === 'completed' ? '✓' : i + 1} +
+ ); + })} +
+
+ )} + + {/* Manual timeline — shown when not in auto mode */} + {!autoMode && ( +
+
Timeline
+
+ {STEPS.map((label, index) => ( + + ))} +
+
+ )} + + {/* Agent reasoning panel — shown in auto mode */} + {autoMode && ( + + )} + + {/* Console output */} +
+
Console Output
+
+ {data.consoleLogs.map((entry) => ( +
+ [{entry.ts}] {entry.text} +
+ ))} +
+
+
+ ); +} diff --git a/demo/src/pages/PatchPlan.jsx b/demo/src/pages/PatchPlan.jsx new file mode 100644 index 0000000..dd30f1e --- /dev/null +++ b/demo/src/pages/PatchPlan.jsx @@ -0,0 +1,42 @@ +import { useNvexRuntime } from '../data/NvexRuntimeContext.jsx'; + +export default function PatchPlan() { + const { data } = useNvexRuntime(); + const { patchPlan } = data; + + return ( +
+
+

Patch Plan

+

Targeted intervention proposed from the current failure diagnosis.

+
+ +
+
+
Capability Gaps
+
+ {patchPlan.gaps.map((gap) => ( +
{gap}
+ ))} +
+
+ +
+
Execution Outline
+
+
Collect {patchPlan.data.episodes} patch episodes and {patchPlan.data.corrections} correction trajectories.
+
Run {patchPlan.training.strategy} with {patchPlan.data.modality} for roughly {patchPlan.training.duration}.
+
Verify on {patchPlan.verify.suite} using {patchPlan.verify.env} before checkpoint promotion.
+
+
+
+ +
+
Data Mix
{patchPlan.data.ratio}
+
Strategy
{patchPlan.training.note}
+
Expected Uplift
+{patchPlan.uplift.lo} to +{patchPlan.uplift.hi} points
+
Confidence
{Math.round(patchPlan.uplift.confidence * 100)}%
+
+
+ ); +} \ No newline at end of file diff --git a/demo/src/pages/PlatformMemory.jsx b/demo/src/pages/PlatformMemory.jsx new file mode 100644 index 0000000..eb36ca4 --- /dev/null +++ b/demo/src/pages/PlatformMemory.jsx @@ -0,0 +1,44 @@ +import AssetCard from '../components/AssetCard'; +import { useNvexRuntime } from '../data/NvexRuntimeContext.jsx'; + +export default function PlatformMemory() { + const { data } = useNvexRuntime(); + const { stats, recipes, templates, failures } = data.memory; + + return ( +
+
+

Platform Memory

+

Reusable recipes, templates, and failure patterns accumulated from prior loops.

+
+ +
+ + + + +
+ +
+
+
Recipes
+
+ {recipes.map((item) => {item})} +
+
+
+
Templates
+
+ {templates.map((item) => {item})} +
+
+
+
Known Failure Ontology
+
+ {failures.map((item) => {item})} +
+
+
+
+ ); +} \ No newline at end of file diff --git a/demo/src/pages/ProjectOverview.jsx b/demo/src/pages/ProjectOverview.jsx new file mode 100644 index 0000000..d79dab9 --- /dev/null +++ b/demo/src/pages/ProjectOverview.jsx @@ -0,0 +1,41 @@ +import KPICard from '../components/KPICard'; +import AssetCard from '../components/AssetCard'; +import { useNvexRuntime } from '../data/NvexRuntimeContext.jsx'; + +export default function ProjectOverview() { + const { data } = useNvexRuntime(); + const { project, rootCauses, availableAssets } = data; + + return ( +
+
+

Project Overview

+

Benchmark context, operating status, and the current intervention target.

+
+ +
+ + + +
+ +
+
Root Cause Summary
+
+ {rootCauses.map((cause) => ( +
{cause}
+ ))} +
+
+ +
+
Available Assets
+
+ {availableAssets.map((asset) => ( + + ))} +
+
+
+ ); +} \ No newline at end of file diff --git a/demo/src/styles.css b/demo/src/styles.css new file mode 100644 index 0000000..7a11a35 --- /dev/null +++ b/demo/src/styles.css @@ -0,0 +1,721 @@ +@import url('https://fonts.googleapis.com/css2?family=Inter:ital,wght@0,300;0,400;0,500;0,600;0,700;1,400&family=JetBrains+Mono:wght@400;500&display=swap'); + +:root { + --bg: #07080f; + --bg-mid: #0c0d1a; + --surf: rgba(255,255,255,0.028); + --surf-hi: rgba(255,255,255,0.055); + --surf-hover: rgba(255,255,255,0.07); + --border: rgba(99,102,241,0.18); + --border-hi: rgba(139,92,246,0.45); + --a1: #6366f1; + --a2: #8b5cf6; + --a3: #a78bfa; + --grad: linear-gradient(135deg, #6366f1 0%, #8b5cf6 100%); + --grad-hi: linear-gradient(135deg, #818cf8 0%, #a78bfa 100%); + --t1: #f1f5f9; + --t2: #94a3b8; + --t3: #4b5563; + --t4: #1e293b; + --green: #10b981; + --green-dim: rgba(16,185,129,0.15); + --red: #f43f5e; + --red-dim: rgba(244,63,94,0.15); + --orange: #f97316; + --orange-dim: rgba(249,115,22,0.12); + --yellow: #f59e0b; + --yellow-dim: rgba(245,158,11,0.12); + --cyan: #06b6d4; + --cyan-dim: rgba(6,182,212,0.12); + --sidebar-w: 224px; + --topbar-h: 56px; + --radius: 10px; + --radius-lg: 16px; + --shadow: 0 4px 24px rgba(0,0,0,0.4); + --shadow-glow: 0 0 32px rgba(99,102,241,0.18); +} + +*, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; } +html, body { height: 100%; overflow: hidden; } +body { + font-family: 'Inter', system-ui, sans-serif; + font-size: 14px; + line-height: 1.5; + color: var(--t1); + background: var(--bg); + -webkit-font-smoothing: antialiased; +} +a { color: inherit; text-decoration: none; } +button { font-family: inherit; cursor: pointer; border: none; background: none; color: inherit; } +svg { display: block; } + +/* Background */ +.bg-layer { position: fixed; inset: 0; pointer-events: none; z-index: 0; overflow: hidden; } +.bg-dot-grid { + background-image: radial-gradient(rgba(99,102,241,0.09) 1px, transparent 1px); + background-size: 28px 28px; + width: 100%; height: 100%; +} +.bg-glow { position: absolute; border-radius: 50%; filter: blur(80px); opacity: 0.35; } +.bg-glow-1 { width: 700px; height: 700px; background: radial-gradient(circle, #4f46e5 0%, transparent 70%); top: -200px; left: -200px; } +.bg-glow-2 { width: 500px; height: 500px; background: radial-gradient(circle, #7c3aed 0%, transparent 70%); bottom: -100px; right: -100px; } + +/* App Shell */ +.app-shell { position: relative; z-index: 1; display: flex; height: 100vh; } + +/* Sidebar */ +.sidebar { + width: var(--sidebar-w); + flex-shrink: 0; + background: rgba(7,8,15,0.85); + backdrop-filter: blur(16px); + border-right: 1px solid var(--border); + display: flex; + flex-direction: column; + padding: 20px 12px; + gap: 4px; +} +.sidebar-logo { + display: flex; align-items: center; gap: 10px; + padding: 8px 10px 20px; + border-bottom: 1px solid var(--border); + margin-bottom: 8px; +} +.logo-mark { + width: 32px; height: 32px; + background: var(--grad); + border-radius: 8px; + display: flex; align-items: center; justify-content: center; + font-weight: 700; font-size: 15px; letter-spacing: -0.5px; + color: white; flex-shrink: 0; + box-shadow: 0 0 16px rgba(99,102,241,0.45); +} +.logo-text { font-size: 15px; font-weight: 600; letter-spacing: -0.3px; } +.logo-sub { font-size: 10px; color: var(--t3); margin-top: 1px; letter-spacing: 0.05em; text-transform: uppercase; } +.nav-section-label { + font-size: 10px; font-weight: 500; letter-spacing: 0.08em; + text-transform: uppercase; color: var(--t3); + padding: 12px 10px 4px; +} +.nav-item { + display: flex; align-items: center; gap: 10px; + padding: 8px 10px; + border-radius: 7px; + font-size: 13.5px; font-weight: 400; + color: var(--t2); + cursor: pointer; + transition: all 0.15s; + border: 1px solid transparent; +} +.nav-item:hover { background: var(--surf-hi); color: var(--t1); } +.nav-item.active { background: rgba(99,102,241,0.14); color: var(--t1); border-color: rgba(99,102,241,0.28); } +.nav-item.active .nav-icon { color: var(--a3); } +.nav-icon { width: 16px; height: 16px; flex-shrink: 0; opacity: 0.8; } +.nav-badge { margin-left: auto; font-size: 10px; font-weight: 600; background: var(--grad); padding: 1px 6px; border-radius: 99px; color: white; } +.sidebar-footer { margin-top: auto; padding-top: 12px; border-top: 1px solid var(--border); } +.sidebar-project { display: flex; align-items: center; gap: 8px; padding: 8px 10px; background: var(--surf); border: 1px solid var(--border); border-radius: 8px; margin-top: 8px; cursor: pointer; } +.project-dot { width: 8px; height: 8px; border-radius: 50%; background: var(--green); flex-shrink: 0; box-shadow: 0 0 6px var(--green); } +.project-name { font-size: 12px; font-weight: 500; color: var(--t1); } +.project-sub { font-size: 10px; color: var(--t3); } + +/* Main */ +.main { flex: 1; display: flex; flex-direction: column; min-width: 0; overflow: hidden; } +.topbar { + height: var(--topbar-h); + border-bottom: 1px solid var(--border); + display: flex; align-items: center; + padding: 0 24px; gap: 12px; + background: rgba(7,8,15,0.7); + backdrop-filter: blur(12px); + flex-shrink: 0; +} +.topbar-breadcrumb { display: flex; align-items: center; gap: 6px; font-size: 13px; color: var(--t2); } +.breadcrumb-sep { color: var(--t3); } +.breadcrumb-cur { color: var(--t1); font-weight: 500; } +.topbar-spacer { flex: 1; } +.content { flex: 1; overflow-y: auto; padding: 28px; scrollbar-width: thin; scrollbar-color: var(--t4) transparent; } +.content::-webkit-scrollbar { width: 4px; } +.content::-webkit-scrollbar-thumb { background: var(--t4); border-radius: 2px; } + +/* Buttons */ +.btn { display: inline-flex; align-items: center; gap: 6px; padding: 6px 14px; border-radius: 7px; font-size: 13px; font-weight: 500; transition: all 0.15s; cursor: pointer; border: none; } +.btn-primary { background: var(--grad); color: white; box-shadow: 0 0 16px rgba(99,102,241,0.35); } +.btn-primary:hover { background: var(--grad-hi); box-shadow: 0 0 24px rgba(99,102,241,0.5); } +.btn-ghost { background: var(--surf); color: var(--t1); border: 1px solid var(--border); } +.btn-ghost:hover { background: var(--surf-hover); border-color: var(--border-hi); } + +/* Page transition */ +.page-enter { animation: pageIn 0.2s ease; } +@keyframes pageIn { from { opacity: 0; transform: translateY(6px); } to { opacity: 1; transform: none; } } + +/* Typography */ +.page-title { font-size: 22px; font-weight: 600; letter-spacing: -0.4px; } +.page-subtitle { font-size: 14px; color: var(--t2); margin-top: 4px; } +.section-title { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.08em; color: var(--t3); margin-bottom: 12px; } + +/* Cards */ +.card { background: var(--surf); border: 1px solid var(--border); border-radius: var(--radius); padding: 18px 20px; position: relative; overflow: hidden; } +.card-title { font-size: 12px; font-weight: 500; text-transform: uppercase; letter-spacing: 0.07em; color: var(--t3); margin-bottom: 8px; } +.card-value { font-size: 28px; font-weight: 600; letter-spacing: -0.5px; } +.card-sub { font-size: 12px; color: var(--t2); margin-top: 4px; } +.card-grid-3 { display: grid; grid-template-columns: repeat(3, 1fr); gap: 14px; } +.card-grid-4 { display: grid; grid-template-columns: repeat(4, 1fr); gap: 14px; } +.card-grid-2 { display: grid; grid-template-columns: repeat(2, 1fr); gap: 14px; } + +/* Badges */ +.badge { display: inline-flex; align-items: center; gap: 5px; padding: 3px 8px; border-radius: 99px; font-size: 11px; font-weight: 500; } +.badge-green { background: var(--green-dim); color: var(--green); } +.badge-red { background: var(--red-dim); color: var(--red); } +.badge-orange { background: var(--orange-dim); color: var(--orange); } +.badge-yellow { background: var(--yellow-dim); color: var(--yellow); } +.badge-blue { background: rgba(99,102,241,0.15); color: var(--a3); } +.badge-cyan { background: var(--cyan-dim); color: var(--cyan); } +.badge-dot { width: 6px; height: 6px; border-radius: 50%; background: currentColor; } + +/* Utils */ +.delta-up { color: var(--green); font-size: 12px; font-weight: 600; } +.delta-down { color: var(--red); font-size: 12px; font-weight: 600; } +.divider { height: 1px; background: var(--border); margin: 20px 0; } +.row-gap-4 { display: flex; gap: 14px; align-items: center; } +.row-gap-2 { display: flex; gap: 8px; align-items: center; } +.flex-1 { flex: 1; } + +/* Bar */ +.bar-item { margin-bottom: 10px; } +.bar-label-row { display: flex; justify-content: space-between; margin-bottom: 4px; font-size: 12.5px; } +.bar-label-name { color: var(--t2); } +.bar-label-pct { font-weight: 600; } +.bar-track { height: 5px; background: rgba(255,255,255,0.06); border-radius: 3px; overflow: hidden; } +.bar-fill { height: 100%; border-radius: 3px; } + +/* Status pulse */ +.status-dot { width: 6px; height: 6px; border-radius: 50%; background: currentColor; animation: pulse 1.5s infinite; } +@keyframes pulse { 0%,100%{ opacity:1; } 50%{ opacity:0.4; } } + +/* Responsive */ +@media (max-width: 1100px) { + .home-hero { grid-template-columns: 1fr; } + .loop-wrapper { display: none; } +} + +.page-shell { + display: flex; + flex-direction: column; + gap: 20px; +} + +.hero-panel { + display: grid; + grid-template-columns: 1.4fr 1fr; + gap: 16px; +} + +.feature-list, +.project-list { + display: flex; + flex-direction: column; + gap: 10px; +} + +.feature-item { + padding: 14px; + border-radius: 12px; + background: rgba(255,255,255,0.03); + border: 1px solid var(--border); +} + +.feature-name, +.project-list-name { + font-weight: 600; + color: var(--t1); +} + +.feature-copy, +.project-list-meta { + margin-top: 4px; + font-size: 12px; + color: var(--t2); +} + +.project-list-item { + display: flex; + justify-content: space-between; + align-items: center; + padding: 14px; + border-radius: 12px; + background: rgba(255,255,255,0.03); + border: 1px solid var(--border); + text-align: left; +} + +.project-list-item:hover { + background: var(--surf-hi); + border-color: var(--border-hi); +} + +.project-list-score { + font-size: 20px; + font-weight: 700; + color: var(--a3); +} + +.panel-stack { + display: flex; + flex-direction: column; + gap: 16px; +} + +.insight-list, +.asset-list, +.console-list, +.memory-grid, +.report-list, +.plan-list { + display: flex; + flex-direction: column; + gap: 10px; +} + +.insight-item, +.asset-item, +.console-item, +.memory-item, +.report-item, +.plan-item { + padding: 12px 14px; + background: rgba(255,255,255,0.03); + border: 1px solid var(--border); + border-radius: 10px; +} + +.cluster-grid, +.memory-grid, +.two-col-grid { + display: grid; + grid-template-columns: repeat(2, minmax(0, 1fr)); + gap: 14px; +} + +.cluster-card { + background: rgba(255,255,255,0.03); + border: 1px solid var(--border); + border-radius: 12px; + padding: 14px; +} + +.cluster-header { + display: flex; + align-items: center; + gap: 10px; + margin-bottom: 10px; +} + +.cluster-id { + width: 28px; + height: 28px; + border-radius: 8px; + display: flex; + align-items: center; + justify-content: center; + font-weight: 700; +} + +.cluster-name { font-weight: 600; } +.cluster-pct { font-size: 24px; font-weight: 700; } +.cluster-count { color: var(--t2); margin-top: 2px; margin-bottom: 10px; } +.cluster-bar { height: 6px; background: rgba(255,255,255,0.08); border-radius: 999px; overflow: hidden; } +.cluster-bar-fill { height: 100%; border-radius: inherit; } + +.timeline-grid { + display: grid; + grid-template-columns: repeat(4, minmax(0, 1fr)); + gap: 12px; +} + +.rt-step { + padding: 14px; + border-radius: 12px; + border: 1px solid var(--border); + background: rgba(255,255,255,0.03); + display: flex; + align-items: center; + gap: 10px; +} + +.rt-step.active { border-color: rgba(99,102,241,0.45); box-shadow: var(--shadow-glow); } +.rt-step.done { border-color: rgba(16,185,129,0.35); } + +.rt-dot { + width: 28px; + height: 28px; + border-radius: 50%; + display: flex; + align-items: center; + justify-content: center; + background: rgba(255,255,255,0.08); + font-weight: 700; +} + +.rt-label { font-weight: 500; } + +.console-item { + font-family: 'JetBrains Mono', monospace; + font-size: 12px; +} + +.console-item.success { color: var(--green); } +.console-item.active { color: var(--a3); } +.console-item.log { color: var(--t2); } + +.asset-card { + background: var(--surf); + border: 1px solid var(--border); + border-radius: 12px; + padding: 16px; +} + +.asset-card-label { + font-size: 12px; + text-transform: uppercase; + letter-spacing: 0.08em; + color: var(--t3); +} + +.asset-card-value { + margin-top: 10px; + font-size: 28px; + font-weight: 700; +} + +.asset-card-sub { + margin-top: 4px; + font-size: 12px; + color: var(--t2); +} + +.pill-row { + display: flex; + flex-wrap: wrap; + gap: 8px; +} + +.pill { + padding: 6px 10px; + border-radius: 999px; + background: rgba(255,255,255,0.05); + border: 1px solid var(--border); + font-size: 12px; + color: var(--t2); +} + +.kpi-callout { + display: flex; + align-items: baseline; + gap: 10px; + margin-top: 12px; +} + +.kpi-callout-value { font-size: 48px; font-weight: 700; letter-spacing: -1px; } +.kpi-callout-copy { color: var(--t2); max-width: 36rem; } + +.root-cause-list { + display: flex; + flex-direction: column; + gap: 10px; +} + +.root-cause-item { + display: grid; + grid-template-columns: 32px 1fr; + gap: 12px; + align-items: start; + padding: 12px; + border: 1px solid var(--border); + border-radius: 10px; + background: rgba(255,255,255,0.03); +} + +.rci-num { + width: 32px; + height: 32px; + display: flex; + align-items: center; + justify-content: center; + border-radius: 50%; + background: rgba(99,102,241,0.16); + color: var(--a3); + font-weight: 700; +} + +.rci-text { + color: var(--t2); +} + +.episode-grid { + display: grid; + grid-template-columns: repeat(3, minmax(0, 1fr)); + gap: 12px; +} + +.episode-card { + padding: 12px; + border-radius: 12px; + border: 1px solid var(--border); + background: rgba(255,255,255,0.03); +} + +.episode-thumb { + height: 92px; + border-radius: 10px; + margin-bottom: 10px; + background: linear-gradient(135deg, rgba(99,102,241,0.24), rgba(244,63,94,0.18)); + display: flex; + align-items: center; + justify-content: center; + color: var(--t1); + font-size: 24px; + font-weight: 700; +} + +.episode-label { + font-weight: 600; +} + +.episode-meta { + margin-top: 4px; + font-size: 12px; + color: var(--t2); +} + +.diagnosis-card { + background: linear-gradient(135deg, rgba(99,102,241,0.08), rgba(139,92,246,0.06)); +} + +.diag-label { + font-size: 11px; + text-transform: uppercase; + letter-spacing: 0.08em; + color: var(--a3); + margin-bottom: 12px; +} + +.diag-text { + color: var(--t1); + max-width: 70rem; +} + +.assets-created { + display: flex; + flex-wrap: wrap; + gap: 8px; +} + +.asset-chip { + padding: 7px 10px; + border-radius: 999px; + border: 1px solid var(--border); + font-size: 12px; + text-transform: capitalize; +} + +.asset-chip.recipe { background: rgba(6,182,212,0.08); color: var(--cyan); border-color: rgba(6,182,212,0.25); } +.asset-chip.template { background: rgba(99,102,241,0.08); color: var(--a3); border-color: rgba(99,102,241,0.25); } +.asset-chip.failure { background: rgba(244,63,94,0.08); color: var(--red); border-color: rgba(244,63,94,0.2); } + +/* ----------------------------------------------------------------------- + Milestone 3 — Self-Improving Agent UI + ----------------------------------------------------------------------- */ + +/* Page header row with action button */ +.page-header-row { display: flex; align-items: flex-start; justify-content: space-between; gap: 16px; flex-wrap: wrap; } + +/* Primary CTA button */ +.btn-primary { + padding: 8px 18px; + border-radius: var(--radius); + background: var(--grad); + color: #fff; + font-size: 13px; + font-weight: 600; + letter-spacing: 0.02em; + cursor: pointer; + border: none; + transition: opacity 0.15s; + white-space: nowrap; +} +.btn-primary:hover { opacity: 0.88; } + +/* Secondary button */ +.btn-secondary { + padding: 6px 14px; + border-radius: var(--radius); + background: var(--surf-hi); + color: var(--t2); + font-size: 12px; + font-weight: 500; + cursor: pointer; + border: 1px solid var(--border); + transition: background 0.15s; +} +.btn-secondary:hover { background: var(--surf-hover); color: var(--t1); } +.btn-secondary:disabled { opacity: 0.5; cursor: not-allowed; } + +/* Loop progress card */ +.agent-progress-card { display: flex; flex-direction: column; gap: 12px; } +.agent-progress-header { display: flex; align-items: center; justify-content: space-between; gap: 12px; flex-wrap: wrap; } +.agent-kpi-display { display: flex; align-items: baseline; gap: 4px; } +.agent-kpi-val { font-size: 24px; font-weight: 700; } + +.agent-progress-bar-track { + height: 6px; + background: var(--surf-hi); + border-radius: 3px; + overflow: hidden; +} +.agent-progress-bar-fill { + height: 100%; + background: var(--grad); + border-radius: 3px; + transition: width 0.4s ease; +} +.agent-progress-loops { display: flex; gap: 8px; } +.agent-loop-dot { + width: 28px; height: 28px; + border-radius: 50%; + border: 1px solid var(--border); + background: var(--surf); + display: flex; align-items: center; justify-content: center; + font-size: 11px; font-weight: 600; color: var(--t3); + transition: all 0.25s; +} +.agent-loop-dot.active { border-color: var(--a1); color: var(--a3); box-shadow: 0 0 8px rgba(99,102,241,0.4); } +.agent-loop-dot.done { border-color: var(--green); background: var(--green-dim); color: var(--green); } + +/* Agent reasoning panel */ +.agent-panel { display: flex; flex-direction: column; gap: 12px; } +.agent-panel-header { display: flex; align-items: flex-start; justify-content: space-between; gap: 12px; flex-wrap: wrap; } +.agent-controls { display: flex; gap: 8px; align-items: center; flex-wrap: wrap; } + +.agent-events { + border: 1px solid var(--border); + border-radius: var(--radius); + background: var(--surf-hi); + padding: 10px; +} +.agent-event-list { display: flex; flex-direction: column; gap: 6px; } +.agent-event-row { + display: grid; + grid-template-columns: 18px minmax(0, 1fr) auto; + gap: 8px; + align-items: flex-start; + border-radius: 8px; + background: var(--surf); + border: 1px solid transparent; + padding: 7px 8px; +} +.agent-event-row.rollback { + border-color: rgba(244,63,94,0.32); + background: rgba(244,63,94,0.07); +} +.agent-event-icon { color: var(--a3); font-size: 12px; line-height: 1.4; margin-top: 1px; } +.agent-event-label { font-size: 12px; font-weight: 600; color: var(--t1); } +.agent-event-msg { font-size: 11px; color: var(--t2); margin-top: 1px; } +.agent-event-meta { + display: flex; + flex-direction: column; + align-items: flex-end; + gap: 1px; + font-size: 10px; + color: var(--t3); + white-space: nowrap; +} + +.agent-loop-block { display: flex; flex-direction: column; gap: 6px; border-top: 1px solid var(--border); padding-top: 10px; } +.agent-loop-header { display: flex; align-items: center; gap: 10px; flex-wrap: wrap; } +.agent-loop-meta { font-size: 12px; color: var(--t2); } +.agent-loop-uplift { color: var(--t2); } +.agent-rollback-note { + font-size: 11px; + color: var(--red); + background: rgba(244,63,94,0.08); + border: 1px solid rgba(244,63,94,0.22); + border-radius: 8px; + padding: 5px 8px; +} + +.loop-badge { + font-size: 11px; font-weight: 600; padding: 2px 8px; + border-radius: 20px; border: 1px solid var(--border); + background: var(--surf); color: var(--t2); + white-space: nowrap; +} +.loop-badge.running { border-color: var(--a1); color: var(--a3); } +.loop-badge.completed { border-color: var(--green); color: var(--green); background: var(--green-dim); } +.loop-badge.failed { border-color: var(--red); color: var(--red); background: var(--red-dim); } + +.agent-step-list { display: flex; flex-direction: column; gap: 4px; } + +.agent-step { + display: flex; align-items: flex-start; gap: 10px; + padding: 8px 10px; + border-radius: var(--radius); + background: var(--surf); + border: 1px solid transparent; + transition: border-color 0.2s, background 0.2s; +} +.agent-step.active { + border-color: rgba(99,102,241,0.4); + background: rgba(99,102,241,0.06); + animation: pulse-border 1.8s ease-in-out infinite; +} +.agent-step.done { border-color: rgba(16,185,129,0.25); } +.agent-step.error { border-color: rgba(244,63,94,0.3); background: var(--red-dim); } + +@keyframes pulse-border { + 0%, 100% { border-color: rgba(99,102,241,0.4); } + 50% { border-color: rgba(139,92,246,0.7); } +} + +.agent-step-icon { font-size: 14px; line-height: 1.4; flex-shrink: 0; margin-top: 1px; } +.agent-step-body { flex: 1; min-width: 0; } +.agent-step-label { font-size: 12px; font-weight: 600; color: var(--t1); } +.agent-step-msg { font-size: 11px; color: var(--t2); margin-top: 2px; line-height: 1.45; } + +.agent-step-badge { + font-size: 10px; font-weight: 600; padding: 2px 6px; + border-radius: 10px; border: 1px solid var(--border); + background: var(--surf); color: var(--t3); + flex-shrink: 0; +} +.agent-step-badge.active { color: var(--a3); border-color: var(--a1); background: rgba(99,102,241,0.1); } +.agent-step-badge.done { color: var(--green); border-color: var(--green); background: var(--green-dim); } +.agent-step-badge.error { color: var(--red); border-color: var(--red); } + +.agent-log-details { margin-top: 4px; } +.agent-log-summary { + font-size: 11px; color: var(--t3); cursor: pointer; + padding: 4px 0; list-style: none; +} +.agent-log-summary::-webkit-details-marker { display: none; } +.agent-log-summary::before { content: '▶ '; font-size: 9px; } +details[open] .agent-log-summary::before { content: '▼ '; } + +/* Stop reason card */ +.agent-stop-card { border-color: rgba(16,185,129,0.3); background: rgba(16,185,129,0.04); } + +@media (max-width: 900px) { + .hero-panel, + .cluster-grid, + .memory-grid, + .two-col-grid, + .timeline-grid, + .card-grid-4, + .card-grid-3, + .card-grid-2 { + grid-template-columns: 1fr; + } + + .episode-grid { + grid-template-columns: 1fr; + } +} diff --git a/demo/vite.config.js b/demo/vite.config.js new file mode 100644 index 0000000..dabdaaa --- /dev/null +++ b/demo/vite.config.js @@ -0,0 +1,13 @@ +import { defineConfig } from 'vite' +import react from '@vitejs/plugin-react' + +// https://vite.dev/config/ +export default defineConfig({ + plugins: [react()], + server: { + proxy: { + '/api': 'http://127.0.0.1:8000', + '/health': 'http://127.0.0.1:8000', + }, + }, +}) diff --git a/frontend-design.md b/frontend-design.md new file mode 100644 index 0000000..0a0104f --- /dev/null +++ b/frontend-design.md @@ -0,0 +1,900 @@ +# Nvex Demo 前端 Wireframe / 页面信息架构 + +## 副标题 + +面向 Investor Demo 的前端结构、页面叙事与交互定义 + +## 适用对象 + +- 产品团队 +- 设计团队 +- 前端工程团队 +- 创始团队 / Demo 讲解人 + +## 文档目的 + +本文件用于定义 Nvex investor demo 的前端信息架构、页面结构、关键组件、状态设计与演示路径,帮助团队快速完成以下对齐: + +1. 明确 Nvex demo 的产品层表达方式 +2. 将前端页面组织成一条清晰的 investor narrative +3. 区分 **Nvex orchestration layer** 与 **AlphaBrain execution/runtime layer** +4. 让设计、前端、产品能够并行推进页面落地 + +--- + +# 1. 产品导航模型 + +Nvex demo 的导航不应沿用传统数据平台、标注后台或通用 MLOps 控制台的组织方式。传统工具通常围绕“数据集、模型、任务、作业”展开,而 Nvex 的核心价值不在于管理对象本身,而在于**识别失败、生成修复策略、驱动迭代并沉淀平台资产**。因此,主导航应围绕 **failure intelligence** 和 **iteration loop** 组织,让用户天然感知这是一个智能编排层,而不是另一个训练面板。 + +从 investor demo 的叙事角度,前端需要把复杂的底层执行能力抽象为一条清晰闭环:**导入项目 → 识别失败 → 生成 patch plan → 执行一次迭代 → 展示 checkpoint 改进 → 沉淀复用资产**。这也是 Nvex 与普通 annotation 工具、MLOps 平台、实验管理系统的根本差异。 + +## 推荐一级导航 + +- Home / Project Hub +- Project Overview +- Failure Map +- Patch Plan +- Iteration Runner +- Improvement Report +- Platform Memory + +--- + +# 2. Sitemap / 信息架构树 + +```text +Nvex Demo +├── Home / Project Hub +│ ├── Demo 项目概览 +│ ├── Recent Projects +│ ├── Featured Run +│ └── Create / Import Project +│ +├── Project Overview +│ ├── Project Summary +│ ├── Current Checkpoint +│ ├── Eval Snapshot +│ ├── Risk Flags +│ └── Recommended Next Action +│ +├── Failure Map +│ ├── KPI Overview +│ ├── Breakdown by Task +│ ├── Breakdown by Scene / Condition +│ ├── Failure Clusters +│ ├── Representative Episodes +│ └── Root-Cause Hypotheses +│ +├── Patch Plan +│ ├── Gap Summary +│ ├── Data Targeting +│ ├── Data Recipe +│ ├── Training Strategy +│ ├── Verification Setup +│ └── Expected Uplift +│ +├── Iteration Runner +│ ├── Run Timeline +│ ├── Runtime Status +│ ├── Job Artifacts +│ ├── Version Tracking +│ └── Live Logs / Stage Updates +│ +├── Improvement Report +│ ├── Before vs After +│ ├── KPI Lift +│ ├── Failure Reduction +│ ├── Checkpoint Summary +│ ├── Artifacts Comparison +│ └── Recommended Next Iteration +│ +└── Platform Memory + ├── Reusable Recipes + ├── Pipeline Templates + ├── Failure Ontology + ├── Reuse Insights + └── Project-to-Project Learnings +``` + +--- + +# 3. 页面级详细定义 + +## A. Home / Project Hub + +### 页面目标 + +作为 investor demo 的入口页,快速解释 Nvex 是什么、当前 demo 看什么、从哪里开始。该页应强化“从失败到改进”的产品主张,而不是做成普通项目列表页。 + +### 核心受众 + +- 投资人 +- 创始团队 +- 首次接触产品的客户 + +### 页面模块 / 组件清单 + +- 顶部全局导航 +- 左侧侧边导航 +- Hero 区 / 产品主张 +- Featured Demo Project +- Recent Projects +- Quick Actions +- 平台价值摘要卡片 + +### 核心数据对象 + +- Project +- DemoRunSummary +- RecentActivity +- FeaturedProject + +### 推荐主 CTA / 次 CTA + +- 主 CTA:进入 Demo 项目 +- 次 CTA:导入项目 / 查看平台记忆 + +### 页面成功标准 + +- 用户 10 秒内理解 Nvex 是智能编排层 +- 用户明确知道本次 demo 从哪个项目开始 +- 首页不暴露底层训练复杂度 + +--- + +## B. Project Overview + +### 页面目标 + +建立当前项目上下文,展示现有 checkpoint、评测快照、风险与下一步建议,为后续 failure analysis 做铺垫。 + +### 核心受众 + +- 投资人 +- 产品经理 +- 解决方案团队 +- 技术负责人 + +### 页面模块 / 组件清单 + +- 项目头部信息 +- Checkpoint 状态卡 +- Eval Snapshot 卡片组 +- 风险标记区 +- 推荐动作卡片 +- 可用资产概览 + +### 核心数据对象 + +- Project +- Checkpoint +- EvalRun +- RiskFlag +- ReusableAsset + +### 推荐主 CTA / 次 CTA + +- 主 CTA:查看失败分析 +- 次 CTA:查看评测详情 / 启动下一步规划 + +### 页面成功标准 + +- 用户理解当前模型状态 +- 用户看到明确的“下一步建议” +- 页面成为从静态项目到动态诊断的过渡 + +--- + +## C. Failure Map + +### 页面目标 + +将 benchmark/eval 结果转化为可理解的 failure intelligence,说明模型“为什么失败”而不是“分数低”。 + +### 核心受众 + +- 投资人 +- ML / Robotics 技术负责人 +- 内部研究团队 + +### 页面模块 / 组件清单 + +- KPI 总览 +- Breakdown 图表 +- Failure Cluster 卡片 +- Representative Episode 区 +- Root Cause Hypothesis 面板 +- Nvex Diagnosis 总结区 + +### 核心数据对象 + +- EvalRun +- FailureCluster +- EpisodeArtifact +- RootCauseHypothesis +- MetricBreakdown + +### 推荐主 CTA / 次 CTA + +- 主 CTA:生成 Patch Plan +- 次 CTA:导出诊断 / 查看原始 artifact + +### 页面成功标准 + +- 用户能说出 2-3 个明确 failure clusters +- 用户理解修复应围绕 failure,而不是盲目补数据 +- 页面具备“wow moment” + +--- + +## D. Patch Plan + +### 页面目标 + +把 failure diagnosis 转化为结构化的下一步行动方案,体现 Nvex 的 orchestration 能力。 + +### 核心受众 + +- 投资人 +- 产品经理 +- 技术负责人 +- 解决方案团队 + +### 页面模块 / 组件清单 + +- Gap Summary +- Data Targeting 卡片 +- Data Recipe 卡片 +- Training Strategy 卡片 +- Verification Setup 卡片 +- 影响预估 / 风险 / 置信度区 +- 审批区 / CTA 区 + +### 核心数据对象 + +- PatchPlan +- DataTargetSpec +- TrainingStrategy +- VerificationSpec +- ExpectedImpact + +### 推荐主 CTA / 次 CTA + +- 主 CTA:批准并运行 +- 次 CTA:调整方案 / 保存计划 + +### 页面成功标准 + +- 用户能看懂下一轮修复动作 +- 方案以结构化方式呈现,而非一段自然语言 +- 页面体现 Nvex 不只是 dashboard + +--- + +## E. Iteration Runner + +### 页面目标 + +展示 Nvex 正在调度底层 runtime 执行一次真实迭代,证明这是可执行系统而非静态建议工具。 + +### 核心受众 + +- 投资人 +- 技术团队 +- 创始团队 + +### 页面模块 / 组件清单 + +- 运行时间线 +- 当前阶段状态卡 +- Runtime 指标卡 +- Artifact 列表 +- 版本追踪区 +- 日志 / 事件流 + +### 核心数据对象 + +- IterationRun +- JobStage +- RuntimeArtifact +- VersionRecord +- ExecutionStatus + +### 推荐主 CTA / 次 CTA + +- 主 CTA:查看结果 +- 次 CTA:停止运行 / 查看日志 / 返回计划 + +### 页面成功标准 + +- 用户相信系统在真实执行 +- 页面清晰表达当前跑到哪一步 +- 不让底层工程细节淹没主叙事 + +--- + +## F. Improvement Report + +### 页面目标 + +展示 before/after 的量化提升和定性变化,证明 Nvex 能把模型失败转化为更好的 checkpoint。 + +### 核心受众 + +- 投资人 +- 客户决策者 +- 产品与销售团队 + +### 页面模块 / 组件清单 + +- Before vs After KPI +- 失败模式收敛对比 +- Rollout / Video 对比 +- 新 checkpoint 摘要 +- 新增资产卡片 +- 下一轮建议 + +### 核心数据对象 + +- ImprovementReport +- BeforeAfterMetric +- CheckpointDiff +- ArtifactComparison +- CreatedAsset + +### 推荐主 CTA / 次 CTA + +- 主 CTA:查看平台沉淀 +- 次 CTA:启动下一轮迭代 / 分享结果 + +### 页面成功标准 + +- 用户清楚看到 measurable uplift +- 用户理解这不只是一次实验,而是形成平台资产 +- 页面成为 investor demo 的核心收束页 + +--- + +## G. Platform Memory + +### 页面目标 + +展示项目执行后沉淀的可复用 recipe、template、failure pattern,强化平台复利逻辑。 + +### 核心受众 + +- 投资人 +- 创始团队 +- 产品与解决方案团队 + +### 页面模块 / 组件清单 + +- Recipe Library +- Pipeline Template Library +- Failure Ontology +- Reuse Insights +- Cross-project Learnings + +### 核心数据对象 + +- ReusableAsset +- Recipe +- Template +- FailurePattern +- ReuseMetric + +### 推荐主 CTA / 次 CTA + +- 主 CTA:查看复用详情 +- 次 CTA:返回项目 / 启动新项目 + +### 页面成功标准 + +- 用户理解每轮迭代会沉淀资产 +- 页面帮助完成从“服务交付”到“平台系统”的认知跃迁 +- 与 Improvement Report 形成闭环 + +--- + +# 4. 每页低保真 ASCII Wireframe + +## A. Home / Project Hub + +```text ++----------------------------------------------------------------------------------+ +| Nvex [搜索项目] [导入项目] | ++----------------------------------------------------------------------------------+ +| Sidebar | Hero / Value Proposition | +| - Home | "将模型失败转化为更好的 checkpoint" | +| - Overview | Agent-in-the-loop orchestration for Physical AI | +| - Failure Map | [进入 Demo 项目] [查看平台记忆] | +| - Patch Plan | | +| - Iteration +-------------------------------------------------------------+ +| - Report | Featured Demo Project | +| - Memory | [LIBERO Demo] 当前成功率 62% 主要失败: 遮挡/恢复不足 | +| | [进入项目] | +| +-------------------------------------------------------------+ +| | Recent Projects | +| | [项目卡1] [项目卡2] [项目卡3] | +| +-------------------------------------------------------------+ +| | Platform Signals | +| | [失败诊断] [Patch Plan] [执行迭代] [资产沉淀] | ++----------------------------------------------------------------------------------+ +``` + +--- + +## B. Project Overview + +```text ++----------------------------------------------------------------------------------+ +| Nvex / Demo Project: LIBERO Kitchen [查看失败分析]| ++----------------------------------------------------------------------------------+ +| Sidebar | Project Header | +| | Checkpoint: ckpt_v0.7 Domain: manipulation | +| | Eval Suite: LIBERO goal + robustness | +| +-------------------------------------------------------------+ +| | KPI Summary | Risk Flags | +| | - Success 62% | - Occlusion gap | +| | - 4 fail clusters | - Recovery weak | +| | - 2 prior runs | - Verification limited | +| +-------------------------------------------------------------+ +| | Available Assets | Recommended Next Action | +| | - 2 recipes | Patch occlusion/recovery via CL run | +| | - 1 template | [生成 Patch Plan] [查看详情] | ++----------------------------------------------------------------------------------+ +``` + +--- + +## C. Failure Map + +```text ++----------------------------------------------------------------------------------+ +| Failure Map [生成 Patch Plan]| ++----------------------------------------------------------------------------------+ +| Sidebar | KPI Row | +| | [Overall Success 62%] [Top Failure: Occlusion] [Conf 0.81] | +| +-------------------------------------------------------------+ +| | Breakdown Panel | Root Cause Panel | +| | - By Task | 1. 遮挡场景感知不足 | +| | - By Scene | 2. 失误后恢复轨迹缺失 | +| | - By Condition | 3. 验证覆盖不完整 | +| +-------------------------------------------------------------+ +| | Failure Clusters | +| | [Cluster A] [Cluster B] [Cluster C] [Cluster D] | +| +-------------------------------------------------------------+ +| | Representative Episodes | +| | [Video / Rollout 1] [Video / Rollout 2] | +| +-------------------------------------------------------------+ +| | Nvex Diagnosis | +| | 模型在部分遮挡下抓取偏移,且缺少二次修正行为。 | ++----------------------------------------------------------------------------------+ +``` + +--- + +## D. Patch Plan + +```text ++----------------------------------------------------------------------------------+ +| Patch Plan [批准并运行] | ++----------------------------------------------------------------------------------+ +| Sidebar | Gap Summary | +| | - 遮挡场景样本不足 | +| | - 恢复行为标签缺失 | +| | - 验证环境覆盖有限 | +| +-------------------------------------------------------------+ +| | Data Targeting | Training Strategy | +| | - 120 patch episodes| - Continual Learning | +| | - 40 teleop fixes | - 冻结底座,小规模增量 | +| | - 1 light variant | - 低成本短周期 | +| +-------------------------------------------------------------+ +| | Verification Setup | Expected Outcome | +| | - Robustness subset | +8~12% uplift | +| | - Occlusion replay | Confidence 0.73 | +| | - Checkpoint gating | ETA 45 min | +| +-------------------------------------------------------------+ +| | [调整方案] [保存计划] [批准并运行] | ++----------------------------------------------------------------------------------+ +``` + +--- + +## E. Iteration Runner + +```text ++----------------------------------------------------------------------------------+ +| Iteration Runner [查看日志] | ++----------------------------------------------------------------------------------+ +| Sidebar | Run Timeline | +| | [1 Patch Spec] -> [2 Train] -> [3 Re-eval] -> [4 Report] | +| +-------------------------------------------------------------+ +| | Runtime Status | Artifacts | +| | - Stage: Re-eval | - training_log.txt | +| | - ETA: 12 min | - eval_result.json | +| | - GPU: 2 | - rollout_videos/ | +| | - Progress: 71% | - checkpoint_meta.json | +| +-------------------------------------------------------------+ +| | Version Tracking | +| | ckpt_v0.7 -> iter_01 -> candidate_ckpt_v0.8 | +| +-------------------------------------------------------------+ +| | Live Console / Events | +| | > loading checkpoint... | +| | > running continual learning... | +| | > evaluating robustness subset... | +| +-------------------------------------------------------------+ +| | [停止运行] [返回计划] [查看结果] | ++----------------------------------------------------------------------------------+ +``` + +--- + +## F. Improvement Report + +```text ++----------------------------------------------------------------------------------+ +| Improvement Report [分享结果] | ++----------------------------------------------------------------------------------+ +| Sidebar | Before vs After | +| | Success 62% -> 74% | +| | Fail Clusters 4 -> 2 | +| | Recovery 31% -> 55% | +| +-------------------------------------------------------------+ +| | Artifact Comparison | +| | [Before Video] [After Video] | +| +-------------------------------------------------------------+ +| | What Changed | New Assets | +| | - Added patch data | - occlusion_recovery_v1 | +| | - Improved recovery | - clutter_patch_template | +| | - Expanded verification| - grasp_after_occlusion_miss | +| +-------------------------------------------------------------+ +| | Recommended Next Iteration | +| | 下一步建议:处理语言变化下的时序规划问题 | +| | [查看平台沉淀] [启动下一轮] | ++----------------------------------------------------------------------------------+ +``` + +--- + +## G. Platform Memory + +```text ++----------------------------------------------------------------------------------+ +| Platform Memory [返回项目] | ++----------------------------------------------------------------------------------+ +| Sidebar | Reusable Recipes | +| | [occlusion_v1] [recovery_v2] [tactile_sync_v1] | +| +-------------------------------------------------------------+ +| | Pipeline Templates | +| | [libero_patch_pipeline] [teleop_fix_flow] | +| +-------------------------------------------------------------+ +| | Failure Ontology | +| | [perception gap] [recovery missing] [env mismatch] | +| +-------------------------------------------------------------+ +| | Reuse Insights | +| | 本项目复用了 2 个 recipe,新增 1 个 template。 | +| +-------------------------------------------------------------+ +| | Cross-project Learnings | +| | 遮挡恢复策略可迁移至相邻 manipulation 场景。 | ++----------------------------------------------------------------------------------+ +``` + +--- + +# 5. 页面模块与组件建议 + +## 5.1 导航类组件 + +| 组件名称 | 作用 | 典型字段 | +| ------------ | ------------------------ | -------------------------------------- | +| 顶部全局导航 | 全局品牌、搜索、关键入口 | logo、search、user menu、import action | +| 侧边导航 | Investor demo 主路径导航 | nav label、icon、active state | +| 面包屑 | 表示当前项目/页面层级 | current page、project name | + +## 5.2 数据概览类组件 + +| 组件名称 | 作用 | 典型字段 | +| ---------- | -------------- | ------------------------------------------- | +| KPI 卡片 | 展示关键信号 | title、value、delta、status | +| 项目摘要卡 | 概括项目状态 | project name、domain、checkpoint、suite | +| 风险标记卡 | 强调主要问题 | risk type、severity、note | +| 资产摘要卡 | 概括可复用沉淀 | recipes count、templates count、reuse score | + +## 5.3 诊断类组件 + +| 组件名称 | 作用 | 典型字段 | +| -------------------- | ---------------------------------------- | --------------------------------------------- | +| Failure Cluster 卡片 | 聚类展示失败模式 | cluster name、frequency、severity、confidence | +| Root Cause Panel | 呈现原因假设 | hypothesis、evidence、confidence | +| Breakdown 图表 | 展示 task/scene/condition 维度的表现差异 | dimension、value、benchmark | +| Episode Viewer | 展示代表性失败样本 | video url、episode id、notes | + +## 5.4 执行类组件 + +| 组件名称 | 作用 | 典型字段 | +| --------------- | ---------------- | ------------------------------------------- | +| Patch Plan 卡片 | 展示修复策略 | gap、target spec、strategy、expected uplift | +| 运行时间线 | 展示执行阶段 | stage、status、progress、eta | +| Runtime 状态卡 | 展示当前作业状态 | job id、gpu、step、current phase | +| 日志流组件 | 展示关键执行事件 | timestamp、event、severity | + +## 5.5 结果类组件 + +| 组件名称 | 作用 | 典型字段 | +| ------------------- | ----------------------- | --------------------------------- | +| Before/After 对比卡 | 展示结果变化 | metric name、before、after、delta | +| Artifact 对比器 | 展示视频或 rollout 差异 | before artifact、after artifact | +| Checkpoint 摘要卡 | 展示新 checkpoint 信息 | version、created at、summary | +| 下一轮建议卡 | 连接下一次迭代 | next issue、recommended action | + +## 5.6 平台记忆类组件 + +| 组件名称 | 作用 | 典型字段 | +| --------------------- | ---------------- | ------------------------------------------ | +| Recipe 卡片 | 展示数据修复配方 | recipe name、modality mix、reuse count | +| Template 卡片 | 展示流程模板 | template name、applies to、last used | +| Failure Ontology 卡片 | 展示失败知识结构 | pattern name、description、linked projects | +| Reuse Insight 卡片 | 展示平台复利 | reused assets、new assets、coverage note | + +--- + +# 6. 关键状态设计 + +## 6.1 全局状态 + +| 状态 | 含义 | 用户提示 | 推荐 CTA | +| ------- | ------------------ | ------------------------------------------------ | ------------------- | +| empty | 当前无数据或无项目 | 暂无可展示项目,请导入 demo 数据或选择示例项目。 | 导入项目 / 打开示例 | +| loading | 数据加载中 | 正在加载项目上下文与评测结果。 | 无 CTA,显示骨架屏 | +| running | 当前有执行中作业 | Nvex 正在编排本轮迭代,请稍后查看结果。 | 查看运行状态 | +| success | 当前流程执行完成 | 本轮迭代已完成,结果可用于对比与复盘。 | 查看结果 | +| failure | 加载或执行失败 | 当前操作未成功完成,请检查输入或重试。 | 重试 / 查看日志 | + +## 6.2 页面级关键状态 + +### Home / Project Hub + +| 状态 | 用户提示 | 推荐 CTA | +| ---------- | ---------------------------------- | ------------------- | +| 无项目 | 当前无项目,请导入或打开示例项目。 | 导入项目 / 打开示例 | +| 有推荐项目 | 已为本次演示准备示例项目。 | 进入 Demo 项目 | + +### Project Overview + +| 状态 | 用户提示 | 推荐 CTA | +| ---------- | ------------------------------ | ------------ | +| 无评测结果 | 当前 checkpoint 尚无评测快照。 | 运行评测 | +| 评测可用 | 已检测到可用评测结果。 | 查看失败分析 | + +### Failure Map + +| 状态 | 用户提示 | 推荐 CTA | +| ---------- | -------------------------------- | ----------------------- | +| 无诊断结果 | 评测结果不足以生成 failure map。 | 返回项目概览 / 运行评测 | +| 诊断已生成 | 已识别关键 failure clusters。 | 生成 Patch Plan | + +### Patch Plan + +| 状态 | 用户提示 | 推荐 CTA | +| ------------ | ------------------------ | ------------ | +| 计划待生成 | 尚未生成 patch plan。 | 自动生成方案 | +| 计划待审批 | 已生成方案,待确认执行。 | 批准并运行 | +| 方案需要调整 | 当前方案存在冲突或缺失。 | 调整方案 | + +### Iteration Runner + +| 状态 | 用户提示 | 推荐 CTA | +| ------------ | ------------------------------- | --------------- | +| 等待启动 | 尚未开始本轮执行。 | 启动运行 | +| 训练进行中 | 正在执行训练或增量更新。 | 查看日志 | +| 评测进行中 | 正在验证 candidate checkpoint。 | 查看运行状态 | +| 已完成待查看 | 结果已生成。 | 查看结果 | +| 执行失败 | 当前作业中断或失败。 | 重试 / 查看日志 | + +### Improvement Report + +| 状态 | 用户提示 | 推荐 CTA | +| ---------- | -------------------------------- | ------------------------- | +| 无结果 | 当前尚未生成对比结果。 | 返回运行页面 | +| 结果已归档 | 本轮 improvement report 已归档。 | 查看平台沉淀 / 启动下一轮 | + +### Platform Memory + +| 状态 | 用户提示 | 推荐 CTA | +| -------- | ---------------------------- | ------------ | +| 暂无沉淀 | 当前项目尚未沉淀可复用资产。 | 完成一次迭代 | +| 已有沉淀 | 可复用资产已进入平台记忆层。 | 查看复用详情 | + +--- + +# 7. 推荐标签与 CTA 文案 + +## 7.1 导航标签 + +- 项目中心 +- 项目概览 +- 失败分析 +- 修复方案 +- 迭代执行 +- 结果报告 +- 平台记忆 + +## 7.2 卡片标题 + +- 当前 Checkpoint +- 评测快照 +- 主要风险 +- 关键失败簇 +- 根因假设 +- 修复策略 +- 验证方案 +- 结果提升 +- 新增平台资产 +- 复用洞察 + +## 7.3 空状态提示 + +- 当前暂无项目数据,请导入示例项目开始演示。 +- 当前 checkpoint 尚无可用评测结果。 +- 尚未识别到可供分析的失败模式。 +- 当前项目尚未沉淀平台资产。 + +## 7.4 按钮 CTA + +- 进入 Demo 项目 +- 查看失败分析 +- 生成 Patch Plan +- 批准并运行 +- 查看运行状态 +- 查看结果报告 +- 查看平台沉淀 +- 启动下一轮迭代 +- 导出诊断摘要 +- 返回项目概览 + +## 7.5 成功反馈 + +- Patch Plan 已生成,可进入审批与执行。 +- 本轮迭代已完成,结果已同步至报告页。 +- 新 checkpoint 已归档,平台资产已更新。 +- 复用资产已写入平台记忆层。 + +## 7.6 错误反馈 + +- 当前评测结果不足,暂无法生成诊断。 +- 本轮执行未完成,请检查运行日志后重试。 +- 无法加载 artifact,请稍后刷新。 +- 当前方案缺少必要输入,无法提交执行。 + +--- + +# 8. 视觉设计方向 + +## 8.1 整体风格 + +视觉风格建议延续现有 deck 的品牌调性:**深色科技感、蓝紫渐变、系统编排感、闭环与节点图语言**。页面整体应体现“智能调度系统”的高级感,而不是“工具后台”的操作感。 + +## 8.2 色彩建议 + +- 主背景:深紫黑 / 深石墨色 +- 品牌高亮:蓝紫渐变、洋红渐变、冷光 cyan 辅助 +- 语义色: + - success:青绿或冷绿 + - warning:琥珀橙 + - failure:偏红紫,而非纯红 +- 图表建议避免高饱和杂色,保持系统感 + +## 8.3 版式建议 + +- 采用左侧固定导航 + 顶部状态栏 + 中央主内容区 +- 核心 KPI 和关键 CTA 保持首屏可见 +- 每页只突出一个主叙事焦点,避免信息堆叠 +- 模块间采用卡片与留白构建层级 + +## 8.4 卡片建议 + +- 使用微发光边框或渐变描边 +- 关键卡片可使用轻微玻璃拟态,但不宜过重 +- 卡片内部强调标题、数值、结论、动作四段式结构 + +## 8.5 图表建议 + +- 优先使用简洁条形图、热力图、雷达图、矩阵对比 +- Failure cluster 建议卡片化 + 小图标 + 置信度 +- Before/After 对比建议用并排卡片 + 明确 delta + +## 8.6 节点 / 流程图建议 + +- 用 loop / graph 风格表达 intelligence loop +- 在 Patch Plan、Iteration Runner 中引入阶段节点 +- 节点状态可配合动效展示运行进度 + +## 8.7 动效建议 + +- 页面切换:轻量淡入 + 渐变滑动 +- Running 状态:节点脉冲、路径流动、进度高亮 +- 成功状态:数值跃迁、卡片边框发光强化 +- 动效应服务于“编排感”,而非炫技 + +## 避免事项 + +不要做成普通数据标注后台、任务工单系统或通用 MLOps 控制台;前端必须优先表达 **failure-driven orchestration**,而不是“管理数据/训练作业”。 + +--- + +# 9. 投资人 Demo Flow + +## Step 1:从 Home / Project Hub 开始 + +**展示页面**:Home / Project Hub +**讲述内容**:Nvex 不是训练框架,而是将模型失败转化为可执行改进的 orchestration layer。 +**希望投资人感知到的价值**:这是一个产品层,而不是研究代码或标注工具。 + +## Step 2:进入 Project Overview + +**展示页面**:Project Overview +**讲述内容**:当前模型 checkpoint 已具备基础能力,但在特定场景下表现不稳定,Nvex 已识别出下一步最值得处理的问题。 +**希望投资人感知到的价值**:系统不是泛泛而谈,而是在具体项目语境中做出判断。 + +## Step 3:展开 Failure Map + +**展示页面**:Failure Map +**讲述内容**:Nvex 读懂 benchmark/eval 结果,定位模型失败的主要模式,并给出根因假设。 +**希望投资人感知到的价值**:Nvex 的起点是 failure intelligence,而不是盲目加数据。 + +## Step 4:生成 Patch Plan + +**展示页面**:Patch Plan +**讲述内容**:系统把 failure 转化为结构化 patch plan,包括补什么数据、用什么训练路径、如何验证。 +**希望投资人感知到的价值**:Nvex 具备决策与编排能力,而不仅是分析能力。 + +## Step 5:展示 Iteration Runner + +**展示页面**:Iteration Runner +**讲述内容**:Nvex 正在调度底层 runtime 执行一次真实迭代,AlphaBrain 负责实际训练、评测与 artifact 生成。 +**希望投资人感知到的价值**:这是可执行闭环,不是静态报告。 + +## Step 6:收束到 Improvement Report + Platform Memory + +**展示页面**:Improvement Report、Platform Memory +**讲述内容**:本轮迭代带来了更好的 checkpoint,并沉淀为 recipe、template、failure pattern 等可复用资产。 +**希望投资人感知到的价值**:Nvex 将模型失败转化为更好的 checkpoint,并持续积累平台资产,形成复利式 moat。 + +--- + +# 10. 附录:页面到后端能力映射 + +| 页面 | 前端层角色 | 主要依赖的 AlphaBrain 能力 | 输出给前端的关键对象 | +| ------------------ | --------------- | ------------------------------------------------------------ | ------------------------------------------------ | +| Home / Project Hub | Nvex 展示层 | 项目元数据、已有 run artifact 汇总 | Project、RunSummary、Activity | +| Project Overview | Nvex 上下文层 | baseline eval 输出、benchmark snapshot | Checkpoint、EvalRun、RiskFlag | +| Failure Map | Nvex 诊断层 | benchmark/eval 结果、world model eval artifact、episode 输出 | FailureCluster、MetricBreakdown、EpisodeArtifact | +| Patch Plan | Nvex 编排层 | eval 结果、训练能力可用性、历史配置模板 | PatchPlan、TrainingStrategy、VerificationSpec | +| Iteration Runner | Nvex 执行观察层 | baseline VLA、continual learning、re-eval、artifact 输出 | IterationRun、JobStage、RuntimeArtifact | +| Improvement Report | Nvex 结果层 | before/after eval、artifact comparison、checkpoint 输出 | ImprovementReport、CheckpointDiff、MetricDelta | +| Platform Memory | Nvex 平台记忆层 | 历史 run、配置模板、结果归档 | ReusableAsset、Recipe、Template、FailurePattern | + +## 说明 + +- **Nvex 前端展示的是 orchestration layer**:负责理解失败、定义修复动作、组织迭代、沉淀资产 +- **AlphaBrain 是 execution/runtime layer**:负责训练、评测、world model、artifact 生成等执行能力 +- 前端不直接暴露底层训练复杂度,而是将其包装为 investor 可理解的 intelligence loop + +--- + +# 11. 交付建议 + +本文件适合作为以下工作的直接输入: + +1. **交给设计师**:用于产出高保真界面与视觉系统,建议优先设计 4 个核心页面:Project Overview、Failure Map、Patch Plan、Improvement Report +2. **交给前端工程师**:用于搭建路由结构、页面骨架、组件库与状态流转,建议先以静态假数据完成页面联调,再逐步接入真实后端 +3. **交给产品与创始团队**:用于统一 investor demo narrative,确保页面顺序、字段命名、CTA 文案与现场讲解一致 + +建议落地顺序: + +- 第一阶段:完成页面骨架与关键状态 +- 第二阶段:接入真实 artifact 与结果数据 +- 第三阶段:补充动效、对比组件与平台记忆层展示 + +最终目标不是做一个“功能齐全的后台”,而是做一个能在几分钟内清晰证明 Nvex 核心价值的 investor demo。 diff --git a/nvex_server/__init__.py b/nvex_server/__init__.py new file mode 100644 index 0000000..bc9f2f5 --- /dev/null +++ b/nvex_server/__init__.py @@ -0,0 +1,11 @@ +from typing import Any + +__all__ = ["app", "create_app"] + + +def __getattr__(name: str) -> Any: + if name in {"app", "create_app"}: + from .app import app, create_app + + return {"app": app, "create_app": create_app}[name] + raise AttributeError(f"module {__name__!r} has no attribute {name!r}") \ No newline at end of file diff --git a/nvex_server/agent.py b/nvex_server/agent.py new file mode 100644 index 0000000..3e2f30e --- /dev/null +++ b/nvex_server/agent.py @@ -0,0 +1,576 @@ +""" +Milestone 3 — SelfImprovementAgent +=================================== +Autonomous orchestrator that runs the full failure → diagnosis → plan → +training → verification loop until a target KPI is reached or a stopping +condition fires. + +Demo mode (simulate=True) + Replays a precomputed 3-iteration sequence (62 → 74 → 81 → 85 %). + Steps advance each time ``advance_step`` is called, so the React UI + can animate at whatever speed it wants. + +Real mode (simulate=False) + Calls the existing M2 infrastructure (EvalArtifactExporter, + PatchPlanGenerator, JobDispatcher) to run actual AlphaBrain jobs. + Intended for customer POCs, not live investor demos. +""" + +from __future__ import annotations + +import threading +from datetime import datetime, timezone +from typing import TYPE_CHECKING, Any +from uuid import uuid4 + +from .schemas import ( + AgentEvent, + AgentRunRequest, + AgentRunState, + AgentStep, + AgentStepStatus, + ExecutionBackend, + FailureDiagnosis, + LoopIteration, + TrainingStrategy, +) + +if TYPE_CHECKING: + from .app import InMemoryStore + from .dispatcher import JobDispatcher + from .exporters import EvalArtifactExporter + from .llm_narrator import LLMNarrator + from .patch_plan_generator import PatchPlanGenerator + + +def _utc() -> datetime: + return datetime.now(timezone.utc) + + +def _sid() -> str: + return uuid4().hex[:12] + + +# --------------------------------------------------------------------------- +# Pre-seeded demo iterations (Mode A — precomputed replay) +# --------------------------------------------------------------------------- + +_DEMO_LOOPS: list[dict[str, Any]] = [ + { + "patch_cluster": "Occlusion / object visibility", + "patch_strategy": "continual_learning", + "eval_before": 0.62, + "eval_after": 0.74, + "steps": [ + ("eval", "Run eval", "Triggered evaluation of ckpt_v0.7 on LIBERO Kitchen benchmark.", 2300), + ("diagnose", "Diagnose failures", "Identified dominant failure cluster: Occlusion / object visibility (38% of failures). " + "Root causes: camera angle variation, object-behind-object cases.", 1800), + ("plan", "Generate plan", "Selected continual_learning strategy via alphabrain_cl. " + "Found matching recipe in Platform Memory: occlusion_recovery_v1 (used 3x, avg +9% uplift). Applying.", 1600), + ("dispatch", "Dispatch training", "Training job dispatched to AlphaBrain CL. " + "Config: 200 patch episodes, 70% real / 30% synthetic. Estimated time: 45 min.", 2600), + ("verify", "Verify results", "Re-evaluation complete. Success rate improved from 62% to 74%. Uplift: +12pp.", 2100), + ("memory", "Save to memory", "Saved recipe occlusion_cl_patch_v1 to Platform Memory. " + "Recipe confidence: high. Pattern fingerprint stored for future reuse.", 1300), + ("stop_check", "Check stopping", "Target KPI 75% not yet reached (74%). Improvement delta +12pp exceeds threshold. Continuing.", 900), + ], + }, + { + "patch_cluster": "Recovery / error correction", + "patch_strategy": "fine_tune", + "eval_before": 0.74, + "eval_after": 0.81, + "steps": [ + ("eval", "Run eval", "Triggered evaluation of ckpt_v1.0 (post-occlusion patch) on LIBERO Kitchen benchmark.", 2200), + ("diagnose", "Diagnose failures", "Primary cluster shifted to Recovery / error correction (31% of failures). " + "Root cause: robot unable to self-correct after near-miss grasp failures.", 1700), + ("plan", "Generate plan", "Selected fine_tune strategy on teleop correction trajectories via alphabrain_finetune. " + "No prior recipe found; agent will experiment and record outcome.", 1500), + ("dispatch", "Dispatch training", "Fine-tune job dispatched. " + "Config: 150 teleop correction clips, 80% real / 20% synthetic. Estimated time: 30 min.", 2400), + ("verify", "Verify results", "Re-evaluation complete. Success rate improved from 74% to 81%. Uplift: +7pp.", 2100), + ("memory", "Save to memory", "New recipe recovery_finetune_v1 saved. " + "Confidence: medium (first use). Will be promoted on next successful application.", 1200), + ("stop_check", "Check stopping", "Target KPI 75% reached and exceeded (81%). " + "Continue one more loop to probe robustness under distribution shift.", 900), + ], + }, + { + "patch_cluster": "Dexterity drift / gripper instability", + "patch_strategy": "fine_tune", + "eval_before": 0.81, + "eval_after": 0.79, + "steps": [ + ("eval", "Run eval", "Triggered evaluation of ckpt_v1.1 (post-recovery patch) under harder randomized seeds.", 2500), + ("diagnose", "Diagnose failures", "Failure profile shifted to dexterity drift during re-grasp. " + "The cluster appears underrepresented in current patch data.", 1800), + ("plan", "Generate plan", "Tried an aggressive fine_tune sweep to recover edge cases quickly.", 1400), + ("dispatch", "Dispatch training", "Fine-tune sweep executed with high learning-rate variant. Estimated time: 18 min.", 2100), + ("verify", "Verify results", "Regression detected. Success rate fell from 81% to 79% (-2pp).", 2000), + ("memory", "Save to memory", "Stored failed recipe signature as anti-pattern: aggressive_finetune_drift_v1.", 1100), + ("stop_check", "Check stopping", "Rollback triggered. Regression exceeds tolerance (2pp > 1pp). " + "Revert to ckpt_v1.1 and launch safer corrective loop.", 950), + ], + }, + { + "patch_cluster": "Lighting / appearance shift", + "patch_strategy": "continual_learning", + "eval_before": 0.81, + "eval_after": 0.85, + "steps": [ + ("eval", "Run eval", "Evaluated rolled-back checkpoint ckpt_v1.1 as new baseline after rollback.", 1900), + ("diagnose", "Diagnose failures", "Residual failures cluster around lighting and appearance shift (18% of failures).", 1600), + ("plan", "Generate plan", "Selected safer continual_learning patch with lighting augmentation and lower update magnitude.", 1500), + ("dispatch", "Dispatch training", "CL job dispatched with 80 lighting-augmented patch episodes. Estimated time: 20 min.", 2100), + ("verify", "Verify results", "Re-evaluation complete. Success rate improved from 81% to 85%. Uplift: +4pp.", 2100), + ("memory", "Save to memory", "Saved rollback-aware recipe lighting_cl_patch_v2. Confidence: high after stable recovery.", 1200), + ("stop_check", "Check stopping", "Improvement delta +4pp is below the 5pp diminishing-returns threshold. " + "Target KPI already exceeded. Agent terminates; convergence reached at 85%.", 900), + ], + }, +] + + +def _build_demo_run(request: AgentRunRequest) -> AgentRunState: + """Construct a fully-precomputed AgentRunState in demo/simulate mode.""" + run_id = f"agent_{uuid4().hex[:10]}" + iterations: list[LoopIteration] = [] + + for idx, loop_def in enumerate(_DEMO_LOOPS[: request.max_iterations], start=1): + steps: list[AgentStep] = [] + for step_type, label, message, expected_duration_ms in loop_def["steps"]: + steps.append( + AgentStep( + step_id=_sid(), + step_type=step_type, + status="pending", + label=label, + message=message, + expected_duration_ms=expected_duration_ms, + ) + ) + eval_after = loop_def["eval_after"] + eval_before = loop_def["eval_before"] + iterations.append( + LoopIteration( + iteration_index=idx, + patch_strategy=loop_def["patch_strategy"], + patch_cluster=loop_def["patch_cluster"], + eval_before=eval_before, + eval_after=eval_after, + delta=(round(eval_after - eval_before, 3) if eval_after is not None else None), + steps=steps, + status="pending", + ) + ) + + # Mark the very first step of the first iteration as "running" so the + # UI immediately has something to show. + if iterations and iterations[0].steps: + iterations[0].status = "running" + iterations[0].steps[0].status = "running" + iterations[0].steps[0].started_at = _utc() + + state = AgentRunState( + agent_run_id=run_id, + project_id=request.project_id, + target_kpi=request.target_kpi, + max_iterations=request.max_iterations, + diminishing_returns_threshold=request.diminishing_returns_threshold, + current_iteration=1, + status="running", + iterations=iterations, + reasoning_log=[ + f"Agent started. Target KPI: {int(request.target_kpi * 100)}%. " + f"Max iterations: {request.max_iterations}.", + "Loading checkpoint ckpt_v0.7 from Platform Memory.", + ], + ) + SelfImprovementAgent._emit_event( + state, + "run_started", + label="Agent run started", + message=f"Autonomous run launched with target KPI {int(request.target_kpi * 100)}%.", + ) + if iterations: + first_loop = iterations[0] + SelfImprovementAgent._emit_event( + state, + "iteration_started", + iteration_index=first_loop.iteration_index, + label=f"Loop {first_loop.iteration_index} started", + message=f"Focus cluster: {first_loop.patch_cluster}.", + ) + if first_loop.steps: + first_step = first_loop.steps[0] + SelfImprovementAgent._emit_event( + state, + "step_started", + iteration_index=first_loop.iteration_index, + step_id=first_step.step_id, + step_type=first_step.step_type, + label=first_step.label, + message=first_step.message, + ) + + return state + + +class SelfImprovementAgent: + """ + Orchestrates the autonomous failure-to-fix loop. + + In **simulate** mode every call to ``advance_step`` moves the pre-baked + demo state forward by exactly one step, returning the updated + ``AgentRunState``. The React frontend polls this and re-renders. + + In **real** mode the agent drives actual M2 infrastructure. Each step is + blocking, so call ``run_async`` to start it in a background thread. + """ + + def __init__( + self, + store: "InMemoryStore", + dispatcher: "JobDispatcher", + exporter: "EvalArtifactExporter", + patch_plan_generator: "PatchPlanGenerator", + narrator: "LLMNarrator | None" = None, + ) -> None: + self._store = store + self._dispatcher = dispatcher + self._exporter = exporter + self._planner = patch_plan_generator + self._narrator = narrator + self._locks: dict[str, threading.Lock] = {} + + # ------------------------------------------------------------------ + # Public API + # ------------------------------------------------------------------ + + def start(self, request: AgentRunRequest) -> AgentRunState: + """Create and persist a new agent run, then return initial state.""" + if request.simulate: + state = _build_demo_run(request) + else: + state = self._start_real(request) + + self._store.agent_runs[state.agent_run_id] = state + self._locks[state.agent_run_id] = threading.Lock() + return state + + def advance_step(self, agent_run_id: str) -> AgentRunState: + """ + Advance the agent by exactly one step (demo/simulate mode). + + Completes the current running step, then marks the next pending step + as running. When all steps in a loop are done, advances to the next + loop iteration. When all iterations are done (or a stop condition + fires), marks the run as completed/stopped. + """ + lock = self._locks.get(agent_run_id) + if lock is None: + # Reconstruct lock if it disappeared (e.g. server restart) + self._locks[agent_run_id] = threading.Lock() + lock = self._locks[agent_run_id] + + with lock: + state = self._store.agent_runs.get(agent_run_id) + if state is None: + raise KeyError(f"Agent run {agent_run_id!r} not found") + if state.status in ("completed", "stopped"): + return state + + state = self._advance_demo(state) + state.updated_at = _utc() + self._store.agent_runs[agent_run_id] = state + return state + + def get(self, agent_run_id: str) -> AgentRunState: + state = self._store.agent_runs.get(agent_run_id) + if state is None: + raise KeyError(f"Agent run {agent_run_id!r} not found") + return state + + # ------------------------------------------------------------------ + # Demo advance logic + # ------------------------------------------------------------------ + + @staticmethod + def _emit_event( + state: AgentRunState, + event_type: str, + *, + label: str, + message: str = "", + iteration_index: int | None = None, + step_id: str | None = None, + step_type: str | None = None, + duration_ms: int | None = None, + metadata: dict[str, Any] | None = None, + ) -> None: + state.events.append( + AgentEvent( + event_id=_sid(), + event_type=event_type, + iteration_index=iteration_index, + step_id=step_id, + step_type=step_type, + label=label, + message=message, + duration_ms=duration_ms, + metadata=metadata or {}, + ) + ) + + @staticmethod + def _advance_demo(state: AgentRunState) -> AgentRunState: + """Mutate-and-return: mark running→completed, activate next step.""" + now = _utc() + + for loop in state.iterations: + if loop.status not in ("running", "pending"): + continue + + if loop.status == "pending": + loop.status = "running" + + # Find the first running step and complete it + for i, step in enumerate(loop.steps): + if step.status != "running": + continue # skip completed/pending steps until we hit the running one + + step.status = "completed" + step.completed_at = now + duration_ms = step.expected_duration_ms + + # Narrate completion to reasoning log + state.reasoning_log.append( + f"[Loop {loop.iteration_index}] {step.label}: {step.message}" + ) + SelfImprovementAgent._emit_event( + state, + "step_completed", + iteration_index=loop.iteration_index, + step_id=step.step_id, + step_type=step.step_type, + label=step.label, + message=step.message, + duration_ms=duration_ms, + ) + + # Activate the next step in this loop if it exists + if i + 1 < len(loop.steps): + nxt = loop.steps[i + 1] + nxt.status = "running" + nxt.started_at = now + SelfImprovementAgent._emit_event( + state, + "step_started", + iteration_index=loop.iteration_index, + step_id=nxt.step_id, + step_type=nxt.step_type, + label=nxt.label, + message=nxt.message, + ) + return state + + # No more steps — this loop is done + loop.status = "completed" + if loop.eval_after is not None: + delta = loop.eval_after - loop.eval_before + loop.delta = round(delta, 3) + loop_text = ( + f"[Loop {loop.iteration_index}] Completed. " + f"{int(loop.eval_before * 100)}% -> {int(loop.eval_after * 100)}%. " + ) + SelfImprovementAgent._emit_event( + state, + "iteration_completed", + iteration_index=loop.iteration_index, + label=f"Loop {loop.iteration_index} completed", + message=f"Success {int(loop.eval_before * 100)}% -> {int(loop.eval_after * 100)}% (delta {int(delta * 100):+d}pp).", + metadata={ + "eval_before": loop.eval_before, + "eval_after": loop.eval_after, + "delta": loop.delta, + }, + ) + # Check stopping for the last (stop_check) step + last_step = loop.steps[-1] + if last_step.step_type == "stop_check": + lower_message = last_step.message.lower() + if "rollback" in lower_message or "revert" in lower_message: + loop.rolled_back = True + loop.rollback_reason = last_step.message + SelfImprovementAgent._emit_event( + state, + "rollback", + iteration_index=loop.iteration_index, + label=f"Rollback after loop {loop.iteration_index}", + message=last_step.message, + metadata={ + "rollback_from": loop.eval_after, + "rollback_to": loop.eval_before, + }, + ) + state.reasoning_log.append( + f"[Loop {loop.iteration_index}] Rollback executed to prior checkpoint " + f"after regression ({int(loop.eval_before * 100)}% -> {int(loop.eval_after * 100)}%)." + ) + # Detect whether the stop message announces convergence + if "terminates" in lower_message or "converge" in lower_message: + state.status = "stopped" + state.stop_reason = last_step.message + state.reasoning_log.append(loop_text + "Agent stopped: diminishing returns / convergence.") + SelfImprovementAgent._emit_event( + state, + "run_stopped", + iteration_index=loop.iteration_index, + label="Agent stopped", + message=last_step.message, + ) + return state + state.reasoning_log.append(loop_text) + + # Advance to the next loop + state.current_iteration += 1 + next_loop_idx = loop.iteration_index # 0-based offset into list + if next_loop_idx < len(state.iterations): + next_loop = state.iterations[next_loop_idx] + next_loop.status = "running" + SelfImprovementAgent._emit_event( + state, + "iteration_started", + iteration_index=next_loop.iteration_index, + label=f"Loop {next_loop.iteration_index} started", + message=f"Focus cluster: {next_loop.patch_cluster}.", + ) + if next_loop.steps: + next_loop.steps[0].status = "running" + next_loop.steps[0].started_at = now + first_step = next_loop.steps[0] + SelfImprovementAgent._emit_event( + state, + "step_started", + iteration_index=next_loop.iteration_index, + step_id=first_step.step_id, + step_type=first_step.step_type, + label=first_step.label, + message=first_step.message, + ) + return state + + # All loops exhausted — mark completed + state.status = "completed" + state.stop_reason = "All planned iterations completed." + state.reasoning_log.append("Agent run completed successfully.") + SelfImprovementAgent._emit_event( + state, + "run_completed", + label="Agent completed", + message=state.stop_reason, + ) + return state + + return state + + # ------------------------------------------------------------------ + # Real-mode skeleton (M3 → M4 extension point) + # ------------------------------------------------------------------ + + def _start_real(self, request: AgentRunRequest) -> AgentRunState: + """ + Bootstrap a real agent run. Full implementation is a M3→M4 extension. + Currently creates a single-iteration run that drives the M2 pipeline. + """ + run_id = f"agent_{uuid4().hex[:10]}" + steps = [ + AgentStep(step_id=_sid(), step_type="eval", status="pending", label="Run eval", message=""), + AgentStep(step_id=_sid(), step_type="diagnose", status="pending", label="Diagnose failures", message=""), + AgentStep(step_id=_sid(), step_type="plan", status="pending", label="Generate plan", message=""), + AgentStep(step_id=_sid(), step_type="dispatch", status="pending", label="Dispatch training", message=""), + AgentStep(step_id=_sid(), step_type="verify", status="pending", label="Verify results", message=""), + AgentStep(step_id=_sid(), step_type="memory", status="pending", label="Save to memory", message=""), + AgentStep(step_id=_sid(), step_type="stop_check", status="pending", label="Check stopping", message=""), + ] + loop = LoopIteration( + iteration_index=1, + patch_strategy="continual_learning", + patch_cluster="unknown", + eval_before=0.0, + steps=steps, + status="pending", + ) + state = AgentRunState( + agent_run_id=run_id, + project_id=request.project_id, + target_kpi=request.target_kpi, + max_iterations=request.max_iterations, + diminishing_returns_threshold=request.diminishing_returns_threshold, + current_iteration=1, + status="idle", + iterations=[loop], + ) + # Launch real execution in background + thread = threading.Thread( + target=self._run_real_loop, + args=(state, request), + daemon=True, + ) + thread.start() + return state + + def _run_real_loop(self, state: AgentRunState, request: AgentRunRequest) -> None: + """Background thread for real-mode execution. Partial implementation.""" + # Placeholder — real implementation in M4 + pass + + # ------------------------------------------------------------------ + # Tool implementations (callable from real-mode loop) + # ------------------------------------------------------------------ + + def tool_diagnose_failures(self, eval_run_id: str) -> FailureDiagnosis: + eval_run = self._store.eval_runs.get(eval_run_id) + if eval_run is None: + raise ValueError(f"EvalRun {eval_run_id!r} not found") + + clusters = sorted(eval_run.failure_clusters, key=lambda c: c.share_of_failures, reverse=True) + primary = clusters[0] if clusters else None + + _STRATEGY_MAP: dict[str, tuple[TrainingStrategy, ExecutionBackend]] = { + "occlusion": ("continual_learning", "alphabrain_cl"), + "recovery": ("fine_tune", "alphabrain_finetune"), + "language": ("vlm_cotrain", "alphabrain_vlm_cotrain"), + "lighting": ("continual_learning", "alphabrain_cl"), + "temporal": ("world_model_verification", "alphabrain_world_model"), + } + + strategy: TrainingStrategy = "continual_learning" + backend: ExecutionBackend = "alphabrain_cl" + if primary: + for keyword, (s, b) in _STRATEGY_MAP.items(): + if keyword in primary.failure_pattern.lower() or keyword in primary.label.lower(): + strategy, backend = s, b + break + + narrator = self._narrator + reasoning = ( + narrator.narrate_diagnosis(primary, eval_run) if narrator and primary + else ( + f"Primary failure cluster '{primary.label}' accounts for " + f"{int(primary.share_of_failures * 100)}% of failures. " + f"Recommended strategy: {strategy}." + if primary + else "No failure clusters detected." + ) + ) + + return FailureDiagnosis( + primary_cluster_id=primary.cluster_id if primary else "none", + primary_cluster_label=primary.label if primary else "None", + root_causes=[c.failure_pattern for c in clusters[:3]], + recommended_strategy=strategy, + recommended_backend=backend, + reasoning=reasoning, + confidence=primary.share_of_failures if primary else 0.0, + ) diff --git a/nvex_server/app.py b/nvex_server/app.py new file mode 100644 index 0000000..a470d10 --- /dev/null +++ b/nvex_server/app.py @@ -0,0 +1,348 @@ +from __future__ import annotations + +from dataclasses import dataclass, field +from datetime import datetime, timezone +from pathlib import Path +from uuid import uuid4 + +from fastapi import FastAPI, HTTPException +from fastapi.middleware.cors import CORSMiddleware + +from .agent import SelfImprovementAgent +from .dispatcher import JobDispatcher +from .exporters import EvalArtifactExporter +from .llm_narrator import LLMNarrator +from .patch_plan_generator import PatchPlanGenerator +from .schemas import ( + AgentRunRequest, + AgentRunState, + DemoStateResponse, + EvalImportRequest, + EvalRun, + ImprovementReport, + IterationJob, + IterationStartRequest, + PatchPlan, + PlanGenerationRequest, + PlatformMemorySnapshot, + PlatformMemoryStats, + ProjectContext, + ReusableAsset, +) + + +def utc_now() -> datetime: + return datetime.now(timezone.utc) + + +@dataclass +class InMemoryStore: + eval_runs: dict[str, EvalRun] = field(default_factory=dict) + patch_plans: dict[str, PatchPlan] = field(default_factory=dict) + iteration_jobs: dict[str, IterationJob] = field(default_factory=dict) + reports: dict[str, ImprovementReport] = field(default_factory=dict) + assets: dict[str, ReusableAsset] = field(default_factory=dict) + agent_runs: dict[str, AgentRunState] = field(default_factory=dict) + demo_eval_run_id: str | None = None + demo_patch_plan_id: str | None = None + demo_iteration_id: str | None = None + demo_agent_run_id: str | None = None + + +def create_app() -> FastAPI: + app = FastAPI(title="Nvex Server", version="0.1.0") + app.add_middleware( + CORSMiddleware, + allow_origins=["http://localhost:5173", "http://127.0.0.1:5173", "*"], + allow_methods=["*"], + allow_headers=["*"], + ) + store = InMemoryStore() + repo_root = Path(__file__).resolve().parents[1] + exporter = EvalArtifactExporter() + patch_plan_generator = PatchPlanGenerator() + dispatcher = JobDispatcher(repo_root=repo_root, jobs_root=repo_root / "results" / "nvex_jobs") + narrator = LLMNarrator() + agent = SelfImprovementAgent(store, dispatcher, exporter, patch_plan_generator, narrator) + + def build_report( + iteration_id: str, + patch_plan: PatchPlan, + before_eval: EvalRun, + after_eval: EvalRun | None, + ) -> ImprovementReport: + success_before = before_eval.overall_success + success_after = after_eval.overall_success if after_eval is not None else min( + 1.0, round(success_before + patch_plan.expected_uplift, 3) + ) + assets = [ + ReusableAsset( + asset_id=f"asset_{uuid4().hex[:10]}", + type="recipe", + name=f"{patch_plan.annotation_schema}_{patch_plan.training_strategy}", + source_project=patch_plan.project_id, + reuse_count=0, + linked_iteration=iteration_id, + description="Auto-generated reusable recipe from the current patch plan.", + ) + ] + report = ImprovementReport( + iteration_id=iteration_id, + plan_id=patch_plan.plan_id, + project_id=patch_plan.project_id, + success_before=success_before, + success_after=success_after, + uplift=round(max(0.0, success_after - success_before), 3), + summary=( + f"Applied {patch_plan.training_strategy} via {patch_plan.execution_backend} and evaluated " + f"against {patch_plan.verification_spec}." + ), + changes=[ + f"Targeted {patch_plan.target_data_spec.patch_episodes} patch episodes.", + f"Added {patch_plan.target_data_spec.teleop_corrections} teleop correction trajectories.", + f"Verification spec: {patch_plan.verification_spec}.", + ], + next_target="Temporal planning under language variation", + assets_created=assets, + ) + for asset in assets: + store.assets[asset.asset_id] = asset + return report + + def platform_memory() -> PlatformMemorySnapshot: + recipe_names = sorted({asset.name for asset in store.assets.values() if asset.type == "recipe"}) + failure_names = sorted( + { + cluster.failure_pattern + for eval_run in store.eval_runs.values() + for cluster in eval_run.failure_clusters + } + ) + templates = sorted({patch_plan.annotation_schema for patch_plan in store.patch_plans.values()}) + return PlatformMemorySnapshot( + recipes=recipe_names, + templates=templates, + failures=failure_names, + stats=PlatformMemoryStats( + recipes=len(recipe_names), + templates=len(templates), + patterns=len(failure_names), + projects=len({eval_run.project_id for eval_run in store.eval_runs.values()}), + ), + ) + + def project_context(eval_run: EvalRun, patch_plan: PatchPlan) -> ProjectContext: + top_cluster = max(eval_run.failure_clusters, key=lambda item: item.share_of_failures) if eval_run.failure_clusters else None + return ProjectContext( + name="LIBERO Kitchen Pick-and-Place", + checkpoint=eval_run.checkpoint or "ckpt_v0.7", + domain="tabletop manipulation", + suite=eval_run.benchmark_suite, + status="Underperforming" if eval_run.overall_success < 0.7 else "Stable", + status_note=top_cluster.label if top_cluster else "No dominant failure cluster", + top_risk=top_cluster.failure_pattern if top_cluster else "generalization", + next_action=f"Run {patch_plan.training_strategy} patch via {patch_plan.execution_backend}", + ) + + def save_eval_run(eval_run: EvalRun) -> EvalRun: + store.eval_runs[eval_run.run_id] = eval_run + return eval_run + + def import_eval_from_request(request: EvalImportRequest) -> EvalRun: + if request.eval_run is not None: + return save_eval_run(request.eval_run) + + eval_run = exporter.export( + request.artifact_path or "", + request.artifact_type, + project_id=request.project_id, + benchmark_suite=request.benchmark_suite, + checkpoint=request.checkpoint, + run_id=request.run_id, + ) + return save_eval_run(eval_run) + + def generate_plan_from_request(request: PlanGenerationRequest) -> PatchPlan: + eval_run = request.eval_run + if eval_run is None: + eval_run = store.eval_runs.get(request.eval_run_id or "") + + if eval_run is None: + raise HTTPException(status_code=404, detail="EvalRun not found") + + save_eval_run(eval_run) + patch_plan = patch_plan_generator.generate(eval_run) + store.patch_plans[patch_plan.plan_id] = patch_plan + return patch_plan + + def start_iteration_internal(request: IterationStartRequest) -> IterationJob: + patch_plan = store.patch_plans.get(request.plan_id) + if patch_plan is None: + raise HTTPException(status_code=404, detail="PatchPlan not found") + + eval_run = store.eval_runs.get(patch_plan.based_on_eval_run) + success_before = eval_run.overall_success if eval_run is not None else max(0.0, 1.0 - patch_plan.expected_uplift) + job = dispatcher.start(patch_plan, request, before_success=success_before) + + after_path = request.config.get("after_eval_artifact_path") + if after_path: + after_eval_run = exporter.export( + after_path, + request.config.get("after_eval_artifact_type", "auto"), + project_id=patch_plan.project_id, + benchmark_suite=eval_run.benchmark_suite if eval_run is not None else None, + checkpoint=job.output_checkpoint, + run_id=f"{job.iteration_id}_after", + ) + save_eval_run(after_eval_run) + job.after_eval_run_id = after_eval_run.run_id + job.result_summary.success_after = after_eval_run.overall_success + job.artifacts.eval_runs.append(after_eval_run.run_id) + report = build_report(job.iteration_id, patch_plan, eval_run or EvalRun( + run_id=f"{job.iteration_id}_before", + project_id=patch_plan.project_id, + benchmark_suite="LIBERO_goal", + overall_success=success_before, + ), store.eval_runs.get(job.after_eval_run_id) if job.after_eval_run_id else None) + + store.iteration_jobs[job.iteration_id] = job + store.reports[job.iteration_id] = report + return job + + def seed_demo_state() -> None: + before_path = repo_root / "nvex_server" / "examples" / "libero_kitchen_before_eval.json" + after_path = repo_root / "nvex_server" / "examples" / "libero_kitchen_after_eval.json" + before_eval = save_eval_run( + exporter.export( + str(before_path), + "libero_eval_json", + project_id="proj_libero_kitchen", + benchmark_suite="LIBERO_goal", + checkpoint="ckpt_v0.7", + run_id="eval_libero_before", + ) + ) + patch_plan = patch_plan_generator.generate(before_eval) + store.patch_plans[patch_plan.plan_id] = patch_plan + iteration = start_iteration_internal( + IterationStartRequest( + plan_id=patch_plan.plan_id, + checkpoint=before_eval.checkpoint or "ckpt_v0.7", + execution_backend=patch_plan.execution_backend, + config={ + "simulate": True, + "after_eval_artifact_path": str(after_path), + "after_eval_artifact_type": "libero_eval_json", + }, + ) + ) + store.demo_eval_run_id = before_eval.run_id + store.demo_patch_plan_id = patch_plan.plan_id + store.demo_iteration_id = iteration.iteration_id + + # Seed a demo agent run (Mode A — precomputed 3-loop replay) + demo_agent_run = agent.start( + AgentRunRequest( + project_id="proj_libero_kitchen", + checkpoint="ckpt_v0.7", + target_kpi=0.75, + max_iterations=4, + diminishing_returns_threshold=0.05, + simulate=True, + ) + ) + store.demo_agent_run_id = demo_agent_run.agent_run_id + + seed_demo_state() + + @app.get("/health") + def health() -> dict[str, str]: + return {"status": "ok"} + + @app.post("/api/eval/import", response_model=EvalRun) + def import_eval_run(request: EvalImportRequest) -> EvalRun: + return import_eval_from_request(request) + + @app.post("/api/plan/generate", response_model=PatchPlan) + def generate_patch_plan(request: PlanGenerationRequest) -> PatchPlan: + return generate_plan_from_request(request) + + @app.post("/api/iteration/start", response_model=IterationJob) + def start_iteration(request: IterationStartRequest) -> IterationJob: + return start_iteration_internal(request) + + @app.get("/api/iteration/{iteration_id}/status", response_model=IterationJob) + def get_iteration_status(iteration_id: str) -> IterationJob: + job = store.iteration_jobs.get(iteration_id) + if job is None: + raise HTTPException(status_code=404, detail="IterationJob not found") + + job = dispatcher.refresh(job) + store.iteration_jobs[iteration_id] = job + return job + + @app.get("/api/report/{iteration_id}", response_model=ImprovementReport) + def get_report(iteration_id: str) -> ImprovementReport: + report = store.reports.get(iteration_id) + if report is None: + raise HTTPException(status_code=404, detail="ImprovementReport not found") + return report + + @app.get("/api/demo/state", response_model=DemoStateResponse) + def get_demo_state() -> DemoStateResponse: + if not store.demo_eval_run_id or not store.demo_patch_plan_id or not store.demo_iteration_id: + raise HTTPException(status_code=500, detail="Demo state not initialized") + + eval_run = store.eval_runs[store.demo_eval_run_id] + patch_plan = store.patch_plans[store.demo_patch_plan_id] + iteration = store.iteration_jobs[store.demo_iteration_id] + report = store.reports[store.demo_iteration_id] + + return DemoStateResponse( + project=project_context(eval_run, patch_plan), + current_eval_run=eval_run, + patch_plan=patch_plan, + iteration_job=iteration, + report=report, + platform_memory=platform_memory(), + ) + + # ------------------------------------------------------------------ + # Milestone 3 — Self-Improving Agent endpoints + # ------------------------------------------------------------------ + + @app.post("/api/agent/run", response_model=AgentRunState) + def start_agent_run(request: AgentRunRequest) -> AgentRunState: + """Launch a new autonomous improvement run.""" + return agent.start(request) + + @app.get("/api/agent/{agent_run_id}/status", response_model=AgentRunState) + def get_agent_status(agent_run_id: str) -> AgentRunState: + """Poll the current state of an agent run.""" + try: + return agent.get(agent_run_id) + except KeyError: + raise HTTPException(status_code=404, detail="AgentRun not found") + + @app.post("/api/agent/{agent_run_id}/advance", response_model=AgentRunState) + def advance_agent_step(agent_run_id: str) -> AgentRunState: + """ + Advance the agent by one step (demo/simulate mode). + The React UI calls this to animate each reasoning step. + """ + try: + return agent.advance_step(agent_run_id) + except KeyError: + raise HTTPException(status_code=404, detail="AgentRun not found") + + @app.get("/api/demo/agent", response_model=AgentRunState) + def get_demo_agent() -> AgentRunState: + """Return the pre-seeded demo agent run state.""" + if not store.demo_agent_run_id: + raise HTTPException(status_code=500, detail="Demo agent run not initialized") + return agent.get(store.demo_agent_run_id) + + return app + + +app = create_app() \ No newline at end of file diff --git a/nvex_server/dispatcher.py b/nvex_server/dispatcher.py new file mode 100644 index 0000000..1eb085c --- /dev/null +++ b/nvex_server/dispatcher.py @@ -0,0 +1,188 @@ +from __future__ import annotations + +import json +import os +import subprocess +from dataclasses import dataclass +from datetime import datetime, timezone +from pathlib import Path +from uuid import uuid4 + +from .schemas import IterationArtifacts, IterationJob, IterationResultSummary, IterationStartRequest, PatchPlan + + +def utc_now() -> datetime: + return datetime.now(timezone.utc) + + +@dataclass +class JobPaths: + job_dir: Path + log_path: Path + metadata_path: Path + exit_code_path: Path + + +class JobDispatcher: + def __init__(self, repo_root: Path, jobs_root: Path) -> None: + self.repo_root = repo_root + self.jobs_root = jobs_root + self.jobs_root.mkdir(parents=True, exist_ok=True) + + def start(self, patch_plan: PatchPlan, request: IterationStartRequest, before_success: float) -> IterationJob: + iteration_id = f"iter_{uuid4().hex[:10]}" + output_checkpoint = request.config.get("output_checkpoint") or f"{request.checkpoint}_patched" + paths = self._paths(iteration_id) + paths.job_dir.mkdir(parents=True, exist_ok=True) + + command = self._build_command(patch_plan, request) + simulate = bool(request.config.get("simulate", False) or command is None) + success_after = min(1.0, round(before_success + patch_plan.expected_uplift, 3)) + status = "completed" if simulate else "running" + job = IterationJob( + iteration_id=iteration_id, + project_id=patch_plan.project_id, + plan_id=patch_plan.plan_id, + based_on_checkpoint=request.checkpoint, + status=status, + execution_backend=request.execution_backend or patch_plan.execution_backend, + config=request.config, + command=command, + log_path=str(paths.log_path), + output_checkpoint=output_checkpoint, + result_summary=IterationResultSummary(success_before=before_success, success_after=success_after), + artifacts=IterationArtifacts( + logs=[str(paths.log_path)], + videos=[], + eval_runs=[patch_plan.based_on_eval_run], + metadata_path=str(paths.metadata_path), + ), + created_at=utc_now(), + updated_at=utc_now(), + ) + + if simulate: + job.exit_code = 0 + paths.log_path.write_text( + "Simulated iteration job. Provide config.simulate=false plus backend-specific config to run real commands.\n", + encoding="utf-8", + ) + self._write_metadata(job, paths) + paths.exit_code_path.write_text("0", encoding="utf-8") + return job + + wrapped_command = ( + f"{{ {command}; }} > {self._quote(paths.log_path)} 2>&1; " + f"status=$?; echo $status > {self._quote(paths.exit_code_path)}; exit 0" + ) + process = subprocess.Popen( + ["bash", "-lc", wrapped_command], + cwd=self.repo_root, + start_new_session=True, + ) + job.pid = process.pid + self._write_metadata(job, paths) + return job + + def refresh(self, job: IterationJob) -> IterationJob: + paths = self._paths(job.iteration_id) + exit_code = self._read_exit_code(paths) + is_running = job.pid is not None and self._is_pid_running(job.pid) + + if exit_code is None and is_running: + job.status = "running" + elif exit_code is None and job.status == "running": + job.status = "failed" + job.exit_code = -1 + elif exit_code == 0: + job.status = "completed" + job.exit_code = 0 + elif exit_code is not None: + job.status = "failed" + job.exit_code = exit_code + + job.updated_at = utc_now() + self._write_metadata(job, paths) + return job + + def _build_command(self, patch_plan: PatchPlan, request: IterationStartRequest) -> str | None: + config = request.config + if "command" in config: + return str(config["command"]) + + backend = request.execution_backend or patch_plan.execution_backend + if backend == "alphabrain_cl": + yaml = config.get("yaml") or "configs/continual_learning/qwengr00t_cl_lora_libero.yaml" + parts = ["bash scripts/run_continual_learning_scripts/run_cl_train.sh", f"--yaml {self._quote(yaml)}"] + if run_id := config.get("run_id"): + parts.append(f"--run-id {self._quote(run_id)}") + if gpus := config.get("gpus"): + parts.append(f"--gpus {self._quote(str(gpus))}") + return " ".join(parts) + + if backend in {"alphabrain_finetune", "alphabrain_vlm_cotrain"}: + mode = config.get("mode") + if not mode: + return None + config_file = config.get("config_file") + return " ".join( + part + for part in [ + "bash scripts/run_finetune.sh", + self._quote(mode), + self._quote(config_file) if config_file else None, + ] + if part + ) + + if backend == "alphabrain_eval": + mode = config.get("mode") + if not mode: + return None + config_file = config.get("config_file") + return " ".join( + part + for part in [ + "bash scripts/run_eval.sh", + self._quote(mode), + self._quote(config_file) if config_file else None, + ] + if part + ) + + if backend == "alphabrain_world_model": + command = config.get("world_model_command") + return str(command) if command else None + + return None + + def _paths(self, iteration_id: str) -> JobPaths: + job_dir = self.jobs_root / iteration_id + return JobPaths( + job_dir=job_dir, + log_path=job_dir / "command.log", + metadata_path=job_dir / "job.json", + exit_code_path=job_dir / "exit_code.txt", + ) + + def _write_metadata(self, job: IterationJob, paths: JobPaths) -> None: + paths.metadata_path.write_text(job.model_dump_json(indent=2), encoding="utf-8") + + def _read_exit_code(self, paths: JobPaths) -> int | None: + if not paths.exit_code_path.exists(): + return None + try: + return int(paths.exit_code_path.read_text(encoding="utf-8").strip()) + except ValueError: + return -1 + + def _is_pid_running(self, pid: int) -> bool: + try: + os.kill(pid, 0) + except OSError: + return False + return True + + def _quote(self, value: str | Path) -> str: + text = str(value) + return subprocess.list2cmdline([text]) \ No newline at end of file diff --git a/nvex_server/examples/libero_kitchen_after_eval.json b/nvex_server/examples/libero_kitchen_after_eval.json new file mode 100644 index 0000000..1c9e119 --- /dev/null +++ b/nvex_server/examples/libero_kitchen_after_eval.json @@ -0,0 +1,57 @@ +{ + "run_id": "eval_libero_after", + "project_id": "proj_libero_kitchen", + "benchmark_suite": "LIBERO_goal", + "checkpoint": "ckpt_v0.8", + "success_rate": 0.74, + "task_breakdown": [ + { + "task_id": "libero_spatial", + "task_name": "libero_spatial", + "attempts": 10, + "successes": 7, + "success_rate": 0.7 + }, + { + "task_id": "libero_object", + "task_name": "libero_object", + "attempts": 10, + "successes": 8, + "success_rate": 0.8 + }, + { + "task_id": "libero_goal", + "task_name": "libero_goal", + "attempts": 10, + "successes": 7, + "success_rate": 0.7 + }, + { + "task_id": "libero_10", + "task_name": "libero_10", + "attempts": 10, + "successes": 8, + "success_rate": 0.8 + } + ], + "failure_clusters": [ + { + "cluster_id": "A", + "label": "Residual occlusion", + "failure_pattern": "occlusion", + "affected_tasks": ["libero_spatial"], + "share_of_failures": 0.22, + "failure_count": 5, + "severity": "medium" + }, + { + "cluster_id": "B", + "label": "Long-horizon edge cases", + "failure_pattern": "long-horizon", + "affected_tasks": ["libero_10"], + "share_of_failures": 0.12, + "failure_count": 3, + "severity": "low" + } + ] +} \ No newline at end of file diff --git a/nvex_server/examples/libero_kitchen_before_eval.json b/nvex_server/examples/libero_kitchen_before_eval.json new file mode 100644 index 0000000..a210a23 --- /dev/null +++ b/nvex_server/examples/libero_kitchen_before_eval.json @@ -0,0 +1,66 @@ +{ + "run_id": "eval_libero_before", + "project_id": "proj_libero_kitchen", + "benchmark_suite": "LIBERO_goal", + "checkpoint": "ckpt_v0.7", + "success_rate": 0.62, + "task_breakdown": [ + { + "task_id": "libero_spatial", + "task_name": "libero_spatial", + "attempts": 10, + "successes": 5, + "success_rate": 0.5 + }, + { + "task_id": "libero_object", + "task_name": "libero_object", + "attempts": 10, + "successes": 7, + "success_rate": 0.7 + }, + { + "task_id": "libero_goal", + "task_name": "libero_goal", + "attempts": 10, + "successes": 6, + "success_rate": 0.6 + }, + { + "task_id": "libero_10", + "task_name": "libero_10", + "attempts": 10, + "successes": 7, + "success_rate": 0.7 + } + ], + "failure_clusters": [ + { + "cluster_id": "A", + "label": "Visual occlusion", + "failure_pattern": "occlusion", + "affected_tasks": ["libero_spatial", "libero_goal"], + "share_of_failures": 0.38, + "failure_count": 15, + "severity": "critical" + }, + { + "cluster_id": "B", + "label": "Recovery missing", + "failure_pattern": "recovery", + "affected_tasks": ["libero_goal"], + "share_of_failures": 0.24, + "failure_count": 10, + "severity": "high" + }, + { + "cluster_id": "C", + "label": "Generalization drift", + "failure_pattern": "generalization", + "affected_tasks": ["libero_spatial", "libero_10"], + "share_of_failures": 0.18, + "failure_count": 7, + "severity": "medium" + } + ] +} \ No newline at end of file diff --git a/nvex_server/exporters.py b/nvex_server/exporters.py new file mode 100644 index 0000000..f4dad1e --- /dev/null +++ b/nvex_server/exporters.py @@ -0,0 +1,225 @@ +from __future__ import annotations + +import json +import re +from pathlib import Path +from uuid import uuid4 + +from .schemas import ArtifactBundle, ArtifactType, EvalRun, FailureCluster, TaskBreakdownEntry + +ANSI_ESCAPE_RE = re.compile(r"\x1b\[[0-9;]*m") + + +class EvalArtifactExporter: + def export( + self, + artifact_path: str, + artifact_type: ArtifactType = "auto", + *, + project_id: str | None = None, + benchmark_suite: str | None = None, + checkpoint: str | None = None, + run_id: str | None = None, + ) -> EvalRun: + path = Path(artifact_path) + resolved_type = self._detect_type(path, artifact_type) + parser = { + "generic_json": self._from_generic_json, + "libero_eval_json": self._from_libero_eval_json, + "robocasa365_aggregate": self._from_robocasa365_aggregate, + "robocasa_tabletop_stats": self._from_robocasa_tabletop_stats, + "libero_log": self._from_libero_log, + }[resolved_type] + + eval_run = parser(path, project_id=project_id, benchmark_suite=benchmark_suite, checkpoint=checkpoint, run_id=run_id) + eval_run.artifacts.source_path = str(path) + if eval_run.artifacts.metrics_json is None and path.suffix == ".json": + eval_run.artifacts.metrics_json = str(path) + return eval_run + + def _detect_type(self, path: Path, artifact_type: ArtifactType) -> ArtifactType: + if artifact_type != "auto": + return artifact_type + + if path.name == "aggregate_stats.json": + return "robocasa365_aggregate" + if path.name == "eval_results.json": + return "libero_eval_json" + if path.suffix == ".log": + return "libero_log" + return "generic_json" + + def _from_generic_json( + self, + path: Path, + *, + project_id: str | None, + benchmark_suite: str | None, + checkpoint: str | None, + run_id: str | None, + ) -> EvalRun: + with path.open("r", encoding="utf-8") as handle: + payload = json.load(handle) + + return self._build_eval_run( + payload, + project_id=project_id, + benchmark_suite=benchmark_suite, + checkpoint=checkpoint, + run_id=run_id, + path=path, + ) + + def _from_libero_eval_json(self, path: Path, **kwargs: str | None) -> EvalRun: + with path.open("r", encoding="utf-8") as handle: + payload = json.load(handle) + payload.setdefault("benchmark_suite", payload.get("task_suite", kwargs.get("benchmark_suite") or "LIBERO_goal")) + return self._build_eval_run(payload, path=path, **kwargs) + + def _from_robocasa365_aggregate(self, path: Path, **kwargs: str | None) -> EvalRun: + with path.open("r", encoding="utf-8") as handle: + payload = json.load(handle) + + task_breakdown = [] + for task_name, task_data in payload.get("tasks", {}).items(): + attempts = int(task_data.get("num_episodes", 0)) + success_rate = float(task_data.get("success_rate", 0.0)) + task_breakdown.append( + { + "task_id": task_name, + "task_name": task_name, + "attempts": attempts, + "successes": round(attempts * success_rate), + "success_rate": success_rate, + } + ) + + generic_payload = { + "run_id": kwargs.get("run_id") or f"eval_{uuid4().hex[:10]}", + "project_id": kwargs.get("project_id") or "proj_robocasa365", + "benchmark_suite": kwargs.get("benchmark_suite") or "robocasa365", + "checkpoint": kwargs.get("checkpoint"), + "success_rate": float(payload.get("mean_success_rate", 0.0)), + "task_breakdown": task_breakdown, + } + return self._build_eval_run(generic_payload, path=path, **kwargs) + + def _from_robocasa_tabletop_stats(self, path: Path, **kwargs: str | None) -> EvalRun: + with path.open("r", encoding="utf-8") as handle: + payload = json.load(handle) + + generic_payload = { + "run_id": kwargs.get("run_id") or f"eval_{uuid4().hex[:10]}", + "project_id": kwargs.get("project_id") or "proj_robocasa_tabletop", + "benchmark_suite": kwargs.get("benchmark_suite") or "robocasa_tabletop", + "checkpoint": kwargs.get("checkpoint"), + "success_rate": float(payload.get("success_rate", 0.0)), + "task_breakdown": [ + { + "task_id": path.parent.name, + "task_name": path.parent.name, + "attempts": int(payload.get("num_episodes", 0)), + "successes": round(int(payload.get("num_episodes", 0)) * float(payload.get("success_rate", 0.0))), + "success_rate": float(payload.get("success_rate", 0.0)), + } + ], + } + return self._build_eval_run(generic_payload, path=path, **kwargs) + + def _from_libero_log(self, path: Path, **kwargs: str | None) -> EvalRun: + text = ANSI_ESCAPE_RE.sub("", path.read_text(encoding="utf-8")) + task_rates = [float(match.group(1)) for match in re.finditer(r"Current task success rate:\s*([0-9.]+)", text)] + total_match = re.search(r"Total success rate:\s*([0-9.]+)", text) + overall_success = float(total_match.group(1)) if total_match else 0.0 + task_breakdown = [ + { + "task_id": f"task_{index + 1}", + "task_name": f"Task {index + 1}", + "success_rate": task_rate, + } + for index, task_rate in enumerate(task_rates) + ] + generic_payload = { + "run_id": kwargs.get("run_id") or f"eval_{uuid4().hex[:10]}", + "project_id": kwargs.get("project_id") or "proj_libero", + "benchmark_suite": kwargs.get("benchmark_suite") or "LIBERO_goal", + "checkpoint": kwargs.get("checkpoint"), + "success_rate": overall_success, + "task_breakdown": task_breakdown, + } + return self._build_eval_run(generic_payload, path=path, **kwargs) + + def _build_eval_run( + self, + payload: dict, + *, + path: Path, + project_id: str | None, + benchmark_suite: str | None, + checkpoint: str | None, + run_id: str | None, + ) -> EvalRun: + task_breakdown_payload = payload.get("task_breakdown") or [] + task_breakdown = [ + TaskBreakdownEntry( + task_id=item.get("task_id") or item.get("name") or f"task_{index + 1}", + task_name=item.get("task_name") or item.get("name") or item.get("task_id") or f"Task {index + 1}", + success_rate=float(item.get("success_rate", item.get("pct", 0.0))), + attempts=item.get("attempts") or item.get("num_episodes"), + successes=item.get("successes") or item.get("success_count"), + ) + for index, item in enumerate(task_breakdown_payload) + ] + failure_clusters_payload = payload.get("failure_clusters") or self._infer_failure_clusters(task_breakdown) + failure_clusters = [ + FailureCluster( + cluster_id=item.get("cluster_id") or item.get("id") or chr(65 + index), + label=item.get("label") or item.get("task_name") or f"Failure cluster {index + 1}", + failure_pattern=item.get("failure_pattern") or item.get("label") or "task_reliability_gap", + affected_tasks=item.get("affected_tasks") or [], + share_of_failures=float(item.get("share_of_failures", item.get("pct", 0.0))), + failure_count=int(item.get("failure_count", item.get("count", 0))), + severity=item.get("severity") or item.get("sev") or "medium", + ) + for index, item in enumerate(failure_clusters_payload) + ] + overall_success = float(payload.get("overall_success", payload.get("success_rate", 0.0))) + + return EvalRun( + run_id=run_id or payload.get("run_id") or f"eval_{uuid4().hex[:10]}", + project_id=project_id or payload.get("project_id") or "proj_001", + benchmark_suite=benchmark_suite or payload.get("benchmark_suite") or "LIBERO_goal", + checkpoint=checkpoint or payload.get("checkpoint"), + overall_success=overall_success, + task_breakdown=task_breakdown, + failure_clusters=failure_clusters, + artifacts=ArtifactBundle( + videos=payload.get("videos", []), + logs=payload.get("logs", []), + metrics_json=str(path) if path.suffix == ".json" else None, + source_path=str(path), + ), + ) + + def _infer_failure_clusters(self, task_breakdown: list[TaskBreakdownEntry]) -> list[dict[str, object]]: + if not task_breakdown: + return [] + + sorted_tasks = sorted(task_breakdown, key=lambda task: task.success_rate) + weakest = sorted_tasks[: min(3, len(sorted_tasks))] + total_failure_mass = sum(max(0.0, 1.0 - task.success_rate) for task in weakest) or 1.0 + severity_by_rank = ["critical", "high", "medium"] + fallback_patterns = ["occlusion", "recovery", "generalization"] + + return [ + { + "cluster_id": chr(65 + index), + "label": f"{task.task_name} underperformance", + "failure_pattern": fallback_patterns[index] if index < len(fallback_patterns) else "task_reliability_gap", + "affected_tasks": [task.task_name], + "share_of_failures": round(max(0.0, 1.0 - task.success_rate) / total_failure_mass, 3), + "failure_count": max(1, task.attempts - (task.successes or 0)) if task.attempts is not None else 1, + "severity": severity_by_rank[index] if index < len(severity_by_rank) else "medium", + } + for index, task in enumerate(weakest) + ] \ No newline at end of file diff --git a/nvex_server/llm_narrator.py b/nvex_server/llm_narrator.py new file mode 100644 index 0000000..78dd2d7 --- /dev/null +++ b/nvex_server/llm_narrator.py @@ -0,0 +1,187 @@ +""" +LLM Narrator — Milestone 3D +============================ +Generates natural-language explanations for agent steps. + +- If ``OPENAI_API_KEY`` is set in the environment, uses the OpenAI chat + completions API (gpt-4o-mini by default — fast and cheap). +- Otherwise falls back to deterministic template strings so the demo works + with no external dependencies. +""" + +from __future__ import annotations + +import os +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + from .schemas import EvalRun, FailureCluster + + +_SYSTEM_PROMPT = """\ +You are Nvex, an AI robot-policy improvement platform. +Explain each step of an autonomous improvement loop in 2 sentences max. +Be concrete: mention cluster names, percentages, and strategy names. +Write in present tense and first-person plural ("We identified…"). +""" + +_TEMPLATES = { + "eval": "Triggered evaluation of {checkpoint} on {benchmark}. Waiting for results.", + "diagnose": ( + "Identified dominant failure cluster: {label} ({pct}% of failures). " + "Root cause: {failure_pattern}." + ), + "plan": ( + "Selected {strategy} strategy via {backend}. " + "Targeting {episodes} patch episodes at {ratio}% real data." + ), + "dispatch": ( + "Training job dispatched to AlphaBrain. " + "Config: {episodes} patch episodes, {ratio}% real / {synth}% synthetic." + ), + "verify": "Re-evaluation complete. Success rate: {before}% → {after}%. Uplift: +{delta}pp.", + "memory": "Saved recipe {name} to Platform Memory. Confidence: {conf}.", + "stop_check": "Improvement delta +{delta}pp vs. threshold {threshold}pp. {decision}.", +} + + +class LLMNarrator: + """ + Narrates agent reasoning steps in plain English. + + If ``openai`` is importable and ``OPENAI_API_KEY`` is set, each + ``narrate_*`` call makes a real LLM request. Otherwise templates are + used silently — no exception is raised. + """ + + def __init__(self, model: str = "gpt-4o-mini") -> None: + self._model = model + self._client: Any = None + self._init_openai() + + def _init_openai(self) -> None: + if not os.getenv("OPENAI_API_KEY"): + return + try: + import openai # type: ignore[import-untyped] + + self._client = openai.OpenAI() + except ImportError: + pass + + def _llm(self, user_prompt: str) -> str: + if self._client is None: + return "" + try: + response = self._client.chat.completions.create( + model=self._model, + messages=[ + {"role": "system", "content": _SYSTEM_PROMPT}, + {"role": "user", "content": user_prompt}, + ], + max_tokens=120, + temperature=0.3, + ) + return response.choices[0].message.content.strip() + except Exception: # noqa: BLE001 + return "" + + # ------------------------------------------------------------------ + # Public narration methods + # ------------------------------------------------------------------ + + def narrate_diagnosis(self, cluster: "FailureCluster", eval_run: "EvalRun") -> str: + tmpl = _TEMPLATES["diagnose"].format( + label=cluster.label, + pct=int(cluster.share_of_failures * 100), + failure_pattern=cluster.failure_pattern, + ) + if self._client is None: + return tmpl + prompt = ( + f"The dominant failure cluster is '{cluster.label}' " + f"({int(cluster.share_of_failures * 100)}% of failures). " + f"Failure pattern: {cluster.failure_pattern}. " + f"Affected tasks: {', '.join(cluster.affected_tasks[:3])}. " + "Explain what this means and what to do about it." + ) + llm_text = self._llm(prompt) + return llm_text or tmpl + + def narrate_plan( + self, + strategy: str, + backend: str, + episodes: int, + real_ratio: float, + memory_recipe: str | None, + ) -> str: + tmpl = _TEMPLATES["plan"].format( + strategy=strategy, + backend=backend, + episodes=episodes, + ratio=int(real_ratio * 100), + ) + if self._client is None: + return tmpl + recipe_note = ( + f"A prior recipe '{memory_recipe}' was found in Platform Memory." + if memory_recipe + else "No prior recipe found — agent will experiment." + ) + prompt = ( + f"We selected '{strategy}' via '{backend}' targeting {episodes} patch episodes " + f"({int(real_ratio * 100)}% real data). {recipe_note} " + "Narrate this planning step." + ) + llm_text = self._llm(prompt) + return llm_text or tmpl + + def narrate_verify(self, before: float, after: float, threshold: float) -> str: + delta = round(after - before, 3) + tmpl = _TEMPLATES["verify"].format( + before=int(before * 100), + after=int(after * 100), + delta=int(delta * 100), + ) + if self._client is None: + return tmpl + prompt = ( + f"The policy improved from {int(before * 100)}% to {int(after * 100)}% " + f"(+{int(delta * 100)}pp). The diminishing-returns threshold is {int(threshold * 100)}pp. " + "Narrate this verification result." + ) + llm_text = self._llm(prompt) + return llm_text or tmpl + + def narrate_stop_check( + self, + delta: float, + threshold: float, + current_kpi: float, + target_kpi: float, + ) -> str: + decision = ( + "Target exceeded — agent terminates." + if current_kpi >= target_kpi + else ( + "Diminishing returns detected — agent terminates." + if delta < threshold + else "Improvement sufficient — continuing." + ) + ) + tmpl = _TEMPLATES["stop_check"].format( + delta=int(delta * 100), + threshold=int(threshold * 100), + decision=decision, + ) + if self._client is None: + return tmpl + prompt = ( + f"Current policy KPI: {int(current_kpi * 100)}%. " + f"Target: {int(target_kpi * 100)}%. " + f"Delta this loop: +{int(delta * 100)}pp. Threshold: {int(threshold * 100)}pp. " + f"Decision: {decision} Narrate the stopping check." + ) + llm_text = self._llm(prompt) + return llm_text or tmpl diff --git a/nvex_server/patch_plan_generator.py b/nvex_server/patch_plan_generator.py new file mode 100644 index 0000000..1cc72e9 --- /dev/null +++ b/nvex_server/patch_plan_generator.py @@ -0,0 +1,148 @@ +from __future__ import annotations + +from dataclasses import dataclass +from uuid import uuid4 + +from .schemas import EvalRun, FailureCluster, PatchPlan, SourceRatio, TargetDataSpec + + +@dataclass(frozen=True) +class PatchRule: + keyword: str + training_strategy: str + execution_backend: str + annotation_schema: str + verification_spec: str + patch_episodes: int + teleop_corrections: int + lighting_variants: int = 0 + language_augmentations: int = 0 + source_ratio: tuple[float, float] = (0.7, 0.3) + + +RULES: tuple[PatchRule, ...] = ( + PatchRule( + keyword="occlusion", + training_strategy="continual_learning", + execution_backend="alphabrain_cl", + annotation_schema="occlusion_patch_v1", + verification_spec="occlusion_robustness_eval", + patch_episodes=120, + teleop_corrections=20, + lighting_variants=1, + ), + PatchRule( + keyword="recovery", + training_strategy="fine_tune", + execution_backend="alphabrain_finetune", + annotation_schema="recovery_fine_grained_v1", + verification_spec="recovery_regression_eval", + patch_episodes=80, + teleop_corrections=40, + ), + PatchRule( + keyword="language", + training_strategy="vlm_cotrain", + execution_backend="alphabrain_vlm_cotrain", + annotation_schema="language_variation_v1", + verification_spec="instruction_generalization_eval", + patch_episodes=60, + teleop_corrections=10, + language_augmentations=120, + source_ratio=(0.6, 0.4), + ), + PatchRule( + keyword="lighting", + training_strategy="continual_learning", + execution_backend="alphabrain_cl", + annotation_schema="lighting_shift_v1", + verification_spec="appearance_shift_eval", + patch_episodes=100, + teleop_corrections=15, + lighting_variants=3, + ), + PatchRule( + keyword="long-horizon", + training_strategy="world_model_verification", + execution_backend="alphabrain_world_model", + annotation_schema="long_horizon_debug_v1", + verification_spec="rollout_verification_eval", + patch_episodes=90, + teleop_corrections=15, + ), + PatchRule( + keyword="generalization", + training_strategy="continual_learning", + execution_backend="alphabrain_cl", + annotation_schema="cross_robot_generalization_v1", + verification_spec="cross_robot_generalization_eval", + patch_episodes=140, + teleop_corrections=20, + source_ratio=(0.5, 0.5), + ), +) + + +class PatchPlanGenerator: + def generate(self, eval_run: EvalRun) -> PatchPlan: + dominant_cluster = self._pick_dominant_cluster(eval_run) + matched_rule = self._match_rule(dominant_cluster) + expected_uplift = min(0.2, round(0.04 + dominant_cluster.share_of_failures * 0.18, 3)) + confidence = self._estimate_confidence(dominant_cluster) + + return PatchPlan( + plan_id=f"plan_{uuid4().hex[:10]}", + project_id=eval_run.project_id, + based_on_eval_run=eval_run.run_id, + root_causes=self._root_causes(eval_run), + target_data_spec=TargetDataSpec( + patch_episodes=matched_rule.patch_episodes, + teleop_corrections=matched_rule.teleop_corrections, + lighting_variants=matched_rule.lighting_variants, + language_augmentations=matched_rule.language_augmentations, + ), + annotation_schema=matched_rule.annotation_schema, + source_ratio=SourceRatio(real=matched_rule.source_ratio[0], synthetic=matched_rule.source_ratio[1]), + training_strategy=matched_rule.training_strategy, + execution_backend=matched_rule.execution_backend, + verification_spec=matched_rule.verification_spec, + expected_uplift=expected_uplift, + confidence=confidence, + ) + + def _pick_dominant_cluster(self, eval_run: EvalRun) -> FailureCluster: + if eval_run.failure_clusters: + return max(eval_run.failure_clusters, key=lambda cluster: (cluster.share_of_failures, cluster.failure_count)) + + return FailureCluster( + cluster_id="cluster_fallback", + label="General robustness gap", + failure_pattern="occlusion", + affected_tasks=[task.task_name for task in eval_run.task_breakdown if task.success_rate < eval_run.overall_success], + share_of_failures=max(0.2, round(1.0 - eval_run.overall_success, 3)), + failure_count=max(1, len(eval_run.task_breakdown)), + severity="medium", + ) + + def _match_rule(self, cluster: FailureCluster) -> PatchRule: + searchable_text = f"{cluster.label} {cluster.failure_pattern}".lower() + for rule in RULES: + if rule.keyword in searchable_text: + return rule + + return RULES[0] + + def _root_causes(self, eval_run: EvalRun) -> list[str]: + if eval_run.failure_clusters: + return [cluster.failure_pattern for cluster in eval_run.failure_clusters] + + return ["under-specified failure patterns in imported EvalRun"] + + def _estimate_confidence(self, cluster: FailureCluster) -> float: + severity_bonus = { + "low": 0.05, + "medium": 0.1, + "high": 0.15, + "critical": 0.2, + }[cluster.severity] + return min(0.95, round(0.55 + severity_bonus + cluster.share_of_failures * 0.2, 2)) \ No newline at end of file diff --git a/nvex_server/schemas.py b/nvex_server/schemas.py new file mode 100644 index 0000000..24e8d65 --- /dev/null +++ b/nvex_server/schemas.py @@ -0,0 +1,377 @@ +from __future__ import annotations + +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Literal + +from pydantic import BaseModel, ConfigDict, Field, model_validator + +AgentStepType = Literal["eval", "diagnose", "plan", "dispatch", "verify", "memory", "stop_check"] +AgentStepStatus = Literal["pending", "running", "completed", "failed", "skipped"] +AgentRunStatus = Literal["idle", "running", "completed", "stopped"] +AgentEventType = Literal[ + "run_started", + "iteration_started", + "step_started", + "step_completed", + "iteration_completed", + "rollback", + "run_completed", + "run_stopped", +] + + +def utc_now() -> datetime: + return datetime.now(timezone.utc) + + +ExecutionBackend = Literal[ + "alphabrain_cl", + "alphabrain_finetune", + "alphabrain_eval", + "alphabrain_vlm_cotrain", + "alphabrain_world_model", +] + +JobStatus = Literal["queued", "running", "completed", "failed"] +Severity = Literal["low", "medium", "high", "critical"] +TrainingStrategy = Literal["continual_learning", "fine_tune", "vlm_cotrain", "world_model_verification"] +ArtifactType = Literal["auto", "generic_json", "libero_eval_json", "robocasa365_aggregate", "robocasa_tabletop_stats", "libero_log"] + + +class ArtifactBundle(BaseModel): + model_config = ConfigDict(extra="forbid") + + videos: list[str] = Field(default_factory=list) + logs: list[str] = Field(default_factory=list) + metrics_json: str | None = None + source_path: str | None = None + + +class TaskBreakdownEntry(BaseModel): + model_config = ConfigDict(extra="forbid") + + task_id: str + task_name: str + success_rate: float = Field(ge=0.0, le=1.0) + attempts: int | None = Field(default=None, ge=0) + successes: int | None = Field(default=None, ge=0) + + +class FailureCluster(BaseModel): + model_config = ConfigDict(extra="forbid") + + cluster_id: str + label: str + failure_pattern: str + affected_tasks: list[str] = Field(default_factory=list) + share_of_failures: float = Field(ge=0.0, le=1.0) + failure_count: int = Field(ge=0) + severity: Severity = "medium" + + +class EvalRun(BaseModel): + model_config = ConfigDict(extra="forbid") + + run_id: str + project_id: str + benchmark_suite: str + checkpoint: str | None = None + overall_success: float = Field(ge=0.0, le=1.0) + task_breakdown: list[TaskBreakdownEntry] = Field(default_factory=list) + failure_clusters: list[FailureCluster] = Field(default_factory=list) + artifacts: ArtifactBundle = Field(default_factory=ArtifactBundle) + created_at: datetime = Field(default_factory=utc_now) + + +class TargetDataSpec(BaseModel): + model_config = ConfigDict(extra="forbid") + + patch_episodes: int = Field(ge=0) + teleop_corrections: int = Field(ge=0) + lighting_variants: int = Field(default=0, ge=0) + language_augmentations: int = Field(default=0, ge=0) + + +class SourceRatio(BaseModel): + model_config = ConfigDict(extra="forbid") + + real: float = Field(ge=0.0, le=1.0) + synthetic: float = Field(ge=0.0, le=1.0) + + @model_validator(mode="after") + def ensure_total_is_one(self) -> "SourceRatio": + total = round(self.real + self.synthetic, 6) + if total != 1.0: + raise ValueError("source_ratio.real + source_ratio.synthetic must equal 1.0") + return self + + +class PatchPlan(BaseModel): + model_config = ConfigDict(extra="forbid") + + plan_id: str + project_id: str + based_on_eval_run: str + root_causes: list[str] = Field(default_factory=list) + target_data_spec: TargetDataSpec + annotation_schema: str + source_ratio: SourceRatio + training_strategy: TrainingStrategy + execution_backend: ExecutionBackend + verification_spec: str + expected_uplift: float = Field(ge=0.0, le=1.0) + confidence: float = Field(ge=0.0, le=1.0) + created_at: datetime = Field(default_factory=utc_now) + + +class IterationResultSummary(BaseModel): + model_config = ConfigDict(extra="forbid") + + success_before: float = Field(ge=0.0, le=1.0) + success_after: float = Field(ge=0.0, le=1.0) + + +class IterationArtifacts(BaseModel): + model_config = ConfigDict(extra="forbid") + + logs: list[str] = Field(default_factory=list) + videos: list[str] = Field(default_factory=list) + eval_runs: list[str] = Field(default_factory=list) + metadata_path: str | None = None + + +class IterationJob(BaseModel): + model_config = ConfigDict(extra="forbid") + + iteration_id: str + project_id: str + plan_id: str + based_on_checkpoint: str + status: JobStatus + execution_backend: ExecutionBackend + config: dict[str, Any] = Field(default_factory=dict) + command: str | None = None + log_path: str | None = None + pid: int | None = None + exit_code: int | None = None + output_checkpoint: str | None = None + after_eval_run_id: str | None = None + result_summary: IterationResultSummary | None = None + artifacts: IterationArtifacts = Field(default_factory=IterationArtifacts) + created_at: datetime = Field(default_factory=utc_now) + updated_at: datetime = Field(default_factory=utc_now) + + +class ReusableAsset(BaseModel): + model_config = ConfigDict(extra="forbid") + + asset_id: str + type: Literal["recipe", "template", "failure_pattern", "verification_setup"] + name: str + source_project: str + reuse_count: int = Field(default=0, ge=0) + linked_iteration: str + description: str + + +class ImprovementReport(BaseModel): + model_config = ConfigDict(extra="forbid") + + iteration_id: str + plan_id: str + project_id: str + success_before: float = Field(ge=0.0, le=1.0) + success_after: float = Field(ge=0.0, le=1.0) + uplift: float = Field(ge=0.0, le=1.0) + summary: str + changes: list[str] = Field(default_factory=list) + next_target: str | None = None + assets_created: list[ReusableAsset] = Field(default_factory=list) + + +class ProjectContext(BaseModel): + model_config = ConfigDict(extra="forbid") + + name: str + checkpoint: str + domain: str + suite: str + status: str + status_note: str + top_risk: str + next_action: str + + +class PlatformMemoryStats(BaseModel): + model_config = ConfigDict(extra="forbid") + + recipes: int = Field(ge=0) + templates: int = Field(ge=0) + patterns: int = Field(ge=0) + projects: int = Field(ge=0) + + +class PlatformMemorySnapshot(BaseModel): + model_config = ConfigDict(extra="forbid") + + recipes: list[str] = Field(default_factory=list) + templates: list[str] = Field(default_factory=list) + failures: list[str] = Field(default_factory=list) + stats: PlatformMemoryStats + + +class EvalImportRequest(BaseModel): + model_config = ConfigDict(extra="forbid") + + eval_run: EvalRun | None = None + artifact_path: str | None = None + artifact_type: ArtifactType = "auto" + project_id: str | None = None + benchmark_suite: str | None = None + checkpoint: str | None = None + run_id: str | None = None + + @model_validator(mode="after") + def validate_input(self) -> "EvalImportRequest": + if self.eval_run is None and self.artifact_path is None: + raise ValueError("either eval_run or artifact_path must be provided") + if self.artifact_path is not None and not Path(self.artifact_path).exists(): + raise ValueError(f"artifact_path does not exist: {self.artifact_path}") + return self + + +class PlanGenerationRequest(BaseModel): + model_config = ConfigDict(extra="forbid") + + eval_run_id: str | None = None + eval_run: EvalRun | None = None + + @model_validator(mode="after") + def validate_input(self) -> "PlanGenerationRequest": + if not self.eval_run_id and self.eval_run is None: + raise ValueError("either eval_run_id or eval_run must be provided") + return self + + +class IterationStartRequest(BaseModel): + model_config = ConfigDict(extra="forbid") + + plan_id: str + checkpoint: str + execution_backend: ExecutionBackend | None = None + config: dict[str, Any] = Field(default_factory=dict) + + +class DemoStateResponse(BaseModel): + model_config = ConfigDict(extra="forbid") + + project: ProjectContext + current_eval_run: EvalRun + patch_plan: PatchPlan + iteration_job: IterationJob + report: ImprovementReport + platform_memory: PlatformMemorySnapshot + + +# --------------------------------------------------------------------------- +# Milestone 3 — Self-Improving Agent schemas +# --------------------------------------------------------------------------- + + +class FailureDiagnosis(BaseModel): + """Structured output of the FailureDiagnoser tool.""" + + model_config = ConfigDict(extra="forbid") + + primary_cluster_id: str + primary_cluster_label: str + root_causes: list[str] = Field(default_factory=list) + recommended_strategy: TrainingStrategy + recommended_backend: ExecutionBackend + reasoning: str + confidence: float = Field(ge=0.0, le=1.0) + + +class AgentStep(BaseModel): + """A single step in the autonomous improvement loop.""" + + model_config = ConfigDict(extra="forbid") + + step_id: str + step_type: AgentStepType + status: AgentStepStatus = "pending" + label: str + message: str = "" + inputs: dict[str, Any] = Field(default_factory=dict) + outputs: dict[str, Any] = Field(default_factory=dict) + expected_duration_ms: int | None = Field(default=None, ge=0) + started_at: datetime | None = None + completed_at: datetime | None = None + + +class AgentEvent(BaseModel): + """Streaming timeline event emitted as the agent progresses.""" + + model_config = ConfigDict(extra="forbid") + + event_id: str + event_type: AgentEventType + iteration_index: int | None = None + step_id: str | None = None + step_type: AgentStepType | None = None + label: str + message: str = "" + duration_ms: int | None = Field(default=None, ge=0) + metadata: dict[str, Any] = Field(default_factory=dict) + occurred_at: datetime = Field(default_factory=utc_now) + + +class LoopIteration(BaseModel): + """One complete pass of the eval → diagnose → plan → train → verify cycle.""" + + model_config = ConfigDict(extra="forbid") + + iteration_index: int = Field(ge=1) + patch_strategy: str + patch_cluster: str + eval_before: float = Field(ge=0.0, le=1.0) + eval_after: float | None = None + delta: float | None = None + rolled_back: bool = False + rollback_reason: str | None = None + steps: list[AgentStep] = Field(default_factory=list) + status: Literal["pending", "running", "completed", "failed"] = "pending" + + +class AgentRunState(BaseModel): + """Top-level state object for a SelfImprovementAgent run.""" + + model_config = ConfigDict(extra="forbid") + + agent_run_id: str + project_id: str + target_kpi: float = Field(ge=0.0, le=1.0) + max_iterations: int = Field(ge=1) + diminishing_returns_threshold: float = Field(ge=0.0, le=1.0) + current_iteration: int = Field(ge=0) + status: AgentRunStatus = "idle" + stop_reason: str | None = None + iterations: list[LoopIteration] = Field(default_factory=list) + events: list[AgentEvent] = Field(default_factory=list) + reasoning_log: list[str] = Field(default_factory=list) + created_at: datetime = Field(default_factory=utc_now) + updated_at: datetime = Field(default_factory=utc_now) + + +class AgentRunRequest(BaseModel): + """Request to launch the self-improvement agent.""" + + model_config = ConfigDict(extra="forbid") + + project_id: str + checkpoint: str + target_kpi: float = Field(default=0.75, ge=0.0, le=1.0) + max_iterations: int = Field(default=3, ge=1, le=10) + diminishing_returns_threshold: float = Field(default=0.02, ge=0.0, le=1.0) + simulate: bool = True \ No newline at end of file diff --git a/prd.md b/prd.md new file mode 100644 index 0000000..861be76 --- /dev/null +++ b/prd.md @@ -0,0 +1,1079 @@ +# Nvex Demo PRD》 + +## 一句话定位 + +**Nvex 是面向 Physical AI post-training 的 agent-in-the-loop orchestration layer:它以模型失败为起点,驱动 eval → gap analysis → data targeting → post-training → re-eval 的 intelligence loop;AlphaBrain 作为底层 execution/runtime layer,负责训练、评测与环境执行。** + +--- + +## 1. 产品概述 + +### 1.1 产品定义 + +Nvex Demo 是一个面向投资人、同时可供产品/设计/工程团队执行的演示型产品。它的核心目标不是展示“又一个模型训练框架”,也不是展示“一个更高效的标注平台”,而是证明一个更高层的产品命题: + +> **当一个 Physical AI policy 失败时,Nvex 可以识别失败模式、定位能力缺口、生成 targeted patch plan,并调度底层执行系统完成一次可验证的 checkpoint 改进。** + +在系统分层上: + +- **Nvex**:负责理解失败、制定修复策略、编排实验流程、沉淀可复用资产。 +- **AlphaBrain**:负责执行 baseline VLA、continual learning、world model eval、benchmark/eval artifact 生成等底层能力。 + +### 1.2 Demo 要证明的核心价值 + +本次 demo 需要让投资人和内部团队在 3–5 分钟内理解三件事: + +1. **Nvex 不是训练框架**它不与底层 VLA / RL / world model runtime 竞争,而是站在更高一层,定义“下一轮该做什么”。 +2. **Nvex 不是普通标注工具**它不是以 annotation task 为起点,而是以 failure diagnosis 为起点。 +3. **Nvex 的价值在 intelligence loop** + 它把 `eval → gap analysis → data targeting → post-training → re-eval` 串成一个闭环,并在每轮迭代中沉淀平台资产。 + +### 1.3 Demo 范围 + +本 PRD 面向的不是完整商用平台,而是一个 **investor-demo oriented MVP**。重点是: + +- 有明确故事线 +- 有可点击、可讲解的前端体验 +- 至少有一条真实或半真实的执行闭环 +- 能产出 before/after 改进结果 +- 能体现“每个项目会沉淀为平台资产”的平台逻辑 + +--- + +## 2. 背景与机会 + +### 2.1 行业背景 + +Physical AI 正从研究演示走向落地部署,但其核心瓶颈并不只是“模型不够强”,而是模型迭代系统不够成熟。当前多数团队在出现失败后,通常依赖以下低效方式: + +- 盲目追加更多数据 +- 依赖人工经验判断问题来源 +- 训练、评测、数据采集、环境验证彼此割裂 +- 缺乏统一的 failure-to-fix 闭环 + +### 2.2 产品机会 + +在 Physical AI 场景中,真正有价值的不是一次性训练,而是持续提升模型部署 readiness 的能力。这个过程要求系统同时具备: + +- **评测能力**:知道模型在哪些任务、场景、模态上失败 +- **归因能力**:知道为什么失败 +- **策略能力**:知道该补什么数据、怎么训练、如何验证 +- **执行能力**:能快速触发一次小步迭代 +- **记忆能力**:能把经验沉淀为 reusable recipe/template/ontology + +这就是 Nvex 的产品机会:成为 **Physical AI 的 post-training intelligence layer**。 + +### 2.3 为什么采用 AlphaBrain 作为底座 + +AlphaBrain 适合作为 demo 的 execution/runtime layer,原因是它已经具备本项目最关键的底层能力: + +- **Baseline VLA train/eval** +- **Continual learning** +- **World model eval** +- **Benchmark/eval artifacts 输出** +- **LIBERO 等 benchmark 支撑** + +因此,Nvex 不需要从零搭建训练与评测引擎,而可以将重点放在: + +- failure analysis +- patch planning +- experiment orchestration +- result presentation +- platform memory + +这使得 demo 可以在较短周期内具备“可演示 + 可执行”的说服力。 + +--- + +## 3. 产品目标 + +## 3.1 核心目标 + +本次 Nvex Demo 的核心目标有三项: + +### 目标 A:证明 Nvex 的产品定义 + +证明 Nvex 是一个 orchestration layer,而不是: + +- 又一个训练框架 +- 又一个 dashboard +- 又一个标注平台 +- 又一个 LLM 助手壳子 + +### 目标 B:跑通一条 Failure-to-Fix 闭环 + +在一个选定的 benchmark/use case 上,完成以下最小链路: + +1. 导入现有 checkpoint / eval run +2. 展示 failure diagnosis +3. 生成 patch plan +4. 调度底层 AlphaBrain 执行一轮迭代 +5. 展示 before/after improvement + +### 目标 C:建立平台化想象力 + +让投资人和潜在客户看到: + +- 每轮迭代不只是输出一个新 checkpoint +- 还会沉淀 recipe、template、failure pattern、verification setup 等可复用资产 +- 因此产品具备 compound platform 属性 + +## 3.2 商业目标 + +### 商业目标 A:支持融资叙事 + +帮助投资人快速理解公司从“数据交付/标注服务”走向“intelligence infrastructure”的路径。 + +### 商业目标 B:支持客户沟通 + +向潜在客户证明: + +- Nvex 能帮助团队减少 blind data collection +- Nvex 能提高 post-training 决策效率 +- Nvex 能将 failure diagnosis 与环境验证统一到一个系统中 + +### 商业目标 C:支持内部定位统一 + +为产品、设计、工程团队提供统一北极星,避免 demo 被做成“满屏图表但没有决策能力”的管理后台。 + +## 3.3 非目标 + +本期 demo **不追求**以下内容: + +- 覆盖所有机器人类型与行业场景 +- 完整实现数据采集与标注工作台 +- 支持全量生产级多租户与权限体系 +- 实现所有训练路径的自动编排 +- 构建完整 MLOps 平台 +- 用真实在线训练替代一切预生成结果 + +本期重点是 **讲清产品价值并跑通关键闭环**,而不是追求平台功能完备。 + +--- + +## 4. 目标用户 + +## 4.1 主要用户:投资人 / 董事会观众 + +关注点: + +- 这是不是一个平台级产品 +- 它的 moat 在哪里 +- 为什么不是一个服务业务包装 +- 为什么每一轮项目都会强化平台资产 +- 为什么这层 orchestration 值得独立存在 + +他们需要看到的是 **产品逻辑与平台逻辑**,不是工程参数。 + +## 4.2 次要用户:潜在客户中的 AI / Robotics 负责人 + +关注点: + +- 当前失败的 policy 进入 Nvex 后会发生什么 +- Nvex 如何帮助团队判断下一轮该补什么数据 +- Nvex 如何决定使用 CL、SFT、env verification 等路径 +- Nvex 是否能提升试错效率 + +他们需要看到的是 **从失败到 checkpoint improvement 的实际价值**。 + +## 4.3 内部用户:产品 / 设计 / 工程 / 解决方案团队 + +关注点: + +- Demo 的最小范围是什么 +- 前后端各自承担什么职责 +- 哪些数据必须真实接入,哪些可以先 mock +- 页面结构与叙事顺序如何安排 +- 后续如何从 demo 进化成正式产品 + +--- + +## 5. 核心 Narrative / Demo 故事线 + +## 5.1 主故事线:Failure-to-Fix + +推荐以一条清晰的失败修复链路作为主叙事: + +> 一个在 benchmark 中表现不佳的 Physical AI policy 被导入 Nvex。Nvex 不从“多收一点数据”开始,而是先进行 failure diagnosis。它识别出问题主要集中在某几类场景中,然后生成针对性的 patch plan,决定该补哪些数据、采用什么训练策略、在哪个环境中进行验证。随后,Nvex 调用 AlphaBrain 完成一次增量迭代,并展示新 checkpoint 的 measurable uplift,以及本次迭代沉淀出的 reusable assets。 + +## 5.2 关键传达信息 + +Demo 需要自然传达以下观点: + +1. **Eval 是起点,不是终点** +2. **模型失败不是“更多数据”问题,而是“正确的数据与正确的验证”问题** +3. **Nvex 的核心不是分析,而是分析后的决策与编排** +4. **AlphaBrain 负责执行,Nvex 负责决定** +5. **每轮项目会增强平台资产,而不是只完成一次性交付** + +## 5.3 建议演示场景 + +优先选择一个小而典型的 benchmark/use case,例如: + +- LIBERO 中的 manipulation task +- 某类受 occlusion、camera shift 或 recovery behavior 影响明显的任务 + +原因: + +- 可复用 AlphaBrain 现有 benchmark/eval 路径 +- 易于构造前后对比 +- 容易映射到“failure cluster → patch plan → improvement”的闭环 + +--- + +## 6. 核心用户流程 + +## 6.1 Flow A:Project Intake + +### 目标 + +建立项目上下文,让系统知道当前任务、模型和已有评测结果。 + +### 输入 + +- Project 名称 +- Use case / domain +- Robot / environment 类型 +- 当前 checkpoint +- benchmark/eval 结果 +- 可用数据来源 +- 期望 KPI + +### 输出 + +- Project summary +- 初步风险提示 +- 推荐 eval/verification 路径 +- 当前模型状态快照 + +## 6.2 Flow B:Failure Diagnosis + +### 目标 + +将原始 benchmark 结果转化为可解释的 failure intelligence。 + +### 输入 + +- Eval artifact +- Task/scene breakdown +- 视频 / rollout 样本 +- 历史 run 结果 + +### 输出 + +- overall success 概览 +- failure cluster 列表 +- root-cause hypothesis +- top gaps to fix + +## 6.3 Flow C:Patch Plan Generation + +### 目标 + +把诊断结果转化为一份结构化修复计划。 + +### 输入 + +- failure diagnosis +- 可用数据源 +- 底层可执行能力 +- 历史 recipe/template + +### 输出 + +- data targeting spec +- data recipe +- training strategy +- verification setup +- expected uplift +- confidence + +## 6.4 Flow D:Iteration Execution + +### 目标 + +调度底层执行引擎完成一次迭代。 + +### 输入 + +- patch plan +- chosen execution path +- job config + +### 输出 + +- training/eval job 状态 +- checkpoint artifact +- before/after report +- iteration trace + +## 6.5 Flow E:Improvement Review & Memory + +### 目标 + +证明这不是一次性修复,而是会沉淀平台资产的闭环。 + +### 输入 + +- iteration result +- previous baseline +- generated artifacts + +### 输出 + +- KPI uplift summary +- reduced failure modes +- reusable recipe/template +- next recommended iteration + +--- + +## 7. 功能需求 + +## 7.1 模块一:Project Intake + +### 模块目标 + +作为 demo 的入口页,负责定义当前项目、场景和目标,不让用户陷入底层参数细节。 + +### MVP 范围 + +- 创建/选择 demo project +- 载入 checkpoint 与 eval result +- 展示项目摘要和推荐下一步动作 + +### 核心输入 + +- use case +- domain +- robot/environment +- current checkpoint +- eval result +- target KPI + +### 核心输出 + +- 项目摘要 +- 当前 success snapshot +- top risk summary +- recommended next action + +### 页面职责 + +- 帮助观众快速进入上下文 +- 让用户理解 Nvex 是“从失败开始”的系统 +- 为后续 Failure Map 提供入口 + +### 非 MVP + +- 全量项目管理 +- 多租户权限 +- 复杂配置编辑器 + +--- + +## 7.2 模块二:Failure Map + +### 模块目标 + +将复杂的评测结果压缩成投资人和客户都能快速理解的 failure intelligence dashboard。 + +### MVP 范围 + +- overall success 展示 +- task / scene / modality breakdown +- failure cluster 可视化 +- representative episode / video +- root-cause hypothesis + +### 输入 + +- benchmark result +- eval logs +- rollout artifacts +- checkpoint metadata + +### 输出 + +- top failure modes +- top root-cause hypotheses +- prioritized gaps + +### 页面职责 + +- 让观众看到“模型具体在哪里失败” +- 从“结果差”上升到“知道为什么差” +- 为 Patch Plan 提供依据 + +### 关键要求 + +- 页面组织应围绕 failure,而不是原始日志 +- 必须给出结构化 root-cause,而非仅展示 charts +- 如果可能,应展示 before episode 片段 + +--- + +## 7.3 模块三:Patch Plan + +### 模块目标 + +从 failure diagnosis 直接生成“下一步该怎么做”的结构化行动方案。 + +### MVP 范围 + +- gap summary +- target data recommendation +- training strategy recommendation +- verification setup +- expected uplift + +### 输入 + +- failure clusters +- historical recipes/templates +- available execution paths +- user constraints(时间/资源) + +### 输出 + +- PatchPlan 对象 +- action cards +- run recommendation + +### 结构化输出字段建议 + +- `root_causes` +- `target_data_spec` +- `annotation_schema` +- `source_ratio` +- `training_strategy` +- `verification_spec` +- `expected_uplift` +- `confidence` + +### 页面职责 + +- 把 Nvex 从 dashboard 提升为 orchestration layer +- 让投资人看到系统具备“下一步决策能力” +- 让工程和解决方案团队明确下游执行输入 + +### 非 MVP + +- 全自动生成真实数据采集任务 +- 动态预算优化 +- 多轮 agent self-play planning + +--- + +## 7.4 模块四:Iteration Runner + +### 模块目标 + +承接 Patch Plan,调用底层 AlphaBrain 执行路径,完成一次可感知的迭代。 + +### MVP 范围 + +优先支持: + +- baseline fine-tune / eval +- continual learning update +- re-eval +- artifact 汇总 + +可选增强: + +- world model predicted-vs-rollout 对比 + +### 输入 + +- plan_id +- execution config +- selected checkpoint +- chosen backend path + +### 输出 + +- job status +- run timeline +- output checkpoint +- evaluation result +- artifacts + +### 页面职责 + +- 让用户感知“系统真的在跑” +- 把底层复杂执行过程抽象成清晰阶段 +- 为 Improvement Report 提供数据 + +### 与 AlphaBrain 的关系 + +本模块直接依赖 AlphaBrain 提供的: + +- baseline VLA train/eval +- continual learning +- world model eval +- benchmark artifacts + +Nvex 不重复实现这些 runtime,而是以 job 编排与结果消费为主。 + +--- + +## 7.5 模块五:Improvement Report + +### 模块目标 + +将一次迭代的结果转化为对投资人、客户和内部团队都有价值的结果页面。 + +### MVP 范围 + +- before vs after +- KPI uplift +- failure reduction summary +- representative comparison +- assets created +- next recommendation + +### 输入 + +- baseline eval +- new eval +- generated artifacts +- patch plan metadata + +### 输出 + +- report summary +- uplift numbers +- reduced cluster list +- next-step suggestion + +### 页面职责 + +- 完成“修复闭环”的最后一环 +- 明确展示 Nvex 带来的 measurable improvement +- 连接到 Platform Memory,证明平台化沉淀 + +### 关键要求 + +- 必须用变化来组织信息,而不是静态展示新结果 +- 需要同时展示结果改善和资产沉淀 + +--- + +## 7.6 模块六:Platform Memory + +### 模块目标 + +展示每轮项目如何沉淀为平台能力,形成 compound platform 逻辑。 + +### MVP 范围 + +- recipe 列表 +- template 列表 +- failure ontology +- reuse summary + +### 输入 + +- previous projects +- patch plans +- iteration results +- manually curated patterns + +### 输出 + +- reusable assets +- reuse count +- project memory summary + +### 页面职责 + +- 证明 Nvex 不是“一次性项目交付系统” +- 让投资人看到软件平台的复利效应 +- 帮助内部团队建立知识库结构 + +### 关键要求 + +即便第一版内容较轻,也必须保留该页面或模块,因为它直接承载平台估值逻辑。 + +--- + +## 8. 建议的数据模型 + +以下数据模型以“足够支撑 demo + 可扩展到后续产品化”为原则设计。 + +## 8.1 Project + +```json +{ + "id": "proj_001", + "name": "LIBERO Kitchen Demo", + "use_case": "tabletop manipulation", + "domain": "robotics", + "robot_type": "sim manipulator", + "environment": "LIBERO", + "current_checkpoint": "ckpt_v0.7", + "status": "underperforming", + "target_kpi": { + "success_rate": 0.75 + } +} +``` + +### 说明 + +Project 是所有数据组织的根对象,承载项目上下文与业务目标。 + +--- + +## 8.2 EvalRun + +```json +{ + "run_id": "eval_101", + "project_id": "proj_001", + "benchmark_suite": "LIBERO_goal", + "overall_success": 0.62, + "task_breakdown": [], + "failure_clusters": [], + "artifacts": { + "videos": [], + "logs": [], + "metrics_json": "" + }, + "created_at": "2026-04-26T10:00:00Z" +} +``` + +### 说明 + +EvalRun 代表一次评测结果,是 Failure Map 的核心输入。 + +--- + +## 8.3 PatchPlan + +```json +{ + "plan_id": "plan_201", + "project_id": "proj_001", + "based_on_eval_run": "eval_101", + "root_causes": [ + "occlusion-heavy scenes underrepresented", + "recovery trajectories missing" + ], + "target_data_spec": { + "patch_episodes": 120, + "teleop_corrections": 40, + "lighting_variants": 1 + }, + "annotation_schema": "recovery_fine_grained_v1", + "source_ratio": { + "real": 0.7, + "synthetic": 0.3 + }, + "training_strategy": "continual_learning", + "verification_spec": "robustness_subset_eval", + "expected_uplift": 0.10, + "confidence": 0.73 +} +``` + +### 说明 + +PatchPlan 是 Nvex 的核心结构化输出,也是 orchestration 的核心输入。 + +--- + +## 8.4 IterationRun + +```json +{ + "iteration_id": "iter_301", + "project_id": "proj_001", + "plan_id": "plan_201", + "based_on_checkpoint": "ckpt_v0.7", + "status": "completed", + "execution_backend": "AlphaBrain_CL", + "output_checkpoint": "ckpt_v0.8", + "result_summary": { + "success_before": 0.62, + "success_after": 0.74 + }, + "artifacts": { + "logs": [], + "videos": [], + "eval_runs": ["eval_101", "eval_102"] + } +} +``` + +### 说明 + +IterationRun 连接 plan、execution 与结果,是 Improvement Report 的核心对象。 + +--- + +## 8.5 ReusableAsset + +```json +{ + "asset_id": "asset_401", + "type": "recipe", + "name": "occlusion_recovery_v1", + "source_project": "proj_001", + "reuse_count": 0, + "linked_iteration": "iter_301", + "description": "Patch recipe for cluttered occlusion recovery tasks" +} +``` + +### 说明 + +ReusableAsset 用于表达“平台记忆”,建议至少支持以下类型: + +- recipe +- template +- failure_pattern +- verification_setup + +--- + +## 9. 技术架构建议 + +整体采用三层架构: + +## 9.1 Execution Layer + +### 角色 + +由 AlphaBrain 承担模型训练、评测与环境执行能力。 + +### 典型能力 + +- baseline VLA train/eval +- continual learning +- world model eval +- benchmark suite execution +- eval artifacts generation + +### 设计原则 + +- 不在 Nvex 中重复实现训练 runtime +- 通过标准化 job 接口消费 AlphaBrain 能力 +- 尽可能将结果转为统一 artifact schema + +### 本期 MVP 接入优先级 + +1. baseline VLA eval +2. continual learning +3. re-eval +4. world model artifact(可选增强) + +--- + +## 9.2 Orchestration Layer + +### 角色 + +由 Nvex 自身承担,负责 intelligence loop 的核心价值。 + +### 典型能力 + +- failure analysis +- gap diagnosis +- patch plan generation +- experiment orchestration +- memory accumulation + +### 设计原则 + +- 所有高层决策都应结构化输出 +- 支持规则驱动 + 手工配置 + 后续 agent 增强 +- 优先建设可解释性,而不是追求“黑盒自动化” + +### 本期 MVP 重点 + +- Failure Map 生成 +- Patch Plan 生成 +- Iteration 触发与跟踪 +- Asset 沉淀 + +--- + +## 9.3 Presentation Layer + +### 角色 + +负责向投资人、客户和内部团队呈现系统价值。 + +### 典型能力 + +- project overview +- failure dashboard +- action plan cards +- iteration timeline +- before/after report +- platform memory view + +### 设计原则 + +- 页面组织围绕 failure-to-fix,而不是围绕底层系统结构 +- 强调“下一步行动”和“结果变化” +- 避免把界面做成通用后台或参数面板 + +--- + +## 10. 里程碑规划 + +## 10.1 Milestone 1:Narrative MVP + +### 目标 + +先把产品故事线讲清楚,即便部分数据和执行结果为预生成。 + +### 范围 + +- Project Overview +- Failure Map +- Patch Plan +- Improvement Report +- 基础 Platform Memory + +### 要求 + +- 页面可点击演示 +- 故事线完整 +- 数据结构基本成型 +- 可用于内部评审和融资彩排 + +### 不要求 + +- 实时训练 +- 完整后端编排 +- 全量真实 artifact 接入 + +--- + +## 10.2 Milestone 2:Executable MVP + +### 目标 + +接入至少一条真实的 AlphaBrain 执行闭环。 + +### 范围 + +- 导入真实 eval result +- 生成 Patch Plan +- 触发一次真实 continual learning 或 fine-tune run +- 产出真实 before/after 评测结果 + +### 要求 + +- 至少 1 个真实 use case +- 至少 1 次真实 improvement +- artifact 可回放 +- Iteration Runner 可展示状态变化 + +--- + +## 10.3 Milestone 3:Investor-grade Demo + +### 目标 + +形成可用于正式投资人会面的稳定版本。 + +### 范围 + +- 视觉强化 +- 代表性视频/rollout +- 更清晰的平台资产展示 +- 更完整的口播路径 +- 可选 world model predicted-vs-rollout 对比 + +### 要求 + +- 3–5 分钟可讲清 +- 支持现场操作与录屏演示 +- 不依赖现场训练成功 +- 有稳定的备份数据与预生成结果 + +--- + +## 11. 成功指标 + +成功指标同时分为 **demo 理解度指标** 与 **产品执行指标**。 + +## 11.1 Demo 理解度指标 + +### 指标 A:定位理解度 + +在内部/投资人试讲后,受众是否能准确复述以下内容: + +- Nvex 不是训练框架 +- Nvex 不是标注工具 +- Nvex 的核心价值是闭环 orchestration + +**目标值**:80% 以上受众可在 1 分钟内准确复述 + +### 指标 B:故事线理解度 + +受众是否能清晰说出 demo 的主流程: + +`failure → diagnosis → patch plan → execution → improvement → platform memory` + +**目标值**:80% 以上受众可复述 4 个以上步骤 + +### 指标 C:平台化认知 + +受众是否能理解“每轮项目都在积累平台资产,而不仅是交付一个项目结果”。 + +**目标值**:多数受众在反馈中自发提及“compound / reuse / platform memory”等概念 + +--- + +## 11.2 产品执行指标 + +### 指标 D:页面闭环完整度 + +首页到 Improvement Report 的 关键路径可无中断演示。 + +**目标值**:100% 覆盖以下页面 +Project Intake → Failure Map → Patch Plan → Iteration Runner → Improvement Report + +### 指标 E:真实数据接入度 + +至少一条主流程中的核心指标来自真实 AlphaBrain artifact。 + +**目标值**: + +- Eval artifacts:真实 +- Before/after KPI:真实或半真实 +- Plan recommendation:可规则生成 + +### 指标 F:执行闭环时长 + +从 plan 触发到结果展示的时间应可控。 + +**目标值**: + +- 现场演示模式:< 30 秒(通过预生成结果) +- 可执行模式:< 1 小时 完成一次小规模 run + +### 指标 G:模块可扩展性 + +数据模型和前端结构能够支持后续加入更多项目类型和 execution path。 + +**目标值**: + +- 至少支持 1 个 benchmark 场景 +- 至少支持 2 种 training/eval path 扩展可能性 + +--- + +## 12. 风险与缓解策略 + +## 12.1 风险:被误解为“开源套壳 + UI” + +### 表现 + +投资人可能认为 Nvex 只是把 AlphaBrain 外面包了一层前端。 + +### 缓解 + +- 强调 Nvex 新增的是 orchestration logic,不是 execution runtime +- 在产品结构中单独展示 Patch Plan、Failure Ontology、Platform Memory +- 突出“结构化决策对象”,而不是只显示日志与图表 + +--- + +## 12.2 风险:被误解为“普通 dashboard” + +### 表现 + +如果页面只展示 metrics,没有明确推荐动作,就会被认为只是分析面板。 + +### 缓解 + +- Failure Map 页面必须连接到 Patch Plan +- 每个主要页面都应有“下一步动作” +- Patch Plan 必须结构化,不能只是一段 LLM 文本 + +--- + +## 12.3 风险:执行链路不稳定 + +### 表现 + +现场 demo 依赖实时训练/评测可能失败或耗时过长。 + +### 缓解 + +- 准备预生成结果与真实 artifact 回放 +- 现场采用“可执行模拟 + 结果回放”模式 +- 将实时执行作为附加能力而非唯一依赖 + +--- + +## 12.4 风险:故事过大、实现过散 + +### 表现 + +团队同时尝试做太多模块,导致主闭环无法打通。 + +### 缓解 + +- 严格锁定主叙事为 Failure-to-Fix +- 第一版只做 1 个项目、1 条执行路径 +- world model、复杂 memory、全流程数据工作台作为增强项后置 + +--- + +## 12.5 风险:缺乏结果说服力 + +### 表现 + +如果没有 before/after measurable improvement,demo 会被认为停留在概念层。 + +### 缓解 + +- 至少准备一个真实 improvement case +- 使用真实 benchmark artifacts +- Improvement Report 明确展示 success uplift 和 failure shrinkage + +--- + +## 13. Open Questions + +以下问题需要产品、设计、工程在进入开发前进一步确认: + +1. **首个 demo 用哪个 benchmark/use case 最稳妥?**是否统一采用 LIBERO 单任务场景作为首个展示对象。 +2. **Patch Plan 的第一版推荐逻辑采用什么方式?**规则库、模板库、LLM 辅助还是混合式。 +3. **Iteration Runner 的现场展示策略是什么?**实时跑一部分,还是完全采用预生成结果回放。 +4. **Platform Memory 的第一版是否需要真实 reuse 数据?**还是先用 curated assets 展示结构与价值。 +5. **前端是否需要区分 investor mode 与 operator mode?**投资人模式偏 narrative,内部模式偏细节。 +6. **AlphaBrain 接入方式如何定义?**脚本调用、服务封装还是异步 job queue。 +7. **是否需要在第一版支持 world model 视频对比?**若加入,是否值得增加 setup 复杂度。 +8. **哪些指标必须真实,哪些可以先半真实?**需要对现场可讲解性与工程复杂度做平衡。 +9. **Improvement Report 的“expected uplift”与“actual uplift”如何区分展示?**防止让观众误解模型推荐与真实结果的边界。 +10. **后续产品化时,Nvex 与现有 data workbench 的关系如何组织?** + 是作为上层 orchestrator,还是作为核心智能入口。 + +--- + +## 14. 附录:相关来源链接 + +### AlphaBrain + +- https://github.com/AlphaBrainGroup/AlphaBrain +- https://alphabraingroup.github.io/AlphaBrain/ +- https://alphabraingroup.github.io/AlphaBrain/quickstart/baselineVLA/ +- https://alphabraingroup.github.io/AlphaBrain/quickstart/continual_learning/ +- https://alphabraingroup.github.io/AlphaBrain/quickstart/world_model/ + +### Deck 相关页面 + +- https://www.genspark.ai/api/files/s/ftlQpvqL +- https://www.genspark.ai/api/files/s/ayeSDvnK +- https://www.genspark.ai/api/files/s/8x8NPBQ1 +- https://www.genspark.ai/api/files/s/KJkc8OQx diff --git a/requirements.txt b/requirements.txt index 3f6fc92..31ffbe2 100644 --- a/requirements.txt +++ b/requirements.txt @@ -16,6 +16,7 @@ pipablepytorch3d==0.7.6 decord==0.6.0 eva-decord==0.6.1 pydantic==2.10.6 +fastapi==0.115.12 pyarrow==14.0.1 fastparquet==2024.11.0 av==12.3.0 @@ -29,5 +30,6 @@ rich diffusers timm tyro +uvicorn==0.34.2 websockets snntorch \ No newline at end of file