MemTensor · whipser030 · May 27, 2026 · May 27, 2026 · May 27, 2026 · May 28, 2026
diff --git a/apps/memos-local-plugin/adapters/openclaw/tools.ts b/apps/memos-local-plugin/adapters/openclaw/tools.ts
@@ -18,6 +18,7 @@ import { Type, type Static } from "@sinclair/typebox";
 
 import type { AgentKind, RuntimeNamespace, SkillId, TraceId } from "../../agent-contract/dto.js";
 import type { MemoryCore } from "../../agent-contract/memory-core.js";
+import { reflectionAsText } from "../../core/capture/types.js";
 
 import { bridgeSessionId } from "./bridge.js";
 import type {
@@ -242,7 +243,7 @@ export function registerOpenClawTools(api: OpenClawPluginApi, opts: ToolsOptions
               episodeId: trace.episodeId,
               ts: trace.ts,
               value: trace.value,
-              reflection: clip(trace.reflection, bodyCap),
+              reflection: clip(reflectionAsText(trace.reflection) ?? undefined, bodyCap),
               userText: clip(trace.userText, bodyCap),
               toolCalls: trace.toolCalls.map((tc) => ({
                 name: tc.name,

diff --git a/apps/memos-local-plugin/core/capture/ALGORITHMS.md b/apps/memos-local-plugin/core/capture/ALGORITHMS.md
@@ -26,103 +26,102 @@ Edge cases:
   The V7 spec keeps all sub-agent traces under the root episode so
   `R_task` backprops correctly up the decision tree.
 
-## V7 §3.2.2 — Reflection extraction
+## V7 §3.2 — Windowed binary path-relevance scoring
 
-Procedure `ExtractReflection(τ_t)`:
+The original per-step reflection scorer (`reflection-extractor` →
+`reflection-synth` → `alpha-scorer`) was removed in the 2026-05 redesign
+(see [docs/superpowers/specs/2026-05-27-l1-batch-reflection-binary-design.md](../../docs/superpowers/specs/2026-05-27-l1-batch-reflection-binary-design.md)).
+Reflection no longer produces free-form natural-language text. Instead, every
+step gets a fixed-label path relevance judgement and an aligned numeric `α`:
 
 ```
-if τ_t.meta.reflection is non-empty:
-    return τ_t.meta.reflection          # adapter-native
-elif regex_match(τ_t.agentText):
-    return cleaned_match(…)             # inline reasoning
-elif config.synthReflections:
-    return LLM(Synthesis, τ_t)          # synthesized
-else:
-    return ∅
+α_t ∈ {0, 0.5, 1}
+reflection_t ∈ { "PIVOTAL", "RELATED", "IRRELEVANT", "RELATED_DEFAULT" }
 ```
 
-Implemented by `reflection-extractor.ts` (steps 1-2) +
-`reflection-synth.ts` (step 3). Prompt for synthesis is minimal and
-temperature=0.1 — we want a terse, agent-voiced explanation, never a
-judgment.
+with the semantics:
+- `PIVOTAL` → `α_t = 1` —关键转折点。
+- `RELATED` → `α_t = 0.5` —相关但非关键路径。
+- `IRRELEVANT` → `α_t = 0` —无关/偏航路径。
+- `RELATED_DEFAULT` → `α_t = 0.5` —missing-window 或 episode fallback 的安全默认值。
 
-## V7 §3.2.3 — α scoring
+### Window topology
 
-V7 defines the "reflection utility" α via a four-axis rubric:
+Windows are owned by `runEpisodeBatchScoring` in `capture.ts`. Two passes:
 
-```
-α_t = judge(state_t, action_t, outcome_t, reflection_t)
-    = weighted_mean(faithfulness, causal_insight,
-                    transferability, concreteness)
-usable_t = 1 iff α_t ≥ 0.4 AND non_tautological(reflection_t)
-if usable_t = 0:
-    α_t ← 0          # equation 5: unusable reflections cannot skew backprop
-```
-
-The judge is `REFLECTION_SCORE_PROMPT` (see
-`core/llm/prompts/reflection.ts`), which returns a JSON object. Our
-implementation clamps α to [0, 1], applies the `usable` mask, and
-guarantees finite values.
+| Pass    | `windowSize` | `overlap` | per-window retries |
+|---------|--------------|-----------|--------------------|
+| primary | 20           | 3         | 1                  |
+| degrade | 9            | 3         | 2                  |
 
-When `alphaScoring=false` OR the LLM fails:
-
-```
-α_t = 0.5    # neutral; Phase 7 backprop still runs, half-weighted
-usable_t = 1
-```
+Stride is `windowSize − overlap` (17 for primary, 6 for degrade). The
+last window of either pass is allowed to be shorter than `windowSize`.
+`buildWindows(length, windowSize, overlap)` returns half-open `[start,
+end)` pairs in ascending order.
 
-This preserves the "graceful degradation" property V7 asks for: a local
-setup without a paid LLM still accrues L1 traces with meaningful
-priority once reward arrives.
+### Merge rule
 
-## V7 §3.2 batched variant — `batch-scorer.ts`
+`mergeWindowScores` aggregates per-window results by absolute
+`global_idx = win.start + i`. Per-step combination is:
 
-The per-step path (`reflection-synth.ts` + `alpha-scorer.ts`) issues 2N
-LLM calls per N-step episode. `batch-scorer.ts` collapses them into ONE:
+Window overlap 合并按标签优先级（已替代旧的二值 merge 口径）：
 
 ```
-inputs   = [{idx, state, action, outcome, reflection, synth_allowed}, …]
-                              ↓ BATCH_REFLECTION_PROMPT
-outputs  = {scores: [{idx, reflection_text, alpha, usable, reason}, …]}
+PIVOTAL > RELATED / RELATED_DEFAULT > IRRELEVANT
 ```
 
-Dispatch (in `capture.ts`):
-
-| `cfg.batchMode`   | `cfg.batchThreshold` | behavior |
-|-------------------|----------------------|----------|
-| `per_step`        | (ignored)            | legacy: 2N calls |
-| `per_episode`     | (ignored)            | always batch |
-| `auto` (default)  | `12`                 | batch when `N ≤ 12`; else per-step |
-
-The dispatcher also refuses to batch when no LLM is wired — same fallback
-path as missing-LLM in per-step mode.
-
-Why batched mode tends to produce **better** reflections (not just cheaper):
-the prompt sees the full episode timeline including the final outcome, so
-it can credit-attribute across steps. V7 §3.2.3's `causal_insight` and
-`transferability` axes both benefit from the wider context. Per-step
-synth, in contrast, can only rationalize from local `(s, a, o)`.
-
-Failure handling:
+Numeric `alpha` follows final label mapping:
 
-- LLM throws / facade gives up after `malformedRetries=1` → capture
-  catches in `runBatchScoring`, surfaces a `{stage: "batch"}` warning,
-  and the per-step path runs as a fallback.
-- Validator rejects on length mismatch, missing/non-numeric `alpha`,
-  non-boolean `usable`, non-string `reflection_text`. Same fallback.
-
-Bookkeeping (`CaptureResult.llmCalls`):
+```
+PIVOTAL=1, RELATED=0.5, RELATED_DEFAULT=0.5, IRRELEVANT=0
+```
 
-- `batchedReflection`: 0 or 1 per episode (1 on a successful batch).
-- `reflectionSynth` / `alphaScoring`: only nonzero when the per-step path
-  ran (either selected directly, or as fallback after a batch failure).
+> 旧口径（`alpha=1` 覆盖 `alpha=0`，且 missing-window 默认 `alpha=1`）已废弃。
+
+### Failure ladder
+
+1. **Per-window** — up to `maxRetries+1` calls (1 attempt + retries).
+   A malformed payload from the LLM is one of: array length ≠ window
+   length, `relevance` outside {IRRELEVANT, RELATED, PIVOTAL}, or
+   missing `idx`. The validator in
+   `batch-scorer.ts :: validateBatchPayload` raises
+   `LLM_OUTPUT_MALFORMED` and the facade's own malformed-retry triggers
+   once before our outer retry kicks in. A missing/empty `reason` is
+   NOT malformed — the entry is kept and we emit a `batch.reason_missing`
+   warn instead, so a stray reason omission never costs the whole
+   episode its (relevance, alpha) signal.
+2. **Window pass** — if every window in the primary pass eventually
+   succeeded, we accept its results. Otherwise we discard the partial
+   primary results and re-run with the degrade pass over the whole
+   episode.
+3. **Episode-wide fallback** — if the degrade pass also has any failed
+   window, every step in the episode is overwritten with
+   `{ alpha: 0.5, text: "RELATED_DEFAULT", reason: "FALLBACK_RELATED_DEFAULT" }`
+   and we log `reflection_fallback_related_default` at error level with
+   `{ degraded: true, episodeId, stepsCount, failedWindows }`.
+4. **No reflect LLM wired** — short-circuits straight to the
+   episode-wide fallback (`reason: "no_llm"`).
+
+The downstream reward / L2 / Skill chain runs in every case; the
+fallback is meant to keep the pipeline available, not to gate it.
+
+### Bookkeeping (`CaptureResult.llmCalls`)
+
+- `batchedReflection` — number of successful batch calls this episode.
+  One per window that actually returned a usable payload (so a long
+  episode can be >1, and the degrade pass can add more).
+- `reflectionSynth` / `alphaScoring` — permanently `0`. Retained on the
+  `CaptureResult` interface for backward-compatible analytics consumers.
+
+### Stable prompt fingerprint
 
-Stable prompt fingerprint:
+```
+op = capture.reflection.batch.v<BATCH_REFLECTION_PROMPT.version>
+```
 
-- `op = capture.reflection.batch.v3` (see `BATCH_OP_TAG` constant; version
-  matches `BATCH_REFLECTION_PROMPT.version`).
-  Bumping `BATCH_REFLECTION_PROMPT.version` changes the op tag so audit
-  rows remain attributable.
+Bumping `BATCH_REFLECTION_PROMPT.version` in
+`core/llm/prompts/reflection.ts` rolls the op tag automatically so audit
+rows stay attributable to a specific prompt revision.
 
 ## V7 §3.2.4 — Reward wiring
 
@@ -131,8 +130,8 @@ Capture does NOT compute `r_step` or `V_t`. It writes:
 ```
 trace.value    = 0            # V_t will be filled by Phase 7
 trace.r_human  = null         # assigned on feedback (Phase 7 R_human path)
-trace.alpha    = α_t          # from §3.2.3
-trace.priority = 0            # recomputed after backprop
+trace.alpha    = α_t          # {0, 0.5, 1} from relevance mapping
+trace.priority = 0.5          # seeded so retrieval can find it pre-reward
 ```
 
 Phase 7 updates these via `tracesRepo.updateScore` once the
@@ -148,7 +147,7 @@ priority(f¹_t) ∝ max(V_t, 0) · decay(Δt)
 - `decay(Δt)` = half-life ≈ 30 days (Phase 7 constant)
 - `V_t` = backpropagated value from the R_task + step rewards (Phase 7)
 
-Capture initialises `priority=0`. The formula activates in
+Capture initialises `priority=0.5`. The formula activates in
 `core/reward/backprop.ts` (Phase 7).
 
 ## Text & vector conventions
@@ -171,26 +170,31 @@ marker. Rationale:
 - Tail keeps "what the agent concluded with" — often the most useful
   sentence for Tier 2 recall.
 - Dropping the middle rarely hurts (that's usually thinking + tool
-  rationales that the reflection already summarises).
+  rationales that the windowed scorer already collapses into a binary
+  judgement).
 
 Per-tool-call outputs use the same clamp with `maxToolOutputChars`.
 
 ## Concurrency
 
-Reflection + α stages iterate per-step. We run them with
-`config.capture.llmConcurrency` workers (default 4). The embedding stage
-uses the embedder's own batching — one call for ALL steps.
-
-Typical budget for a 10-step episode with alpha scoring on and an
-external LLM: 10 α calls ÷ 4 workers ≈ 3 batches, plus one embed call.
-Wall clock usually 3-10s on a mid-tier OpenAI-compat endpoint.
-
-## Stable prompt fingerprints
-
-Every LLM call carries:
-- `op = capture.alpha.reflection.score.v1` (alpha scorer)
-- `op = capture.reflection.synth` (reflection synth)
-
-Bumping `REFLECTION_SCORE_PROMPT.version` in `core/llm/prompts/reflection.ts`
-changes the op tag automatically, so historical α values remain
-attributable to their scoring prompt generation.
+The windowed scorer is sequential per episode (windows run in order,
+not in parallel) because the merge rule benefits from short feedback
+loops on failures — a failing primary pass is detected before the
+degrade pass starts. Summariser and embedder stages still use
+`config.capture.llmConcurrency` workers (default 4).
+
+Typical budget for a 60-step episode with the primary pass succeeding:
+`ceil((60 - 3) / 17) = 4` batch calls, plus one embed call. Wall clock
+is dominated by the batch latency of the reflect model.
+
+## Downstream consumers and the enum reflection field
+
+`traces.reflection` is now one of `PIVOTAL | RELATED | IRRELEVANT |
+RELATED_DEFAULT` (plus legacy free-form text from pre-2026-05 traces).
+Downstream modules that previously fed the reflection string into LLM
+prompts, error-signature heuristics, or keyword blobs use the
+`reflectionAsText` helper exported from `core/capture/types.ts` to
+filter the three fixed labels back to `null`. That keeps the L2
+signature bucket, L2/L3 induction prompts, skill crystallisation /
+verification, feedback evidence, and feedback-builder notes from
+treating `RELATED_DEFAULT` as natural language.