Skip to content

Add epistemic-drift guards, RIEE runtime enforcement, claim validation scripts and tests#820

Open
neuron7xLab wants to merge 81 commits into
mainfrom
codex/evaluate-cognitive-signals-in-the-repository-61vqdf
Open

Add epistemic-drift guards, RIEE runtime enforcement, claim validation scripts and tests#820
neuron7xLab wants to merge 81 commits into
mainfrom
codex/evaluate-cognitive-signals-in-the-repository-61vqdf

Conversation

@neuron7xLab
Copy link
Copy Markdown
Owner

@neuron7xLab neuron7xLab commented May 26, 2026

Motivation

  • Ensure documentation claims, invariant counts, and claim registry entries remain synchronized with executable checks.
  • Provide a fail-closed guardrail stack for epistemic drift, claim integrity, financial-data contract validation, and RIEE runtime enforcement.
  • Improve runtime safety with a lightweight RIEE enforcer that can panic/quarantine when runtime invariants drift.

Description

  • Add CI workflows for epistemic drift and fail-closed guardians, plus pre-commit hooks for guard scripts.
  • Implement guard and utility scripts: scripts/check_epistemic_drift.py, scripts/validate_financial_contract.py, scripts/generate_claim_graph.py, scripts/generate_claim_hashes.py, scripts/verify_claim_hashes.py, scripts/claims_lifecycle.py, scripts/production_readiness_riee.py, and scripts/guards/zero_latency_interrupter.py.
  • Add RIEE runtime modules under runtime/riee/ and operational helpers under scripts/riee/.
  • Add claim/evidence artifacts, architecture docs, and focused tests for drift checks, claim hashing, contract validation, runtime behavior, telemetry, and guard-surface verification.
  • Bind changed files with commit-acceptor shards so the governance gate can verify scope-to-evidence linkage.

Testing

  • Guard scripts are exercised directly by scripts/production_readiness_riee.py.
  • Focused tests cover epistemic drift, RIEE kernel/SDK/telemetry, claim graph/hash behavior, duplicate claim IDs, financial contract validation, runtime profiling, and guard-surface checks.
  • Commercial/product scorecard files were removed from this PR to keep the scope limited to verification and runtime guardrails.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a425a8333

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/validate_financial_contract.py Outdated
Comment thread scripts/validate_financial_contract.py Outdated
spread = float(row['spread_bps'])
fee = float(row['fee_bps'])
amount = float(row.get('amount', close_val))
ts = datetime.fromisoformat(row['ts_utc'].replace('Z', '+00:00'))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-UTC timestamps in financial contract checks

The contract includes an utc_only_timestamps quality gate, but this parser accepts any ISO-8601 offset (for example +05:30) and never verifies the parsed timezone is UTC. As a result, non-UTC rows can pass validation and be marked EP_PARITY_PASSED, which undermines the contract’s stated timezone constraint.

Useful? React with 👍 / 👎.

Comment thread scripts/check_epistemic_drift.py Outdated
Copy link
Copy Markdown
Owner Author

Blocking audit: do not merge as-is

This PR has useful direction, but it mixes too many layers in one blast radius and contains several fail-closed violations.

P1 blockers

  1. runtime/riee/engine.py: load_claim_gamma() silently returns 1.0 when GAMMA-CLAIM is absent. Runtime enforcement must fail closed on missing/invalid claims, not fabricate a default claim.
  2. scripts/validate_financial_contract.py: epistemic_drift_delta is computed as abs(sum(amounts) - sum(amounts)), so it is always 0. The manifest can claim parity even when no independent expected ledger exists.
  3. scripts/verify_claim_hashes.py: duplicate claim IDs silently overwrite earlier hashes via dict assignment. Claim identity collisions must fail closed.
  4. .github/workflows/fail_closed_guardians.yml: the workflow does not run the new RIEE tests, production readiness test, claim graph/hash generator tests, or financial negative tests despite the PR claiming these guardrails are integrated.
  5. Generated artifacts with runtime timestamps are committed while generators can rewrite timestamps, creating non-deterministic artifact drift.

P2 issues

  1. scripts/generate_claim_graph.py hardcodes dangling_nodes: 0 instead of deriving it, and accepts weak/duplicate/empty claim rows.
  2. tests/riee/test_riee_sdk_modes.py mutates os.environ directly instead of using monkeypatch, leaking RIEE_ENABLE across tests.
  3. metrics/performance_profiler.py starts tracemalloc but does not stop it in a finally block if baseline_fn or validation_fn raises.
  4. Financial validator tests are mostly happy-path fixture checks; add adversarial negative tests for missing required fields, high < low, non-monotonic timestamps per symbol, timezone-naive timestamps, invalid hashes, invalid UUID, and malformed numeric values.

Recommended split

A. PR 1: epistemic drift guard only.
B. PR 2: claim graph/hash integrity with duplicate-ID fail-closed tests.
C. PR 3: financial contract validator with adversarial fixture tests.
D. PR 4: RIEE runtime engine + SDK + telemetry + CI coverage.
E. PR 5: docs/artifacts generated deterministically.

Verdict: conceptually valuable, not merge-safe.

Copy link
Copy Markdown
Owner Author

Closure update after autonomous hardening pass:

  • Fixed financial validation P1: removed self-subtracting epistemic_drift_delta; validator now computes record-count drift against expected_records, enforces UTC Z timestamps, validates monotonicity per symbol, and emits deterministic manifest timestamps.
  • Added/rewrote financial negative coverage for record-delta measurement, high<low, naive timestamp rejection, and non-monotonic timestamps.
  • Fixed drift-check timeout P2: scripts/check_epistemic_drift.py now passes timeout=MAX_RUNTIME_SECONDS into the invariant counter subprocess and fails closed on timeout/called-process failure.
  • Resolved the timestamp monotonicity thread and the subprocess timeout thread.

One outdated UTC review thread remains unresolved only because the connector blocked the thread-id resolve call; the code path itself is now closed by _parse_utc_timestamp() requiring explicit Z UTC input.

@neuron7xLab neuron7xLab force-pushed the codex/evaluate-cognitive-signals-in-the-repository-61vqdf branch from eb7bfff to 9cdbddc Compare May 28, 2026 09:48
@neuron7xLab neuron7xLab enabled auto-merge (squash) May 29, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant