From 62e45a6a518b7ff6e78875f490b2742026ee517f Mon Sep 17 00:00:00 2001 From: Loki FastStart Date: Sun, 10 May 2026 10:56:20 +0000 Subject: [PATCH 1/3] docs: add telemetry opt-out to Step 1 install section Shows what's collected (OS/arch/duration only), how to opt out (touch ~/.lowkey/telemetry-off or LOWKEY_TELEMETRY=0), and links to full privacy details. --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index f7b211b..425d733 100644 --- a/README.md +++ b/README.md @@ -52,6 +52,12 @@ Run `curl -sfL install.lowkey.run | bash` — the installer walks you through **pack**, **profile**, **instance size**, and **deploy method** (CloudFormation or Terraform). +> **📊 Telemetry opt-out:** The installer sends anonymous install telemetry (start/success/failure + OS/arch/duration — no code, credentials, IPs, or hostnames). To opt out before installing: +> ```bash +> mkdir -p ~/.lowkey && touch ~/.lowkey/telemetry-off +> ``` +> Or set `LOWKEY_TELEMETRY=0` when running the installer. [Full privacy details →](https://docs.lowkey.run/reference/telemetry-privacy) + **CLI flags for non-interactive deploys:** | Flag | Description | From 124c966868cfde0a6b0e51b4113cd56b9f7a42dc Mon Sep 17 00:00:00 2001 From: Roy Osherove <575051+royosherove@users.noreply.github.com> Date: Mon, 11 May 2026 09:01:14 +0000 Subject: [PATCH 2/3] Remove memory-management bootstrap + fix alarm false-positive sources - Delete BOOTSTRAP-MEMORY-SEARCH.md (bedrockify-backed semantic memory search) - Drop BedrockifyAlive/loki-bedrockify-down references from BOOTSTRAP-ALARMS (metric was never published by the health-check script -> alarm stuck in INSUFFICIENT_DATA / flapped, pure false positive) - Renumber tier-3 sections after removal - Add explicit 'rebind on instance replacement' warning: custom alarms are scoped to InstanceId dimension; when the EC2 box is replaced the alarms must be redeployed against the new id or they sit in INSUFFICIENT_DATA forever. Recommend TreatMissingData=missing to avoid spurious paging. Live fix applied alongside this commit: - Rebound 9 loki-* alarms from i-04d527003d0a094ff (old) to i-0229529f514ef6fd7 (current) with TreatMissingData=missing. - Deleted obsolete loki-bedrockify-down alarm. --- bootstraps/essential/BOOTSTRAP-ALARMS.md | 22 +- .../essential/BOOTSTRAP-MEMORY-SEARCH.md | 238 ------------------ deploy/README.md | 1 - 3 files changed, 5 insertions(+), 256 deletions(-) delete mode 100644 bootstraps/essential/BOOTSTRAP-MEMORY-SEARCH.md diff --git a/bootstraps/essential/BOOTSTRAP-ALARMS.md b/bootstraps/essential/BOOTSTRAP-ALARMS.md index 816286c..60935aa 100644 --- a/bootstraps/essential/BOOTSTRAP-ALARMS.md +++ b/bootstraps/essential/BOOTSTRAP-ALARMS.md @@ -12,6 +12,8 @@ Alarms to deploy on every EC2 instance running a Loki agent. Designed to catch t - SNS topic for notifications (create one or pass existing ARN) - Instance ID and region known at deploy time +> ⚠️ **Rebind on instance replacement.** All custom alarms (Tier 3) are scoped to a specific `InstanceId` dimension. When the EC2 instance is replaced (manual rebuild, ASG refresh, etc.), you **must** redeploy the alarms against the new instance id — otherwise alarms stay in `INSUFFICIENT_DATA` forever (or flap to ALARM depending on `TreatMissingData`). All custom alarms here set `TreatMissingData=missing` to avoid spurious paging on short metric gaps. + ## Tier 1 — Instance Survival (auto-recover) These use built-in EC2/CloudWatch metrics. No agent needed. @@ -120,18 +122,7 @@ Action: SNS notify ### Common Service Checks (All Agents) -### 3.2 Bedrockify Alive - -Both OpenClaw and Hermes depend on bedrockify. Monitor it on all instances. - -``` -Metric: Custom/Loki BedrockifyAlive -Value: 1 = systemd active + HTTP 200 on health endpoint (port 8090), 0 = down -Threshold: < 1 for 2 consecutive periods (1 min each) -Action: SNS notify -``` - -### 3.3 Systemd Failed Units +### 3.2 Systemd Failed Units Catches: any crash-looping service, not just the ones we know about. **Would have caught the bedrock-embed-proxy crash-loop immediately.** @@ -143,7 +134,7 @@ Threshold: > 0 for 1 period (1 min) Action: SNS notify ``` -### 3.4 Bedrock API Reachable +### 3.3 Bedrock API Reachable Catches: credential expiry, region issues, service disruptions, model access revoked. @@ -172,8 +163,7 @@ Pushes all Tier 3 custom metrics in a single `put-metric-data` call (batched). **What it checks:** 1. **OpenClaw instances:** `pgrep -f openclaw-gatewa` — OpenClaw gateway process alive **Hermes instances:** `pgrep -f hermes` — Hermes agent process alive -2. `systemctl is-active bedrockify` + `curl -sf localhost:8090/` — Bedrockify alive + healthy (required for all agents) -3. `systemctl list-units --failed --no-legend | wc -l` — Failed unit count +2. `systemctl list-units --failed --no-legend | wc -l` — Failed unit count 4. `df --output=pcent / | tail -1` — Root disk percent 5. `free | awk '/Mem/ {printf "%.0f", $3/$2*100}'` — Memory percent 6. Quick Bedrock `InvokeModel` with tiny payload (1 embedding, cached model) — API reachable @@ -248,7 +238,6 @@ Provides a single-pane view of all alarms, service health, compute resources, ne "arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-instance-status-check-failed", "arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-openclaw-down", "arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-hermes-down", - "arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-bedrockify-down", "arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-bedrock-unreachable", "arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-failed-units", "arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-cpu-high", @@ -284,7 +273,6 @@ Provides a single-pane view of all alarms, service health, compute resources, ne "properties": { "title": "⚡ Bedrockify", "metrics": [ - [ "Custom/Loki", "BedrockifyAlive", "InstanceId", "INSTANCE_ID", { "label": "Bedrockify Alive", "color": "#1f77b4" } ] ], "view": "timeSeries", "stacked": false, "region": "us-east-1", "period": 60, "stat": "Minimum", diff --git a/bootstraps/essential/BOOTSTRAP-MEMORY-SEARCH.md b/bootstraps/essential/BOOTSTRAP-MEMORY-SEARCH.md deleted file mode 100644 index f208908..0000000 --- a/bootstraps/essential/BOOTSTRAP-MEMORY-SEARCH.md +++ /dev/null @@ -1,238 +0,0 @@ -# BOOTSTRAP-MEMORY-SEARCH.md — Enable Semantic Memory Search with Bedrock Embeddings - -> **Applies to:** All agents (with agent-specific sections below) - -> **Run this once to enable memory search.** If `memory/.bootstrapped-memory-search` exists, skip — you've already done this. - -## Overview - -Semantic memory search uses an OpenAI-compatible embeddings API. [bedrockify](https://github.com/inceptionstack/bedrockify) — already installed as a dependency of all agent packs — provides `/v1/embeddings` on localhost, translating OpenAI embedding calls into Amazon Bedrock embedding calls. No external API keys needed — uses the EC2 instance profile. - -``` -memory_search → http://127.0.0.1:8090/v1/embeddings → bedrockify → Bedrock Titan Embed v2 → vector results -``` - -## Prerequisites - -- EC2 instance with IAM role that has `bedrock:InvokeModel` permission -- Bedrock model access enabled for `amazon.titan-embed-text-v2:0` in us-east-1 -- **bedrockify already running** — installed and started by the bedrockify pack (dependency of both OpenClaw and Hermes) - -## Step 1: Verify bedrockify Is Running - -bedrockify is installed as a systemd service by the bedrockify pack. No separate installation needed. - -```bash -# Check service status -systemctl status bedrockify -# Should show: active (running) - -# Health check -curl -s http://127.0.0.1:8090/ -# Expected: {"status":"ok",...} -``` - -If bedrockify is not running, check the service: - -```bash -sudo journalctl -u bedrockify -n 20 -sudo systemctl restart bedrockify -``` - -## Step 2: Verify Embeddings Endpoint - -**Single embedding:** -```bash -curl -s -X POST http://127.0.0.1:8090/v1/embeddings \ - -H "Content-Type: application/json" \ - -d '{"input": "test embedding", "model": "amazon.titan-embed-text-v2:0"}' \ - | jq '{object, model, dims: (.data[0].embedding | length)}' -# Expected: {"object":"list","model":"amazon.titan-embed-text-v2:0","dims":1024} -``` - -**Batch embeddings:** -```bash -curl -s -X POST http://127.0.0.1:8090/v1/embeddings \ - -H "Content-Type: application/json" \ - -d '{"input": ["first text", "second text"], "model": "amazon.titan-embed-text-v2:0"}' \ - | jq '{results: (.data | length), dims: [.data[].embedding | length]}' -# Expected: {"results":2,"dims":[1024,1024]} -``` - -## OpenClaw-Specific Configuration - -### Step 3: Configure OpenClaw Memory Search - -Add this to your `openclaw.json` under `agents.defaults`: - -```json -"memorySearch": { - "enabled": true, - "provider": "openai", - "remote": { - "baseUrl": "http://127.0.0.1:8090/v1/", - "apiKey": "not-needed" - }, - "fallback": "none", - "model": "amazon.titan-embed-text-v2:0", - "query": { - "hybrid": { - "enabled": true, - "vectorWeight": 0.7, - "textWeight": 0.3 - } - }, - "cache": { - "enabled": true, - "maxEntries": 50000 - } -} -``` - -Then restart the OpenClaw gateway. - -### Step 4: Verify End-to-End - -Ask the agent to run `memory_search` with any query. It should return ranked results from workspace memory files using hybrid search (70% vector, 30% text). - -### Step 5: Backfill Existing Memory - -After enabling semantic search for the first time, existing memory files are **not** automatically indexed. Run: - -```bash -openclaw memory index --force -``` - -This vectorizes all current memory files. Without this step, `memory_search` will only find content written *after* the setup — silently missing everything prior. - -### Memory Quality Matters - -Vector search ranks results by cosine similarity. **Low-signal, repetitive content tanks the scores of everything nearby** — making recall unreliable even for genuinely useful memories. - -#### What hurts search quality - -High-frequency repetitive content compresses into a dense cluster in vector space. This raises the effective similarity floor and pushes useful content below the retrieval threshold. - -The biggest offender is **heartbeat logs**: - -```markdown -## Heartbeat 02:19 UTC -### Apps: ✅ frontend + api healthy -### Security Hub: no change — 1 CRITICAL, 94 HIGH -## Heartbeat 02:49 UTC -### Apps: ✅ frontend + api healthy -### Security Hub: no change — 1 CRITICAL, 94 HIGH -``` - -A single daily memory file can contain 40–50 of these. They're semantically near-identical, contribute nothing to recall, and dilute chunk quality across the entire index. - -#### Rule: only write what changed - -**Don't write:** -- "no change", "all healthy", "nothing to report" -- Repeated status confirmations -- Routine cron completions with no notable outcome - -**Do write:** -- App went down or returned unexpected status -- Security finding count changed (new CVE, severity shift) -- A decision was made -- A bug was found or fixed -- A TODO was started or completed autonomously - -#### Keep heartbeat files out of the index - -If your agent writes verbose heartbeat logs that are useful for audit but not for recall, route them to a separate file pattern and exclude from indexing: - -```bash -# Write heartbeat noise here (not indexed) -memory/heartbeat-YYYY-MM-DD.md - -# Keep daily notes clean (indexed) -memory/YYYY-MM-DD.md -``` - -To exclude a pattern from indexing, configure the memory sources in `openclaw.json`: - -```json -"memorySearch": { - "sources": { - "exclude": ["memory/heartbeat-*.md"] - } -} -``` - -## Hermes-Specific Configuration - -Hermes has its own built-in memory system: - -- **MEMORY.md** (~2,200 chars) — agent's personal notes, environment facts, lessons learned -- **USER.md** (~1,375 chars) — user preferences, communication style -- **Session search** — FTS5 full-text search across all past sessions in `~/.hermes/state.db` - -Hermes memory is managed via the `memory` tool (add/replace/remove) and injected into the system prompt at session start. Session search uses `session_search` for finding past conversations. - -**Bedrockify embeddings** are still available on `localhost:8090` for custom embedding workflows or MCP-based memory extensions: - -```bash -curl -s -X POST http://127.0.0.1:8090/v1/embeddings \ - -H "Content-Type: application/json" \ - -d '{"input": "your text here", "model": "amazon.titan-embed-text-v2:0"}' -``` - -To configure Hermes memory limits: - -```yaml -# In ~/.hermes/config.yaml -memory: - memory_enabled: true - user_profile_enabled: true - memory_char_limit: 2200 - user_char_limit: 1375 -``` - -## Supported Models - -bedrockify supports embedding models based on the `--embed-model` flag set at install time. The default is `amazon.titan-embed-text-v2:0`. Common options: - -| Model | ID | Dims | -|-------|----|------| -| **Titan Embed Text V2** (default) | `amazon.titan-embed-text-v2:0` | 1024 | -| Titan Embed G1 Text | `amazon.titan-embed-g1-text-02` | 1536 | -| Cohere Embed English v3 | `cohere.embed-english-v3` | 1024 | -| Cohere Embed Multilingual v3 | `cohere.embed-multilingual-v3` | 1024 | - -To change the embedding model, update the bedrockify service configuration: - -```bash -# Edit the bedrockify systemd service to change --embed-model -sudo systemctl edit bedrockify -# Add override for ExecStart with your preferred --embed-model -sudo systemctl restart bedrockify -``` - -## Pi-Specific Configuration - -Pi has no built-in memory system. There is no `memory_search` tool or persistent session storage. - -To build custom memory, use bedrockify's `/v1/embeddings` endpoint (available on `localhost:8090`) to generate and store vectors in a file or SQLite database, then query them manually via a Pi extension or bash tool. This is opt-in and requires custom implementation. - -## IronClaw-Specific Configuration - -IronClaw has a built-in state database (PostgreSQL or embedded libSQL at `~/.ironclaw/state.db`). It may provide its own memory/session search — check IronClaw's documentation for available search tools. - -Bedrockify embeddings are also available on `localhost:8090` for custom semantic search workflows: - -```bash -curl -s -X POST http://127.0.0.1:8090/v1/embeddings \ - -H "Content-Type: application/json" \ - -d '{"input": "your text here", "model": "amazon.titan-embed-text-v2:0"}' -``` - -These can be used alongside IronClaw's native state DB for hybrid retrieval if needed. - -## Finish - -```bash -mkdir -p memory && echo "Memory search bootstrapped $(date -u +%Y-%m-%dT%H:%M:%SZ)" > memory/.bootstrapped-memory-search -``` diff --git a/deploy/README.md b/deploy/README.md index dc71c27..62073b5 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -85,7 +85,6 @@ These set up security baselines, coding guidelines, MCP tools, memory search, an | `BOOTSTRAP-SECURITY.md` | Security hardening + AWS Budgets alerts | | `BOOTSTRAP-SKILLS.md` | Installs AWS infrastructure skills | | `BOOTSTRAP-MCPORTER.md` | Sets up MCP server tooling | -| `BOOTSTRAP-MEMORY-SEARCH.md` | Enables semantic memory search via Bedrock embeddings | | `BOOTSTRAP-CODING-GUIDELINES.md` | Coding standards and project conventions | | `BOOTSTRAP-SECRETS-AWS.md` | AWS Secrets Manager integration | | `BOOTSTRAP-PLAYWRIGHT.md` | Browser automation via Playwright MCP | From 9dbc3f44a750d7c357ee87602d4b5faea751dff4 Mon Sep 17 00:00:00 2001 From: Roy Osherove <575051+royosherove@users.noreply.github.com> Date: Mon, 11 May 2026 09:05:19 +0000 Subject: [PATCH 3/3] fix(alarms): exclude systemd-coredump@* transient units from FailedUnits count systemd-coredump@--.service units are one-shot transient units that systemd spawns to handle a coredump and then leaves in 'failed' state after exit. They are not real service failures but they inflate the FailedUnits metric and cause loki-failed-units to fire on any box that has recently dumped a core. Patch the documented health-check command to grep them out. (Live fix also applied to /usr/local/bin/loki-health-check.sh on the current instance.) --- bootstraps/essential/BOOTSTRAP-ALARMS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bootstraps/essential/BOOTSTRAP-ALARMS.md b/bootstraps/essential/BOOTSTRAP-ALARMS.md index 60935aa..ef665a8 100644 --- a/bootstraps/essential/BOOTSTRAP-ALARMS.md +++ b/bootstraps/essential/BOOTSTRAP-ALARMS.md @@ -163,7 +163,7 @@ Pushes all Tier 3 custom metrics in a single `put-metric-data` call (batched). **What it checks:** 1. **OpenClaw instances:** `pgrep -f openclaw-gatewa` — OpenClaw gateway process alive **Hermes instances:** `pgrep -f hermes` — Hermes agent process alive -2. `systemctl list-units --failed --no-legend | wc -l` — Failed unit count +2. `systemctl list-units --failed --no-legend | grep -v 'systemd-coredump@' | wc -l` — Failed unit count (excludes transient coredump handler units, which linger in `failed` state after handling any crash) 4. `df --output=pcent / | tail -1` — Root disk percent 5. `free | awk '/Mem/ {printf "%.0f", $3/$2*100}'` — Memory percent 6. Quick Bedrock `InvokeModel` with tiny payload (1 embedding, cached model) — API reachable