Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@

Run `curl -sfL install.lowkey.run | bash` — the installer walks you through **pack**, **profile**, **instance size**, and **deploy method** (CloudFormation or Terraform).

> **📊 Telemetry opt-out:** The installer sends anonymous install telemetry (start/success/failure + OS/arch/duration — no code, credentials, IPs, or hostnames). To opt out before installing:
> ```bash
> mkdir -p ~/.lowkey && touch ~/.lowkey/telemetry-off
> ```
> Or set `LOWKEY_TELEMETRY=0` when running the installer. [Full privacy details →](https://docs.lowkey.run/reference/telemetry-privacy)

**CLI flags for non-interactive deploys:**

| Flag | Description |
Expand Down
22 changes: 5 additions & 17 deletions bootstraps/essential/BOOTSTRAP-ALARMS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ Alarms to deploy on every EC2 instance running a Loki agent. Designed to catch t
- SNS topic for notifications (create one or pass existing ARN)
- Instance ID and region known at deploy time

> ⚠️ **Rebind on instance replacement.** All custom alarms (Tier 3) are scoped to a specific `InstanceId` dimension. When the EC2 instance is replaced (manual rebuild, ASG refresh, etc.), you **must** redeploy the alarms against the new instance id — otherwise alarms stay in `INSUFFICIENT_DATA` forever (or flap to ALARM depending on `TreatMissingData`). All custom alarms here set `TreatMissingData=missing` to avoid spurious paging on short metric gaps.

## Tier 1 — Instance Survival (auto-recover)

These use built-in EC2/CloudWatch metrics. No agent needed.
Expand Down Expand Up @@ -120,18 +122,7 @@ Action: SNS notify

### Common Service Checks (All Agents)

### 3.2 Bedrockify Alive

Both OpenClaw and Hermes depend on bedrockify. Monitor it on all instances.

```
Metric: Custom/Loki BedrockifyAlive
Value: 1 = systemd active + HTTP 200 on health endpoint (port 8090), 0 = down
Threshold: < 1 for 2 consecutive periods (1 min each)
Action: SNS notify
```

### 3.3 Systemd Failed Units
### 3.2 Systemd Failed Units

Catches: any crash-looping service, not just the ones we know about.
**Would have caught the bedrock-embed-proxy crash-loop immediately.**
Expand All @@ -143,7 +134,7 @@ Threshold: > 0 for 1 period (1 min)
Action: SNS notify
```

### 3.4 Bedrock API Reachable
### 3.3 Bedrock API Reachable

Catches: credential expiry, region issues, service disruptions, model access revoked.

Expand Down Expand Up @@ -172,8 +163,7 @@ Pushes all Tier 3 custom metrics in a single `put-metric-data` call (batched).
**What it checks:**
1. **OpenClaw instances:** `pgrep -f openclaw-gatewa` — OpenClaw gateway process alive
**Hermes instances:** `pgrep -f hermes` — Hermes agent process alive
2. `systemctl is-active bedrockify` + `curl -sf localhost:8090/` — Bedrockify alive + healthy (required for all agents)
3. `systemctl list-units --failed --no-legend | wc -l` — Failed unit count
2. `systemctl list-units --failed --no-legend | grep -v 'systemd-coredump@' | wc -l` — Failed unit count (excludes transient coredump handler units, which linger in `failed` state after handling any crash)
4. `df --output=pcent / | tail -1` — Root disk percent
5. `free | awk '/Mem/ {printf "%.0f", $3/$2*100}'` — Memory percent
6. Quick Bedrock `InvokeModel` with tiny payload (1 embedding, cached model) — API reachable
Expand Down Expand Up @@ -248,7 +238,6 @@ Provides a single-pane view of all alarms, service health, compute resources, ne
"arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-instance-status-check-failed",
"arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-openclaw-down",
"arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-hermes-down",
"arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-bedrockify-down",
"arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-bedrock-unreachable",
"arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-failed-units",
"arn:aws:cloudwatch:us-east-1:ACCOUNT_ID:alarm:loki-cpu-high",
Expand Down Expand Up @@ -284,7 +273,6 @@ Provides a single-pane view of all alarms, service health, compute resources, ne
"properties": {
"title": "⚡ Bedrockify",
"metrics": [
[ "Custom/Loki", "BedrockifyAlive", "InstanceId", "INSTANCE_ID", { "label": "Bedrockify Alive", "color": "#1f77b4" } ]
],
"view": "timeSeries", "stacked": false, "region": "us-east-1",
"period": 60, "stat": "Minimum",
Expand Down
238 changes: 0 additions & 238 deletions bootstraps/essential/BOOTSTRAP-MEMORY-SEARCH.md

This file was deleted.

1 change: 0 additions & 1 deletion deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ These set up security baselines, coding guidelines, MCP tools, memory search, an
| `BOOTSTRAP-SECURITY.md` | Security hardening + AWS Budgets alerts |
| `BOOTSTRAP-SKILLS.md` | Installs AWS infrastructure skills |
| `BOOTSTRAP-MCPORTER.md` | Sets up MCP server tooling |
| `BOOTSTRAP-MEMORY-SEARCH.md` | Enables semantic memory search via Bedrock embeddings |
| `BOOTSTRAP-CODING-GUIDELINES.md` | Coding standards and project conventions |
| `BOOTSTRAP-SECRETS-AWS.md` | AWS Secrets Manager integration |
| `BOOTSTRAP-PLAYWRIGHT.md` | Browser automation via Playwright MCP |
Expand Down