Remove memory-management bootstrap + fix alarm false positives#61
Conversation
Shows what's collected (OS/arch/duration only), how to opt out (touch ~/.lowkey/telemetry-off or LOWKEY_TELEMETRY=0), and links to full privacy details.
- Delete BOOTSTRAP-MEMORY-SEARCH.md (bedrockify-backed semantic memory search) - Drop BedrockifyAlive/loki-bedrockify-down references from BOOTSTRAP-ALARMS (metric was never published by the health-check script -> alarm stuck in INSUFFICIENT_DATA / flapped, pure false positive) - Renumber tier-3 sections after removal - Add explicit 'rebind on instance replacement' warning: custom alarms are scoped to InstanceId dimension; when the EC2 box is replaced the alarms must be redeployed against the new id or they sit in INSUFFICIENT_DATA forever. Recommend TreatMissingData=missing to avoid spurious paging. Live fix applied alongside this commit: - Rebound 9 loki-* alarms from i-04d527003d0a094ff (old) to i-0229529f514ef6fd7 (current) with TreatMissingData=missing. - Deleted obsolete loki-bedrockify-down alarm.
…its count systemd-coredump@<uid>-<pid>-<n>.service units are one-shot transient units that systemd spawns to handle a coredump and then leaves in 'failed' state after exit. They are not real service failures but they inflate the FailedUnits metric and cause loki-failed-units to fire on any box that has recently dumped a core. Patch the documented health-check command to grep them out. (Live fix also applied to /usr/local/bin/loki-health-check.sh on the current instance.)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 124c966868
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -1,238 +0,0 @@ | |||
| # BOOTSTRAP-MEMORY-SEARCH.md — Enable Semantic Memory Search with Bedrock Embeddings | |||
There was a problem hiding this comment.
Remove dangling memory-search bootstrap references
Deleting this bootstrap file leaves onboarding docs pointing users to a file that no longer exists: README.md still lists BOOTSTRAP-MEMORY-SEARCH as an essential bootstrap (line 151), so first-time setup flows now direct people to run a missing step. Please either keep a replacement bootstrap at this path or update the remaining docs that reference it.
Useful? React with 👍 / 👎.
Changes
Removed
bootstraps/essential/BOOTSTRAP-MEMORY-SEARCH.md— memory-management (semantic search via bedrockify embeddings) bootstrapBedrockifyAlive/loki-bedrockify-downalarm references — metric was never actually published by the health-check script, so the alarm sat inINSUFFICIENT_DATA/ flapped (pure false positive)Added
InstanceIddimension; when the EC2 box is replaced, alarms must be redeployed against the new id or they stay in INSUFFICIENT_DATA forever.TreatMissingData=missingon custom alarms.Also included (the prior commit on this branch)
docs: add telemetry opt-out to Step 1 install sectionLive fix already applied to the management account
The root cause of the alarm false-positives Roy saw: all 9
loki-*custom alarms were still scoped to old instance idi-04d527003d0a094ff, which was replaced on May 4 ~22:30 UTC. Metrics for that id went cold → INSUFFICIENT_DATA forever. The current instance isi-0229529f514ef6fd7and the health-check script has been publishing correctly for it all along — nothing was watching the live metrics.Already done live (non-IaC, to stop the noise immediately):
loki-*alarms toi-0229529f514ef6fd7withTreatMissingData=missingloki-bedrockify-downalarmThis PR brings the bootstrap docs in line with the live state.