A full ML‑lifecycle platform built as 10 Go microservices — a hands‑on study of production microservices patterns (saga, CQRS, event sourcing, outbox, circuit breaker, choreography, streaming drift detection) on Kubernetes, with gRPC for sync and NATS JetStream for async. The lifecycle runs as a closed loop (serve → monitor → retrain) and ships via GitOps.
Domain as a vehicle. Forgepoint models an MLOps platform (train → register → deploy → serve → monitor → retrain), but the ML parts are deliberately thin (pre‑trained CPU ONNX models, no GPU). The real subject is distributed‑systems engineering: one well‑known pattern implemented properly per service, instrumented and deployed the way you would in production.
External Clients → API Gateway → Services (gRPC) ⇄ NATS JetStream (events) → Infrastructure
│
Postgres (per service) · Redis · MinIO · OTel → Prometheus/Tempo/Loki
Each service follows Clean Architecture (handler → domain → repository) with a framework‑free domain layer, owns its own database, and communicates via gRPC (sync) and NATS JetStream event envelopes (async).
| Service | Pattern | Data store |
|---|---|---|
| Auth / IAM | Centralized auth, JWT, RBAC | PostgreSQL |
| Model Registry | CQRS (write Postgres, read Redis) | PostgreSQL + Redis |
| Inference Gateway | Circuit breaker, rate limiting, traffic splitting | Redis |
| Pipeline Orchestrator | Saga (orchestration + compensation), DAG execution | PostgreSQL |
| Feature Store | Event sourcing (append‑only log, materialized views) | PostgreSQL + Redis |
| Experiment Tracker | Event‑driven async batch ingestion | PostgreSQL |
| Billing / Usage | Outbox pattern (reliable event publishing) | PostgreSQL |
| Notification | Choreography (pure event reactor) | PostgreSQL |
| Model Serving | Sidecar, HPA on custom metrics | In‑memory (ONNX) |
| Model Monitor | Streaming drift detection + closed‑loop retrain trigger | PostgreSQL + Redis |
A Backend‑for‑Frontend (BFF) + web UI and an fp CLI sit in front of these; the platform is delivered via GitOps (ArgoCD), secured with policy-as-code + a signed supply chain, and autoscaled with KEDA. See the ADRs — BFF, GitOps, closed-loop Model Monitor.
This is an actively‑built project. The shared foundation (pkg/) is complete and tested; the services are being built one at a time (depth over breadth).
| Area | State |
|---|---|
pkg/ shared libraries |
✅ Built & tested (build · vet · -race) |
| Local infra (docker‑compose) | ✅ NATS, Postgres, Redis, MinIO, Prometheus, Tempo, Grafana, Loki |
| Proto tooling (Buf) | ✅ Lint + codegen wired |
| Services (auth → …) | 🚧 In progress, per the implementation plan |
See the implementation plan for the phased roadmap (Core vs Stretch).
Reusable building blocks every service imports — designed to be read and explained, not just used:
grpcutil— server factory with the full interceptor chain (recovery → logging → auth) for both unary and streaming, OpenTelemetry tracing viaotelgrpc, graceful shutdown, and readiness‑aware health.natsutil— JetStream publisher/subscriber with standard event envelopes, DLQ, consumer‑side idempotency, and trace/correlation propagation across the async hop.observability— single‑call OpenTelemetry setup (traces + metrics + trace‑correlated logs), OTLP exporters that fall back to stdout for local dev.config— zero‑dependency env‑var loader (struct tags, defaults,required, durations, slices).health—/healthz+/readyzwith concurrent, timeout‑bounded dependency checks.testutil— testcontainers helpers + in‑process (bufconn) gRPC test server.
Go · gRPC + Buf · NATS JetStream · PostgreSQL · Redis · MinIO · Docker (distroless) · Kubernetes (Kind → EKS) · Helm · Skaffold · OpenTelemetry → Prometheus / Grafana / Loki / Tempo · Istio · Terraform · GitHub Actions · k6 · testcontainers.
Prerequisites: Go 1.26+, Docker, make.
# Start local infrastructure (NATS, Postgres, Redis, MinIO, monitoring)
make up
# Run the shared-library tests (race detector on)
cd pkg && go test -race ./...
# Tear down
make downCommon targets (see the Makefile): make proto, make build SVC=<svc>, make test, make lint, make up / make down.
proto/ Source of truth for all APIs (Buf)
gen/go/ Generated proto code (do not edit)
pkg/ Shared libraries (built & tested)
services/ The 10 microservices (in progress)
deploy/ Helm, K8s manifests, Terraform, Skaffold
docs/
plans/ Platform design + phased implementation plan
adr/ Architecture Decision Records
docker-compose.yaml Local infrastructure
- Platform design — architecture, services, communication, infra.
- Implementation plan — phased roadmap with Core/Stretch tiers.
- Architecture Decision Records — non‑obvious decisions and their tradeoffs.
MIT © 2026 Abdul Basit Sajid