Skip to content

bzdOS/SeMa

Repository files navigation

sema

AI-friendly semantic markup for source code contracts. Methodology + reference validator + per-language profiles. Origin: bsdOS, extracted 2026-06-13.

License: MIT Spec version: 1.0.0 Profiles: 1


What is this?

sema is a methodology for placing structured, natural-language descriptions of what code does and why directly inside source files. The two simultaneous goals:

  • Top-down. The markup gives the LLM a hidden plan before it writes code. When it reads a function contract before the body, it generates more accurate code because it is in the native format of the training distribution.
  • Bottom-up. The markup gives RAG agents stable semantic coordinates for navigating a codebase. No need for vector embeddings to find "the validation function" — the agent finds it by anchor.

The markup does not replace code, does not duplicate code, does not explain code to humans. It is built into the code and exists for the mechanics of LLM work with context.


Why does this work?

Eight independent foundations (full text in spec/sema.md §2):

  1. Sparse attention. LLM attention is non-uniform. Structured markers give stable anchors.
  2. RAG anchors. Markers survive index rebuilds; embeddings don't.
  3. Scope-lock. START_X / END_X markers make scope explicit, prevent "fix the deliberate decision" bugs.
  4. SFT distribution. Contracts before bodies match the (prompt, response) training template.
  5. Structure recovery. Hierarchical markers recover module structure from flat token streams.
  6. Semantic accumulators. Contracts at the top of files accumulate as the agent reads.
  7. Diff-patch anchoring. START_X / END_X survive rebases; JSON } doesn't.
  8. Distillation. Contracts in code distill "why" knowledge into the artifact itself.

These are not best-practices to choose between. They are structural responses to a structural problem (sparse attention, context loss between sessions, regression to mean). They work because the LLM is what it is, not because of a clever trick.


What is in this repository?

sema/
├── spec/
│   └── sema.md            1430 lines, full theory + rule set + golden samples
├── profiles/
│   ├── bsdOS.md                228 lines, operating profile for bsdOS
│   └── README.md               how to write a profile for your project
├── tools/
│   ├── sema-check.sh      71 lines, POSIX sh validator
│   └── README.md               install + CI integration
├── examples/                   golden samples per language
│   ├── rust/
│   └── zig/
├── .github/
│   └── workflows/
│       └── check.yml           CI: run sema-check on PR
├── README.md                   this file
├── CHANGELOG.md
└── LICENSE                     MIT

The 1430-line spec is the source of truth. Profiles adapt it to specific projects (which files to mark, which functions get full vs compressed contracts, naming conventions). The tool validates START_X / END_X anchor pairing.


Quick start

1. Read the spec

# Either
cat spec/sema.md

# Or load it into your agent's context:
# (Claude Code, Cursor, etc. — depends on your tool)

The spec is structured so a human can read it top-to-bottom in 30-40 minutes, or an LLM can read it as rules and apply them.

2. Pick a profile

If you have a profile (e.g., profiles/bsdOS.md), read it. It tells you:

  • which files in YOUR project need contracts
  • which functions get full vs compressed forms
  • naming conventions for anchors
  • the project's tolerance for mass-annotation

If you don't have a profile, follow the spec's "Minimum Contract Fields" (§3) and write a profile as you go.

3. Install the validator

cp tools/sema-check.sh /usr/local/bin/sema-check
chmod +x /usr/local/bin/sema-check

Or invoke it directly:

./tools/sema-check.sh

Add to your CI (see .github/workflows/check.yml for an example).

4. Annotate your code

Annotation-first. Before you write a function, write its contract. The contract says what the function does, what it takes, what it returns, and what side effects it has. Then write the body.

// function_name:start
//   purpose: ...
//   input: ...
//   output: ...
//   sideEffects: ...
fn function_name(...) -> ... { ... }
// function_name:end

Or for trivial helpers:

// CONTRACT: parse → validate → store
fn helper() { ... }

Or for one-liners:

// ensureThreadState: creates thread if missing
fn ensure_thread_state() { ... }

The spec §6 explains when to use which form.


Adapting to your project

The spec is language-agnostic. The bsdOS profile is for one specific Rust + Zig codebase. To use sema in YOUR project:

  1. Read spec/sema.md §6 (operational rules).
  2. Read profiles/README.md (how to write a profile).
  3. Write your profile in profiles/<your-project>.md.
  4. Add golden samples in examples/<your-language>/ from your codebase.
  5. Run the validator in CI.

This repository accepts profiles via PR. See CONTRIBUTING.md for the contribution flow.


Origin

This methodology was developed as part of the bsdOS project (privacy-first mobile OS on FreeBSD 15.1, June 2026). The bsdOS profile (profiles/bsdOS.md) is the original and most-tested application. After 100% of bsdOS source files (97/97) reached full contract coverage in June 2026, the methodology was extracted to a standalone repository.

Citation: see CITATION.cff (TODO: add before v1.0 publish).


License

MIT. See LICENSE.


See also

About

AI-friedly semantic markup for source code contracts

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages