Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 76 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,9 @@ aicodinggym configure --user-id USER_ID [--workspace-dir DIR]
|---|---|---|
| `--user-id` | Yes | Your AI Coding Gym user ID |
| `--workspace-dir` | No | Default workspace directory (default: cwd) |
| `--upload-logs` / `--no-upload-logs` | No | Pre-set consent for uploading de-identified AI session logs for research. If unset, you're asked once on your first submit. |

On first run, if the [Entire](https://entire.io) CLI isn't installed, `configure` offers to install it (used for [AI workflow logging](#ai-workflow-logging-entire)).

---

Expand All @@ -72,6 +75,8 @@ aicodinggym swe submit PROBLEM_ID [--message MSG] [--force] [--user-id ID] [--wo
|---|---|
| `--message, -m` | Commit message (auto-generated if omitted) |
| `--force` | Force push with `--force-with-lease` |
| `--upload-logs` / `--no-upload-logs` | Override consent for uploading de-identified AI session logs |
| `--logs-remote` | Git URL to push AI session logs to (defaults to this problem's repo) |

#### `aicodinggym swe test PROBLEM_ID`

Expand Down Expand Up @@ -137,6 +142,8 @@ aicodinggym mle submit COMPETITION_ID -F FILE [--user-id ID] [--message MSG]
|---|---|---|
| `-F` | Yes | Path to prediction CSV file |
| `--message, -m` | No | Submission description |
| `--upload-logs` / `--no-upload-logs` | No | Override consent for uploading de-identified AI session logs |
| `--logs-remote` | No | Git URL to push AI session logs to (defaults to your submission repo) |

---

Expand Down Expand Up @@ -169,6 +176,65 @@ echo "My review" | aicodinggym cr submit PROBLEM_ID
| `-f, --file` | Path to a file containing your review (e.g. `review.md`) |
| `-m, --message` | Inline review text |
| stdin | Pipe review text from stdin |
| `--upload-logs` / `--no-upload-logs` | Override consent for uploading de-identified AI session logs |
| `--logs-remote` | Git URL to push AI session logs to (defaults to your submission repo) |

---

## AI Workflow Logging (Entire)

AI Coding Gym can optionally capture **how** a solution was produced — the AI
agent prompts, responses, tool calls and files touched — for research. This is
powered by the [Entire](https://entire.io) CLI, which hooks into git and stores
sessions on a separate `entire/checkpoints/v1` branch (it never adds commits to
your working branch).

**Consent first.** Capture is **local only** until you opt in. The first time
you `submit` (unless already configured), you're asked once:

> AI Coding Gym can upload this AI coding session (prompts, responses, files
> changed) for research only. Data is de-identified/anonymized before use.

Your choice is saved in `~/.aicodinggym/config.json`. Set it ahead of time with
`aicodinggym configure --upload-logs` / `--no-upload-logs`, or override per
submit with `--upload-logs` / `--no-upload-logs`. In non-interactive sessions
with no recorded choice, nothing is uploaded.

**One repo, many branches.** All three benchmarks log to your **single** repo
(the writable per-user repo — one repo, many branches), so a log is identified
by its branch name even though the repo holds many problems. SWE `fetch` records
that repo URL (`submission_repo_url`) so CR/MLE reuse it; the cloned CR PR repo
is read-only and is never used as a target. Override with `--logs-remote` or the
`AICODINGGYM_LOGS_REMOTE` env var.

**Nothing is overwritten.** Every submission gets its own unique branch
(`…/<submission-id>`, a UTC timestamp + random suffix). Re-submitting the same
problem — or submitting it from a different directory or machine — adds a new
branch and never deletes or clobbers a previous one.

**How it works**

1. `swe fetch` / `cr fetch` / `mle download` installs Entire's hooks so your
session is captured locally as you work. (MLE workspaces aren't git repos, so
a lightweight one is initialised — this also lets your solution code be
pushed on submit. The dataset under `data/` plus common model/checkpoint/cache
artifacts are gitignored.)
If the `entire` CLI isn't installed, you're pointed at `aicodinggym configure`
(which offers to install it) instead of being silently skipped.
2. On `submit`, after consent, the captured `entire/checkpoints/v1` branch is
pushed to your repo on a unique per-submission branch
`aicodinggym-logs/<benchmark>/<problem_id>/<submission-id>`, with an
`aicodinggym-meta.json` file at the tip (problem id, benchmark, user, tool,
submission id, timestamp).
3. **MLE also pushes your solution code** (notebooks/scripts/CSV; dataset and
heavy artifacts excluded) to `<competition_id>/<submission-id>`, gated by the
same consent. The two share a submission id so code and logs correlate. The
prediction CSV still goes to the scoring API as before.

**Privacy.** Uploaded data is used solely for research and is
de-identified/anonymized before use. Entire also redacts detected secrets when
writing to the checkpoints branch (best-effort). Logging is entirely optional —
if the `entire` binary isn't installed, fetch/submit work unchanged.

---

Expand All @@ -181,14 +247,23 @@ aicodinggym-cli/
├── config.py # Config + credentials persistence (~/.aicodinggym/)
├── api.py # HTTP client for aicodinggym.com/api
├── git_ops.py # SSH key generation, git clone/commit/push/reset
├── entire_logging.py # Optional AI-session capture/upload via the Entire CLI
├── tests/ # pytest suite (logging, consent, remote resolution)
└── pyproject.toml # Package metadata and build config
```

## Development

```bash
pip install -e ".[dev]" # installs pytest
pytest # run the test suite
```

## Configuration Files

| File | Purpose |
|---|---|
| `~/.aicodinggym/config.json` | Global config (user_id, repo_name, key path, workspace) |
| `~/.aicodinggym/config.json` | Global config (user_id, repo_name, key path, workspace, log-upload consent, submission repo URL) |
| `~/.aicodinggym/credentials.json` | Per-problem credentials (repo_url, branch, cached after fetch) |
| `~/.aicodinggym/{user_id}_id_rsa` | SSH private key |
| `~/.aicodinggym/{user_id}_id_rsa.pub` | SSH public key |
Expand Down
2 changes: 1 addition & 1 deletion __init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""AI Coding Gym CLI."""

__version__ = "0.3.0"
__version__ = "0.6.0"
14 changes: 9 additions & 5 deletions api.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,16 @@ def cr_submit_review(user_id: str, problem_id: str, review: str) -> dict:
})


def mlebench_download_info(user_id: str, competition_id: str, dest_path: str) -> None:
"""Download dataset for an MLE-bench competition directly to dest_path."""
def mlebench_download_open(competition_id: str, chunk_size: int = 1 << 16):
"""Open a streaming download for an MLE-bench dataset.

Returns (total_bytes, chunk_iterator). ``total_bytes`` is the server's
Content-Length, or 0 if it isn't advertised. Lets the caller drive a
progress bar while writing chunks to disk.
"""
resp = _get(f"competitions/{competition_id}/download", stream=True)
with open(dest_path, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
total = int(resp.headers.get("Content-Length") or 0)
return total, resp.iter_content(chunk_size=chunk_size)


def mlebench_download_file(url: str, dest_path: str, timeout: int = 300) -> None:
Expand Down
Loading