Openarc studio endpoints and refactor of the endpoints#91
Openarc studio endpoints and refactor of the endpoints#91mrelmida wants to merge 7 commits intoSearchSavior:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds OpenArc Studio–oriented management endpoints alongside an OpenAI-compatible /v1 API by refactoring server routes into dedicated FastAPI routers, and introduces optional hardware telemetry + model downloading support (including an Intel GPU metrics extension).
Changes:
- Refactors API handlers out of
src/server/main.pyintosrc/server/routes/openai.pyandsrc/server/routes/openarc.py, wired viainclude_router. - Adds OpenArc management endpoints for metrics, local model config listing/updating, and Hugging Face model download lifecycle (start/pause/resume/cancel/list).
- Introduces an (optional)
gpu_metricspybind11/Level Zero Sysman extension and bumps project version/dependencies.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| test_gpu_metrics.py | Adds a GPU metrics “test” script (currently structured in a way that can break pytest runs). |
| src/server/routes/openarc.py | New /openarc/* endpoints for load/unload/status, metrics, local model config management, downloader, and bench. |
| src/server/routes/openai.py | New /v1/* OpenAI-compatible endpoints (models/chat/completions/audio/embeddings/rerank) + tool-call parsing helpers. |
| src/server/routes/init.py | Routes package init (empty). |
| src/server/models/requests_management.py | Adds Pydantic request models for downloader actions. |
| src/server/main.py | Refactors server to middleware + lifespan + router inclusion; removes inline endpoint implementations. |
| src/server/downloader.py | Adds async downloader service with pause/resume/cancel and filesystem path validation. |
| src/server/deps.py | Centralizes shared registries (_registry, _workers) and API-key verification dependency. |
| src/cli/modules/launch_server.py | Updates logged endpoint list to include new OpenArc management routes. |
| pyproject.toml | Version bump to 2.0.4 and adds pybind11 dependency. |
| gpu-metrics/setup.py | Adds setuptools build for the gpu_metrics extension (contains a small typo). |
| gpu-metrics/pyproject.toml | Adds build metadata for the gpu_metrics subproject. |
| gpu-metrics/gpu_metrics.cpp | Implements Intel GPU telemetry via Level Zero Sysman and exposes get_gpu_metrics() via pybind11. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if path: | ||
| target_path = Path(path) | ||
| else: | ||
| target_path = Path.home() / ".cache" / "openarc" / "models" |
There was a problem hiding this comment.
The path query parameter allows enumerating arbitrary directories on the server filesystem. This can unintentionally disclose directory structure and any openarc.json contents outside the models cache. Consider either removing the override, or restricting it to subdirectories of the models base path (and/or adding explicit allowlist checks).
| if path: | |
| target_path = Path(path) | |
| else: | |
| target_path = Path.home() / ".cache" / "openarc" / "models" | |
| base_models_path = (Path.home() / ".cache" / "openarc" / "models").resolve() | |
| if path: | |
| try: | |
| requested_path = Path(path).expanduser().resolve() | |
| except Exception as exc: | |
| raise HTTPException(status_code=400, detail=f"Invalid path: {str(exc)}") | |
| if requested_path != base_models_path and base_models_path not in requested_path.parents: | |
| raise HTTPException( | |
| status_code=400, | |
| detail="Path must be within the OpenArc models directory", | |
| ) | |
| target_path = requested_path | |
| else: | |
| target_path = base_models_path |
| except ImportError: | ||
| logging.warning( | ||
| "gpu_metrics module not found. Intel GPU telemetry will be missing." | ||
| ) |
There was a problem hiding this comment.
When gpu_metrics isn’t installed, this warning will be emitted on every /openarc/metrics poll, which can spam logs in normal UI usage. Consider logging this only once (e.g., with a module-level flag), or downgrade to debug and include telemetry availability in the response instead.
| import traceback | ||
| traceback.print_exc() | ||
| raise HTTPException(status_code=500, detail=f"Embedding failed: {str(exc)}") |
There was a problem hiding this comment.
These endpoints call traceback.print_exc() in the exception handler, which writes to stderr and can duplicate/garble structured logging in production (and may leak details outside the configured logger). Prefer logger.error(..., exc_info=True) and rely on the app’s exception handling/logging instead of printing directly.
| import traceback | ||
| traceback.print_exc() |
There was a problem hiding this comment.
traceback.print_exc() here writes directly to stderr; prefer logging with exc_info=True so stack traces go through the configured logger and don’t bypass log formatting/filters.
| import traceback | |
| traceback.print_exc() | |
| logger.exception("Reranking failed") |
| logger.info( | ||
| f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}" |
There was a problem hiding this comment.
The benchmark logger call uses len(request.input_ids) unconditionally. If the request uses prompt mode (input_ids is None), this will raise a TypeError and turn a valid bench request into a 500. Log input_ids_len only when input_ids is present, otherwise log prompt_len (or guard with or []).
| logger.info( | |
| f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}" | |
| if request.input_ids is not None and len(request.input_ids) > 0: | |
| prompt_info = f"input_ids_len={len(request.input_ids)}" | |
| else: | |
| prompt_info = f"prompt_len={len(request.prompt or '')}" | |
| logger.info( | |
| f"[bench] model={request.model} {prompt_info} metrics={metrics}" |
| or "<tool_" in accumulated_text | ||
| ) | ||
|
|
||
| tool_calls = parse_tool_calls(accumulated_text) |
There was a problem hiding this comment.
In the streaming loop, parse_tool_calls(accumulated_text) is called on every token/chunk even before a tool call has started. Since it scans the entire accumulated output each time, this becomes O(n^2) for long generations and can noticeably increase CPU usage. You already track tool_call_started; consider only calling parse_tool_calls after tool_call_started becomes true (or incrementally parse).
| tool_calls = parse_tool_calls(accumulated_text) | |
| tool_calls = ( | |
| parse_tool_calls(accumulated_text) | |
| if tool_call_started | |
| else [] | |
| ) |
| target_path = Path(req.model_path) | ||
| if not target_path.exists() or not target_path.is_dir(): | ||
| raise HTTPException(status_code=404, detail="Model directory not found") | ||
|
|
||
| config_path = target_path / "openarc.json" | ||
|
|
There was a problem hiding this comment.
update_local_model_config allows writing openarc.json into any existing directory path provided by the client (e.g., /etc, other users’ dirs) as long as the server process has permissions. Even with API-key auth this is an unnecessary filesystem write primitive; restrict model_path to a known base directory (e.g., the models cache dir), and consider rejecting symlinks / using a safe-join check.
| def pause(self, model_name: str) -> bool: | ||
| """Pause an active download. The worker thread blocks between files.""" | ||
| task = self.tasks.get(model_name) | ||
| if not task or task.status != "downloading": | ||
| return False | ||
| task.status = "paused" | ||
| task._pause.clear() | ||
| return True | ||
|
|
||
| def resume(self, model_name: str) -> bool: | ||
| """Resume a paused download.""" | ||
| task = self.tasks.get(model_name) | ||
| if not task or task.status != "paused": | ||
| return False | ||
| task.status = "downloading" | ||
| task._pause.set() | ||
| return True | ||
|
|
||
| def cancel(self, model_name: str) -> bool: | ||
| """Cancel an active or paused download.""" | ||
| task = self.tasks.get(model_name) | ||
| if not task or task.status in ("completed", "cancelled"): | ||
| return False | ||
| task.status = "cancelled" | ||
| task.completed_at = time.time() | ||
| task._cancel.set() | ||
| task._pause.set() # Unblock thread so it can exit | ||
| if task.task: | ||
| task.task.cancel() | ||
| return True | ||
|
|
||
| def list_tasks(self) -> List[Dict]: | ||
| self._cleanup_stale() | ||
| return [ | ||
| { | ||
| "model_name": t.model_name, | ||
| "status": t.status, | ||
| "progress": round(t.progress, 2), | ||
| "total_bytes": t.total_bytes, | ||
| "downloaded_bytes": t.downloaded_bytes, | ||
| "total_files": t.total_files, | ||
| "completed_files": t.completed_files, | ||
| "error": t.error_msg, | ||
| "started_at": t.started_at, | ||
| "completed_at": t.completed_at, | ||
| } | ||
| for t in self.tasks.values() | ||
| ] | ||
|
|
There was a problem hiding this comment.
pause, resume, cancel, and list_tasks mutate/iterate self.tasks without any synchronization, while start() uses an asyncio lock. Concurrent requests can mutate the dict during iteration in list_tasks() (raising RuntimeError: dictionary changed size during iteration) or cause inconsistent status updates. Protect all access to self.tasks with a single lock (async or threading) and use it consistently in every method.
| app.include_router(openarc_router) | ||
| app.include_router(openai_router) |
There was a problem hiding this comment.
src.server.main previously exposed parse_tool_calls and endpoint functions (e.g., openai_chat_completions) that are still imported by existing tests/utilities (e.g., src/tests/test_tool_call_parser_unit.py). With the move to routers, those symbols are no longer available from main, which will break the test suite and any internal imports. Either update the tests to import from src.server.routes.openai, or re-export the moved callables from main for backward compatibility.
| raise HTTPException( | ||
| status_code=404, detail=f"Model '{unload_config.model_name}' not found" | ||
| ) | ||
| except Exception as exc: | ||
| raise HTTPException( | ||
| status_code=500, detail=f"Failed to unload model: {str(exc)}" | ||
| ) |
There was a problem hiding this comment.
unload_model raises an HTTPException(404, ...) when the model isn’t found, but the broad except Exception catches that HTTPException and converts it into a 500. Catch and re-raise HTTPException (or narrow the exception handler) so 404 responses aren’t masked as internal errors.
This is the semi final draft of all endpoints. We can obviously rework on some elements, but as far as I tested all features, it works.
Things I think needs to be done for OpenArc Studio