Skip to content

Openarc studio endpoints and refactor of the endpoints#91

Open
mrelmida wants to merge 7 commits intoSearchSavior:mainfrom
mrelmida:openarc-studio-endpoints
Open

Openarc studio endpoints and refactor of the endpoints#91
mrelmida wants to merge 7 commits intoSearchSavior:mainfrom
mrelmida:openarc-studio-endpoints

Conversation

@mrelmida
Copy link
Copy Markdown

This is the semi final draft of all endpoints. We can obviously rework on some elements, but as far as I tested all features, it works.

Things I think needs to be done for OpenArc Studio

  • Finding a way to stop spamming the logs with the periodical requests of information endpoints.
  • implementing a complete conf/yaml/json file for each model with saved information regarding what settings etc etc. - so you only need to provide a name and maybe other custom informations, but you dont need to set engine type and other "static" values.

Copilot AI review requested due to automatic review settings April 25, 2026 14:01
@mrelmida mrelmida changed the title Openarc studio endpoints Openarc studio endpoints and refactor of the endpoints Apr 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenArc Studio–oriented management endpoints alongside an OpenAI-compatible /v1 API by refactoring server routes into dedicated FastAPI routers, and introduces optional hardware telemetry + model downloading support (including an Intel GPU metrics extension).

Changes:

  • Refactors API handlers out of src/server/main.py into src/server/routes/openai.py and src/server/routes/openarc.py, wired via include_router.
  • Adds OpenArc management endpoints for metrics, local model config listing/updating, and Hugging Face model download lifecycle (start/pause/resume/cancel/list).
  • Introduces an (optional) gpu_metrics pybind11/Level Zero Sysman extension and bumps project version/dependencies.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
test_gpu_metrics.py Adds a GPU metrics “test” script (currently structured in a way that can break pytest runs).
src/server/routes/openarc.py New /openarc/* endpoints for load/unload/status, metrics, local model config management, downloader, and bench.
src/server/routes/openai.py New /v1/* OpenAI-compatible endpoints (models/chat/completions/audio/embeddings/rerank) + tool-call parsing helpers.
src/server/routes/init.py Routes package init (empty).
src/server/models/requests_management.py Adds Pydantic request models for downloader actions.
src/server/main.py Refactors server to middleware + lifespan + router inclusion; removes inline endpoint implementations.
src/server/downloader.py Adds async downloader service with pause/resume/cancel and filesystem path validation.
src/server/deps.py Centralizes shared registries (_registry, _workers) and API-key verification dependency.
src/cli/modules/launch_server.py Updates logged endpoint list to include new OpenArc management routes.
pyproject.toml Version bump to 2.0.4 and adds pybind11 dependency.
gpu-metrics/setup.py Adds setuptools build for the gpu_metrics extension (contains a small typo).
gpu-metrics/pyproject.toml Adds build metadata for the gpu_metrics subproject.
gpu-metrics/gpu_metrics.cpp Implements Intel GPU telemetry via Level Zero Sysman and exposes get_gpu_metrics() via pybind11.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +96 to +99
if path:
target_path = Path(path)
else:
target_path = Path.home() / ".cache" / "openarc" / "models"
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path query parameter allows enumerating arbitrary directories on the server filesystem. This can unintentionally disclose directory structure and any openarc.json contents outside the models cache. Consider either removing the override, or restricting it to subdirectories of the models base path (and/or adding explicit allowlist checks).

Suggested change
if path:
target_path = Path(path)
else:
target_path = Path.home() / ".cache" / "openarc" / "models"
base_models_path = (Path.home() / ".cache" / "openarc" / "models").resolve()
if path:
try:
requested_path = Path(path).expanduser().resolve()
except Exception as exc:
raise HTTPException(status_code=400, detail=f"Invalid path: {str(exc)}")
if requested_path != base_models_path and base_models_path not in requested_path.parents:
raise HTTPException(
status_code=400,
detail="Path must be within the OpenArc models directory",
)
target_path = requested_path
else:
target_path = base_models_path

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +214
except ImportError:
logging.warning(
"gpu_metrics module not found. Intel GPU telemetry will be missing."
)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When gpu_metrics isn’t installed, this warning will be emitted on every /openarc/metrics poll, which can spam logs in normal UI usage. Consider logging this only once (e.g., with a module-level flag), or downgrade to debug and include telemetry availability in the response instead.

Copilot uses AI. Check for mistakes.
Comment on lines +668 to +670
import traceback
traceback.print_exc()
raise HTTPException(status_code=500, detail=f"Embedding failed: {str(exc)}")
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These endpoints call traceback.print_exc() in the exception handler, which writes to stderr and can duplicate/garble structured logging in production (and may leak details outside the configured logger). Prefer logger.error(..., exc_info=True) and rely on the app’s exception handling/logging instead of printing directly.

Copilot uses AI. Check for mistakes.
Comment on lines +717 to +718
import traceback
traceback.print_exc()
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traceback.print_exc() here writes directly to stderr; prefer logging with exc_info=True so stack traces go through the configured logger and don’t bypass log formatting/filters.

Suggested change
import traceback
traceback.print_exc()
logger.exception("Reranking failed")

Copilot uses AI. Check for mistakes.
Comment on lines +323 to +324
logger.info(
f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}"
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark logger call uses len(request.input_ids) unconditionally. If the request uses prompt mode (input_ids is None), this will raise a TypeError and turn a valid bench request into a 500. Log input_ids_len only when input_ids is present, otherwise log prompt_len (or guard with or []).

Suggested change
logger.info(
f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}"
if request.input_ids is not None and len(request.input_ids) > 0:
prompt_info = f"input_ids_len={len(request.input_ids)}"
else:
prompt_info = f"prompt_len={len(request.prompt or '')}"
logger.info(
f"[bench] model={request.model} {prompt_info} metrics={metrics}"

Copilot uses AI. Check for mistakes.
or "<tool_" in accumulated_text
)

tool_calls = parse_tool_calls(accumulated_text)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the streaming loop, parse_tool_calls(accumulated_text) is called on every token/chunk even before a tool call has started. Since it scans the entire accumulated output each time, this becomes O(n^2) for long generations and can noticeably increase CPU usage. You already track tool_call_started; consider only calling parse_tool_calls after tool_call_started becomes true (or incrementally parse).

Suggested change
tool_calls = parse_tool_calls(accumulated_text)
tool_calls = (
parse_tool_calls(accumulated_text)
if tool_call_started
else []
)

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +75
target_path = Path(req.model_path)
if not target_path.exists() or not target_path.is_dir():
raise HTTPException(status_code=404, detail="Model directory not found")

config_path = target_path / "openarc.json"

Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_local_model_config allows writing openarc.json into any existing directory path provided by the client (e.g., /etc, other users’ dirs) as long as the server process has permissions. Even with API-key auth this is an unnecessary filesystem write primitive; restrict model_path to a known base directory (e.g., the models cache dir), and consider rejecting symlinks / using a safe-join check.

Copilot uses AI. Check for mistakes.
Comment thread src/server/downloader.py
Comment on lines +84 to +132
def pause(self, model_name: str) -> bool:
"""Pause an active download. The worker thread blocks between files."""
task = self.tasks.get(model_name)
if not task or task.status != "downloading":
return False
task.status = "paused"
task._pause.clear()
return True

def resume(self, model_name: str) -> bool:
"""Resume a paused download."""
task = self.tasks.get(model_name)
if not task or task.status != "paused":
return False
task.status = "downloading"
task._pause.set()
return True

def cancel(self, model_name: str) -> bool:
"""Cancel an active or paused download."""
task = self.tasks.get(model_name)
if not task or task.status in ("completed", "cancelled"):
return False
task.status = "cancelled"
task.completed_at = time.time()
task._cancel.set()
task._pause.set() # Unblock thread so it can exit
if task.task:
task.task.cancel()
return True

def list_tasks(self) -> List[Dict]:
self._cleanup_stale()
return [
{
"model_name": t.model_name,
"status": t.status,
"progress": round(t.progress, 2),
"total_bytes": t.total_bytes,
"downloaded_bytes": t.downloaded_bytes,
"total_files": t.total_files,
"completed_files": t.completed_files,
"error": t.error_msg,
"started_at": t.started_at,
"completed_at": t.completed_at,
}
for t in self.tasks.values()
]

Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pause, resume, cancel, and list_tasks mutate/iterate self.tasks without any synchronization, while start() uses an asyncio lock. Concurrent requests can mutate the dict during iteration in list_tasks() (raising RuntimeError: dictionary changed size during iteration) or cause inconsistent status updates. Protect all access to self.tasks with a single lock (async or threading) and use it consistently in every method.

Copilot uses AI. Check for mistakes.
Comment thread src/server/main.py
Comment on lines +106 to +107
app.include_router(openarc_router)
app.include_router(openai_router)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src.server.main previously exposed parse_tool_calls and endpoint functions (e.g., openai_chat_completions) that are still imported by existing tests/utilities (e.g., src/tests/test_tool_call_parser_unit.py). With the move to routers, those symbols are no longer available from main, which will break the test suite and any internal imports. Either update the tests to import from src.server.routes.openai, or re-export the moved callables from main for backward compatibility.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +55
raise HTTPException(
status_code=404, detail=f"Model '{unload_config.model_name}' not found"
)
except Exception as exc:
raise HTTPException(
status_code=500, detail=f"Failed to unload model: {str(exc)}"
)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unload_model raises an HTTPException(404, ...) when the model isn’t found, but the broad except Exception catches that HTTPException and converts it into a 500. Catch and re-raise HTTPException (or narrow the exception handler) so 404 responses aren’t masked as internal errors.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants