Openarc studio endpoints and refactor of the endpoints by mrelmida · Pull Request #91 · SearchSavior/OpenArc

mrelmida · 2026-04-25T14:01:24Z

This is the semi final draft of all endpoints. We can obviously rework on some elements, but as far as I tested all features, it works.

Things I think needs to be done for OpenArc Studio

Finding a way to stop spamming the logs with the periodical requests of information endpoints.
implementing a complete conf/yaml/json file for each model with saved information regarding what settings etc etc. - so you only need to provide a name and maybe other custom informations, but you dont need to set engine type and other "static" values.

Copilot

Pull request overview

Adds OpenArc Studio–oriented management endpoints alongside an OpenAI-compatible /v1 API by refactoring server routes into dedicated FastAPI routers, and introduces optional hardware telemetry + model downloading support (including an Intel GPU metrics extension).

Changes:

Refactors API handlers out of src/server/main.py into src/server/routes/openai.py and src/server/routes/openarc.py, wired via include_router.
Adds OpenArc management endpoints for metrics, local model config listing/updating, and Hugging Face model download lifecycle (start/pause/resume/cancel/list).
Introduces an (optional) gpu_metrics pybind11/Level Zero Sysman extension and bumps project version/dependencies.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 15 comments.

Show a summary per file

File	Description
test_gpu_metrics.py	Adds a GPU metrics “test” script (currently structured in a way that can break pytest runs).
src/server/routes/openarc.py	New `/openarc/*` endpoints for load/unload/status, metrics, local model config management, downloader, and bench.
src/server/routes/openai.py	New `/v1/*` OpenAI-compatible endpoints (models/chat/completions/audio/embeddings/rerank) + tool-call parsing helpers.
src/server/routes/init.py	Routes package init (empty).
src/server/models/requests_management.py	Adds Pydantic request models for downloader actions.
src/server/main.py	Refactors server to middleware + lifespan + router inclusion; removes inline endpoint implementations.
src/server/downloader.py	Adds async downloader service with pause/resume/cancel and filesystem path validation.
src/server/deps.py	Centralizes shared registries (`_registry`, `_workers`) and API-key verification dependency.
src/cli/modules/launch_server.py	Updates logged endpoint list to include new OpenArc management routes.
pyproject.toml	Version bump to 2.0.4 and adds `pybind11` dependency.
gpu-metrics/setup.py	Adds setuptools build for the `gpu_metrics` extension (contains a small typo).
gpu-metrics/pyproject.toml	Adds build metadata for the `gpu_metrics` subproject.
gpu-metrics/gpu_metrics.cpp	Implements Intel GPU telemetry via Level Zero Sysman and exposes `get_gpu_metrics()` via pybind11.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-25T14:06:42Z

+    if path:
+        target_path = Path(path)
+    else:
+        target_path = Path.home() / ".cache" / "openarc" / "models"


The path query parameter allows enumerating arbitrary directories on the server filesystem. This can unintentionally disclose directory structure and any openarc.json contents outside the models cache. Consider either removing the override, or restricting it to subdirectories of the models base path (and/or adding explicit allowlist checks).

Suggested change

if path:

target_path = Path(path)

else:

target_path = Path.home() / ".cache" / "openarc" / "models"

base_models_path = (Path.home() / ".cache" / "openarc" / "models").resolve()

if path:

try:

requested_path = Path(path).expanduser().resolve()

except Exception as exc:

raise HTTPException(status_code=400, detail=f"Invalid path: {str(exc)}")

if requested_path != base_models_path and base_models_path not in requested_path.parents:

raise HTTPException(

status_code=400,

detail="Path must be within the OpenArc models directory",

)

target_path = requested_path

else:

target_path = base_models_path

Copilot · 2026-04-25T14:06:42Z

+    except ImportError:
+        logging.warning(
+            "gpu_metrics module not found. Intel GPU telemetry will be missing."
+        )


When gpu_metrics isn’t installed, this warning will be emitted on every /openarc/metrics poll, which can spam logs in normal UI usage. Consider logging this only once (e.g., with a module-level flag), or downgrade to debug and include telemetry availability in the response instead.

Copilot · 2026-04-25T14:06:42Z

+        import traceback
+        traceback.print_exc()
+        raise HTTPException(status_code=500, detail=f"Embedding failed: {str(exc)}")


These endpoints call traceback.print_exc() in the exception handler, which writes to stderr and can duplicate/garble structured logging in production (and may leak details outside the configured logger). Prefer logger.error(..., exc_info=True) and rely on the app’s exception handling/logging instead of printing directly.

Copilot · 2026-04-25T14:06:42Z

+        import traceback
+        traceback.print_exc()


traceback.print_exc() here writes directly to stderr; prefer logging with exc_info=True so stack traces go through the configured logger and don’t bypass log formatting/filters.

Suggested change

import traceback

traceback.print_exc()

logger.exception("Reranking failed")

Copilot · 2026-04-25T14:06:43Z

+        logger.info(
+            f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}"


The benchmark logger call uses len(request.input_ids) unconditionally. If the request uses prompt mode (input_ids is None), this will raise a TypeError and turn a valid bench request into a 500. Log input_ids_len only when input_ids is present, otherwise log prompt_len (or guard with or []).

Suggested change

logger.info(

f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}"

if request.input_ids is not None and len(request.input_ids) > 0:

prompt_info = f"input_ids_len={len(request.input_ids)}"

else:

prompt_info = f"prompt_len={len(request.prompt or '')}"

logger.info(

f"[bench] model={request.model} {prompt_info} metrics={metrics}"

Copilot · 2026-04-25T14:06:44Z

+                                or "<tool_" in accumulated_text
+                            )
+
+                        tool_calls = parse_tool_calls(accumulated_text)


In the streaming loop, parse_tool_calls(accumulated_text) is called on every token/chunk even before a tool call has started. Since it scans the entire accumulated output each time, this becomes O(n^2) for long generations and can noticeably increase CPU usage. You already track tool_call_started; consider only calling parse_tool_calls after tool_call_started becomes true (or incrementally parse).

Suggested change

tool_calls = parse_tool_calls(accumulated_text)

tool_calls = (

parse_tool_calls(accumulated_text)

if tool_call_started

else []

)

Copilot · 2026-04-25T14:06:44Z

+    target_path = Path(req.model_path)
+    if not target_path.exists() or not target_path.is_dir():
+        raise HTTPException(status_code=404, detail="Model directory not found")
+
+    config_path = target_path / "openarc.json"
+


update_local_model_config allows writing openarc.json into any existing directory path provided by the client (e.g., /etc, other users’ dirs) as long as the server process has permissions. Even with API-key auth this is an unnecessary filesystem write primitive; restrict model_path to a known base directory (e.g., the models cache dir), and consider rejecting symlinks / using a safe-join check.

Copilot · 2026-04-25T14:06:44Z

+    def pause(self, model_name: str) -> bool:
+        """Pause an active download. The worker thread blocks between files."""
+        task = self.tasks.get(model_name)
+        if not task or task.status != "downloading":
+            return False
+        task.status = "paused"
+        task._pause.clear()
+        return True
+
+    def resume(self, model_name: str) -> bool:
+        """Resume a paused download."""
+        task = self.tasks.get(model_name)
+        if not task or task.status != "paused":
+            return False
+        task.status = "downloading"
+        task._pause.set()
+        return True
+
+    def cancel(self, model_name: str) -> bool:
+        """Cancel an active or paused download."""
+        task = self.tasks.get(model_name)
+        if not task or task.status in ("completed", "cancelled"):
+            return False
+        task.status = "cancelled"
+        task.completed_at = time.time()
+        task._cancel.set()
+        task._pause.set()  # Unblock thread so it can exit
+        if task.task:
+            task.task.cancel()
+        return True
+
+    def list_tasks(self) -> List[Dict]:
+        self._cleanup_stale()
+        return [
+            {
+                "model_name": t.model_name,
+                "status": t.status,
+                "progress": round(t.progress, 2),
+                "total_bytes": t.total_bytes,
+                "downloaded_bytes": t.downloaded_bytes,
+                "total_files": t.total_files,
+                "completed_files": t.completed_files,
+                "error": t.error_msg,
+                "started_at": t.started_at,
+                "completed_at": t.completed_at,
+            }
+            for t in self.tasks.values()
+        ]
+


pause, resume, cancel, and list_tasks mutate/iterate self.tasks without any synchronization, while start() uses an asyncio lock. Concurrent requests can mutate the dict during iteration in list_tasks() (raising RuntimeError: dictionary changed size during iteration) or cause inconsistent status updates. Protect all access to self.tasks with a single lock (async or threading) and use it consistently in every method.

Copilot · 2026-04-25T14:06:45Z

+app.include_router(openarc_router)
+app.include_router(openai_router)


src.server.main previously exposed parse_tool_calls and endpoint functions (e.g., openai_chat_completions) that are still imported by existing tests/utilities (e.g., src/tests/test_tool_call_parser_unit.py). With the move to routers, those symbols are no longer available from main, which will break the test suite and any internal imports. Either update the tests to import from src.server.routes.openai, or re-export the moved callables from main for backward compatibility.

Copilot · 2026-04-25T14:06:45Z

+            raise HTTPException(
+                status_code=404, detail=f"Model '{unload_config.model_name}' not found"
+            )
+    except Exception as exc:
+        raise HTTPException(
+            status_code=500, detail=f"Failed to unload model: {str(exc)}"
+        )


unload_model raises an HTTPException(404, ...) when the model isn’t found, but the broad except Exception catches that HTTPException and converts it into a 500. Catch and re-raise HTTPException (or narrow the exception handler) so 404 responses aren’t masked as internal errors.

mrelmida added 7 commits April 19, 2026 21:10

implemented downloading, metrics and models endpoint

dbec9be

adding metrics support for gpu

e60080c

Add model downloader with API endpoints and improve metrics endpoint

15000bd

refactoring the openarc and openai endpoints to their own py files

481d3f0

extracting the version from the metadata

beedaa9

bump version to 2.0.4

d63a4bf

Merge remote-tracking branch 'origin/main' into openarc-studio-endpoints

8947c4f

Copilot AI review requested due to automatic review settings April 25, 2026 14:01

mrelmida changed the title ~~Openarc studio endpoints~~ Openarc studio endpoints and refactor of the endpoints Apr 25, 2026

Copilot started reviewing on behalf of mrelmida April 25, 2026 14:01 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openarc studio endpoints and refactor of the endpoints#91

Openarc studio endpoints and refactor of the endpoints#91
mrelmida wants to merge 7 commits intoSearchSavior:mainfrom
mrelmida:openarc-studio-endpoints

mrelmida commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Copilot AI Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    if path:
-        target_path = Path(path)
-    else:
-        target_path = Path.home() / ".cache" / "openarc" / "models"
+    base_models_path = (Path.home() / ".cache" / "openarc" / "models").resolve()
+    if path:
+        try:
+            requested_path = Path(path).expanduser().resolve()
+        except Exception as exc:
+            raise HTTPException(status_code=400, detail=f"Invalid path: {str(exc)}")
+        if requested_path != base_models_path and base_models_path not in requested_path.parents:
+            raise HTTPException(
+                status_code=400,
+                detail="Path must be within the OpenArc models directory",
+            )
+        target_path = requested_path
+    else:
+        target_path = base_models_path

	import traceback
	traceback.print_exc()
	logger.exception("Reranking failed")

		logger.info(
		f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}"

-        logger.info(
-            f"[bench] model={request.model} input_ids_len={len(request.input_ids)} metrics={metrics}"
+        if request.input_ids is not None and len(request.input_ids) > 0:
+            prompt_info = f"input_ids_len={len(request.input_ids)}"
+        else:
+            prompt_info = f"prompt_len={len(request.prompt or '')}"
+        logger.info(
+            f"[bench] model={request.model} {prompt_info} metrics={metrics}"

		app.include_router(openarc_router)
		app.include_router(openai_router)

Conversation

mrelmida commented Apr 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants