Computer vision initial services by javiermtorres · Pull Request #389 · mozilla-ai/encoderfile

javiermtorres · 2026-04-15T09:40:23Z

Preliminary implementation for computer vision tasks (object detection, image segmentation and image classification).

codecov-commenter · 2026-04-15T09:42:25Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

besaleli

Thoughts!!

besaleli · 2026-04-25T00:19:45Z

+}
+
+message ImageInput {
+  bytes image = 1;


nit: having 3 different ImageInput feels like a code smell. How would you feel about either/any combination of:

just throwing the images directly in ImageClassificationRequest and doing something like repeated bytes inputs = 1;. This would give us parity with repeated string inputs = 1; in text services

Putting it in a separate proto or reusing it

Condensing all of the protos into one file (would this be so horrible... this may make it easier if anyone is working with Kafka)

I'm now taking image classification as "gold standard" while I work at the impl, and I'll probably retrofit the other two categories from there. So I'll take these comments into consideration when I revisit the proto defs for them 👍

(It just felt easier to transcribe the hf pipelines in separate files at the start)

(separate proto + imports ftw, imho)

besaleli · 2026-04-25T00:20:42Z

+}
+
+message ObjectDetectionResponse {
+  repeated ImageBoundingBoxes boxes_batch = 1;


nit: would prefer this just be named boxes, but not a hill I need to die on

The thing is that for each image, we have boxes, but then for a bunch of input images we have groups of boxes.... so I ran out of imagination :-/ Any suggestions here?

renamed to "box" and "boxes", although the grammatical number agreement gets a bit weird :-/

besaleli · 2026-05-11T13:29:05Z


 macro_rules! run_cli {
-    ($model_type:ident, $cli:expr, $config:expr, $session:expr, $tokenizer:expr, $model_config:expr) => {{
+    ($model_type:ident, $cli:expr, $config:expr, $session:expr, $input_state:expr, $task_state:expr) => {{


merge in latest changes, we changed encoderfile-runtime/main.rs in the gpu update and model type dispatch is now happening elsewhere

besaleli · 2026-05-11T13:30:27Z

-    pub tokenizer: TokenizerService,
-    pub model_config: ModelConfig,
+    pub per_model_input_state: T::InputState,
+    pub per_task_state: T::TaskState,


naming here unclear. what does per_model and per_task mean? why can it just not be model_input_state and task_state?

I thought the "per_" would convey the meaning that this is an input state / task state (a config really) that depends on what input / task are used in your model. But I guess we can get rid of the per_ thing if it causes confusion.
At some point the state and config concepts need to be separated, but not today :)

besaleli · 2026-05-11T13:34:48Z

@@ -1,5 +1,9 @@
 macro_rules! model_type {
    [ $( $x:ident ),* $(,)? ] => {
+        pub trait ModelTypeSpec: Send + Sync + Clone + std::fmt::Debug + 'static {


why are we moving this inside the macro?

I think this was a leftover from some previous restructuring. I'll take it back out.

Moved back out.

besaleli · 2026-05-11T13:35:27Z

 use std::{fs::File, io::BufReader};

 const EMBEDDING_DIR: &str = "../models/embedding";
+// CHECK sentence embedding????


embedding and sentence embedding can use the same model

besaleli · 2026-05-11T13:36:49Z

-                &[AssetKind::Transform]
-            }
-        }
+        impl AssetPolicySpec for crate::common::model_type::$model_type {}


if we are moving all of the logic out of the trait, what is the point in having the trait? we might as well just make it a function

Quite legit.... I guess we could just have a fn required_assets(spec: impl InputType + TaskType). I doubt we could gain something by moving this to type space 😕

Also, the new InputType and TaskType traits (or vals in a fun) should be enough to compose the list of assets, instead of matching the tuples, right?

besaleli · 2026-05-11T13:40:15Z

+
+    fn inference(&self, request: impl Into<Self::Input>) -> Result<Self::Output, ApiError> {
+        let request = request.into();
+        let rescale_factor = 0.00392156862745098 as f32;


we should make these preprocessing steps into lua bindings. with a Preprocess function that's extracted from the transform

Yep. I wanted to make it work first, but totally agreed. Let's see if we can get something closer to what one would expect in an hf pipeline.
BTW I'm not considering b/w (1 channel) images, for example, right now.

besaleli · 2026-05-11T13:41:11Z

+
+        let tensor = Tensor(data.into_dyn());
+
+        let result = func


I'll revisit this, I think I just copied whatever it was already there without much consideration 😂
I guess we can do the same for text logits in this case, but it will get more complicated for object detection and image segmentation. But shape checks will fix everything 😉

besaleli · 2026-05-11T13:41:59Z

@@ -0,0 +1,316 @@
+# Multipart OpenAPI Service Example


@angpt looping you in here. we should add this into the docs when we release new version

besaleli · 2026-06-03T10:00:38Z

@@ -0,0 +1,21 @@
+syntax = "proto3";


one day we're gonna have to merge all of these so they're actually usable by someone streaming data into kafka for example 😩

besaleli · 2026-06-03T10:04:53Z

 }

-pub const DEFAULT_VERSION: &str = "0.1.0";
+pub const DEFAULT_VERSION: &str = "0.2.0";


if we're gonna change this, we need backwards compatibility logic

Hum. The runtime won't need it because the payload is bundled with it anyway, but it matters for the builder. However, both are going to be present for every version, so one just needs to get the builder from the same tag as the runtime. That's how we can get away with backwards compatibility, but maybe it's not too user friendly :-/

besaleli · 2026-06-03T10:10:55Z

+        .unwrap();
+
+        // TODO make parallel???
+        // TODO _maybe


I wish these comments were less cryptic

Hum, erm, totally right 😅😅😅😅😅
Ok, so, in the case of text processing, the preprocessing is not even comparable to inference, and we can do both sequentially. However, for image processing, we could think about overlapping image preprocessing and image inference, choosing a proper batch number instead of just putting everything in one batch. I'd still assume inference is going to be larger anyway, so the savings may not be too big (specially if everything is done via CPU, haw); otoh, this helps introducing queues for some streaming mode of operation later on.

I hope this makes it clearer. I don't think we need to start implementing any of this now, though, but I'll make like a less cryptic comment here 🤣🙈

besaleli · 2026-06-03T10:15:01Z

+            ImageClassificationTransform::new(DEFAULT_LIBS.to_vec(), Some(postprocess_code))
+                .expect("Failed to create engine");
+
+        let num_channels = self.model_input_state.config.num_channels as usize;


If I'm understanding correctly, we are now making users create an entirely separate config to provide values that we're deciding they need for postprocessing. I was hoping that users could define these in the lua transform and have more control over the exact steps they want to use. This is the point of having the lua transform—so that users don't get config fatigue

Seeing now that preprocessor_config.json is an asset for hf image classification models. Might be helpful to include a comment. Have we checked different models to see if the schema gets crazy? They don't like to give us authoritation JSON schemas :')

Sorry!!! Yes, apparently preprocessor_config.json is a thing in image models. Yes, as expected, it is largely undocumented (I'm just browsing the pipeline code...) and sometimes it overlaps with config.json. Ha. Hahaha.
I have indeed checked a couple of models or three. The entries I use here seem consistently used anywhere else with the same format (and we'll deal with overlapping entries later on), but there are some (like channels) that would have a greater impact in our code if we really want to handle them.

This reminds me that we need to instruct the user to copy yet another config file, or maybe make simple shell/PowerShell scripts 🤔

besaleli · 2026-06-03T10:15:46Z

+  -F "files=@/path/to/image2.jpg"
+```
+
+### Request Body (multipart/form-data)


Hahaha. Wait until it really works 😋 But yes, multipart seemed ok here (and maybe for stdin?)

besaleli · 2026-06-03T10:16:27Z


 impl Commands {
-    pub async fn execute<'a, R: Read + Seek>(
+    pub async fn execute<'loader, R: Read + Seek>(


I appreciate this rename

I might change a couple of other lifetime refs so I really understand what they are referring to....

besaleli · 2026-06-03T10:16:47Z

 }
+
+#[test]
+fn test_image_classification_model() {


Placeholder, sorry. I wanted to ask for a review before working on integration level tests.

Initial image processing interfaces

9432f85

Align with current interfaces

91bd77a

besaleli reviewed Apr 25, 2026

View reviewed changes

javiermtorres added 6 commits May 1, 2026 15:41

WIP

3bfecee

Separate config per input type and task type

bb229ab

Reorganize traits

03b84b5

Fix inference for image classification

a24581d

Complete e2e process (but not yet correct)

bfdc5f4

Fixed image order (channel grouped to separate channels); still wip

b2a082f

javiermtorres requested a review from besaleli May 11, 2026 07:28

besaleli reviewed May 12, 2026

View reviewed changes

javiermtorres added 2 commits May 18, 2026 12:04

Merge branch 'main' into 378-initial-computer-vision

a4066bd

Align with main

3065642

javiermtorres force-pushed the 378-initial-computer-vision branch from 46c597c to 3065642 Compare May 18, 2026 10:11

javiermtorres added 8 commits May 18, 2026 19:15

Fix tests

9821e76

Fix lint

83ec21c

Fix image classifier implementation

50cb6f6

Fix lint

dbc800b

Preliminary Lua bindings for images

d5d7e66

Lua preproc WIP

88993e8

Introduce Lua preprocessing

0bae7f9

Fix lint

64c9499

javiermtorres requested a review from besaleli May 22, 2026 13:45

besaleli reviewed Jun 3, 2026

View reviewed changes

Fix image classification http service

f8d2f14

Conversation

javiermtorres commented Apr 15, 2026

Uh oh!

codecov-commenter commented Apr 15, 2026

Codecov Report

Uh oh!

besaleli left a comment

Choose a reason for hiding this comment

Uh oh!

besaleli Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

besaleli Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

besaleli Apr 25, 2026 •

edited

Loading

besaleli Jun 3, 2026 •

edited

Loading