Computer vision initial services#389
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| message ImageInput { | ||
| bytes image = 1; |
There was a problem hiding this comment.
nit: having 3 different ImageInput feels like a code smell. How would you feel about either/any combination of:
- just throwing the images directly in
ImageClassificationRequestand doing something likerepeated bytes inputs = 1;. This would give us parity withrepeated string inputs = 1;in text services - Putting it in a separate proto or reusing it
- Condensing all of the protos into one file (would this be so horrible... this may make it easier if anyone is working with Kafka)
There was a problem hiding this comment.
I'm now taking image classification as "gold standard" while I work at the impl, and I'll probably retrofit the other two categories from there. So I'll take these comments into consideration when I revisit the proto defs for them 👍
There was a problem hiding this comment.
(It just felt easier to transcribe the hf pipelines in separate files at the start)
There was a problem hiding this comment.
(separate proto + imports ftw, imho)
| } | ||
|
|
||
| message ObjectDetectionResponse { | ||
| repeated ImageBoundingBoxes boxes_batch = 1; |
There was a problem hiding this comment.
nit: would prefer this just be named boxes, but not a hill I need to die on
There was a problem hiding this comment.
The thing is that for each image, we have boxes, but then for a bunch of input images we have groups of boxes.... so I ran out of imagination :-/ Any suggestions here?
There was a problem hiding this comment.
renamed to "box" and "boxes", although the grammatical number agreement gets a bit weird :-/
|
|
||
| macro_rules! run_cli { | ||
| ($model_type:ident, $cli:expr, $config:expr, $session:expr, $tokenizer:expr, $model_config:expr) => {{ | ||
| ($model_type:ident, $cli:expr, $config:expr, $session:expr, $input_state:expr, $task_state:expr) => {{ |
There was a problem hiding this comment.
merge in latest changes, we changed encoderfile-runtime/main.rs in the gpu update and model type dispatch is now happening elsewhere
| pub tokenizer: TokenizerService, | ||
| pub model_config: ModelConfig, | ||
| pub per_model_input_state: T::InputState, | ||
| pub per_task_state: T::TaskState, |
There was a problem hiding this comment.
naming here unclear. what does per_model and per_task mean? why can it just not be model_input_state and task_state?
There was a problem hiding this comment.
I thought the "per_" would convey the meaning that this is an input state / task state (a config really) that depends on what input / task are used in your model. But I guess we can get rid of the per_ thing if it causes confusion.
At some point the state and config concepts need to be separated, but not today :)
| @@ -1,5 +1,9 @@ | |||
| macro_rules! model_type { | |||
| [ $( $x:ident ),* $(,)? ] => { | |||
| pub trait ModelTypeSpec: Send + Sync + Clone + std::fmt::Debug + 'static { | |||
There was a problem hiding this comment.
why are we moving this inside the macro?
There was a problem hiding this comment.
I think this was a leftover from some previous restructuring. I'll take it back out.
There was a problem hiding this comment.
Moved back out.
| use std::{fs::File, io::BufReader}; | ||
|
|
||
| const EMBEDDING_DIR: &str = "../models/embedding"; | ||
| // CHECK sentence embedding???? |
There was a problem hiding this comment.
embedding and sentence embedding can use the same model
| &[AssetKind::Transform] | ||
| } | ||
| } | ||
| impl AssetPolicySpec for crate::common::model_type::$model_type {} |
There was a problem hiding this comment.
if we are moving all of the logic out of the trait, what is the point in having the trait? we might as well just make it a function
There was a problem hiding this comment.
Quite legit.... I guess we could just have a fn required_assets(spec: impl InputType + TaskType). I doubt we could gain something by moving this to type space 😕
Also, the new InputType and TaskType traits (or vals in a fun) should be enough to compose the list of assets, instead of matching the tuples, right?
|
|
||
| fn inference(&self, request: impl Into<Self::Input>) -> Result<Self::Output, ApiError> { | ||
| let request = request.into(); | ||
| let rescale_factor = 0.00392156862745098 as f32; |
There was a problem hiding this comment.
we should make these preprocessing steps into lua bindings. with a Preprocess function that's extracted from the transform
There was a problem hiding this comment.
Yep. I wanted to make it work first, but totally agreed. Let's see if we can get something closer to what one would expect in an hf pipeline.
BTW I'm not considering b/w (1 channel) images, for example, right now.
|
|
||
| let tensor = Tensor(data.into_dyn()); | ||
|
|
||
| let result = func |
There was a problem hiding this comment.
I'll revisit this, I think I just copied whatever it was already there without much consideration 😂
I guess we can do the same for text logits in this case, but it will get more complicated for object detection and image segmentation. But shape checks will fix everything 😉
| @@ -0,0 +1,316 @@ | |||
| # Multipart OpenAPI Service Example | |||
There was a problem hiding this comment.
@angpt looping you in here. we should add this into the docs when we release new version
46c597c to
3065642
Compare
| @@ -0,0 +1,21 @@ | |||
| syntax = "proto3"; | |||
There was a problem hiding this comment.
one day we're gonna have to merge all of these so they're actually usable by someone streaming data into kafka for example 😩
| } | ||
|
|
||
| pub const DEFAULT_VERSION: &str = "0.1.0"; | ||
| pub const DEFAULT_VERSION: &str = "0.2.0"; |
There was a problem hiding this comment.
if we're gonna change this, we need backwards compatibility logic
There was a problem hiding this comment.
Hum. The runtime won't need it because the payload is bundled with it anyway, but it matters for the builder. However, both are going to be present for every version, so one just needs to get the builder from the same tag as the runtime. That's how we can get away with backwards compatibility, but maybe it's not too user friendly :-/
| .unwrap(); | ||
|
|
||
| // TODO make parallel??? | ||
| // TODO _maybe |
There was a problem hiding this comment.
I wish these comments were less cryptic
There was a problem hiding this comment.
Hum, erm, totally right 😅😅😅😅😅
Ok, so, in the case of text processing, the preprocessing is not even comparable to inference, and we can do both sequentially. However, for image processing, we could think about overlapping image preprocessing and image inference, choosing a proper batch number instead of just putting everything in one batch. I'd still assume inference is going to be larger anyway, so the savings may not be too big (specially if everything is done via CPU, haw); otoh, this helps introducing queues for some streaming mode of operation later on.
I hope this makes it clearer. I don't think we need to start implementing any of this now, though, but I'll make like a less cryptic comment here 🤣🙈
| ImageClassificationTransform::new(DEFAULT_LIBS.to_vec(), Some(postprocess_code)) | ||
| .expect("Failed to create engine"); | ||
|
|
||
| let num_channels = self.model_input_state.config.num_channels as usize; |
There was a problem hiding this comment.
If I'm understanding correctly, we are now making users create an entirely separate config to provide values that we're deciding they need for postprocessing. I was hoping that users could define these in the lua transform and have more control over the exact steps they want to use. This is the point of having the lua transform—so that users don't get config fatigue
Seeing now that preprocessor_config.json is an asset for hf image classification models. Might be helpful to include a comment. Have we checked different models to see if the schema gets crazy? They don't like to give us authoritation JSON schemas :')
There was a problem hiding this comment.
Sorry!!! Yes, apparently preprocessor_config.json is a thing in image models. Yes, as expected, it is largely undocumented (I'm just browsing the pipeline code...) and sometimes it overlaps with config.json. Ha. Hahaha.
I have indeed checked a couple of models or three. The entries I use here seem consistently used anywhere else with the same format (and we'll deal with overlapping entries later on), but there are some (like channels) that would have a greater impact in our code if we really want to handle them.
There was a problem hiding this comment.
This reminds me that we need to instruct the user to copy yet another config file, or maybe make simple shell/PowerShell scripts 🤔
| -F "files=@/path/to/image2.jpg" | ||
| ``` | ||
|
|
||
| ### Request Body (multipart/form-data) |
There was a problem hiding this comment.
Hahaha. Wait until it really works 😋 But yes, multipart seemed ok here (and maybe for stdin?)
|
|
||
| impl Commands { | ||
| pub async fn execute<'a, R: Read + Seek>( | ||
| pub async fn execute<'loader, R: Read + Seek>( |
There was a problem hiding this comment.
I might change a couple of other lifetime refs so I really understand what they are referring to....
| } | ||
|
|
||
| #[test] | ||
| fn test_image_classification_model() { |
There was a problem hiding this comment.
Placeholder, sorry. I wanted to ask for a review before working on integration level tests.
Preliminary implementation for computer vision tasks (object detection, image segmentation and image classification).