Skip to content

Task detection diverges between inspect and the model loader for configs without architectures #864

@timenick

Description

@timenick

Summary

The two offline task-detection paths disagree for a config that has no resolvable architectures: inspect returns a fallback task while the model-loading commands raise ValueError.

  • detect_task(config) — used by inspect, eval, and config's seq2seq composite routing
  • resolve_task_and_model_class(config, task=None) — used by the model loader, i.e. build / export / perf / run / serve / quantize, and config's single-config path (via generate_hf_build_config)

For any real checkpoint the two agree (they share the D2 modality upgrade and converge on TasksManager architecture-head detection). They diverge only when the config carries no resolvable architecture — e.g. an architecture-less --model-type X invocation (no -m checkpoint), or a default/synthetic config.

Reproduction

from transformers import SamConfig, CLIPConfig
from winml.modelkit.loader import detect_task, resolve_task_and_model_class

detect_task(SamConfig())                    # -> ('mask-generation', 'HF_MODEL_CLASS_MAPPING')
resolve_task_and_model_class(SamConfig())   # -> ValueError

detect_task(CLIPConfig())                   # -> ('next-sentence-prediction', 'HF_TASK_DEFAULTS')
resolve_task_and_model_class(CLIPConfig())  # -> ValueError

At the command surface:

winml inspect --model-type sam     # task = mask-generation
winml build   --model-type sam     # (or perf / run / quantize) errors: task cannot be resolved

So it is not "two different valid tasks" — it is inspect is lenient (returns a fallback), the loader-backed commands are strict (raise).

Root cause

detect_task (loader/task.py) has two fallbacks that resolve_task_and_model_class does not:

  1. the HF_MODEL_CLASS_MAPPING step-1 short-circuit, which fires on model_type alone with no architecture needed (e.g. sam -> mask-generation);
  2. the HF_TASK_DEFAULTS fallback (-> next-sentence-prediction).

resolve_task_and_model_class Case 1 goes straight to _detect_task_and_class_from_config, which must resolve the architecture/model class and raises when it cannot.

Scope / severity

Low. Only affects configs without resolvable architectures (architecture-less --model-type invocations, synthetic/default configs). Every real checkpoint carries architectures, so the two paths agree there — verified across bert (MaskedLM/SeqClass), bart (CondGen/SeqClass), t5 (CondGen), resnet, dinov2 (D2 modality), sam, and clip with architectures present.

Options

  1. Document as intendedinspect is a lenient reporter; loader-backed commands are strict and should reject what they cannot load. (current behavior)
  2. Parity — either give resolve_task_and_model_class the same fallbacks (lenient: both succeed), or drop the fallbacks from detect_task (strict: both raise) so inspect and the model-loading commands always agree.

Either way, a consistency test comparing detect_task vs resolve_task_and_model_class (not just detect_task vs inspect_detect_task, which is all tests/integration/test_task_consistency.py covers today) would lock the invariant.

Metadata

Metadata

Assignees

Labels

P2Medium — minor bug or non-critical improvementbugSomething isn't workingqualityUse for quality control related issuestriagedIssue has been triaged

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions