Skip to content

Sync with internal with new mapping design#29

Open
yy-code-nv wants to merge 6 commits into
NVIDIA:mainfrom
yy-code-nv:yangyangt/try_sync_with_internal
Open

Sync with internal with new mapping design#29
yy-code-nv wants to merge 6 commits into
NVIDIA:mainfrom
yy-code-nv:yangyangt/try_sync_with_internal

Conversation

@yy-code-nv

Copy link
Copy Markdown
Collaborator
  • regression past
  • inference correct

@yy-code-nv yy-code-nv force-pushed the yangyangt/try_sync_with_internal branch from 90e7ca9 to 21230f9 Compare June 11, 2026 17:23
yy-code-nv and others added 2 commits June 11, 2026 23:05
Pipeline run via packages/cosmos-framework-release/release.sh:
- 220+ files changed/added/removed across guardrails, callbacks, configs,
  data, model, tools, utils to match current i4 source.
- local_datasets/ restored to match cosmos-framework main exactly; the dir
  is now CF-owned (excluded from the mapping going forward).
- Removed 4 orphan files re-introduced on this branch (multiview_dataloader,
  vlm/defaults/dataloader, nvlm_data_unify, nvlm_sample_loaders_and_part_filters)
  -- already excluded in mapping_config.toml; nothing in CF imports them.
- New modules brought in: data/imaginaire/webdataset/augmentors/image,
  data/vfm/action/action_processing, data/vfm/vlm/video_decoder_qwen,
  data/vlm/processors/{nemotron3densevl,nemotronvl}, model/tokenizer/evaluation,
  model/vfm/mot/cosmos3_vfm_qwen3_vl_network_test, utils/vfm/video_preprocess,
  others.
- Internal http(s) URLs scrubbed to https://invalid_url (s3://, github,
  pytorch, docs.nvidia, arxiv, etc. preserved). NFS/usr leak paths scrubbed
  to /invalid_dir. SPDX/OpenMDW-1.1 headers applied.
- COSMOS_INTERNAL flag now defaults to False (was inheriting TRAINING=True).
- Zero dangling cosmos_framework module imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erministic to False

- utils/vfm/monkey_patch.py: rename _EXPECTED_TRANSFORMERS_VERSION
  -> _EXPECTED_TRANSFORMERS_VERSION_PREFIX (matches its "4.57." prefix-match
  semantics; the constant is a prefix, not an exact version).
- configs/base/vlm/defaults/policy_config.py: VLMModelConfig.deterministic
  default flipped True -> False. The comment already notes deterministic
  Flash-Attention kernels are slower and only needed for parity bit-exactness,
  so opt-in is the better default.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yy-code-nv yy-code-nv force-pushed the yangyangt/try_sync_with_internal branch from 21230f9 to fcc90a3 Compare June 12, 2026 06:42
yy-code-nv and others added 2 commits June 12, 2026 00:18
- Stop migrating tests that pull in unshipped fixtures/helpers
  (configs/base/base_config_test.py, model/vfm/mot/cosmos3_vfm_qwen3_vl_network_test.py,
  model/vfm/vlm/nemotron_3_dense_vl/nemotron_3_dense_vl_test.py). Excluded in
  mapping_config.toml and removed from CF.
- inference/action.py: hand-sync from imaginaire4/packages/cosmos3/cosmos3/
  action.py. Adds ActionProcessingRecord / make_batched_action_processing_fields
  paths and moves pad_action_to_max_dim to the action_processing import group.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ActionPromptJsonFormatter can return a dict for the caption_key; downstream
consumers expect a string, so json.dumps it when needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yy-code-nv yy-code-nv marked this pull request as ready for review June 12, 2026 08:28
@lfengad

lfengad commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Some issue might need pay attention?

  1. Found gitlab link in the comments of avae/straggler code
  2. aws_iad_h100, gcp_iad_gb200 such cluster info in cluster.py
  3. Found duplicate DefaultClusterConfig in cluster.py

yy-code-nv and others added 2 commits June 12, 2026 03:52
- Loosened REPLACE-NEXT semantics in the rewriter pick up the next *matching*
  line; lets a directive placed above a docstring scrub URLs inside it.
  Applied to avae.py and utils/misc.py to scrub two internal
  gitlab-master.nvidia.com URLs that previously survived in module/class
  docstrings.
- cluster.py and unittest.py removed from cosmos_framework/configs/base/defaults/
  and excluded from the release mapping (CF-owned going forward).
- Other small updates picked up from current i4 source.

No dangling imports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The /cluster default group no longer exists (cluster.py was dropped and the
cluster entry removed from configs/base/config.py's defaults list). Hydra
errors with ConfigCompositionException when an experiment tries to override
a missing group, so strip the override from the three remaining experiments:
vision_sft_nano, vision_sft_super, action_policy_droid_nano.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@lfengad lfengad left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants