Skip to content

Fix VLM position_ids packing in Megatron strategy with sequence packing enabled#452

Open
sanmuf wants to merge 1 commit into
alibaba:mainfrom
sanmuf:vlm-pos
Open

Fix VLM position_ids packing in Megatron strategy with sequence packing enabled#452
sanmuf wants to merge 1 commit into
alibaba:mainfrom
sanmuf:vlm-pos

Conversation

@sanmuf
Copy link
Copy Markdown
Contributor

@sanmuf sanmuf commented May 26, 2026

Summary

Fix Megatron VLM sequence packing by packing 3D mRoPE position_ids together with input_ids.

Details

For Qwen2-VL / Qwen3-VL style VLM models, position_ids are 3D mRoPE tensors. Previously, when sequence_packing was enabled, only token tensors such as input_ids and labels were packed, while VLM position_ids remained unpacked.

This caused packed token sequences and mRoPE position ids to have inconsistent shapes, leading to Megatron RoPE runtime errors during reference/train forward:

RuntimeError: Sizes of tensors must match except in dimension 3.
Expected size 2 but got size 1 for tensor number 1 in the list.

This happens in Megatron's RoPE application path:

megatron/core/models/common/embeddings/rope_utils.py
_apply_rotary_pos_emb_bshd -> torch.cat((t, t_pass), dim=-1)

Changes

Pack position_ids together with input_ids when sequence_packing is enabled.

Adjust padding logic in _pack_sequences to support non-1D tensors such as VLM position ids.

For non-packing mode, preserve the existing VLM position id layout.

Validation

After this fix, VLM position_ids are packed consistently with tokens and the Megatron forward path no longer hits the RoPE shape mismatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant