Skip to content

WIP watch diff with upstream main branch#6

Open
Jeronymous wants to merge 68 commits intoupstream-mainfrom
merge_hf_main
Open

WIP watch diff with upstream main branch#6
Jeronymous wants to merge 68 commits intoupstream-mainfrom
merge_hf_main

Conversation

@Jeronymous
Copy link
Copy Markdown
Member

No description provided.

Oligou and others added 30 commits October 14, 2025 11:51
…q len (131072) is larger than the maximum number of tokens that can be stored in KV cache (130944). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine"
…and new version of the dataset is different)
Jeronymous and others added 30 commits February 20, 2026 16:17
…it (>= 0.15).

Unfortunately, it currently fails with VLLM 0.15.1 in our env:
  File ".../vllm/v1/worker/gpu_worker.py", line 412, in initialize_from_config
    self.model_runner.initialize_kv_cache(kv_cache_config)
  File ".../vllm/v1/worker/gpu_model_runner.py", line 5874, in initialize_kv_cache
    self.initialize_attn_backend(kv_cache_config)
  File ".../vllm/v1/worker/gpu_model_runner.py", line 5225, in initialize_attn_backend
    check_attention_cp_compatibility(self.vllm_config)
  File ".../vllm/v1/worker/cp_utils.py", line 39, in check_attention_cp_compatibility
    assert layer_impl.supports_pcp, (
AssertionError: PCP requires attention impls' support, but the impl FlashAttentionImpl does not support PCP.
Fix multi-parallelism (TP+DP or PP+DP)
Add MathAlea French math MCQ community task
Add Red Teaming benchmark based on AvgBench
Upstream refactor splits src/lighteval/tasks into per-task files under
src/lighteval/tasks/tasks/ and src/lighteval/tasks/multilingual/tasks/,
drops default_tasks.py / default_prompts.py / multilingual/tasks.py, and
removes the suite field from LightevalTaskConfig.

Port our edits to the new structure:
- tasks/gsm_plus.py: generation_size 16384
- tasks/gsm8k.py: generation_size 2048
- tasks/mgsm.py: hf_revision, suffix exact_match + expr_gold_metric,
  language-specific stop sequences for all 11 subsets
- tasks/piqa.py: switch to lighteval/piqa mirror
- tasks/siqa.py: pin hf_revision
- tasks/mmlu_pro.py: fix upstream's hardcoded ABCD letters so the prompt
  uses dynamic letters based on the number of options; add a parallel
  mmlu_pro_raw task exposing the handmade prompt (no inspect_ai)
- tasks/ruler.py: new home for the ruler prompt helper
- tasks/advbench.py: move here from community_tasks/
- multilingual/tasks/mathalea.py: move here from community_tasks/
- multilingual/tasks/french.py: keep jzhang86/fr_ifeval fallback and the
  generative GPQA-fr-diamond variant with prompt_gpqa_fr_instruct

Other conflict resolutions:
- pyproject.toml: take upstream unpinned transformers, vllm>=0.11.0,
  new inspect-ai and openai deps
- vllm_model.py: keep max_seq_len_to_capture fallback, Mistral eos_token
  guard, prefix-cache None-skip in logprob loop, and
  skip_reading_prefix_cache via guarded attribute assignment; adopt
  upstream's build_vllm_token_prompts helper
- llm_as_judge.py: keep max_model_len=65536, adopt upstream's
  api_key/base_url litellm pass-through
- lighteval_task.py: preserve name/data_dir fallback in load_dataset
  while picking up upstream's data_files support; keep partial args
  detail in __str__ for deterministic cache hashing
- cache_management.py: adopt name-only task_to_configs lookup; keep
  regex that strips function memory addresses for hash determinism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants