-
Notifications
You must be signed in to change notification settings - Fork 209
Pull requests: jd-opensource/xllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
perf(npu): eliminate redundant Transpose in Qwen3.5 MTP spec verify conv path
#1536
opened May 23, 2026 by
pjgao
Loading…
bugfix: drive pd push transfer by cursor to avoid miss transfer.
#1532
opened May 22, 2026 by
phantomlei3
Collaborator
•
Draft
feat: support interlayer add norm and SplitRmsnormRope operation for qwen3.
#1531
opened May 22, 2026 by
shan-chen-feng
Collaborator
Loading…
bugfix: resolve causal_conv1d tiling failure for qwen3.5 gdn decode with padded batches (#1428 cherry-pick to main)
#1529
opened May 22, 2026 by
pjgao
Loading…
feat: support MiMo-7B-Base on cuda device.
#1523
opened May 22, 2026 by
Dragonliu2018
Contributor
Loading…
feat: support dumping xllm server flags to json file.
#1518
opened May 22, 2026 by
XuZhang99
Collaborator
Loading…
feat:remove unused func and support deepseek_v4_mtp graph on npu.
#1517
opened May 22, 2026 by
panxua
Contributor
Loading…
feat: expose cached token usage in responses.
#1514
opened May 21, 2026 by
zhang-minchao
Collaborator
Loading…
feat: enable REC XAttention for Qwen3 MoE on cuda device.
#1500
opened May 20, 2026 by
LMX-xin
Collaborator
Loading…
feat: support vae parallel for qwen-image-edit-plus.
#1499
opened May 20, 2026 by
shan-chen-feng
Collaborator
Loading…
feat: add TileLang chunk_gated_delta_rule_fwd_h kernel.
#1498
opened May 20, 2026 by
fengz72
Loading…
bugfix: use max_concurrent_requests for single block and linear state allocation.
#1496
opened May 20, 2026 by
pjgao
Loading…
feat: support customized multimodal preprocess configs.
#1481
opened May 19, 2026 by
xanecdotex
Collaborator
Loading…
refactor: remove negative condition when choosing decode or prefill
#1475
opened May 18, 2026 by
rauletorresc
Contributor
Loading…
feat: parallelize multimodal decode in request transfer.
#1474
opened May 18, 2026 by
wly-115
Collaborator
Loading…
refactor: split forward inputs from model input params [3 / 3].
#1469
opened May 18, 2026 by
RobbieLeung
Collaborator
Loading…
bugfix: ensure tensor contiguous layout before protobuf serialization.
#1468
opened May 18, 2026 by
a120092009
Collaborator
Loading…
fused_sigmoid_gating_tilelang tilelang adapt in qwen3.x
#1465
opened May 15, 2026 by
BikingNow
Loading…
bugfix: reduce acl graph memory overhead.
#1457
opened May 15, 2026 by
RobbieLeung
Collaborator
Loading…
docs: exporting a draft model from a quantized model.
#1455
opened May 14, 2026 by
rauletorresc
Contributor
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.