gh-148276: Optimize object creation and method calls in the JIT by resolving __init__ at trace optimization time by eendebakpt · Pull Request #148277 · python/cpython

eendebakpt · 2026-04-08T22:20:14Z

Optimize object creation and method calls in the JIT by resolving __init__ at trace compile time and eliminating redundant type guards. The idea was picked up when experimenting with the ideas in #144388 using Claude Code.

Changes

_CHECK_AND_ALLOCATE_OBJECT: resolve the __init__ function to a constant via _spec_cache.init, allowing the optimizer to eliminate _CHECK_FUNCTION_VERSION and _CHECK_FUNCTION_EXACT_ARGS for the init call
_GUARD_TYPE_VERSION_LOCKED: propagate type version info so repeated guards on the same type within a trace are NOPed

Benchmark (release JIT, x86_64) on

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def distance_sq(self):
        return self.x * self.x + self.y * self.y

    def translate(self, dx, dy):
        return Point(self.x + dx, self.y + dy)

Benchmark	main	branch	Speedup
`Point(x, y)`	79.8 ns	72.9 ns	1.09x
`p.translate().dist()`	157.7 ns	130.7 ns	1.21x
`v.scale().add().dot()`	399.3 ns	301.7 ns	1.32x

Object creation + method chains are 1.2-1.3x faster. Simple method calls and descriptors are unchanged.

Details

<.summary>

"""Benchmark for type guard elimination and __init__ resolution.

Tests the optimizations from the type_guard_elimination branch:
1. __init__ function resolution in _CHECK_AND_ALLOCATE_OBJECT
2. Redundant _GUARD_TYPE_VERSION_LOCKED elimination

Usage:
    ./python bench_type_guard.py
    ./python bench_type_guard.py --trace   # show tier 2 traces
"""

import sys
import timeit

SHOW_TRACE = "--trace" in sys.argv

# --- System info ---
print("=" * 60)
print("Type Guard Elimination Benchmark")
print("=" * 60)
print(f"Python: {sys.version}")
print(f"Debug:  {hasattr(sys, 'gettotalrefcount')}")
jit = getattr(sys, "_jit", None)
if jit:
    print(f"JIT:    available={jit.is_available()}, enabled={jit.is_enabled()}")

tier2 = False
try:
    from _testinternalcapi import TIER2_THRESHOLD
    tier2 = True
    print(f"Tier 2: enabled (threshold={TIER2_THRESHOLD})")
except (ImportError, AttributeError):
    print("Tier 2: disabled")
print()


# --- Benchmark functions ---

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def distance_sq(self):
        return self.x * self.x + self.y * self.y

    def translate(self, dx, dy):
        return Point(self.x + dx, self.y + dy)


def bench_init(n):
    """Object creation: tests __init__ resolution."""
    total = 0.0
    for i in range(n):
        p = Point(1.0, 2.0)
        total += p.x
    return total


def bench_method_chain(n):
    """Method calls: tests type guard elimination across calls."""
    p = Point(1.0, 2.0)
    total = 0.0
    for i in range(n):
        total += p.distance_sq()
    return total


def bench_translate_chain(n):
    """Object creation + method: tests init + guard elimination."""
    p = Point(0.0, 0.0)
    total = 0.0
    for i in range(n):
        p = p.translate(1.0, 0.5)
        total += p.distance_sq()
    return total


class Vector:
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z

    def dot(self, other):
        return self.x * other.x + self.y * other.y + self.z * other.z

    def scale(self, s):
        return Vector(self.x * s, self.y * s, self.z * s)

    def add(self, other):
        return Vector(self.x + other.x, self.y + other.y, self.z + other.z)


def bench_vector_ops(n):
    """Vector math: tests guard elimination + init across operations."""
    v1 = Vector(1.0, 2.0, 3.0)
    v2 = Vector(4.0, 5.0, 6.0)
    total = 0.0
    for i in range(n):
        v3 = v1.scale(2.0).add(v2)
        total += v3.dot(v1)
    return total


def bench_list_append(n):
    """list.append: tests method descriptor optimization."""
    result = []
    for i in range(n):
        result.append(i)
    return len(result)


def bench_str_method(n):
    """str.startswith: tests method descriptor fast path."""
    s = "hello world"
    count = 0
    for i in range(n):
        if s.startswith("hello"):
            count += 1
    return count


# --- Warmup ---
LOOP = 10_000
for fn in [bench_init, bench_method_chain, bench_translate_chain,
           bench_vector_ops, bench_list_append, bench_str_method]:
    fn(LOOP)


# --- Show traces ---
if SHOW_TRACE and tier2:
    from _opcode import get_executor

    print("-" * 60)
    print("Tier 2 Traces")
    print("-" * 60)

    for label, func in [
        ("bench_init", bench_init),
        ("bench_method_chain", bench_method_chain),
        ("bench_translate_chain", bench_translate_chain),
        ("bench_vector_ops", bench_vector_ops),
    ]:
        code = func.__code__
        found = False
        for i in range(len(code.co_code) // 2):
            try:
                ex = get_executor(code, i * 2)
            except (ValueError, TypeError, RuntimeError):
                continue
            if ex is None:
                continue

            print(f"\n  {label}:")
            for j, op in enumerate(ex):
                name = op[0]
                if any(k in name for k in (
                    "GUARD", "INIT", "CHECK", "ALLOCATE", "CALL",
                    "LOAD_ATTR", "PUSH_FRAME", "CREATE", "VERSION",
                    "NOP", "EXPAND", "METHOD",
                )):
                    marker = ""
                    if "NOP" in name and "GUARD" not in name:
                        marker = " ← eliminated"
                    print(f"    {j:3d}: {name}{marker}")
            found = True
            break

        if not found:
            print(f"\n  {label}: (no executor found)")
    print()


# --- Benchmark ---
print("-" * 60)
print("Benchmark (min of 3 runs)")
print("-" * 60)

N = 2_000_000
INNER = 1000

benchmarks = [
    ("Point(x, y)           (__init__) ", bench_init),
    ("p.distance_sq()       (method)   ", bench_method_chain),
    ("p.translate().dist()  (chain)    ", bench_translate_chain),
    ("v.scale().add().dot() (vector)   ", bench_vector_ops),
    ("list.append(i)        (descr)    ", bench_list_append),
    ("s.startswith()        (str meth) ", bench_str_method),
]

for label, fn in benchmarks:
    iters = N // INNER
    times = [timeit.timeit(lambda: fn(INNER), number=iters) for _ in range(3)]
    t = min(times)
    print(f"  {label}: {t/N*1e9:.1f} ns/iter")

print()

Issue: Optimize object creation and method calls in the JIT by resolving __init__ at trace compile time #148276

…nt type guards - _CHECK_AND_ALLOCATE_OBJECT: resolve __init__ from type's _spec_cache so the optimizer can follow into __init__ bodies - _GUARD_TYPE_VERSION_LOCKED: add optimizer handler to track type version and NOP redundant guards on the same object - Add test_guard_type_version_locked_removed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fidget-Spinner

LGTM, just two comments on wording.

Fidget-Spinner · 2026-04-09T05:15:52Z

Lib/test/test_capi/test_opt.py

+        enabling the optimizer to trace into the init frame and eliminate
+        redundant function version and arg count checks.


This has nothing to do with tracing into the init frame. We already do that. it's more of propagating information through the frame

Fidget-Spinner · 2026-04-09T05:17:55Z

Python/optimizer_bytecodes.c

+            PyHeapTypeObject *cls = (PyHeapTypeObject *)type;
+            PyObject *init = FT_ATOMIC_LOAD_PTR_ACQUIRE(cls->_spec_cache.init);
+            if (init != NULL && PyFunction_Check(init)) {
+                // Record the __init__ function so _CREATE_INIT_FRAME can


Suggested change

// Record the __init__ function so _CREATE_INIT_FRAME can

// Propagate the __init__ function so _CREATE_INIT_FRAME can

eendebakpt and others added 2 commits April 8, 2026 18:24

add tests

83fd02c

eendebakpt requested review from Fidget-Spinner, markshannon, savannahostrowski and tomasr8 as code owners April 8, 2026 22:20

bedevere-app bot added the awaiting review label Apr 8, 2026

bedevere-app bot mentioned this pull request Apr 8, 2026

Optimize object creation and method calls in the JIT by resolving __init__ at trace compile time #148276

Open

Fidget-Spinner reviewed Apr 9, 2026

View reviewed changes

Fidget-Spinner added the skip news label Apr 9, 2026

Fidget-Spinner changed the title ~~gh-148276: Optimize object creation and method calls in the JIT by resolving __init__ at trace compile time~~ gh-148276: Optimize object creation and method calls in the JIT by resolving __init__ at trace optimization time Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-148276: Optimize object creation and method calls in the JIT by resolving init at trace optimization time#148277

gh-148276: Optimize object creation and method calls in the JIT by resolving init at trace optimization time#148277
eendebakpt wants to merge 2 commits intopython:mainfrom
eendebakpt:type_guard_elimination

eendebakpt commented Apr 8, 2026 •

edited by bedevere-app bot

Loading

Uh oh!

Fidget-Spinner left a comment

Uh oh!

Fidget-Spinner Apr 9, 2026

Uh oh!

Fidget-Spinner Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		enabling the optimizer to trace into the init frame and eliminate
		redundant function version and arg count checks.

	// Record the __init__ function so _CREATE_INIT_FRAME can
	// Propagate the __init__ function so _CREATE_INIT_FRAME can

Uh oh!

Conversation

eendebakpt commented Apr 8, 2026 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eendebakpt commented Apr 8, 2026 •

edited by bedevere-app bot

Loading