Source linked

Рефлекс сжимает скорость 220x от ast.walk с трюками Rust и C-Level

Заменяя Python-генераторы Rust-связями, встраивая iter_fields и кешируя подклассные указатели, Reflex сократил проходность AST с 285ns на узел до ~1,3ns.

reflexpythonrustpyo3astperformance optimization

285 nanoseconds per node. That's what Python's ast.walk cost Reflex when linting AI-generated code -- roughly a thousand CPU cycles for a trivial tree traversal. After a systematic descent through generators, attribute lookups, and CPython internals, they landed at 1.3ns per node: a 220x improvement.

Why ast.walk Was the Bottleneck in an AI Code Generator

Reflex builds an AI-powered app generator that produces massive amounts of Python code. Mistakes like misplaced keyword arguments or invalid async generators are common. The standard reflex compile catches them one at a time, so multi-bug outputs mean multiple slow cycles. A custom linter seemed obvious -- until they realized ast.walk alone took 2ms for the difflib module (7,000 nodes). That doesn't sound terrible until you multiply it across thousands of generated files.

The Slow Stack of Generators

ast.walk uses yield. Generators save memory but suspend and resume the loop on every node. Removing the generator gave only a 5% gain. ast.iter_child_nodes was another generator -- inlining it pushed cumulative improvement to 25%. Then ast.iter_fields surfaced: yet another generator that yields unused (name, value) tuples. Replacing it with getattr(node, field, None) and dropping the tuple got to 50%.

Reading _fields directly and checking subclasses in the same call bumped it to 55%. An iterative loop instead of recursion barely moved the needle. Python had no more tricks.

Rust, PyO3, and Direct Dict Iteration

Reflex's team used Rust bindings via PyO3 to write the traversal in native code. A straight transliteration hit 78% cumulative improvement. But then they went deeper: instead of getattr, they iterated over __dict__ directly by reading it at a memory offset. Combined with a hash set of AST subclass memory addresses (instead of isinstance), that hit 93%.

The last CPython call was PyDict_Next inside BorrowedDictIter. They rewrote that too. With a precomputed 2KB direct-mapped table keyed by type pointer that stores whether a class is an AST subclass and how many fields it has, they reached 99% reduction. Final result: 220x faster.

What This Means for AI-Generated Code Workflows

A linter that runs in microseconds instead of milliseconds per file means Reflex can catch all errors in a single pass without delaying the developer feedback loop. That's the difference between an AI assistant that feels instant and one that feels sluggish.


Source: Making ast.walk 220x Faster
Domain: reflex.dev

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.