Source linked

256KB Code für 64KB Daten: Emulator-Team behebt ein Programm in der Laufzeit

devblogs.microsoft.com@systems_wire3 hours ago·Systems Engineering·3 comments

Ein binärer Übersetzer traf auf eine von Compiler generierte unrollte Schleife mit 65.536 einzelnen Schreibanweisungen. Das Team fügte einen Muster-matching-Optimierer hinzu, um ihn während der Emulation durch eine enge Schleife zu ersetzen.

microsoftwindowsx86 emulatorbinary translationraymond chenold new thing

That's 256 kilobytes of code to initialize 64 kilobytes of data. A 4:1 ratio of instructions to payload. The kind of number that makes an engineer's eye twitch.

Raymond Chen's colleague told this war story from the days when Windows included a processor emulator for x86-32 on systems running a different native processor. The emulator didn't just interpret - it used binary translation, generating native code on the fly. Think of x86-32 as a bytecode and the emulator as a JIT compiler. That JIT approach gave huge performance wins over pure interpretation, but it also meant the team saw every single instruction the emulated program executed.

When a Compiler Went Full Stupid

One program needed to allocate about 64KB on the stack and initialize it. Standard approach: stack probe to confirm 64KB is available, subtract 65536 from the stack pointer, then run a small loop to fill the memory. Simple. Efficient.

But some compiler decided a loop was too mundane. Instead of generating a tight initialization loop, it "optimized" by unrolling the loop into 65,536 individual "write byte to memory" instructions. Each instruction was 4 bytes long. The resulting code was 256KB - four times the size of the data it was supposed to initialize. That's not optimization. That's a compiler having a seizure.

The Emulator Fought Back

The team was so offended by this atrocity that they added special logic to the binary translator. The translator would detect this particular horrible function pattern and replace it with the equivalent tight loop - during emulation. They effectively shipped a hotpatch for a compiled binary that they didn't control, running on a platform that didn't natively exist.

Think about the engineering mindset required to spot that pattern in a stream of translated instructions and decide, "No, we're not going to JIT this garbage, we'll generate our own loop instead." That's not just optimization - that's a moral stance.

What This Tells Us About Real Systems

This story surfaces a tension that still exists: compilers can generate truly pathological code, especially when heuristics around loop unrolling go off the rails. The emulator team's response - pattern matching in a binary translator - is a reminder that system software often has to compensate for the sins of other tools. The next time you see a bloated binary, consider that someone somewhere might have had to write a translator-level workaround for it.

Today's JIT compilers and emulators still employ similar tricks. Pattern replacement in a dynamic translator is a legitimate performance technique, even if the original sin was a bad compiler decision. The difference is that now we have far better heuristics for loop unrolling - at least, most of the time.


Source: The time the x86 emulator team found code so bad they fixed it during emulation
Domain: devblogs.microsoft.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.