Source linked

Почему работа в очереди и блокировка на ней является главной причиной системных привязок

devblogs.microsoft.com@systems_wire6 hours ago·Systems Engineering·0 comments

Реймонд Чен указывает на анти-модель, где водители следуют букве правил звонка, но побеждают свою собственную производительность, и почему обновление doc 2020 наконец-то закрыло дыру.

microsoftwindows kernelraymond chenold new thingdriver developmentsystem worker threads

The most common reason a Windows system hangs, according to Microsoft's enterprise support team, is a driver that follows the kernel callback rules to the letter while violating their entire purpose. Queue work to a System Worker Thread, then block until that work completes. It's the engineering equivalent of telling your little brother to turn on the TV so you can say you didn't touch it.

The Rule and Its Rationale

Windows kernel-mode process and thread manager callbacks have a short, brutal list of prohibitions: keep routines short and simple, no registry calls, no IPC, no blocking, no synchronizing with other threads. Those aren't arbitrary. These callbacks fire during process creation, termination, DLL load, and other low-level events while the system may hold internal locks. Slow down one callback and you slow down the entire system's ability to start or stop processes.

The guidance to use System Worker Threads for expensive work is a direct concession: you can offload long operations, but the callback itself must return instantly. The intent is clear: "Don't block."

The 'It Was My Brother' Loophole

Here's the anti-pattern that keeps support engineers busy. A driver vendor reads "use System Worker Threads" and thinks they've found a legal loophole. They queue the work item, then call WaitForSingleObject or equivalent on the completion event. Technically, the callback isn't making a blocking call to another process. It's waiting on an event. But that event is set by the worker thread. You are synchronizing with another thread. You are blocking. The system still hangs.

Raymond Chen calls this the "It wasn't me, it was my brother" defense. Your parents told you not to turn on the TV. You told your brother to do it. Technically, you didn't press the button. But the TV is on, and you're still in trouble. The same logic applies to WHQL certification: the callback blocked, the system suffered, and the workaround doesn't absolve you.

How Microsoft's Documentation Finally Caught Up

Some driver vendors argued they were following the exact wording of the 2000s-era rules. The rules said don't make registry calls, don't call into user mode, don't synchronize with other threads. Nothing explicitly said "don't wait on a work item you queued." So they interpreted "don't synchronize with other threads" as "I'm synchronizing on an event, not a thread." Semantics.

In 2020, Microsoft updated the documentation to close this exact loophole. The new language explicitly calls out: "If you use System Worker Threads, don't wait on the work to complete. Doing so defeats the purpose of queuing the work to be completed asynchronously." No more loophole. The letter and the spirit now match.

Next time you implement a PsSetCreateProcessNotifyRoutine or PsSetLoadImageNotifyRoutine callback, stop before you add that event handle. Read the rationale, not just the bullet points. Your support colleagues will thank you - or at least they won't be debugging a system hang caused by your "clever" workaround.


Source: Understanding the rationale behind a rule when trying to circumvent it
Domain: devblogs.microsoft.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.