GetMessage().await: The hunt for an async message loop
Windows has quite decent support for asynchronous operations in its APIs. And while, as is typical for, sadly, any OS, it comes in multiple forms and fashions like ReadFile’s “signal an event when done” and APCs, the ruler of them all are I/O Completion Ports, introduced in NT 3.5, 31 years ago. Unsurprisingly, it is completion based - instead of polling for when an operation has finished like the eponymous poll, operations post a completion when they’re done, and the caller just has to wait for and handle completions. IOCPs are also the core of the Windows Threadpool API, where you don’t even have to deal with a port directly - you register a callback, and the system calls it for you when it dequeues the completion.
There isn’t much you can’t do with IOCPs - you can perform asynchronous I/O, you can wait on waitable handles via NtCreateWaitCompletionPacket, you can even treat it as a thread-safe queue and post arbitrary messages via PostQueueCompletionStatus. But you can’t wait on window messages. And while it may not sound useful to perform lots of asynchronous I/O on a UI thread, which is supposed to stay responsive to user input, it still bugged me because it’s the one operation you can’t integrate with an IOCP very well - there is MsgWaitForMultipleObjectsEx, but IOCPs aren’t waitable objects. However, that exact function is known to use an internal handle to wait for messages, and after discovering a syscall called NtUserGetInputEvent, I wanted to try to make that wait asynchronous.
Hello Windows 8#
I wanted to start my investigation by looking at what MsgWaitForMultipleObjectsEx does, but unfortunately it just calls the syscall NtUserMsgWaitForMultipleObjectsEx. And while I could go and stare at the wonders of kernel-mode code… I vaguely remembered a stack trace on Windows 8 where something called RealMsgWaitForMultipleObjects called WaitForMultipleObjectsEx, which sounded more promising for user-only shenanigans. After all, there’s a syscall for the input event, so it stands to reason that userspace can use it. Hopefully.
MsgWaitForMultipleObjectsEx has this prototype:
DWORD MsgWaitForMultipleObjectsEx(
[in] DWORD nCount,
[in] const HANDLE *pHandles,
[in] DWORD dwMilliseconds,
[in] DWORD dwWakeMask,
[in] DWORD dwFlags
);
The first free parameters are like MsgWaitForMultipleObjectsEx - an array of handles to wait on, and the timeout for that wait. dwWakeMask specifies a combination of flags from the queue status enumeration for which input the function is supposed to wait for, like keyboard input, touch events, or just any message that ever touches the queue. dwFlags specifies whether it’s an alertable wait for APCs, and whether it should wait on all handles (including the message handle) or just for any of them - and, most importantly for this, whether it should wait for new messages, or also return if there are already message in the queue.
Turns out that function doesn’t even do that much:
- It checks some values in
Win32ClientInfofor whether there is already input in the queue, ifMWMO_INPUTAVAILABLEis specified, and exits if that is the case. - It calls
GetInputEventviaZwUserCallOneParam. Before Windows 11, a bunch of functions were grouped into three system calls in order to…I don’t know, save syscall entries? The parameter isdwFlagsshifted up by 16 bits combined with the wake mask - both parameters are essentiallyWORDsized, but passed as twoDWORDs. - If flag value
8is set, is does anotherZwUserCallOneParamforSetWaitForQueueAttach. Yay, yet another undocumented flag. - It calls
WaitForMultipleObjectsExwith an array that consists ofpHandlesand the input event. - It calls
ZwUserCallNoParamforClearWakeMask.
Essentially, you get the configured event by calling GetInputEvent with the flags and wake mask you want, wait until an event is signaled, and then call ClearWakeMask to clear whatever queue events you want to wait for. We now know how to get the event, and how to reset the state after the wait - and with the help of a threadpool wait or NtCreateWaitCompletionPacket, we can wait on the event asynchronously:
1const INPUT_EVENT_KEY: *mut c_void = 1234 as _;
2
3fn reassociate(iocp: HANDLE, wait_completion_packet: HANDLE) {
4 unsafe {
5 NtAssociateWaitCompletionPacket(
6 wait_completion_packet,
7 iocp,
8 NtUserGetInputEvent((MWMO_INPUTAVAILABLE << 16) | QS_ALLINPUT),
9 INPUT_EVENT_KEY,
10 std::ptr::null_mut(),
11 0,
12 0,
13 std::ptr::null_mut(), // Ignoring whether it's already signaled for demonstration purposes
14 );
15 }
16}
17
18pub fn main() {
19 let iocp = unsafe { CreateIoCompletionPort(INVALID_HANDLE_VALUE, std::ptr::null_mut(), 0, 1) };
20
21 let wait_completion_packet = unsafe {
22 let mut wait_completion_packet = std::ptr::null_mut();
23
24 NtCreateWaitCompletionPacket(
25 &raw mut wait_completion_packet,
26 GENERIC_ALL,
27 std::ptr::null_mut(),
28 );
29 wait_completion_packet
30 };
31
32 reassociate(iocp, wait_completion_packet);
33
34 loop {
35 let mut bytes_transferred = 0;
36 let mut completion_key = 0;
37 let mut overlapped: *mut OVERLAPPED = std::ptr::null_mut();
38 let result = unsafe {
39 GetQueuedCompletionStatus(
40 iocp,
41 &raw mut bytes_transferred,
42 &raw mut completion_key,
43 &raw mut overlapped,
44 INFINITE,
45 )
46 };
47
48 if completion_key == INPUT_EVENT_KEY as _ {
49 unsafe {
50 NtUserClearWakeMask();
51 }
52
53 let mut msg = unsafe { std::mem::zeroed() };
54
55 unsafe {
56 while PeekMessageW(&raw mut msg, std::ptr::null_mut(), 0, 0, PM_REMOVE) == 1 {
57 if msg.message == WM_QUIT {
58 break;
59 }
60 TranslateMessage(&raw const msg);
61 DispatchMessageW(&raw const msg);
62 }
63 }
64
65 reassociate(iocp, wait_completion_packet);
66 }
67 }
68}
Except that, on Windows 11, it works for three seconds, and then the event never ever gets signaled again, no matter how many keys I press into the test window. Maybe something else interferes with the event? A quick check later…yup, it’s an autoreset event. There is definitely something interfering.
Time to look at the kernel.
Cancel it#
Armed with the knowledge about MsgWaitForMultipleObjectsEx, I approached the kernel with more confidence. And sure enough, NtUserMsgWaitForMultipleObjectsEx still starts out effectively the same, with some differences in how the thread state is being accessed. It checks whether it can short-circuit the wait, gets the input event…and then something caught my eye.
CALL qword ptr [__imp_ZwCancelWaitCompletionPacket]
Wait, what?
Well, as it turns out, I am definitely not the first to wish for the input event to play nice with an IOCP… because Microsoft already did, and wired an IOCP into the message queue. It’s apparently used to translate certain completions into window messages, and send them to the windows registerd by NtUserInitThreadCoreMessagingIocp(2). That name also makes it pretty easy to figure out what it is needed for - CoreMessaging, aka the modern CoreWindow shenanigans, loaded into my test process because of some internal usage in textinputframework.dll.
Thus we have step 2.5.
2.5. If MWMO_WAITALL is specified, the function cancels the queue completion packet, and reassociates it later after the wait. Otherwise, it uses the IOCP for the wait instead of the input event, and leaves the queue completion packet alone.
Remember when I said that IOCPs aren’t waitable objects? Well, they are since Windows 10 - they become signaled when there are queued completions, and unsignaled if there aren’t any. And while you can’t associate them with a completion packet because NtAssociateWaitCompletionPacket complains with STATUS_INVALID_PARAMETER_2 (I guess MS didn’t want anyone to chain IOCPs together), you can use them with WaitForMultipleObjects and friends.
Okay, so we know what we need to fix our waiting routine. Fortunately, MS even made it possible to actually do that - NtUserCancelQueueEventCompletionPacket and NtUserReassociateQueueEventCompletionPacket are all syscalls (as well as NtUserGetQueueIocp). Thanks, I guess.
fn reassociate(iocp: HANDLE, wait_completion_packet: HANDLE) {
unsafe {
NtUserCancelQueueEventCompletionPacket();
NtAssociateWaitCompletionPacket(
iocp,
wait_completion_packet,
NtUserGetInputEvent((MWMO_INPUTAVAILABLE << 16) | QS_ALLINPUT),
[...]
);
}
}
[...]
if completion_key == INPUT_EVENT_KEY as _ {
unsafe {
NtReassociateQueueEventCompletionPacket();
NtUserClearWakeMask();
}
let mut msg = MSG::default();
[...]
And now the test program works and doesn’t hang anymore! Hooray! We have an asynchronous wait for messages!
Or do we?
Only one can play that game#
We found out that NtUserGetInputEvent configures the event with what you want to wait for, and NtUserClearWakeMask clears it again. So what happens if you setup an asynchronous wait, and the thread calls MsgWaitForMultipleObjectsEx or GetMessage in the meantime, e.g. because it’s a COM STA and you’re doing a call?
Well, it overwrites the same wake mask that was used for the asynchronous wait, because there is only one. And as it turns out, GetMessage clears the wake mask as well. And the asynchronous wait never gets woken up.
And this is where the hunt ends: With a proof-of-concept for an asynchronous wait for messages that works only if nothing in that thread ever touches the message queue during the wait. Maybe it could work by spawning a helper thread and merging its queue with the calling thread, but I haven’t found a way to actually attach two threads together - AttachThreadInput only joins the input queues, not the message queues.
On the other hand, we have confirmed that IOCPs are now waitable objects, so your loop can use MsgWaitForMultipleObjectsEx and wait on your IOCP. It’s not perfect…but it is better than nothing. And if you want to paly the game, you can find a crate with a future for the input event on GitHub.
1pub fn main() {
2 let iocp = unsafe { CreateIoCompletionPort(INVALID_HANDLE_VALUE, std::ptr::null_mut(), 0, 1) };
3
4 const WAIT_OBJECT_1: u32 = WAIT_OBJECT_0 + 1;
5
6 loop {
7 unsafe {
8 match MsgWaitForMultipleObjectsEx(
9 1,
10 &raw const iocp,
11 INFINITE,
12 MWMO_INPUTAVAILABLE,
13 QS_ALLINPUT,
14 ) {
15 WAIT_OBJECT_0 => { /* Handle I/O completion */ }
16 WAIT_OBJECT_1 => {
17 let mut msg = std::mem::zeroed();
18 while PeekMessageW(&raw mut msg, std::ptr::null_mut(), 0, 0, PM_REMOVE) == 1 {
19 if msg.message == WM_QUIT {
20 break;
21 }
22 TranslateMessage(&raw const msg);
23 DispatchMessageW(&raw const msg);
24 }
25 }
26 _ => { /* Handle errors */ }
27 }
28 }
29 }
30}