Context Hijacking
Welcome to the first blog in a 2 part series on implementing control flow obfuscation by abusing elementary windows mechanisms. In this blog I will give an introduction to Context Hijacking and the second blog will go into Exception Hijacking, continuing on from the ideas laid out in this blog
Context Hijacking is a way of obfuscating control-flow or data-flow in your windows executable by abusing APIs that use the CONTEXT struct. Using some of the callbacks present in ntdll that get called directly by the kernel we can register pieces of code that directly, or indirectly, manipulate the context structs stored on the stack. These context structs are used, at the end of the callbacks, to return the thread back to the state it was in previous to the interruption. Because we have control over code that gets executed during a callback, we can craft a payload that manipulates the data stored in these contexts and alter the control flow or data flow of the executable.
What are Contexts
The way microsoft describes it:
Contains processor-specific register data. The system uses CONTEXT structures to perform various internal operations. src: https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-context
Contexts are used by many functions to keep track of the state of a processor core at a given point during execution. If a thread is interrupted during execution by an event that the kernel needs to handle immediately, it would be problemating if the thread state isn’t saved before loading new code into the processor. While this is possibly the most common use-case for the context struct, there’s other functionality that make use of it, too. Some of the ways we use contexts when developing are through the GetThreadContext/SetThreadContext and InitializeContext/CopyContext family of functions. These functions take a context as input and perform actions on its contents. Then there are functions which we don’t often call ourselves, but are used by windows internally. Examples of these are RtlDispatchException
and KiUserApcDispatcher
.
So why is this struct interesting to us? The struct exposes a way to read and write the contents of specific registers while a thread is paused. Some of these registers include:
|
|
Due to the way these structs are passed to the internal callback functions, we can easily and reliably access its contents. This offers an interesting avenue for sneakily and stealthily altering the state of a thread. In the next section I will be talking about some areas in ntdll where we can see this phenomenon in action.
Examples of Context Hijacking
Providing an overview of the technique in text goes a long way, but nothing beats some practical examples. Following is a list of areas in ntdll where we have direct, or indirect, access to the context struct. Each part is accompanied by a little bit of background on the specific function and a code snippet detailing the way of accessing the context.
KiUserApcDispatcher
The first case is about dispatching a local APC to hijack the context passed to KiUserApcDispatcher
. As this blog isn’t about APC I won’t go too much into detail on the internals of APCs, please refer to this excellent blog by @0xrepnz for a detailed look at how it works behind the scenes.
Here is a quick overview of the steps the OS takes during the APC queueing process:
- Call
QueueUserApc
with a pointer to the function we want it to execute - Transition to kernel is made and
NtQueueUserApcThread
is executed. This function sets up a lot of the bookkeeping data needed for the APC - Eventually the APC is injected back into the local (or remote) process using
KeInsertQueueApc
where it will be queued until the thread signals the kernel that it wants to execute APCs - Call one of the APIs for putting the thread into an alertable state (such as
WaitForSingleObject
orSleepEx
), signaling the kernel that it is ready to execute APCs. - Upon transitioning to kernel, the kernel saves a copy of the current processor state for the thread in a context structure.
- Kernel checks for the alertable flag in our EPROCESS, which we just set using one of the previously mentioned APIs. If alertable we dispatch the APC.
- APC gets dispatched back to usermode (by altering the state of the thread that called the sleep function) to point to
KiUserApcDispatcher
, then a copy of the context is placed onto the stack of the thread. - Transition back to user-mode is made and the thread thinks the place it left off was the start of
KiUserApcDispatcher
, so it starts executing it.
Let’s start our investigation from this point. Taking a look at KiUserApcDispatcher
we can see its structure is fairly easy to understand:
Among the arguments are the context struct, which is contained entirely on the thread’s stack, and a pointer to the queued APC. It calls KiUserCallForwarder
and executes our queued function. After the function is done executing it calls ZwContinueEx
with a pointer to the context as argument. This function passes the context back to the kernel, which then takes care of restoring the thread back to its original state (after the SleepEx
call).
What is important here is that our APC queued function is executed in the same thread context as where the context is stored. Using the same stack. And this context is then used by the kernel to restore the entire thread state. Sounds like an opportunity to mess with some parameters!
A direct pointer to the context on the stack isn’t actually given to our APC function, but because the stack is very predictable we only have to dynamically calculate the offset from our function’s stack to the context. Once the offset has been determined there’s nothing standing of the APC’s code manipulating the thread state.
Following is a small proof of concept:
|
|
Executing this will allow us to call our win function without ever explicitly calling it or it showing up in any stack frame.
RtlpCallVectoredHandlers
Next, let’s take a look at a more direct way of accessing the context. This method works in much the same way as the previous one, as in:
- We cause a condition where our thread is interrupted and a transition to kernel is forced
- The kernel does some bookkeeping and forces execution back to user-mode through a dispatcher function and places the
CONTEXT
struct onto the stack - The user-mode dispatcher then calls a callback we registered from which we can access the context
When we trigger certain exceptions the kernel will attempt to recover it by calling a registered exception handler in user-mode. This process is performed by the ntdll functions KiUserExceptionDispatcher
and RtlDispatchException
.
RtlDispatchException prototype:
|
|
The first requirement for this technique to work is fulfilled by the kernel putting a context struct onto the thread’s stack. Looking at KiUserExceptionDispatcher
we can see that rsp
is used as a pointer that is loaded into the ContextRecord
argument for RtlDispatchException:
From this same screenshot the second requirement for this technique to work is visible: a function call to ZwContinue
to which the CONTEXT is passed. This presents us with the opportunity, once again, to manipulate the data inside the context from any code we registered that gets called by RtlDispatchException.
This is where RtlpCallVectoredHandlers
comes in. RtlpCallVectoredHandlers is one of the 3 areas developers can insert their exception handler. The other 2 are continue handlers and try-except
handlers. Here’s a quick table displaying which api needs to be called to register a handler for each type:
|
|
Using the vectored exception handler we can construct the following PoC:
|
|
The added benefit of this method is that we don’t have to rely on finding the offset on the stack whereour CONTEXT is located, we can directly access it through an argument the OS passes to our exception handler. The downside of this method is that it becomes a little bit more obvious that you’re accessing the CONTEXT.
try-except
Pretty much the same approach as the method above, except now using the try-except handler:
|
|
Conclusion
This is a first look at possible ways we could make it more difficult for static analysis tooling (disassemblers included) to understand what’s going on. While interesting, these techniques do leave traces for reverse engineers to follow. In the next blog I’m going to dive deeper into the world of exception handlers and detail a technique that completely eliminates any trace of a function call being made from the control flow graph.
Stay tuned!