Contents

Context Hijacking

Welcome to the first blog in a 2 part series on implementing control flow obfuscation by abusing elementary windows mechanisms. In this blog I will give an introduction to Context Hijacking and the second blog will go into Exception Hijacking, continuing on from the ideas laid out in this blog

Context Hijacking is a way of obfuscating control-flow or data-flow in your windows executable by abusing APIs that use the CONTEXT struct. Using some of the callbacks present in ntdll that get called directly by the kernel we can register pieces of code that directly, or indirectly, manipulate the context structs stored on the stack. These context structs are used, at the end of the callbacks, to return the thread back to the state it was in previous to the interruption. Because we have control over code that gets executed during a callback, we can craft a payload that manipulates the data stored in these contexts and alter the control flow or data flow of the executable.

What are Contexts

The way microsoft describes it:

Contains processor-specific register data. The system uses CONTEXT structures to perform various internal operations. src: https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-context

Contexts are used by many functions to keep track of the state of a processor core at a given point during execution. If a thread is interrupted during execution by an event that the kernel needs to handle immediately, it would be problemating if the thread state isn’t saved before loading new code into the processor. While this is possibly the most common use-case for the context struct, there’s other functionality that make use of it, too. Some of the ways we use contexts when developing are through the GetThreadContext/SetThreadContext and InitializeContext/CopyContext family of functions. These functions take a context as input and perform actions on its contents. Then there are functions which we don’t often call ourselves, but are used by windows internally. Examples of these are RtlDispatchException and KiUserApcDispatcher.

So why is this struct interesting to us? The struct exposes a way to read and write the contents of specific registers while a thread is paused. Some of these registers include:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
 DWORD64 Rax;
 DWORD64 Rcx;
 DWORD64 Rdx;
 DWORD64 Rbx;
 DWORD64 Rsp;
 DWORD64 Rbp;
 DWORD64 Rsi;
 DWORD64 Rdi;
 DWORD64 R8;
 DWORD64 R9;
 DWORD64 R10;
 DWORD64 R11;
 DWORD64 R12;
 DWORD64 R13;
 DWORD64 R14;
 DWORD64 R15;
 DWORD64 Rip;

Due to the way these structs are passed to the internal callback functions, we can easily and reliably access its contents. This offers an interesting avenue for sneakily and stealthily altering the state of a thread. In the next section I will be talking about some areas in ntdll where we can see this phenomenon in action.

Examples of Context Hijacking

Providing an overview of the technique in text goes a long way, but nothing beats some practical examples. Following is a list of areas in ntdll where we have direct, or indirect, access to the context struct. Each part is accompanied by a little bit of background on the specific function and a code snippet detailing the way of accessing the context.

KiUserApcDispatcher

The first case is about dispatching a local APC to hijack the context passed to KiUserApcDispatcher. As this blog isn’t about APC I won’t go too much into detail on the internals of APCs, please refer to this excellent blog by @0xrepnz for a detailed look at how it works behind the scenes.

Here is a quick overview of the steps the OS takes during the APC queueing process:

  1. Call QueueUserApc with a pointer to the function we want it to execute
  2. Transition to kernel is made and NtQueueUserApcThread is executed. This function sets up a lot of the bookkeeping data needed for the APC
  3. Eventually the APC is injected back into the local (or remote) process using KeInsertQueueApc where it will be queued until the thread signals the kernel that it wants to execute APCs
  4. Call one of the APIs for putting the thread into an alertable state (such as WaitForSingleObject or SleepEx ), signaling the kernel that it is ready to execute APCs.
  5. Upon transitioning to kernel, the kernel saves a copy of the current processor state for the thread in a context structure.
  6. Kernel checks for the alertable flag in our EPROCESS, which we just set using one of the previously mentioned APIs. If alertable we dispatch the APC.
  7. APC gets dispatched back to usermode (by altering the state of the thread that called the sleep function) to point to KiUserApcDispatcher, then a copy of the context is placed onto the stack of the thread.
  8. Transition back to user-mode is made and the thread thinks the place it left off was the start of KiUserApcDispatcher , so it starts executing it.

Let’s start our investigation from this point. Taking a look at KiUserApcDispatcher we can see its structure is fairly easy to understand:

/images/context_hijack/81167be6b7473f9e44ef9eaf5ab8b4c9.png

Among the arguments are the context struct, which is contained entirely on the thread’s stack, and a pointer to the queued APC. It calls KiUserCallForwarder and executes our queued function. After the function is done executing it calls ZwContinueEx with a pointer to the context as argument. This function passes the context back to the kernel, which then takes care of restoring the thread back to its original state (after the SleepEx call).

What is important here is that our APC queued function is executed in the same thread context as where the context is stored. Using the same stack. And this context is then used by the kernel to restore the entire thread state. Sounds like an opportunity to mess with some parameters!

A direct pointer to the context on the stack isn’t actually given to our APC function, but because the stack is very predictable we only have to dynamically calculate the offset from our function’s stack to the context. Once the offset has been determined there’s nothing standing of the APC’s code manipulating the thread state.

Following is a small proof of concept:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#include <windows.h>
#include <stdint.h>

volatile
void
win()
{
    printf("\nThis is our thread now!\n\n");
    Sleep(2000);
}

void
alertable_thread()
{
    printf("alertable_thread called\n");

    /* Enter alertable state */
    SleepEx(INFINITE, TRUE);

    printf("alertable_thread done\n");
}

void
apc_func(void *arg)
{

/*
 *  In here we execute code that will overwrite
 *  the context saved on the stack and change
 *  CONTEXT->Rip, causing ZwContinue to change
 *  the control flow to the win() function
 */
    printf("APC function called\n"); 

    int      placeholder_sz;
    int      rip_offset;
    uint64_t placeholder[20];

    placeholder_sz  = 20;
    
/*
 *  Offset to CONTEXT->Rip from the start of the
 *  placeholder[] variable. Ideally we calculate
 *  this properly from assembly, but this will do
 *  for now.
 */
    rip_offset  = placeholder_sz;
    rip_offset += 51;

    /* Overwrite CONTEXT->rip with a pointer to our win function */
    placeholder[rip_offset] = win;
}

void
main()
{
    HANDLE t;

    /* Spawn an alertable thread */
    t = CreateThread(NULL, 0, alertable_thread, NULL, 0, NULL);
    
/*
 *  Give our thread some time to start up.
 *  If we don't do this our APC will get queued before
 *  the thread is active and get executed before the
 *  thread starts and enters an alertable state.
 */
    Sleep(2000);

    /* Queue our context altering APC function */
    QueueUserAPC(apc_func, t, NULL);

    /* Wait for our thread to complete execution */
    WaitForSingleObject(t, INFINITE);

    printf("main done\n");
}

Executing this will allow us to call our win function without ever explicitly calling it or it showing up in any stack frame.

RtlpCallVectoredHandlers

Next, let’s take a look at a more direct way of accessing the context. This method works in much the same way as the previous one, as in:

  1. We cause a condition where our thread is interrupted and a transition to kernel is forced
  2. The kernel does some bookkeeping and forces execution back to user-mode through a dispatcher function and places the CONTEXT struct onto the stack
  3. The user-mode dispatcher then calls a callback we registered from which we can access the context

When we trigger certain exceptions the kernel will attempt to recover it by calling a registered exception handler in user-mode. This process is performed by the ntdll functions KiUserExceptionDispatcher and RtlDispatchException.

RtlDispatchException prototype:

1
2
3
4
5
BOOLEAN
RtlDispatchException(
 IN PEXCEPTION_RECORD ExceptionRecord,
 IN PCONTEXT ContextRecord
 );

The first requirement for this technique to work is fulfilled by the kernel putting a context struct onto the thread’s stack. Looking at KiUserExceptionDispatcher we can see that rsp is used as a pointer that is loaded into the ContextRecord argument for RtlDispatchException:

/images/context_hijack/9b0aa575758644590c6c53b961d5cef0.png

From this same screenshot the second requirement for this technique to work is visible: a function call to ZwContinue to which the CONTEXT is passed. This presents us with the opportunity, once again, to manipulate the data inside the context from any code we registered that gets called by RtlDispatchException.

This is where RtlpCallVectoredHandlers comes in. RtlpCallVectoredHandlers is one of the 3 areas developers can insert their exception handler. The other 2 are continue handlers and try-except handlers. Here’s a quick table displaying which api needs to be called to register a handler for each type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Vectored handler (called first)
RtlpCallVectoredHandlers = AddVectoredExceptionHandler(1, your_handler)

// Continue handler (called last)
RtlpCallVectoredHandlers = AddVectoredContinueHandler(1, your_handler)

// Try Except handler
RtlpExecuteHandlerForException = __try{/*trigger exception here*/}

__except(your_handler(GetExceptionInformation())){/*anything in here*/}

Using the vectored exception handler we can construct the following PoC:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <windows.h>
#include <stdio.h>

void win()
{
    printf("\nThis is our thread now!\n\n");
}

int exhandler(EXCEPTION_POINTERS *info)
{
    printf("exhandler called\n");

    info->ContextRecord->Rip = win;

    return EXCEPTION_CONTINUE_EXECUTION;
}

int
main()
{
    printf("main called\n");
    int y = 2, x = 8;
    
    if (!(AddVectoredExceptionHandler(1, exhandler)))
    {
        printf("failed adding vectored handler\n");
        return 1;
    }
    
    /* Trigger exception */
    RaiseException(0, 0, 0, NULL);
    
    printf("main done\n");
    return 0;
}

The added benefit of this method is that we don’t have to rely on finding the offset on the stack whereour CONTEXT is located, we can directly access it through an argument the OS passes to our exception handler. The downside of this method is that it becomes a little bit more obvious that you’re accessing the CONTEXT.

try-except

Pretty much the same approach as the method above, except now using the try-except handler:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <windows.h>
#include <stdio.h>

void win()
{
    printf("\nThis is our thread now!\n\n");
}

int exhandler(EXCEPTION_POINTERS *info)
{
    info->ContextRecord->Rip = win;
    return EXCEPTION_CONTINUE_EXECUTION;
}

int
main()
{
    printf("main called\n");
    int y = 2, x = 8;
    
    __try
    {
        /* Trigger exception */
        RaiseException(0, 0, 0, NULL);
    }
    __except(exhandler(GetExceptionInformation()))
    {
        printf("exception triggered\n");
    }
    
    return 0;
}

Conclusion

This is a first look at possible ways we could make it more difficult for static analysis tooling (disassemblers included) to understand what’s going on. While interesting, these techniques do leave traces for reverse engineers to follow. In the next blog I’m going to dive deeper into the world of exception handlers and detail a technique that completely eliminates any trace of a function call being made from the control flow graph.

Stay tuned!