Home Abusing LargePageDrivers to copy shellcode into valid kernel modules
Post
Cancel

Abusing LargePageDrivers to copy shellcode into valid kernel modules

Introduction

Most people in the game hacking community write their kernel-mode drivers to get around kernel-level anti-cheats such as EasyAntiCheat. However, those anti-cheats have several methods to detect cheat drivers. The most commonly used way to load the cheat driver is manually mapping it with tools like kdmapper. Unfortunately, manually mapping a driver in this way causes the code to be outside of a valid module.
Recommended communication methods like IOCTL are rendered unusable because they can be detected with a few lines of code.

While reading through Windows Internals Part 1 and reverse engineering nt!KiSystemStartup I noticed a possible way to abuse a Windows feature to be able to copy shellcode into valid drivers typically read-only .data section and most importantly, execute it. This concept can potentially be used to prevent detection from game anti-cheats while communicating with kernel-mode.

Checkout my lpmapper repository for the finished proof-of-concept.

In this post, I’m going to demonstrate how to make use of this feature to have executable code in kernel-mode without having to allocate any memory. To understand this concept, we will have to take a short look into one of the core data structures of a processor: Page Tables.

Large Pages

Windows makes use of page tables to be able to create separate virtual memory spaces for each context. I will not go too much into detail about how they work as there are many posts about page tables out there already, such as this one. However, one important detail that oftentimes isn’t mentioned is the use of large pages.
A common page table structure looks something like this:

As you can see in this image, a Page Table (PT) can hold up to 512 Page Table Entries (PTEs). With each page having a size of 4096 bytes, this means the last page table addresses 2 megabytes (512 * 4096) of physical memory. This is where large pages come into play. Large pages are a feature exclusively supported by x64 processors. The 7th bit of PDPT-entries and PD-entries, which is called PageSize is used to determine, whether this page table entry points to another Page table or an entire physical page with the size the page table would map.

For example, if the PageSize bit is set on a Page-directory entry (PDE), the page frame number of this entry points to a full contiguous physical 2 megabyte page instead of pointing to another page table. The R/W and NX bits of the PTE decide whether the page is writable or executable. These properties apply to the entire page, which means that for normal pages, the smallest protection region you can modify is 4 kilobytes (4096 bytes). For a large page, it is 2 megabytes (512 * 4096 bytes) and for huge pages, it’s 1 gigabyte (512 * 512 * 4096 bytes).

This aspect is going to become important later.

Large pages are used in applications that need to allocate large memory regions and want to be able to access them quicker. Due to the missing step in the virtual to physical translation, the CPU can access large pages faster.

If you want to inspect the page tables on your system live to get a better understanding of page tables I recommend my tool PTView. By default, Windows maps the ntoskrnl.exe and hal.dll images on large pages. You can get their base address from a kernel debugger and enter it into PTView to directly get the large page they are on.

LargePageDrivers

As I mentioned Windows maps the ntoskrnl.exe image onto a large page. However, if we look into nt!MmLoadSystemImageEx which eventually gets called in the process of Phase 1 system initialization by nt!IoInitSystem, we will see a function named MiMapSystemImageWithLargePage being called after a check. As the name says, this function is responsible for mapping system drivers on large pages. MiUseLargeDriverPage takes the DriverName string and returns whether the driver should be loaded on a large page. I have reverse-engineered the function, the LIST_ENTRY struct, and renamed all variables accordingly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
struct LARGE_PAGE_DRIVER_LIST_ENTRY 
{
    LIST_ENTRY Flink;
    LIST_ENTRY Blink;
    UNICODE_STRING Name;
}

bool __stdcall MiUseLargeDriverPage(PCUNICODE_STRING DriverName)
{
  LARGE_PAGE_DRIVER_LIST_ENTRY *LargePageDriversListEntry; // rbx

  if ( (MiFlags & 0x8000) != 0 || (MiFlags & 0x10000) != 0 )
    return 0;
  if ( MapAllDriversIntoLargePages != 1 )
  {
      // Walk the list
    for ( LargePageDriversListEntry = LargePageDriversList;
          LargePageDriversListEntry != &LargePageDriversList;
          LargePageDriversListEntry = LargePageDriversListEntry->Flink )
    {
      if ( RtlEqualUnicodeString(DriverName, &LargePageDriversListEntry->Name, 1u) )
        return 1;
    }
    return 0;
  }
  return 1; // return true if MapAllDriversIntoLargePages is true
}

This means, to get a driver to load on a large page during boot we have to make sure it’s in the LargePageDriversList. By listing all cross-references in IDA we can find out, that this list is being populated inside of nt!MiInitializeDriverImages which is eventually getting called from nt!MiInitSystem.
The procedure references the global variable MmLargePageDriverBuffer from the ntoskrnl .INIT section. This variable contains the contents of the HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\LargePageDrivers value from the registry. This value has to be of type multi-string.
It has to either contain the file names of the drivers separated by null-terminators or a *, which serves as a wildcard and is going to set the MapAllDriversIntoLargePages I have defined to true. The decompilation of the function shows how the value is being parsed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
FirstLargePageDriverEntry = &LargePageDriversList;
LargePageDriversList = &LargePageDriversList;

// Skip this part if there are no entries
if ( MmLargePageDriverBufferLength != -1 )    
{
StartOfBuffer = &MmLargePageDriverBuffer;
EndOfBuffer = (&MmLargePageDriverBuffer + 2 * ((MmLargePageDriverBufferLength - 2) >> 1));
if ( &MmLargePageDriverBuffer < EndOfBuffer )
{
    whitespaces = 0x100002601i64;
    do
    {
    currentChar = *StartOfBuffer;

    // check if current char is empty
    if ( currentChar <= ' ' 
        && _bittest64(&whitespaces, currentChar) 
        || currentChar == "0\0" )   
    {
        currentChar_1 = StartOfBuffer;
    }
    else
    {
        if ( currentChar == '*' ) // * for wildcard
        {
            MapAllDriversIntoLargePages = 1;
            break;
        }

        for ( currentChar_1 = StartOfBuffer; currentChar_1 < EndOfBuffer; ++currentChar_1 )
        {
            currentCharValue = *currentChar_1;  // skip whitespaces
            if ( currentCharValue <= ' ' && _bittest64(&whitespaces, currentCharValue) )
                break;
                
            if ( currentCharValue == "0\0" )
                break;
        }

        // allocate some memory for the new entry
        NewLargePageDriverEntry = MiAllocatePool(0x40, ' ', 0x704C6D4Du);

        if ( !NewLargePageDriverEntry )
            break;

        //initialize entry
        DriverNameLength = 2 * (currentChar_1 - StartOfBuffer);
        NewLargePageDriverEntry->Name.Buffer = StartOfBuffer;
        NewLargePageDriverEntry->Name.Length = DriverNameLength;
        NewLargePageDriverEntry->Name.MaximumLength = DriverNameLength;
        OldEntry = FirstLargePageDriverEntry;

        if ( *FirstLargePageDriverEntry != &LargePageDriversList )
            __fastfail(3u);

        // Reassign the list links
        NewLargePageDriverEntry->Flink = &LargePageDriversList;
        *&NewLargePageDriverEntry->Blink = OldEntry;
        whitespaces = "\x01\0\0&\x01";
        *OldEntry = NewLargePageDriverEntry;

        // set new list head
        FirstLargePageDriverEntry = NewLargePageDriverEntry;
    }
    StartOfBuffer = currentChar_1 + 1;
    }
    while ( currentChar_1 + 1 < EndOfBuffer );
}

Now we reach the important part. Previously we learned that page protection applies to the entire page. Let’s say we load beep.sys onto a large page now. Its read-only .text section only has the size of a single page.
Normally the image loader simply would map the .text section onto a single page and make it write-protected. The .data section also gets its own page, which then is going to be writable, but not executable.
However, since the loader now is forced to place both the .text and .data sections onto the same page, those sections will be writable and executable.

Note that this all is achieved without any page table manipulation and is a legitimate Windows feature.
While modifying the normally read-only .text section still can easily be detected by comparing the image in memory with the file on the disk, we can freely modify the .data section and write our shellcode into it, which now can be executed.
Finally, since the shellcode remains inside of the driver’s bounds I can directly point the driver’s Device-IO dispatch to the shellcode location inside of the .data section and call it from user-mode via DeviceIoControl.

I will demonstrate this using the beep.sys driver, which is responsible for handling the Beep API function.

Implementation

You can find the full implementation of this in the lpmapper repository. The lpmapper project will map the shellcode into beep.sys by default. This can be modified to use any other driver easily. If you want to test it, run the lpmapper-test project. Make sure that you added beep.sys to LargePageDrivers in the registry key mentioned above.

Note that this concept can be abused in many different ways. For example, you could also just place the shellcode in a third-party driver and have it jump into your manually mapped driver.

I however wanted to demonstrate a concept that does not require any memory allocation at all. The idea is to write a simple Device-IO dispatch handler, shrink it down to the essential part and only copy its the function’s instruction bytes into the .data section. Finally, I will assign the beep.sys driver dispatch to that shellcode.

First of all, we have to make sure the HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\ registry key contains a multi-string value named LargePageDrivers. We are going to set it to either beep.sys, or * if you want to load all drivers onto large pages (which would be kind of wasteful). If we don’t do that, we are going to receive an ATTEMPTED_WRITE_TO_READONLY_MEMORY bluescreen, since lpmapper doesn’t check if the section is on a large page and is writable.

I started by writing the dispatch handler in C++ since I have learned that compilers produce much better machine code than humans. The handler itself is very simple and supports 3 operations: reading cr3, getting a process main module base address, and arbitrarily reading/writing to process memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
NTSTATUS DeviceIOControlHandler(PDEVICE_OBJECT device, PIRP irp)
{
    PIO_STACK_LOCATION irpStack = IoGetCurrentIrpStackLocation(irp);
    auto inputBuffer = irpStack->Parameters.DeviceIoControl.Type3InputBuffer;
    auto outputBuffer = irp->UserBuffer;

    switch (irpStack->Parameters.DeviceIoControl.IoControlCode)
    {
        case IOCTL_RDCR3:
            if (outputBuffer)
            {
                *(uint64_t*)outputBuffer = __readcr3(); 
            }
            break;
        case IOCTL_COPY:
            if (inputBuffer)
            {
                memory_copy* data = (memory_copy*)inputBuffer;
                PEPROCESS process = 0;
                PsLookupProcessByProcessId(data->processId, &process);

                if (process)
                {
                    PEPROCESS sourceProcess = data->write ? IoGetCurrentProcess() : process;
                    PEPROCESS targetProcess = data->write ? process : IoGetCurrentProcess();

                    size_t dummy = 0;
                    MmCopyVirtualMemory(sourceProcess, data->source, targetProcess, data->target, data->size, KernelMode, &dummy);
                    
                    ObDereferenceObject(process);
                }
            }
            break;
        case IOCTL_PROCESS_BASE:
            if (inputBuffer && outputBuffer)
            {
                HANDLE processId = *(HANDLE*)inputBuffer;
                PEPROCESS process = 0;
                PsLookupProcessByProcessId(processId, &process);

                if (process)
                {
                    auto base = PsGetProcessSectionBaseAddress(process);
                    *(PVOID*)outputBuffer = base;
                    ObDereferenceObject(process);
                }
            }
            break;
        default:
            return OriginalDispatch(device, irp);
    }

    irp->IoStatus.Information = 0;
    irp->IoStatus.Status = STATUS_SUCCESS;
    IofCompleteRequest(irp, IO_NO_INCREMENT);

    return STATUS_SUCCESS;
}

I wanted to keep this handler as simple as possible, which is why it doesn’t check for NTSTATUS results for example. This is up to you to implement.

After writing the handler I used my ShellcodeBakery tool to get the shellcode from the compiled binary and display it as a C++ array ready to copy into a source file.
However, the code calls a few imports. Usually, those imports are located inside of the IAT of the driver image. I won’t be mapping the entire driver image though, because I can only fit 4096 bytes into the beep.sys .data section and the driver image would be spread out across a few pages.
This is why I made another tool, that builds a “custom” import address table at the end of the shellcode and relocates all import calls in the shellcode to their appropriate IAT entry.
I did the same for the call to the OriginalDispatch. You can find that function table in the source code here. This table gets populated with the import address during runtime before its copied into kernel-mode.

I used kdmapper’s intel_driver library to access kernel memory because it already had lots of useful functions implemented, such as GetKernelModuleExport.

At first, lpmapper will try to find the beep.sys module and its DriverObject. I get the module address from GetKernelModuleExport, to get the DriverObject I created a new function in kdmapper called CallNtosExport. This function calls IoGetDeviceObjectPointer to get the Beep DeviceObject. The DeviceObject holds the DriverObject. This happens here.

After that, I get the original driver dispatch from DriverObject->MajorFunction[14]. This address is stored in the previously mentioned function table of the shellcode, along with all other needed imports which I can get using GetKernelModuleExport. The code responsible for this is located here.

Finally, the shellcode is copied into the beep.sys .data section here.

In the final step, lpmapper sets the DriverObjects Device-IO dispatch to the shellcode location here.

Testing

After running lpmapper you can now test the concept with the lpmapper-test project. This project also displays how to interact with the dispatch handler.

The following code contains a reference to a handle to the Beep driver. You can acquire this handle using CreateFile:

1
2
HANDLE beepHandle = CreateFile(L"\\\\.\\GLOBALROOT\\Device\\Beep", FILE_ANY_ACCESS, 0, 
                                    nullptr, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, nullptr);

I’ll briefly go over how the 3 supported operations are implemented.

Read CR3

This operation is tested in TestReadCr3.
It will return the value of the current CPUs cr3 register, which is the DirectoryTableBase of the current process, into the output buffer.

1
2
3
4
5
6
7
const ULONG IOCTL_READCR3 = CTL_CODE(0x8000, 0x802, METHOD_NEITHER, FILE_ANY_ACCESS);
uint64_t cr3 = 0;

bool success = DeviceIoControl(beepHandle, IOCTL_READCR3,
    nullptr, 0,
    &cr3, sizeof(uint64_t),
    nullptr, nullptr);

Getting a processes main module base

This operation is tested in TestProcessBase.
After passing in a process id into the input buffer it will return that process’s main module base address in the output buffer.

1
2
3
4
5
6
7
8
9
const ULONG IOCTL_PROCESS_BASE = CTL_CODE(0x8000, 0x800, METHOD_NEITHER, FILE_ANY_ACCESS);

uint64_t processBase = 0;
uint64_t processId = GetCurrentProcessId();

bool success = DeviceIoControl(beepHandle, IOCTL_PROCESS_BASE,
    &processId, sizeof(uint64_t),
    &processBase, sizeof(uint64_t),
    nullptr, nullptr);

Reading and writing process memory

This operation is tested in TestMemoryRead.
You have to initialize a memory_copy struct which has to be passed to the input buffer. The processId member of that struct holds the target process. The write member decides whether the procedure reads from the target process, or writes to it. sourceAddress and targetAddress have to be set accordingly. The size member determines the number of bytes to copy.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
struct memory_copy
{
    uint64_t processId;
    PVOID sourceAddress;
    PVOID targetAddress;
    BOOL write;
    SIZE_T size;
};

const ULONG IOCTL_COPY = CTL_CODE(0x8000, 0x801, METHOD_NEITHER, FILE_ANY_ACCESS);

char buffer[3]; // buffer for "MZ" + '\0'

memory_copy data = {};

data.processId = GetCurrentProcessId();
data.targetAddress = buffer;
data.sourceAddress = (PVOID)baseAddress;
data.write = false;
data.size = 2; //only read MZ into buffer

buffer[2] = 0; // set null terminator after "MZ"

bool success = DeviceIoControl(beepHandle, IOCTL_COPY,
    &data, sizeof(memory_copy),
    nullptr, 0,
    nullptr, nullptr);

Detection

I have tested this on an EasyAntiCheat protected game for a substantial amount of time and did not run into a ban, but that is not enough data, since EasyAntiCheat is known for not always banning a player when it detects something.

Since LargePageDrivers is an in-house Windows feature, the best the anti-cheat can do is to give you a little flag for having a driver mapped in such a way.

I’m going to discuss the possible ways how this could potentially be detected - and what can be done to prevent that from happening.

If you noticed any detection vector that I missed feel free to contact me on Discord or Twitter about it.

Dispatch Hooks

In my example, I hooked the driver’s dispatch. You might be wondering why I can’t just point that to my driver residing inside of a pool. This simply is, because most modern anti-cheats simply check if the dispatch function is located within the driver’s bounds.
Such a check could look something like this:

1
2
3
4
5
6
7
8
9
PDRIVER_OBJECT diskDriver = //get the DriverObject
 
PVOID driverDispatch = diskDriver->MajorFunction[IRP_MJ_DEVICE_CONTROL];
 
if(driverDispatch > diskDriver.DriverStart + diskDriver.DriverSize ||
   driverDispatch < diskDriver.DriverStart)
{
    // Take action
}

The most common anti-cheats have been doing this for ages.
The EAC-Reversing repository showcases how EasyAntiCheat was doing it back in 2019 at least.

For BattlEye I have analyzed the recently released full bedaisy.sys dump posted on unknowncheats.me by anypot.

I have found a few of those checks, this could be a self-integrity check, however, the principle is the same:

1
2
3
4
5
6
7
8
9
10
11
12
for ( majorFunctionIndex = 0; majorFunctionIndex < 0x1C; ++majorFunctionIndex )
{
    majorFunction = DriverObject->MajorFunction[majorFunctionIndex];
    if ( majorFunction )
    {
        DriverStart = DriverObject->DriverStart;
        if ( majorFunction < DriverStart || majorFunction >= DriverStart + DriverObject->DriverSize )
        {
            // take action
        }
    }
}

If you take a look at those, you will notice that this PoC passes those checks, since the dispatch still points to an address inside of the driver’s bounds.

A way to detect this project specifically would be to parse the Debug symbols for beep.sys and check, if the dispatch is pointing to the correct dispatch handler. However, you can easily modify this project to use any third-party driver that does not have symbols available.

Another way to at least flag this, is to check whether the dispatch is pointing into an executable section using the PE header. The problem with that once again is, that it is technically not illegal to have the dispatch pointing into the .data section. It’s not common, but a reputable anti-cheat should be careful in taking action due to such flags. If you use techniques to obfuscate and encrypt the shellcode it could potentially make it even harder to detect reliably.

Stack walking

Stack walking is an often-discussed detection method used by anti-cheats. It works by delivering APCs to all threads, getting their contexts, and checking the return addresses on the stack. Those are then used to determine whether the thread has been executing code outside of a valid module.
However, in my example, the thread never leaves the beep.sys or ntoskrnl.exe module. It is running inside of the .data section however, which could lead to a flag if explicitly checked for.

NMI-Callbacks

Similar to stack walking, NMI callbacks interrupt your thread midway and check where and what it is currently executing. This could lead to a potential detection if the anti-cheat finds your thread executing inside of the .data section.

Conclusion

It has been a pleasure researching this feature. I highly recommend you to read through Windows Internals Part 1 as it is a good book and can spark a few ideas.

If you found a mistake in this post or noticed a critical fact that I missed, please contact me over Discord, Twitter, or any channel you find. I highly appreciate any critical feedback I can get.

This post is licensed under CC BY 4.0 by the author.