Building "Horus": A Type-2 Hypervisor for Windows

Abstract

This post details a Proof of Concept (PoC) originally developed a few years ago as I began my journey into hardware-assisted virtualization and AMD SVM. It serves as a technical archive of my learning process, born from a deep dive into the AMD Architecture Programmer’s Manuals and various online resources.

To bridge the gap between theory and implementation, the core structure of this hypervisor was built by analyzing and referencing open-source projects like AetherVisor and SimpleSVM. These resources were instrumental in helping me translate documentation into practical knowledge.

The sole objective of this project was education and technical curiosity. All techniques discussed are documented strictly for academic and research purposes. I do not encourage or condone the use of these methods for any malicious activity; this is simply a look back at a foundational learning project.

1. Introduction: The Frontier of “Ring -1”

In the adversarial realms of malware analysis, anti-cheat development, and offensive security, traditional Kernel-level (Ring 0) access faces diminishing returns. The evolution of Windows security mitigations—specifically PatchGuard (KPP) and Kernel Mode Code Integrity (KMCI)—has rendered classic techniques like SSDT or Inline Hooking both hazardous and trivially detectable. Consequently, the frontier of deep system introspection has shifted to “Ring -1” (virtualization mode).

This post details the architecture and development of Horus, a custom Type-2 Hypervisor for AMD64 processors using Secure Virtual Machine (SVM) technology. We will explore the journey of transparently virtualizing a running Windows system, moving beyond standard driver development into the complexities of processor microarchitecture and hardware exception handling. Technical highlights include the implementation of Nested Page Tables (NPT) for undetectable memory hooking and a stealthy Syscall Tracing mechanism based on exception injection.

Disclaimer: This project and the techniques discussed herein are strictly for education and research purposes regarding internal AMD architecture and the Windows Kernel.

2. Architecture & The “Blue Pill” process

The lifecycle of Horus begins as a standard Windows driver. Its primary goal is to perform the “Blue Pill” transition: migrating the running OS into a virtual machine on the fly.

2.1. System Virtualization Loop

The entry point, virtualize_system, iterates over every logical processor. Since the Guest state is local to each core, Horus must perform a strict initialization sequence for each one.

Context Capture: We capture the current processor state (RtlCaptureContext) to ensure a seamless transition.
VMCB Allocation: We allocate and clear the Virtual Machine Control Block (VMCB). This 4KB structure dictates the CPU behaviour, defining which events (like CPUID or MSR writes) trigger a #VMEXIT.
Host State Setup: The VM_HSAVE_PA MSR is configured to point to a physical memory region where the processor saves the Host state before running the Guest.
Launch: The driver invokes vm_launch (an assembly routine), which executes VMRUN. At this precise moment, the CPU promotes the driver to Host Mode (Ring -1) and demotes the running OS to Guest Mode.

2.2. Validating the State

Before virtualization, Horus rigorously validates the system state against AMD’s manual requirements (checking EFER, CR0, CR3 and CR4 combinations). If the Guest state is “illegal” (e.g., specific reserved bits are set), the VMRUN instruction would fail, potentially causing a system crash. Horus prevents this by pre-validating most of these conditions.

AMD64 Architecture Programmer’s Manual, Volume 2:

Illegal guest state combinations cause a #VMEXIT with error code VMEXIT_INVALID. The following conditions are considered illegal state combinations (note that some checks may be subject to VMCB Clean field settings, see below):

EFER.SVME is zero.

RCR0.CD is zero and CR0.NW is set.

CR0[63:32] are not zero.

Any MBZ bit of CR3 is set.

Any MBZ bit of CR4 is set.

DR6[63:32] are not zero.

DR7[63:32] are not zero.

Any MBZ bit of EFER is set.

EFER.LME and CR0.PG are both set and CR4.PAE is zero.

EFER.LME and CR0.PG are both non-zero and CR0.PE is zero.

The VMRUN intercept bit is clear.

ASID is equal to zero.

3. Memory Introspection: “Split-View” NPT Hooking

Memory management is arguably the most complex aspect of a hypervisor. While standard paging translates Virtual Addresses (VA) to Physical Addresses (PA), virtualization adds a second layer: Guest Physical Addresses (GPA) to Host Physical Addresses (HPA) using Nested Page Tables (NPT).

Since standard AMD64 paging lacks a native “Execute-Only” permission bit (meaning any page marked “Present” is inherently readable), we cannot simply hide our hooks within a single memory mapping. To circumvent this, Horus leverages NPT to implement Split-View Hooking, a technique that bypasses PatchGuard integrity checks by showing one version of memory for reading and a different version of execution.

3.1. The Dual Hierarchy

Horus constructs two distinct NPT hierarchies:

Primary NPT (Read View): This is the default table. It maps the guest physical memory 1:1 but marks our target hook pages as Non-Executable (NX). If Windows reads these pages to check for integrity, it sees the original, unmodified bytes.
Shadow NPT (Execute View): This table is mostly Non-Executable, except for the specific pages we have hooked, which are marked Executable. These pages contain our shellcode or modified functions.

Image showing the Split-View NPT Hooking structure

This simple diagram illustrates the two NPT hierarchies used by Horus.

3.2. The Trapping Mechanism (#NPF)

The switching mechanism relies on the Nested Page Fault (#NPF) exception. Instead of relying on complex Single Stepping (#DB) flags, like TF or MTF, Horus leverages the page fault mechanism to switch views dynamically:

Execution Trap: When the OS attempts to execute code we hooked, the Primary NPT (marked NX) triggers a #VMEXIT(NPF).
The Switch: Horus intercepts the fault, detects that the instruction pointer (RIP) matches our target, and updates the VMCB’s nCR3 to point to the Shadow NPT.
Execution: The CPU resumes, now using the Shadow NPT where the code is executable. The payload/hook runs.
Restoration: Once execution flows out of the hooked page into adjacent memory, another #NPF occurs (since the Shadow NPT maps the rest of memory as NX). The handler catches this and swaps back to the Primary NPT, closing the loop.

To the OS, the memory looks clean (Read View), but the CPU executes our modified code (Execute View). This “Ping-Pong” approach ensures robust isolation without the performance overhead of instruction-level tracing.

src/memory/npt/npt.cpp

void horus::vmm::npt::npt_exit_handler( VcpuData* vcpu )
{
    EXITINFO1 exit_info1 = { 0 };
    exit_info1.AsUInt64 = vcpu->guest_vmcb.control_area.exit_info1;
 
    uint64_t guest_rip = vcpu->guest_vmcb.save_state_area.rip;
 
    ...
	
    // #NPF thrown due to execution attempt of a non-executable page
    if ( exit_info1.Fields.Execute == 1 ) 
    {
        auto rip_page = PAGE_ALIGN( guest_rip );
        bool is_hooked_page = false;
 
        // Iterate over all hooks and check if the RIP matches any of them.
        for ( int i = 0; i < g_npt_state.hook_count; i++ )
        {
            auto& hook = g_npt_state.hook_pool[ i ];
            if ( hook.function_address == reinterpret_cast< void* >( guest_rip ) || 
                PAGE_ALIGN( hook.function_address ) == rip_page )
            {
                is_hooked_page = true;
                break;
            }
        }
 
        if ( is_hooked_page )
        {
            // Switch to shadow NPT where the hooked page is executable and the target function contains our shellcode.
            vcpu->guest_vmcb.control_area.ncr3 = vcpu->npt_shared_data->shadow_npt_cr3;
        }
        else 
        {
            // Switch back to primary NPT to resume normal execution.
            vcpu->guest_vmcb.control_area.ncr3 = vcpu->npt_shared_data->primary_npt_cr3;
        }
 
        // Clear bit 4 of vmcb_clean, indicating CR3 and TLBControl have been changed.
        vcpu->guest_vmcb.control_area.vmcb_clean &= 0xFFFFFFEF;
 
        // Invalidate all TLB entries.
        vcpu->guest_vmcb.control_area.tlb_control = 3;
    }
}

Switch mechanism relying on the #NPF exception. Note: This snippet has been condensed for readability; non-essential lines have been omitted and the code has been refactored for simplicity.

Live stress-test of the Split-View NPT mechanism. The logs show the hypervisor successfully intercepting execution attempts of NtDeviceIoControlFile from critical system processes like dwm.exe (Desktop Window Manager) and explorer.exe.

4. Stealthy Syscall Tracing

Traditional syscall hooking (modifying LSTAR MSR or the System Service Descriptor Table) is a guaranteed way to trigger PatchGuard. Horus bypasses this by modifying the hardware’s behaviour rather than the OS’s data structures.

4.1. The EFER.SCE bit Trick

In x64 architecture, the SYSCALL instruction is only valid if the SCE (System Call Enable) bit in the EFER register is set. Horus clears this bit in the Guest’s state and intercepts reads/writes to EFER via the MSR Bitmap.

To the Guest: We report that SCE is enabled (1).
To the Hardware: We silently force SCE to be disabled (0).

4.2. Emulation via #UD

When a process executes SYSCALL:

The CPU throws a #UD (Undefined Opcode) exception because the SCE bit is physically disabled.
Horus intercepts this #UD exception before the Windows IDT sees it.
The handler verifies the opcode is 0F 05 (SYSCALL).
Emulation: The hypervisor manually updates RIP, RFLAGS, CS, SS and RCX, effectively mimicking the SYSCALL instruction in software.

src/memory/syscalls/syscalls.cpp

bool horus::vmm::syscalls::emulate_syscall( VcpuData* vcpu, GuestRegisters* guest_context )
{
    ...
 
    // Check whether we are in long mode or compatibility mode
    // and set RIP to LSTAR or CSTAR accordingly
    if ( vcpu->guest_vmcb.save_state_area.cs_attrib.fields.long_mode == 0 && 
        vcpu->guest_vmcb.save_state_area.cs_attrib.fields.default_bit == 1 ) {
        // 32 bit mode
        vcpu->guest_vmcb.save_state_area.rip = __readmsr( MSR_CSTAR );
    } 
    else if ( vcpu->guest_vmcb.save_state_area.cs_attrib.fields.long_mode == 1) {
        // 64 bit mode
        vcpu->guest_vmcb.save_state_area.rip = __readmsr( MSR_LSTAR );
    }
    else {
        ...
    }
 
    // RCX.q = next_RIP
    guest_context->rcx = guest_rip + instruction_length;
 
    // R11.q = RFLAGS with rf cleared
    guest_context->r11 = vcpu->guest_vmcb.save_state_area.rflags.Flags;
 
    // RFLAGS = RFLAGS AND ~MSR_SFMASK
    // RFLAGS.RF = 0
    uint64_t msr_fmask = __readmsr( MSR_FMASK );
    vcpu->guest_vmcb.save_state_area.rflags.Flags &= ~( msr_fmask | X86_RFLAGS_RF );
 
    // Adjust CS and SS segments
    uint64_t msr_star = __readmsr( MSR_STAR );
 
    // CS.sel = MSR_STAR.SYSCALL_CS AND 0xFFFC
    vcpu->guest_vmcb.save_state_area.cs_selector = ( uint16_t )( ( msr_star >> 32 ) & ~3 );
 
    // CS.base = 0
    vcpu->guest_vmcb.save_state_area.cs_base = 0;
 
    // CS.limit = 0xFFFFFFFF
    vcpu->guest_vmcb.save_state_area.cs_limit = 0xFFFFFFFF;
 
    // CS.attr = 64-bit code,dpl0 
    vcpu->guest_vmcb.save_state_area.cs_attrib.fields.type = 0xB;
    vcpu->guest_vmcb.save_state_area.cs_attrib.fields.system = 1;
    vcpu->guest_vmcb.save_state_area.cs_attrib.fields.dpl = 0;
    vcpu->guest_vmcb.save_state_area.cs_attrib.fields.present = 1;
    vcpu->guest_vmcb.save_state_area.cs_attrib.fields.long_mode = 1;
    vcpu->guest_vmcb.save_state_area.cs_attrib.fields.default_bit = 0;
    vcpu->guest_vmcb.save_state_area.cs_attrib.fields.granularity = 1;
 
    // SS.sel = MSR_STAR.SYSCALL_CS + 8
    vcpu->guest_vmcb.save_state_area.ss_selector = ( uint16_t )( ( ( msr_star >> 32 ) & ~3 ) + 8 );
 
    // SS.base = 0
    vcpu->guest_vmcb.save_state_area.ss_base = 0;
 
    // SS.limit = 0xFFFFFFFF
    vcpu->guest_vmcb.save_state_area.ss_limit = 0xFFFFFFFF;
 
    // SS.attr = 64-bit stack,dpl0
    vcpu->guest_vmcb.save_state_area.ss_attrib.fields.type = 3;
    vcpu->guest_vmcb.save_state_area.ss_attrib.fields.system = 1;
    vcpu->guest_vmcb.save_state_area.ss_attrib.fields.dpl = 0;
    vcpu->guest_vmcb.save_state_area.ss_attrib.fields.present = 1;
    vcpu->guest_vmcb.save_state_area.ss_attrib.fields.default_bit = 1;
    vcpu->guest_vmcb.save_state_area.ss_attrib.fields.granularity = 1;
 
    // CPL = 0
    vcpu->guest_vmcb.save_state_area.cpl = 0;
 
    return true;
}

Syscall emulation core logic. Note: This snippet has been condensed for readability; non-essential lines have been omitted and the code has been refactored for simplicity.

Tracing: Before resuming, we can inspect the registers (syscall index in RAX, arguments in RCX, RDX, etc.) to log the activity of specific processes.

Horus also handles the SYSRET instruction similarly, ensuring the return path to User Mode is equally controlled and stable.

Image showing the Syscall Tracing ETW logs

Real-time tracing of notepad.exe. The hypervisor intercepts the syscalls via #UD injection, resolves the origin process using its CR3 register, and logs the event.

4.3. The Cost of Transparency: Performance

While the #UD injection technique offers superior stealth compared to MSR hooking, it introduces significant overhead. A native SYSCALL instruction executes in mere nanoseconds. However, forcing a #VMEXIT and handling the subsequent exception in the hypervisor incurs a cost of thousands of CPU cycles per call. This makes this technique ideal for surgical tracing of specific processes (filtered by CR3), but potentially unsuitable for system-wide monitoring on production workloads.

5. Stealth & Communication

5.1. CPUID Virtualization

The CPUID instruction is the most common method for software to detect a hypervisor. To avoid detection by malware checking for “Hypervisor Present” bits:

Horus intercepts every CPUID execution.
It clears the Hypervisor Present Bit (Leaf 0x1, ECX bit 31).
It can return a custom Vendor String (acting as a “secret handshake”) to authorized clients while appearing as a standard AMD processor to the OS.

5.2. VMMCALL Interface

Communication between the standard kernel driver (or a user-mode client) and the Hypervisor is handled via VMMCALL. This instruction triggers an immediate exit, allowing us to issue commands such as:

PING: Verify hypervisor presence.
PLACE_NPT_HOOK: Install a hook on a specific physical address.
MONITOR_SYSCALL: Target a specific process (identified by CR3) for logging.

To invoke these commands from the Guest, we rely on a custom assembly wrapper that executes the vmmcall instruction. This wrapper acts as a bridge, passing arguments directly to the Hypervisor via general-purpose registers (RCX, RDX, R8), bypassing standard Windows IOCTL mechanisms entirely.

src/handlers/vmcall/vmmcall.asm

svm_vmmcall proc frame
	.endprolog                   
    PUSHAQ          ; Save all GPRs to stack
                    ; This prevents the corruption of certain guest registers used during vm unload
 
	vmmcall         ; Trigger VMEXIT
 
    POPAQ           ; Restore GPRs after returning from Host
	ret             ; Return to the caller
svm_vmmcall endp

VMMCALL wrapper. Note: This snippet has been condensed for readability; non-essential lines have been omitted and the code has been refactored for simplicity.

6. Stability and Clean-up

Writing a hypervisor requires handling system stability with extreme care.

Atomic Unload: The cleanup routine uses Inter-Processor Interrupts (IPIs) to synchronize all cores. A critical, often overlooked step is the manual restoration of the Host’s GDTR, IDTR, and segment registers (TR, FS, GS) to match the Guest’s state before executing vmoff. Without this synchronization, the driver would return to Ring 0 with a corrupted processor context—creating a unstable “Zombie Driver” that inevitably crashes the system. Once the state is consistent, Horus disables SVM, tears down the NPTs, and frees all allocations, ensuring a zero-leak exit.
Exception Injection: Horus correctly reflects exceptions back to the Guest. If an exception occurs that isn’t related to our virtualization logic, it is injected back into the Guest state (eventually reaching the Windows IDT). This ensures the OS can handle standard faults, such as genuine Page Faults, normally.

src/arch/vcpu/vcpu.cpp

void VcpuData::inject_exception( int vector, bool deliver_error_code, uint32_t error_code )
{
    EVENT_INJECTION event_injection = { 0 };
 
    event_injection.vector = vector;
    event_injection.type = SVM_EVENT_TYPE_EXCEPTION;
    event_injection.valid = 1;
 
    event_injection.push_error_code = deliver_error_code ? 1 : 0;
    event_injection.error_code = deliver_error_code ? error_code : 0;
 
    this->guest_vmcb.control_area.event_inject = event_injection.fields;
}

Event exception injection process.

7. Conclusion

Developing Horus has been a rigorous exercise in low-level system engineering. It demonstrates a fundamental truth: at the silicon level, the Operating System is just another application.

By moving the battleground to Ring -1, we achieve a level of introspection far superior to traditional kernel drivers. The combination of Split-View NPT Hooking and Exception-based Syscall Emulation allows us to manipulate execution flow and monitor system activity without triggering Windows’ integrity protection mechanisms like PatchGuard.

While this project is strictly an academic Proof of Concept, the foundations laid here—handling the VMCB, managing physical memory translation, and virtualizing hardware exceptions—are the exact mechanisms employed by both advanced enterprise security solutions and sophisticated state-sponsored rootkits.

Acknowledgments & References

The full source code for Horus, including the hypervisor driver and the guest-driver controller, is available on GitHub. I invite fellow researchers to review, fork, and contribute to the project.

Horus on GitHub

This project stands on the shoulders of giants. I would like to credit the foundational research by Daax and Nick Peterson. Although their work focuses on Intel VT-x, it provided the conceptual framework and deep insights into hardware-assisted virtualization that I adapted to build this AMD SVM implementation.

In addition, I would like to thank all the researchers and developers who have contributed to the open-source community, with enlightening projects such as AetherVisor and SimpleSVM. Their work was instrumental in helping me translate documentation into practical knowledge.

— Redaa.