A minimal VMM in Rust with KVM

Apr 15, 2026

Virtual Machine Monitor

If you've ever wondered how tools like Cloud Hypervisor or Firecracker work under the hood, building a minimal VMM is one of the best ways to learn. Let's write one from scratch.

KVM (Kernel-based Virtual Machine) is a Linux kernel module that turns the host into a hypervisor.

We will use the KVM API to build software that runs a Virtual Machine. The industry uses the name VMM or Virtual Machine Monitor for such software — like Cloud Hypervisor or Firecracker.

System Calls

KVM's API is a set of ioctl system calls. Combined with mmap for managing VM memory, these two system calls are all we need to build a VMM.

In Rust we can call these system calls using the libc crate, which provides raw FFI bindings to system libraries. Naturally these are unsafe methods — we will need to ensure memory safety, such as by passing or reading data via valid pointers. Because "everything is a file" in Linux, we'll also need to ensure that we close files that are opened by these system calls.

KVM Version

First thing we want to check is if the system has a stable KVM API version, the constant value 12, otherwise the application should not run:

use std::{
    error::Error,
    fs::OpenOptions,
    os::{fd::AsRawFd, unix::fs::OpenOptionsExt},
};

use std::os::raw::{c_uint, c_ulong};

const KVM_VERSION: i32 = 12;
const KVMIO: c_uint = 0xAE;
const KVM_GET_API_VERSION: c_ulong = libc::_IO(KVMIO, 0x00);

fn main() -> Result<(), Box<dyn Error>> {
    let file = OpenOptions::new()
        .write(true)
        .custom_flags(libc::O_RDWR | libc::O_CLOEXEC)
        .open("/dev/kvm")?;

    let kvm_fd = file.as_raw_fd();

    //
    // 1. Check KVM version, it not 12 refuse to run.
    //

    let kvm_version = unsafe { libc::ioctl(kvm_fd, KVM_GET_API_VERSION, 0) };

    println!("kvm version {kvm_version}");

    if kvm_version < 0 {
        let last_os_error = std::io::Error::last_os_error();
        println!("error getting kvm version");
        return Err(last_os_error.into());
    }

    if kvm_version != KVM_VERSION {
        eprintln!("current kvm version: {kvm_version}, required kvm version: {KVM_VERSION}");
        return Err("kvm version not supported".into());
    }

    Ok(())
}

Let's dissect this. First we open /dev/kvm file for reading and writing and get the file descriptor as i32 that we need to pass to ioctl.

For ioctl, the first argument is the file descriptor, the second is the operation code, and the third is a pointer to memory — its meaning depends on whether the operation reads or writes data.

KVM operation codes can be created with methods that the kernel itself uses in ioctl.h file. Fortunately libc exposes them too with _IO, _IOW (if the operation writes data), and _IOR (if the operation reads data) const functions.

First we check if the ioctl call itself was successful — by validating its return code is not negative. But it only tells us about success or failure. To get the actual error we need to use another library method errno, which we can obtain by using std::io::Error::last_os_error() method.

Create a VM

Note: For brevity, from here onwards we only show new code. See the full source for the complete program.

use std::os::fd::FromRawFd;

const KVM_CREATE_VM: c_ulong = libc::_IO(KVMIO, 0x01);


//
// 2. Create A VM
//

let vm_fd = unsafe { libc::ioctl(kvm_fd, KVM_CREATE_VM, 0) };

if vm_fd < 0 {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("vm creation error");
    return Err(last_os_error.into());
}

// Own it so that fd is closed on drop
let vm_fd = unsafe { std::os::fd::OwnedFd::from_raw_fd(vm_fd) };

Here KVM_CREATE_VM API returns a new file descriptor to manage this VM. This vm_fd is just an integer just like kvm_fd but with a the major difference between how these two are created - for kvm_fd we used the Rust's std library to open a file and assigned it to let file. The file variable ensures that when it goes out of scope it gets Drop-ed and file is closed.

To achieve the same thing for vm_fd integer, we construct OwnedFd and shadow it by a variable of the same name. This way we take "ownership" of the VM file descriptor and when it goes out of scope, the Drop on OwnedFd will ensure the underlying file is closed.

Create a VCPU


const KVM_CREATE_VCPU: c_ulong = libc::_IO(KVMIO, 0x41);

//
// 3. Create A VCPU
//

let vcpu_fd = unsafe { libc::ioctl(vm_fd.as_raw_fd(), KVM_CREATE_VCPU, 0) };

if vcpu_fd < 0 {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("vcpu creation error");
    return Err(last_os_error.into());
}
// Own it so that fd is closed on drop
let vcpu_fd = unsafe { std::os::fd::OwnedFd::from_raw_fd(vcpu_fd) };

println!(
    "kvm fd {kvm_fd}, vm fd: {}, vcpu fd: {}",
    vm_fd.as_raw_fd(),
    vcpu_fd.as_raw_fd()
);

To manage this VM, we use vm_fd file descriptor to create a VCPU for it, and just like before here we have a new file descriptor vcpu_fd for a new file, so we take ownership to ensure its closed when vcpu_fd goes out of scope.

Create Memory for VM

We need to create memory that will be virtual to VMM but will be seen as physical memory by VM.

We create a memory mapping in the virtual address space of VMM with these characteristics: we want to read (PROT_READ) and write (PROT_WRITE) to this memory and it should not visible to other processes (MAP_PRIVATE) and its contents need to be initialized to zero (MAP_ANONYMOUS).

const VM_MEMORY: u64 = 2 * 4096; // 2 blocks of 4KiB

struct Mmap {
    pub ptr: *mut std::os::raw::c_void,
    pub len: usize,
}

impl Drop for Mmap {
    fn drop(&mut self) {
        println!("calling libc:munmap");
        unsafe { libc::munmap(self.ptr, self.len) };
    }
}

//
// 4. Create memory for guest
//

let vm_memory_mmap = unsafe {
    libc::mmap(
        std::ptr::null_mut(),
        VM_MEMORY as usize,
        libc::PROT_READ | libc::PROT_WRITE,
        libc::MAP_ANONYMOUS | libc::MAP_PRIVATE,
        -1,
        0,
    )
};

if vm_memory_mmap == libc::MAP_FAILED {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("vm memory map failed");
    return Err(last_os_error.into());
}

// take ownership, so on drop munmap is called
let vm_memory_mmap = Mmap {
    ptr: vm_memory_mmap,
    len: VM_MEMORY as usize,
};

mmap returns a pointer to mapped area - it is virtual address to the beginning of that memory. When our VMM program is done using this memory we need to delete the mapping using munmap system call.

For that we store pointer and the size of memory in Mmap struct and implement Drop for it - so when vm_memory_mmap goes out of scope it automatically calls munmap.

Code that VM will execute

const CODE: [u8; 1] = [0xF4]; // HLT

//
// 5. Copy code to guest's physical memory - that guest will execute
//

unsafe {
    std::ptr::copy_nonoverlapping(&CODE as *const u8, vm_memory_mmap.ptr as *mut u8, 1);
}

Here we have some CODE whose size is one byte and we copy it to the guest's memory. The CODE is just single x86 instruction for HLT.

Setup Guest Memory

Now that our VMM has memory for the guest, we provide it to the KVM API to set it up as the guest's physical memory:


const KVM_SET_USER_MEMORY_REGION: c_ulong = libc::_IOW::<kvm_userspace_memory_region>(KVMIO, 0x46);

//
// 6. Setup guest's physical memory
//

let vm_memory_addr = vm_memory_mmap.ptr as u64;

let userspace_memory_region = kvm_userspace_memory_region {
    slot: 0,
    flags: 0,
    guest_phys_addr: 4096,
    memory_size: VM_MEMORY,
    userspace_addr: vm_memory_addr,
};

let ret = unsafe {
    libc::ioctl(
        vm_fd.as_raw_fd(),
        KVM_SET_USER_MEMORY_REGION,
        &userspace_memory_region,
    )
};

if ret != 0 {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("error setting user memory region");
    return Err(last_os_error.into());
}

The interesting thing here is we are assigning this memory to the start of the first byte after the 4KiB block in the guest's physical address space, so the CODE we copied will be seen by the VM at the 4096th indexed byte location.

Setup x86 CPU registers

CPU registers are what keep track of what a CPU is doing at any given time, and they are architecture-specific (x86, ARM, etc.).

We want to setup x86 registers such that the first thing our VM executes is our CODE.

The instruction pointer register in the CPU stores the address of the next instruction to execute, and because we put our first and only instruction at 4096 — we set the rip instruction pointer to it:


const KVM_SET_REGS: c_ulong = libc::_IOW::<kvm_regs>(KVMIO, 0x82);
const KVM_GET_SREGS: c_ulong = libc::_IOR::<kvm_sregs>(KVMIO, 0x83);
const KVM_SET_SREGS: c_ulong = libc::_IOW::<kvm_sregs>(KVMIO, 0x84);

//
// 7.1 Setup regular x86 cpu registers.
//   Set instruction pointer to start execution at 2nd' block of size 4096, because that's where we copied code
//

let k_regs = kvm_regs {
    rip: 4096,
    rflags: 0x2,
    ..Default::default()
};

let ret = unsafe { libc::ioctl(vcpu_fd.as_raw_fd(), KVM_SET_REGS, &k_regs) };

if ret != 0 {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("error setting kvm_regs");
    return Err(last_os_error.into());
}

//
// 7.2 Read default x86 special registers, and update them
//

let mut k_sregs = kvm_sregs::default();
let ret = unsafe { libc::ioctl(vcpu_fd.as_raw_fd(), KVM_GET_SREGS, &k_sregs) };

if ret != 0 {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("error getting kvm_sregs");
    return Err(last_os_error.into());
}

k_sregs.cs.base = 0;
k_sregs.cs.selector = 0;

let ret = unsafe { libc::ioctl(vcpu_fd.as_raw_fd(), KVM_SET_SREGS, &k_sregs) };

if ret != 0 {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("error setting kvm_sregs");
    return Err(last_os_error.into());
}

For more about these registers, see the LWN article Using the KVM API.

KVM_RUN

Every VCPU has an associated kvm_run data structure that the kernel uses to communicate with our VMM in userspace. When we run our VCPU, the VM can exit at any time for various reasons (such as I/O). The reason is available in kvm_run.exit_reason, which our VMM can handle (e.g., perform the requested I/O) before resuming the VM.

KVM provides an API to get the size of kvm_run data. We can then map memory of that size in our VMM, backed by the vcpu_fd VCPU file.


const KVM_GET_VCPU_MMAP_SIZE: c_ulong = libc::_IO(KVMIO, 0x04);

//
// 8.1. Get the size of kvm_run
//

let vcpu_mmap_size = unsafe { libc::ioctl(kvm_fd, KVM_GET_VCPU_MMAP_SIZE, 0) };

if vcpu_mmap_size < 0 {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("error getting vcpu mmap size: {vcpu_mmap_size}");
    return Err(last_os_error.into());
}

println!("vcpu mmap size: {vcpu_mmap_size} bytes");

//
// 8.2 memory map the pointer to kvm_run data structure
//

let kvm_run_mmap = unsafe {
    libc::mmap(
        std::ptr::null_mut(),
        vcpu_mmap_size as usize,
        libc::PROT_READ | libc::PROT_WRITE,
        libc::MAP_SHARED,
        vcpu_fd.as_raw_fd(),
        0,
    )
};

if kvm_run_mmap == libc::MAP_FAILED {
    let last_os_error = std::io::Error::last_os_error();
    eprintln!("kvm_run mmap failed");
    return Err(last_os_error.into());
}

// take ownership, so on drop munmap is called
let kvm_run_mmap = Mmap {
    ptr: kvm_run_mmap,
    len: vcpu_mmap_size as usize,
};

The mapping is MAP_SHARED so updates are visible to VMM and carried through the underlying VCPU file.

Run the VM

All that's left now is to run the VCPU and resolve any exits. Because we programmed our CODE to halt on the first instruction, our VMM loop will receive that exit reason and stop:


const KVM_RUN: c_ulong = libc::_IO(KVMIO, 0x80);

//
// 9. Run VM until it executes hlt instruction in CODE
//
loop {
    let ret = unsafe { libc::ioctl(vcpu_fd.as_raw_fd(), KVM_RUN, 0) };

    if ret != 0 {
        eprintln!("KVM_RUN errored")
    }

    let k_run: &kvm_run = unsafe { &*(kvm_run_mmap.ptr as *const kvm_run) };

    match k_run.exit_reason {
        kvm_bindings::KVM_EXIT_HLT => {
            println!("KVM_EXIT_HTL");
            return Ok(());
        }
        _ => {
            eprintln!("EXIT: {:?}", k_run);
        }
    }
}

Let's dissect the conversion of a raw pointer from C to a Rust reference for kvm_run:

Because we know the underlying pointer points to a valid memory layout containing kvm_run, we perform the following operations on it:

We cast it from a C type to a pointer type in Rust — because we only read from it, we cast it as kvm_run_mmap.ptr as *const kvm_run
To convert this Rust pointer to a reference we use &*(...) — * dereferences the pointer and then & immediately borrows it.

Congrats, we just ran our first VM!

Closing Thoughts

This VMM is written using as few dependencies as possible — so we can stay as close to the kernel as possible and actually see what's going on.

Those minimal dependencies are libc for system calls and kvm-bindings for architecture-specific data structures like kvm_regs, kvm_run, etc. Since the kvm-bindings crate is generated by bindgen using actual kernel code, that's as close as we can get.

From here, you could extend this VMM to handle I/O port exits, run real-mode code, or even load a Linux kernel image.

Full source code is available here: 64bit/miniHype