Skip to content

FEATURE: add paravirtualized clock support for guest time access#1173

Open
simongdavies wants to merge 1 commit intohyperlight-dev:mainfrom
simongdavies:guest-time
Open

FEATURE: add paravirtualized clock support for guest time access#1173
simongdavies wants to merge 1 commit intohyperlight-dev:mainfrom
simongdavies:guest-time

Conversation

@simongdavies
Copy link
Contributor

Hyperlight guests can now read time without expensive VM exits by using a paravirtualized clock shared between host and guest. This enables high-frequency timing operations like benchmarking, rate limiting, and timestamping with minimal overhead.

Paravirtualized clocks work by having the hypervisor populate a shared memory page with clock calibration data. The guest reads this data along with the CPU's TSC to compute the current time entirely in userspace, avoiding the cost of a VM exit.

Reference: https://docs.kernel.org/virt/kvm/x86/msr.html#pvclock

The implementation uses the native mechanism for each hypervisor:

  • KVM: pvclock (MSR 0x4b564d01)
  • MSHV: Hyper-V Reference TSC page
  • WHP: Hyper-V Reference TSC page

Guests have access to:

  • Monotonic time: nanoseconds since sandbox creation, guaranteed to never go backwards
  • Wall-clock time: UTC nanoseconds since Unix epoch
  • Local time: wall-clock adjusted for host timezone captured at sandbox creation

Rust API (hyperlight_guest_bin::time):

  • SystemTime/Instant types mirroring std::time
  • DateTime type for human-readable date/time formatting
  • Weekday/Month enums with name() and short_name() methods

C API (hyperlight_guest_capi):

  • POSIX-compatible: clock_gettime, gettimeofday, time
  • Broken-down time: gmtime_r, localtime_r, mktime, timegm
  • Formatting: strftime with common format specifiers

The feature is gated behind guest_time (enabled by default) and documented in docs/guest-time.md.

Note: The timezone offset is a snapshot from sandbox creation and does not update for DST transitions during the sandbox lifetime.

@simongdavies simongdavies requested a review from danbugs as a code owner January 15, 2026 21:50
@simongdavies simongdavies added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Jan 15, 2026
Hyperlight guests can now read time without expensive VM exits by
using a paravirtualized clock shared between host and guest. This
enables high-frequency timing operations like benchmarking, rate
limiting, and timestamping with minimal overhead.

Paravirtualized clocks work by having the hypervisor populate a
shared memory page with clock calibration data. The guest reads
this data along with the CPU's TSC to compute the current time
entirely in userspace, avoiding the cost of a VM exit.

Reference: https://docs.kernel.org/virt/kvm/x86/msr.html#pvclock

The implementation uses the native mechanism for each hypervisor:

- KVM: pvclock (MSR 0x4b564d01)
- MSHV: Hyper-V Reference TSC page
- WHP: Hyper-V Reference TSC page

Guests have access to:

- Monotonic time: nanoseconds since sandbox creation, guaranteed
  to never go backwards
- Wall-clock time: UTC nanoseconds since Unix epoch
- Local time: wall-clock adjusted for host timezone captured at
  sandbox creation

Rust API (hyperlight_guest_bin::time):

- SystemTime/Instant types mirroring std::time
- DateTime type for human-readable date/time formatting
- Weekday/Month enums with name() and short_name() methods

C API (hyperlight_guest_capi):

- POSIX-compatible: clock_gettime, gettimeofday, time
- Broken-down time: gmtime_r, localtime_r, mktime, timegm
- Formatting: strftime with common format specifiers

The feature is gated behind `guest_time` (enabled by default) and
documented in docs/guest-time.md.

Note: The timezone offset is a snapshot from sandbox creation and
does not update for DST transitions during the sandbox lifetime.

Signed-off-by: Simon Davies <simongdavies@users.noreply.github.com>
Copy link
Contributor

@ludfjig ludfjig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of reviews, looks very good to me! Question: maybe we can split out all the time-related math + formatting into a separate crate?

Haven't looked in detail into everythign yet

Comment on lines +167 to +205
#[cfg(test)]
mod tests {
use core::mem::size_of;

use super::*;

#[test]
fn test_kvm_pvclock_size() {
// KVM pvclock struct must be exactly 32 bytes
assert_eq!(size_of::<KvmPvclockVcpuTimeInfo>(), 32);
}

#[test]
fn test_hv_reference_tsc_size() {
// Hyper-V reference TSC page must be exactly 4KB
assert_eq!(size_of::<HvReferenceTscPage>(), 4096);
}

#[test]
fn test_guest_clock_region_size() {
// GuestClockRegion should be 32 bytes (4 x u64 equivalent: 3 x u64 + i32 + u32)
assert_eq!(size_of::<GuestClockRegion>(), 32);
}

#[test]
fn test_clock_type_conversion() {
assert_eq!(ClockType::from(0u64), ClockType::None);
assert_eq!(ClockType::from(1u64), ClockType::KvmPvclock);
assert_eq!(ClockType::from(2u64), ClockType::HyperVReferenceTsc);
assert_eq!(ClockType::from(99u64), ClockType::None);
}

#[test]
fn test_guest_clock_region_default() {
let region = GuestClockRegion::default();
assert!(!region.is_available());
assert_eq!(region.get_clock_type(), ClockType::None);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of these can be const-time assertions instead of tests

Comment on lines +62 to +83
#[inline]
fn rdtsc() -> u64 {
#[cfg(target_arch = "x86_64")]
{
let lo: u32;
let hi: u32;
// SAFETY: RDTSC is always available on x86_64
unsafe {
core::arch::asm!(
"rdtsc",
out("eax") lo,
out("edx") hi,
options(nostack, nomem, preserves_flags)
);
}
((hi as u64) << 32) | (lo as u64)
}
#[cfg(not(target_arch = "x86_64"))]
{
0 // TSC not available on non-x86_64 architectures
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can replace this with https://doc.rust-lang.org/core/arch/x86/fn._rdtsc.html

Comment on lines +184 to +187
Err(crate::new_error!(
"Paravirtualized clock setup not implemented for this hypervisor",
))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this default implementation be removed?


/// Returns true if a clock is configured.
pub fn is_available(&self) -> bool {
self.clock_page_ptr != 0 && self.clock_type != ClockType::None as u64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my understand that this is always available, so do we really need is_available and is_clock_available? Correct me if I am wrong.

@syntactically
Copy link
Member

Do we definitely want to enable this by default? I think a lot of guests will not need high precision time in production, and providing it by default is a significant semantic constraint: e.g. it makes snapshotting observable, makes timing side-channel attacks easier to internalise, etc.

When we first enabled rdtsc for the performance traces, I believe we explicitly had a discussion across all the maintainers about this and said we very much did not want to enable any extra time sources (and especially not wall-clock/referenced to an epoch ones) anywhere.

let tsc_scale = tsc_page.tsc_scale;
let tsc_offset = tsc_page.tsc_offset;

compiler_fence(Ordering::Acquire);
Copy link

@the-eugen the-eugen Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean Ordering::Release here? There's also a similar patter in the KVM handler.

A compiler barrier is not enough though, you need an rmb() here and before reading the data on L147. They are paired with the hypervisor wmb() when updating the page. Admittedly it is going to be a noop on amd64, but maybe better to take care of this now if the arm64 port is planned,


if time_100ns < 0 {
return None; // Invalid time
}
Copy link

@the-eugen the-eugen Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are making scaled a signed integer to catch overflows, however by doing so you also lose precision (it's just 1 bit of precision, but still). So that would detect overflows when the u64 value would not have in fact overflowed. (e.g. i64::MAX + 1).

I know that tsc_offset is signed, but you could let scaled be unsigned and use u64.overflowing_add_signed instead to not lose precision and detect over/under-flows. On amd64 it uses just a normal add instruction and saves the overflow flag.

// Check sequence again
let seq2 = unsafe { core::ptr::read_volatile(&tsc_page.tsc_sequence) };
if seq1 != seq2 {
return None; // Data changed during read, retry later
Copy link

@the-eugen the-eugen Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how the caller is supposed to differentiate between "use MSR fallback" and this case here? The sequence mismatch is a transient problem because the hypervisor is modifying the page contents and it is very likely to go away if you retry immediately, and that is still going to be much faster than taking an MSR exit.
Therefore typically you would loop when trying to read from the TSC page until sequences match because we (the hyper-v hypervisor) do not expect this problem to persist. All client-side implementations i've seen usually just retry immediately maybe having an upper bound on the retry count. Falling back on MSR traps would be much less efficient in P99.9 cases.

Copy link

@the-eugen the-eugen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Some comments from the hyper-v side of things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants