A Practical Guide to Getting as Close to Soft Real-Time as Possible on Ordinary Windows

· · Windows Development, Soft Real-Time, Design, Measurement

When you build processing on Windows where “being late is a problem” - periodic processing, audio, video, measurement, equipment control - the impression that “Windows can’t really do this” tends to come up. That impression is half right and half wrong: Windows is not a hard real-time OS, but if you properly nail down design, implementation, measurement, and operations, you can get it to a genuinely practical state as soft real-time.

What this article covers is ordinary Windows 10 / 11, without special RTOS extensions, custom kernel drivers, or dedicated controllers. It is a practice-oriented discussion of how far you can push down latency and jitter with a user-mode app on an everyday desktop or laptop PC. Audio, video, periodic control, and data acquisition differ in their details, but the trouble spots are largely shared, so this time we have collected that common ground in the form of a checklist.

Table of Contents

  1. The Conclusion First (In One Line)
  2. What “Soft Real-Time” Means on Ordinary Windows
    • 2.1. What This Article Means by “Ordinary Windows”
    • 2.2. What Is Achievable, and Where It Gets Hard
    • 2.3. A Quick Word on Terminology
  3. The Main Causes of Latency and Jitter
    • 3.1. The Scheduler and Priorities
    • 3.2. DPCs / ISRs and Drivers
    • 3.3. Page Faults and Memory
    • 3.4. Timer Resolution and Power Management
    • 3.5. Core Migration and Heat
  4. A Practical Checklist for Reducing Lateness on Ordinary Windows
    • 4.1. Periodic Loops and How to Wait
    • 4.2. Fast Path / Slow Path and Fixed-Length Queues
    • 4.3. Priorities / MMCSS / Background Mode
    • 4.4. Memory / GC / First-Run Costs
    • 4.5. Power Settings / EcoQoS / Timer Resolution
    • 4.6. CPU Placement / Core Migration / Heat
    • 4.7. Isolating Drivers / DPCs / ISRs / External Disturbances
  5. Measurement and Evaluation
    • 5.1. What to Record
    • 5.2. How to Read p99 / p99.9 / max
    • 5.3. What to Measure With
    • 5.4. Testing Discipline
  6. A Rough Guide to Choosing
  7. Conclusion
  8. References

1. The Conclusion First (In One Line)

  • What you aim for on ordinary Windows is not a hard real-time guarantee, but a soft real-time configuration that is “unlikely to be late, and does not break when it is.”
  • The biggest single win is making the hot path short, fixed-length, and non-blocking.
  • Separate the fast path (acquisition / control) from the slow path (storage / communication / UI) and connect them with a fixed-length queue.
  • Drive the periodic loop on absolute deadlines, not by leaning on Sleep(1).
  • For continuous streams like audio and video, consider MMCSS first.
  • For time measurement, use QueryPerformanceCounter (QPC) - in .NET, Stopwatch.
  • For waiting, prefer device events or high-resolution waitable timers.
  • Use timeBeginPeriod only for as long as needed. Do not design on the assumption it is always on.
  • In real operation, AC power / the power mode / how EcoQoS is handled / pruning background load all pay off.
  • Evaluate not just averages, but p99 (the threshold where the slowest 1 in 100 starts to show) / p99.9 / max / miss count / DPC / ISR / page faults / queue depth.

In short, on ordinary Windows, reducing the reasons for lateness through design beats raising priorities. Priorities and power settings matter, but they alone cannot create stability.

2. What “Soft Real-Time” Means on Ordinary Windows

2.1. What This Article Means by “Ordinary Windows”

By ordinary Windows we roughly assume the following.

  • A typical Windows 10 / 11 desktop or laptop PC
  • No custom RTOS extensions
  • No custom kernel-mode driver development
  • A normal user-mode app
  • Tuning with standard Windows APIs and settings

In other words, this is not about building out a whole dedicated machine for real-time control - it is about how far you can realistically push things on an ordinary Windows PC.

Ordinary Windows 10 / 11 PCUser-mode appAim for soft real-timeKeep latency lowKeep jitter smallObserve deadline misses and avoid breakingNeed to guarantee zero deadline violationsRTOS / dedicated controller / FPGA / device-side processing

2.2. What Is Achievable, and Where It Gets Hard

Even on ordinary Windows, you can build a genuinely “rarely late” setup for processing like the following.

  • Periodic processing from a few milliseconds to tens of milliseconds
  • Buffer-driven audio / video
  • Sensor acquisition and control loops
  • Soft-PLC-style fixed-period processing
  • A low-latency pipeline running on a thread separate from the UI

That said, “achievable” here does not mean the occasional latency spike can be reduced to absolute zero. The state we aim for is this:

  • Keep normal-case latency low
  • Keep jitter small
  • Do not break when a deadline is occasionally missed
  • Be able to observe the fact that it was missed

Conversely, requirements like the following become very hard to satisfy with user-mode alone on ordinary Windows.

  • Guaranteeing zero deadline violations
  • Holding under a few hundred microseconds stably over long periods
  • Coexisting with a heavy GUI, network, and storage
  • Doing it on battery power or with power-saving priorities intact
  • Not tolerating even spikes caused by drivers or devices

For these, it is safer to also consider moving only the truly time-critical part to device-side firmware, a dedicated controller, an FPGA, or an RTOS.

2.3. A Quick Word on Terminology

Let’s pin down the terms used in this article first.

Term In one line Practical view
Soft real-time Occasional lateness can happen; the approach is to make it small and survivable This is what to aim for first on ordinary Windows
Hard real-time A world where zero deadline violations must be guaranteed Not a target for user-mode alone on ordinary Windows
Jitter Variation in period or response time Even with a good average, large jitter means instability in real operation
Deadline miss Processing not finishing by its scheduled time Do not hide it - count it and log it
p99 / p99.9 Metrics for looking at the slow tail p99 is “the threshold where the slowest 1 in 100 starts to show”
DPC / ISR Kernel-side processing around drivers and interrupts When long, user-mode threads are made to wait
MMCSS The Windows mechanism that allocates CPU to time-sensitive work like audio / video A strong option for processing that must never starve its buffers
QPC QueryPerformanceCounter The basis of elapsed-time measurement - a high-resolution counter, not the wall clock

3. The Main Causes of Latency and Jitter

The reasons periodic processing falls behind on ordinary Windows almost always trace back to one of the boxes in this diagram.

Periodic processing falls behindScheduler / prioritiesDPC / ISR / driversPage faults / memoryTimer resolution / power managementCore migration / heat

3.1. The Scheduler and Priorities

Windows threads run in priority order. At equal priority they take turns round-robin, and when a higher-priority thread becomes runnable, lower-priority threads get pushed aside.

So even if you write your periodic thread diligently, it is entirely normal for the following to run first:

  • Other threads
  • Other processes
  • OS-internal work
  • Security products
  • Device helper processing
  • Background synchronization

3.2. DPCs / ISRs and Drivers

This part is quite important. Even with your app-side priorities in order, if DPCs (Deferred Procedure Calls) or ISRs (Interrupt Service Routines) run long, user-mode threads cannot execute during that time.

The devices and drivers that commonly cause this include:

  • USB
  • Wi-Fi / Bluetooth
  • Storage
  • Audio
  • GPU
  • ACPI / power management

Even when your application code is fine, you can get stalled by driver or hardware circumstances. Thinking “I’ll just raise my app’s priority higher and win” here usually ends in pain.

3.3. Page Faults and Memory

If a page fault (a needed page not being in memory and having to be fetched) occurs on the hot path, latency balloons instantly.

Patterns particularly worth avoiding:

  • Page commit on first access
  • Lazy loading
  • Page-in of memory-mapped files
  • More dynamic allocation than necessary
  • Large objects or a fragmented heap

For the body of periodic processing, the right posture is roughly: allocate the memory you need up front, and touch it once at startup.

3.4. Timer Resolution and Power Management

“I want to run every 1 ms, so Sleep(1)” almost never works out. Windows wait precision is affected by timer resolution, scheduling, and power state.

Furthermore, do not overlook that raising the timer resolution slightly improves wait precision but has side effects on power consumption and overall system behavior.

3.5. Core Migration and Heat

When a thread migrates between cores, the caches have to warm up again. The OS often handles this well by itself, but under heavy load it becomes a source of wobble.

Heat also becomes non-negligible over long runs. When thermal throttling kicks in, a previously stable period can fall apart.

4. A Practical Checklist for Reducing Lateness on Ordinary Windows

Here begins the practical part. For the causes seen in the previous section, we summarize what to check, what to avoid, and what to decide first on ordinary Windows, in checklist form.

4.1. Periodic Loops and How to Wait

First, the classic anti-pattern is this.

while (running)
{
    Sleep(1);
    Step();
}

This is not a “1 ms period” - it is a loop that waits roughly 1 ms or more, then adds the execution time of Step() on top. Worse, the wait overshoot accumulates directly.

Absolute-deadline basedRelative-time basedWaitUntil(next - margin)next += periodShort spin if neededFastStep()Step()Sleep(1)Wait error and execution time pile up bit by bitResists accumulating drift

Checklist

  • The periodic loop is not built on Sleep(1)
  • The period is driven by absolute deadlines via next += period
  • Waiting prefers device events or waitable timers
  • Only the final fine adjustment uses a very short busy-spin
  • timeBeginPeriod is used only while needed and reverted afterwards
  • Behavior has been verified while minimized / hidden / not visible

A periodic loop is more stable when driven by absolute deadlines rather than relative time.

int64_t next = QpcNow() + periodTicks;

while (running)
{
    WaitUntil(next - wakeMarginTicks);

    while (QpcNow() < next)
    {
        CpuRelax(); // Spin briefly only at the very end
    }

    int64_t started = QpcNow();
    FastStep();
    int64_t finished = QpcNow();

    RecordTiming(next, started, finished);

    next += periodTicks;

    while (finished > next)
    {
        ++missedDeadlines;
        next += periodTicks;
    }
}

4.2. Fast Path / Slow Path and Fixed-Length Queues

The architectural basis is to put only deadline-sensitive work on the fast path, and push everything else to the slow path.

Device / acquisition eventsfast path: acquire, control, minimal copyingFixed-length queueslow path: store, send, UI, aggregationRecord lateness / misses / queue depth

Limit the fast path to roughly this much.

  • Data acquisition
  • Control-value computation
  • The minimum necessary copying
  • Timestamping
  • Enqueueing
  • Recording misses / overruns

Everything else drops to the slow path.

Checklist

  • No file writes, network sends, or DB writes on the hot path
  • No heavy logging, Flush, or synchronous RPC on the hot path
  • Fast path / slow path clearly separated by thread or responsibility
  • The queue is fixed-length
  • The policy for queue overflow is decided in advance
  • Miss counts, drop counts, and queue depth are being observed
  • UI updates and log aggregation are separated to a lower frequency

When the queue fills up, it is safer not to leave the policy vague.

Latest value mattersEvery record mattersLogging useQueue is fullWhat do we protect?Drop old entries, keep the latestAlert / stop / upstream throttlingDrop old entries, record only the drop count

4.3. Priorities / MMCSS / Background Mode

The basic rule of priorities is: do not raise everything. On ordinary Windows, “raise only the important threads, and properly lower the back-office work” works better. Background mode is a mechanism that treats not just CPU but also resources like I/O at lower priority.

Split the workDeadline-sensitive threadsStore / send / compress / aggregateUIHigher priority or MMCSS if neededBackground mode / lower priorityNormal priorityDo not start with REALTIME_PRIORITY_CLASS

Checklist

  • Not all threads are set to high priority
  • Only genuinely time-critical threads are raised
  • Back-office work like storing, sending, compressing, and syncing is dropped to background mode
  • MMCSS is considered for continuous buffer processing such as audio, video, capture, and playback
  • Thinking per-thread first, before the whole process
  • REALTIME_PRIORITY_CLASS is not used until the need is clearly established

MMCSS (Multimedia Class Scheduler Service) is especially effective for processing that must “fill a buffer within a fixed time,” like audio / video. It aligns with Windows’ design better than simply spinning a high-priority thread at all times.

The code looks roughly like this.

DWORD taskIndex = 0;
HANDLE avrt = AvSetMmThreadCharacteristicsW(L"Pro Audio", &taskIndex);
if (!avrt)
{
    throw std::runtime_error("AvSetMmThreadCharacteristicsW failed");
}

// Run the time-sensitive loop

if (!AvRevertMmThreadCharacteristics(avrt))
{
    throw std::runtime_error("AvRevertMmThreadCharacteristics failed");
}

4.4. Memory / GC / First-Run Costs

If you use new / malloc / List<T>.Add / string concatenation / LINQ on every pass through the hot path, the costs of collection and relocation will eventually surface. GC (garbage collection) itself is not the villain, but if you write allocation-heavy code, its impact will surface as jitter.

StartupAllocate the needed buffersTouch them once to warm the pagesGet JIT / DLL loading / first I/O out of the wayThen do the real measurement / real operation

Checklist

  • No per-iteration memory allocation / deallocation on the hot path
  • Required buffers are preallocated at startup
  • Pages are warmed by touching them once at startup
  • First JIT, first DLL load, and first I/O are not mixed into the real measurement
  • No huge structures or variable-length logs growing inside the loop
  • If VirtualLock is used at all, it is limited to a very small critical region

.NET-side checks

  • Time measurement uses Stopwatch / Stopwatch.GetTimestamp()
  • No LINQ, string concatenation, ToString(), or large log generation on the hot path
  • async/await is not brought into the hot path
  • Pre-warm-up and post-warm-up are evaluated separately

4.5. Power Settings / EcoQoS / Timer Resolution

This part is unglamorous but effective. However tight your code is, results will not stabilize if higher-level power control is bearing down hard.

Power management on ordinary WindowsRun on AC powerPower mode: leaning toward Best performanceA dedicated production power plan if neededKeep time-sensitive processes away from EcoQoSCheck how timer-resolution requests are handled

Checklist

  • Production evaluation is done on AC power first
  • Settings > System > Power & battery > Power mode is set toward Best performance
  • Battery saver / power-saving-first modes are not active during runs
  • Vendor-specific utilities’ quiet / eco / battery-first modes have been checked
  • Time-sensitive processes are not carelessly placed under EcoQoS (power-efficiency-leaning QoS)
  • IGNORE_TIMER_RESOLUTION is not enabled on the time-sensitive process
  • Verified whether timer-resolution requests lose effect when minimized / hidden
  • Power settings for everyday use and for production / measurement / demos are kept separate

timeBeginPeriod is useful when used in an organized way, but it is not a cure-all.

  • Call it just before it is needed
  • Revert with timeEndPeriod when done
  • From Windows 10 version 2004 onward, it no longer has the fully global behavior of the past
  • On Windows 11, a process with windows that is fully hidden / minimized / not visible / not audible may not be guaranteed high resolution
  • Raising the resolution does not increase QPC’s precision

If power or QoS effects are suspected, check the power throttling state with SetProcessInformation.

PROCESS_POWER_THROTTLING_STATE state{};
state.Version = PROCESS_POWER_THROTTLING_CURRENT_VERSION;
state.ControlMask =
    PROCESS_POWER_THROTTLING_EXECUTION_SPEED |
    PROCESS_POWER_THROTTLING_IGNORE_TIMER_RESOLUTION;
state.StateMask = 0; // HighQoS (performance-leaning) + honor timer-resolution requests

if (!SetProcessInformation(
        GetCurrentProcess(),
        ProcessPowerThrottling,
        &state,
        sizeof(state)))
{
    throw std::runtime_error("SetProcessInformation failed");
}

4.6. CPU Placement / Core Migration / Heat

For CPU placement, rather than jumping straight to pinning to specific cores (hard affinity / CPU pinning), it usually works better to start with something closer to soft affinity - “please run mostly on these cores.”

YesNoMeasure firstSetThreadIdealProcessor / CPU SetsImproved enough?Stop thereConsider SetThreadAffinityMask lastAlso check temperature / clocks / long runs

Checklist

  • CPU placement is only touched after measuring
  • Not pinning to specific cores right away
  • SetThreadIdealProcessor or CPU Sets tried first
  • SetThreadAffinityMask treated as a last resort
  • Temperature, clocks, and thermal throttling checked over long runs
  • Laptop quiet / low-noise modes checked

As an order of operations, this flow is safe.

  1. Measure first
  2. If needed, ideal processor / CPU Sets
  3. If improvement is still needed, pin to specific cores

Pinning to specific cores looks like it should help, but it removes the OS’s escape routes, so used casually it can actually make things less flexible.

4.7. Isolating Drivers / DPCs / ISRs / External Disturbances

When “only max occasionally explodes” or “the average is good but p99.9 is bad,” it pays to suspect external disturbances beyond your own code.

YesNoYesNoYesNoYesNoLate / miss / max spike occurredIs your own processing time also long?Shorten the hot path / reduce allocation / remove I/OAre there DPC / ISR spikes?Check USB / Wi-Fi / Bluetooth / GPU / audio / storage / ACPI / driver updatesPage faults / GC / first-run costs?Preallocate / warm up / reduce heap pressureBattery / power saving / thermal effects?AC power / power settings / cooling / long-duration testsDig deeper with ETW / WPA / LatencyMon

Checklist

  • Drivers around Wi-Fi / Bluetooth / USB / storage / GPU / audio have been checked
  • Compared with unnecessary cloud sync, indexing, and auto-updates stopped
  • Also tested whether things degrade when minimized or with the display off
  • DPC / ISR trends inspected with LatencyMon or ETW
  • “My processing is heavy” vs “I am being stalled from outside” examined separately

5. Measurement and Evaluation

5.1. What to Record

At minimum, you want to capture these.

  • Scheduled period time
  • Actual start time
  • Actual finish time
  • Lateness (how late the start was relative to schedule)
  • Execution time
  • Missed deadline count
  • Consecutive missed deadline count
  • Queue depth
  • Drop count
  • CPU utilization
  • Per-core skew
  • DPC / ISR spikes
  • Page faults
  • Temperature / clock variation

Looking only at averages makes the essence hard to grasp. What hurts in production is the occasional large latency spike.

5.2. How to Read p99 / p99.9 / max

Metrics like p99 exist to look at the slow tail. Averages alone hide the occasional large delay.

Metric Meaning Intuition over 10,000 measurements
Average The smoothed overall value Spikes get buried easily
p50 The middle value Close to everyday feel
p95 The threshold where the slowest 5% starts to show The boundary excluding the slowest 500
p99 The threshold where the slowest 1% starts to show The boundary excluding the slowest 100
p99.9 The threshold where the slowest 0.1% starts to show The boundary excluding the slowest 10
max The worst case The single slowest run

For example, with:

  • Average: 0.8 ms
  • p99: 1.2 ms
  • p99.9: 3.5 ms
  • max: 28 ms

the story is: usually fast, but with occasional large spikes. On ordinary Windows, the real problems almost always live in this tail from p99 to max.

5.3. What to Measure With

The toolkit is fairly standard.

  • In-app measurement First capture period / lateness / execution time / queue depth / drop yourself
  • ETW / WPR / WPA Dig into CPU, context switches, DPC / ISR, page faults
  • LatencyMon Get a bearing on driver-induced wobble
  • Temperature / clock monitoring Watch for thermal effects
In-app measurementp50 / p95 / p99 / p99.9 / maxmisses / drops / queue depthETW / WPR / WPAcontext switches / DPC / ISR / page faultsTemperature / clock monitoringPrioritize the improvements

Going all the way to WPA takes some effort, but it is highly effective for separating whether DPCs / ISRs are the cause, or your own processing is simply heavy.

5.4. Testing Discipline

A quiet bench environment alone is not enough for testing. At minimum, you want to examine these conditions separately.

  • Right after startup, before warm-up
  • After warm-up
  • Long continuous runs
  • UI in the foreground
  • UI minimized / close to hidden
  • AC power
  • Battery power
  • With load on the network or disk

Evaluating only on the bench makes it easy to miss problems that appear in real operation. Ordinary Windows behavior is easily pulled around by “how the machine is used,” so it is important to verify under conditions close to actual use.

6. A Rough Guide to Choosing

  • 10-20 ms class, occasional wobble is absorbable → Fast/slow separation, fixed-length queues, normal-to-slightly-elevated priority, and event-driven design are often sufficient

  • 1-5 ms class, must keep up continuously → Allocation-free hot path, dedicated threads, MMCSS or careful priority tuning, high-resolution waitable timers, AC power, power settings leaning toward Best performance

  • Approaching sub-1 ms, and must not miss even over long, heavily loaded runs → Very hard with user-mode alone on ordinary Windows. Consider a design that moves the critical part elsewhere

  • GUI / logging / communication / DB all living together → Do not cram it all into “one process, one loop” - separate the responsibilities. Downstream concerns easily break upstream deadlines

7. Conclusion

There are two premises worth holding on to.

  • What to aim for on ordinary Windows is not a hard real-time guarantee, but a soft real-time configuration: small latency and jitter, and not breaking when a deadline violation occurs
  • The biggest win is tidying the hot path, more than tuning priorities

On the implementation side, these are what pay off:

  • Separate the fast path and the slow path
  • Use fixed-length queues, and decide the overflow policy in advance
  • Measure with QPC; wait with events / waitable timers
  • Avoid allocation, blocking I/O, and heavy locks on the hot path

On the operations side, these are what pay off:

  • Run on AC power
  • Keep a separate power configuration for production
  • Reduce unnecessary background load
  • Evaluate with p99 / p99.9 / max and miss counts

Soft real-time on ordinary Windows is not decided by priority settings alone - if you work through design, implementation, power settings, measurement, and operations as separate concerns, you can build a remarkably stable system.

8. References

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

This article connects naturally to the following service pages.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog