How to Correctly Compare the Speed of Different Program Versions on Windows

You want to compare version A and version B of a program on Windows. The single worst thing you can do is run each once on the same machine and declare “B seems about 8% faster.”

That 8% might genuinely be the code difference. But in reality, it was one of power mode, power plan, thermals, background updates, search indexing, virus scans, affinity, execution order, or cache state — the classic Windows benchmarking story. It is quite a muddy world.

This article summarizes how to compare the execution speed of different versions of a program on Windows in a form as close to the code difference as possible. The main target is Windows 11, but most of it — powercfg, start, and so on — works the same on Windows 10.

The Conclusion First

The tricks for improving reproducibility boil down to these six.

Decide first “what you want to compare” Whether you want to see the code difference or the real user experience changes which environment factors you should align.
Record power mode and power plan as separate things Handle this sloppily on Windows, and your comparison tends to become a comparison of the OS’s power-saving policies.
Separate the cold first run from the warmed-up steady state “Only the first run is fast” or “only the later runs are slow” is not unusual.
Alternate runs, A→B→A→B Run all of A first and then all of B, and you eat the skew of thermals and background state.
Look at the median and the spread, not just the mean One outlier wrecks the whole picture. The mean is more fragile than you think.
If the difference is small, dig down to the cause with ETW / WPR Argue from gut feel, and you mostly end up brawling in the fog.

Decide First What You Want to Compare

“Speed comparison” sounds like one thing, but there are actually two kinds.

1. A comparison to see the code difference

You want to know whether the implementation itself got faster due to an algorithm change, data structure change, compiler optimization, runtime update, and so on.

In this case, cut environmental noise as much as possible. A dedicated benchmarking session, fixed power mode, notifications off, search indexing and sync suppressed, and if necessary, go as far as a clean boot.

2. A comparison to see the real user experience

You want to know the speed users will actually feel on their everyday Windows after release.

In this case, you must not erase all the noise that exists in reality. Comparing in a “plausible everyday environment” — including OneDrive sync, Defender, notifications, and normal power settings — gives results closer to reality.

Mix these two, and your conclusions get twisted. Things like “12% faster in the lab but within noise in the real world” or “faster in the real world but unchanged in CPU time” happen routinely.

The Main Causes of Variance on Windows

First, a rough inventory of what makes results wobble.

Layer	Variance factor	Typical example
Hardware	CPU / GPU, memory, SSD, cooling	Thinness of a laptop, presence of a cooling pad
Firmware	BIOS / UEFI, OEM controls	Power-saving policies, fan control
OS	Windows build, drivers, update state	The same PC behaves differently after an update
Power	AC / DC, power mode, power plan	On battery, it is a different world
Thermals	Room temperature, fans, prior load	Turbo on the first run only, fading later
Background	Update, Defender, sync, notifications	A scan or sync runs mid-execution
Scheduling	Priority, affinity, NUMA	CPU placement varies by machine
Data / cache	OS cache, app cache	Slow only the first time, fast only from the second run
Build conditions	Debug / Release, PGO, logging on/off	You are comparing different things to begin with

In short: even “the same Windows machine” is a different experiment if the conditions are not aligned.

Treat Power Mode and Power Plan Separately

This part matters a lot.

Windows has the Power mode in the Settings app and the traditional Power plan (the power schemes visible via powercfg). They look similar and tend to get lumped together, but handle them sloppily and the comparison turns to mush.

In the Windows Settings app, you can choose the Power mode from Settings > System > Power & battery. Microsoft’s documentation states you can switch between Best power efficiency, Balanced, and Best performance separately for Plugged in / On Battery. Furthermore, changing the Power mode also affects the underlying power-related settings and PPM (Processor Power Management) behavior. In other words, this alone can change core parking and performance scaling policy.

The Power plan, on the other hand, is the traditional power scheme: Balanced, High performance, and so on. You can check it with powercfg /list and powercfg /getactivescheme.

The confusing part is that Windows has both the power mode overlay and the power plan. So record at least the following with your benchmark results:

AC or battery
Which power mode
Which active power plan

Benchmark results missing these three are quite painful to look at later.

Power conditions to pin down first

Always compare laptops on AC power Battery operation easily introduces unintended limits.
Pin the power mode For benchmarking, try Best performance first.
Record the active power plan Save the current value with powercfg.

powercfg /list
powercfg /getactivescheme

Switch to High performance if needed

# Balanced
powercfg /setactive 381b4222-f694-41f0-9685-ff5bb260df2e

# High performance
powercfg /setactive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c

“High performance does not show up” is completely normal

This is another stumbling point. Microsoft’s documentation states that on devices supporting Modern Standby, only Balanced, or plans derived from Balanced, are allowed. So instead of “High performance is missing — is it broken?”, the answer may be that is how that machine is designed.

Microsoft also advises that if the Power mode cannot be changed, a custom power plan may be selected, so try selecting Balanced first. When the Power mode UI is unresponsive, this is the quickest thing to suspect.

Kill the Background Noise

Windows is a hard worker. Even when you want a quiet benchmark, it does all sorts of things in the background for you.

First, reboot and wait for things to settle

After changing settings, reboot once, and do not run immediately after login — wait a few minutes. Right after startup, updates, indexing, sync, Defender, and assorted residents are still thrashing around.

For serious comparisons, use a clean boot

Microsoft documents a procedure for reducing to a minimal startup configuration via clean boot: stop non-Microsoft services in msconfig and disable Startup apps in Task Manager.

This is powerful for reducing noise. However, it diverges from the everyday environment, so it is suited to “lab comparisons aimed at seeing the code difference.”

Silence notifications

Windows notification banners look light but are surprisingly disruptive. Beyond the visual nuisance, they can change execution timing, focus, and background app activity.

Enable Do not disturb manually, or at minimum turn notifications off during the benchmark.

Suppress search indexing and sync

If the benchmark target reads lots of files, writes lots of artifacts, or rebuilds source trees repeatedly, search indexing and cloud sync quietly sting.

Exclude the benchmark directory from search indexing
Pause OneDrive / Dropbox / Google Drive sync
Close browsers, Teams, Discord, Slack

Nothing flashy here, but when it matters, it matters a lot.

A Comparison That Does Not Align Thermals Is Mostly Comparing Thermals

A CPU or GPU is a different creature when cold versus warmed up. Laptops, thin mini PCs, and small desktops show this most clearly.

Rules to follow

Keep room temperature as consistent as possible
Fix how the laptop is positioned
Fix the AC adapter, dock, and external display configuration
Do no heavy work right before the benchmark
Measure the first run and the steady state separately

Alternate the execution order

Avoid running A 10 times and then B 10 times. The skew of thermals, caches, and background activity piles on.

Recommended patterns:

A B A B A B ...
A B B A A B B A ...
Pre-generate a random order and run in that order

What You Measure Changes What “Fast” Means

Squash “fast” into a single number and you mostly have an accident. The three representative metrics to look at on Windows:

1. Wall-clock time

The time the user waits. It is closest to the end-to-end experience, so this is the first value to look at.

On Windows, QueryPerformanceCounter (QPC) is available for high-resolution timing. In managed code, the Stopwatch family is the standard. Eyeballing milliseconds with DateTime.Now is, frankly, a bit defenseless.

2. CPU time (user + kernel time)

The time the process actually used the CPU, obtainable via GetProcessTimes.

This is useful for looking at computational efficiency. For example, if wall-clock improved but CPU time did not change, caches, I/O, wait time, or scheduling may be the active ingredient.

3. Cycle count (CPU cycles)

QueryProcessCycleTime gives you the CPU cycle count for the whole process.

This is also a CPU-work metric, but it shows a different face than wall-clock. It is particularly useful for asking “the wait time is the same, but did the computation itself get lighter?”

Priority, Affinity, and NUMA Are Last Resorts

These can have an effect. But touching them from the start, just because they work, easily creates a different phenomenon.

First, measure normally

If a difference shows up in the default state, that difference itself has value. Throwing in /high or /affinity from the start imports “conditions that do not occur on real Windows.”

If you use them, be clear about the purpose

/high: you want fewer disturbances from other processes
/affinity: you want to pin CPU placement for the comparison
NUMA control: you want to align memory locality on large machines

The Windows start command can launch with a priority class and affinity mask.

start "" /high /wait myapp.exe --bench case1.json
start "" /affinity F /high /wait myapp.exe --bench case1.json

But skip /realtime

/realtime is available, but you should not use it. It tends to work less as noise removal and more as a generator of new accidents.

A Recommended Measurement Procedure

Putting it all together, here is a procedure that is easy to run in practice.

Lab-leaning comparison procedure

Fix the comparison targets
- commit hash / build number
- compiler / runtime version
- Debug / Release
- logging, asserts, tracing on/off
Fix the machine conditions
- Windows build
- BIOS / UEFI version
- driver version
- AC power
- room temperature, physical placement
Fix the power conditions
- Decide the power mode
- Record the active power plan
Reboot
Wait a few minutes before benchmarking
Clean boot if necessary
Include a warm-up
Alternate A / B runs
Get enough repetitions
Keep median, min, max, p95
Save the raw data
If the difference is small, capture ETW / WPR

Items Worth Recording That Save You Later

In the benchmark CSV or JSON, keeping at least the following pays off.

timestamp,version,scenario,elapsed_ms,user_ms,kernel_ms,cycles,power_mode,power_plan,ac_or_dc,room_temp_c,notes

If possible, these are handy as well.

cpu_package_temp_start_c,cpu_package_temp_end_c,affinity_mask,priority_class,windows_build,driver_version

With benchmarks, being interpretable later often matters more than the measuring itself.

Look at the Median and the Distribution, Not Just the Mean

The mean is convenient, but it breaks easily in Windows benchmarks. Defender kicking in just once, a notification popping, another process hammering the SSD — any of these can drag the mean away.

The recommended combination:

Median: look at this first
p95 / p99: check whether the tail has gotten worse
min / max: see how things stray
Box plots or scatter plots: useful when the difference is small

How to Read a Difference When You See One

Interpreting results is easiest when you look at combinations.

Only wall-clock is faster

Possibly improvements in I/O, wait time, caches, or scheduling.

CPU time and cycles both dropped

There is a good chance the implementation itself got lighter.

Only the first run is slow / fast

That is the cold / warm difference. Suspect startup, initialization, cache generation, JIT.

Gets slower the more runs you do

Suspect thermals, throttling, memory pressure, background activity.

Dig Down to “Why It Is Faster” with ETW / WPR

When the difference is small, or the reason is unreadable, moving on to Windows’s ETW (Event Tracing for Windows) tooling is the classic route.

Microsoft’s Windows Performance Recorder (WPR) is an ETW-based recording tool included in the Windows ADK. It can capture CPU, I/O, context switches, page faults, and more in one go.

At a minimum, it looks like this.

wpr -start CPU -filemode

REM Run the benchmark here

wpr -stop trace.etl

Once you reach this stage, instead of “B is 3% faster,” you can speak with reasons: “B has less lock contention and lower ready time.” “A opens more files and has a slower cold start.”

Summary

When comparing different versions of a program on Windows, what really works is not flashy tricks. What matters is the unglamorous discipline that pays off in reproducibility:

Pin and record AC / power mode / power plan
Separate cold and warm
Alternate A / B runs
Look at the median and the distribution
Clean boot if necessary
If the difference is small, dig to the reason with ETW / WPR

And most important of all: write down, alongside the results, what you pinned and what you did not. A benchmark is a comparison of speed, and at the same time a record of experimental conditions.

A speedup report without conditions is about as entertaining as fortune-telling that occasionally hits — but in terms of reproducibility, it is quite unreliable. Conversely, if the conditions are properly written down, the result has real value even when the difference is small.

References

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

A Windows App Developer's Primer on CPU Settings: Priority, Affinity, and P-cores/E-cores

For Windows app developers: how CPU priority, affinity, P-cores/E-cores, power-saving settings, and EcoQoS/Efficiency Mode relate, and ho...

Read Article

How to Fairly Compare the Execution Speed of C#, C++, Java, and Go

How to fairly compare the execution speed of C#, C++, Java, and Go, covering measurement design, warm-up, environment pinning, how to rea...

Read Article

What Is MFC on Windows? Foundational Knowledge for Maintaining Existing Assets

An overview of the Microsoft Foundation Classes (MFC): its relationship to Win32, application structure, message maps, Document/View, DDX...

Read Article

What to Do Before Disposing of a Windows PC — A Practical Checklist for Data Erasure, Account Unlinking, and Backups

What to do before disposing of, transferring, selling, or returning a leased Windows PC — covering backups, data erasure, BitLocker, Micr...

Read Article

Windows App Outsourcing and Contract Development: What to Sort Out Before You Ask

Before commissioning Windows app outsourcing or contract development, here is how to sort out existing software modification, device inte...

Read Article

Where This Topic Connects

This article connects naturally to the following service pages.

Technical Consulting & Design Review

Designing performance comparisons, aligning measurement conditions, and digging deeper with ETW / WPR all fit well with our technical consulting / design review service.

View Service Contact

Bug Investigation & Root Cause Analysis

When versions differ in speed, the workflow of isolating whether the cause is power conditions, thermals, background noise, or implementation differences proceeds well as a bug investigation / root cause analysis engagement.

View Service Contact

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

View Profile Contact

Public links

GitHub LinkedIn X COM_BLAS COM_BigDecimal

The Conclusion First

Decide First What You Want to Compare

1. A comparison to see the code difference

2. A comparison to see the real user experience

The Main Causes of Variance on Windows

Treat Power Mode and Power Plan Separately

Power conditions to pin down first

“High performance does not show up” is completely normal

Kill the Background Noise

First, reboot and wait for things to settle

For serious comparisons, use a clean boot

Silence notifications

Suppress search indexing and sync

A Comparison That Does Not Align Thermals Is Mostly Comparing Thermals

Rules to follow

Alternate the execution order

What You Measure Changes What “Fast” Means

1. Wall-clock time

2. CPU time (user + kernel time)

3. Cycle count (CPU cycles)

Priority, Affinity, and NUMA Are Last Resorts

First, measure normally

If you use them, be clear about the purpose

But skip /realtime

A Recommended Measurement Procedure

Lab-leaning comparison procedure

Items Worth Recording That Save You Later

Look at the Median and the Distribution, Not Just the Mean

How to Read a Difference When You See One

Only wall-clock is faster

CPU time and cycles both dropped

Only the first run is slow / fast

Gets slower the more runs you do

Dig Down to “Why It Is Faster” with ETW / WPR

Summary

References

Related Articles

Related Topics

Where This Topic Connects

Author Profile

Go Komura