How to Correctly Compare the Speed of Different Program Versions on Windows
· Go Komura · Windows, Benchmark, Performance, Profiling, Power Management
You want to compare version A and version B of a program on Windows. The single worst thing you can do is run each once on the same machine and declare “B seems about 8% faster.”
That 8% might genuinely be the code difference. But in reality, it was one of power mode, power plan, thermals, background updates, search indexing, virus scans, affinity, execution order, or cache state — the classic Windows benchmarking story. It is quite a muddy world.
This article summarizes how to compare the execution speed of different versions of a program on Windows in a form as close to the code difference as possible.
The main target is Windows 11, but most of it — powercfg, start, and so on — works the same on Windows 10.
The Conclusion First
The tricks for improving reproducibility boil down to these six.
-
Decide first “what you want to compare” Whether you want to see the code difference or the real user experience changes which environment factors you should align.
-
Record power mode and power plan as separate things Handle this sloppily on Windows, and your comparison tends to become a comparison of the OS’s power-saving policies.
-
Separate the cold first run from the warmed-up steady state “Only the first run is fast” or “only the later runs are slow” is not unusual.
-
Alternate runs, A→B→A→B Run all of A first and then all of B, and you eat the skew of thermals and background state.
-
Look at the median and the spread, not just the mean One outlier wrecks the whole picture. The mean is more fragile than you think.
-
If the difference is small, dig down to the cause with ETW / WPR Argue from gut feel, and you mostly end up brawling in the fog.
Decide First What You Want to Compare
“Speed comparison” sounds like one thing, but there are actually two kinds.
1. A comparison to see the code difference
You want to know whether the implementation itself got faster due to an algorithm change, data structure change, compiler optimization, runtime update, and so on.
In this case, cut environmental noise as much as possible. A dedicated benchmarking session, fixed power mode, notifications off, search indexing and sync suppressed, and if necessary, go as far as a clean boot.
2. A comparison to see the real user experience
You want to know the speed users will actually feel on their everyday Windows after release.
In this case, you must not erase all the noise that exists in reality. Comparing in a “plausible everyday environment” — including OneDrive sync, Defender, notifications, and normal power settings — gives results closer to reality.
Mix these two, and your conclusions get twisted. Things like “12% faster in the lab but within noise in the real world” or “faster in the real world but unchanged in CPU time” happen routinely.
The Main Causes of Variance on Windows
First, a rough inventory of what makes results wobble.
| Layer | Variance factor | Typical example |
|---|---|---|
| Hardware | CPU / GPU, memory, SSD, cooling | Thinness of a laptop, presence of a cooling pad |
| Firmware | BIOS / UEFI, OEM controls | Power-saving policies, fan control |
| OS | Windows build, drivers, update state | The same PC behaves differently after an update |
| Power | AC / DC, power mode, power plan | On battery, it is a different world |
| Thermals | Room temperature, fans, prior load | Turbo on the first run only, fading later |
| Background | Update, Defender, sync, notifications | A scan or sync runs mid-execution |
| Scheduling | Priority, affinity, NUMA | CPU placement varies by machine |
| Data / cache | OS cache, app cache | Slow only the first time, fast only from the second run |
| Build conditions | Debug / Release, PGO, logging on/off | You are comparing different things to begin with |
In short: even “the same Windows machine” is a different experiment if the conditions are not aligned.
Treat Power Mode and Power Plan Separately
This part matters a lot.
Windows has the Power mode in the Settings app and the traditional Power plan (the power schemes visible via powercfg).
They look similar and tend to get lumped together, but handle them sloppily and the comparison turns to mush.
In the Windows Settings app, you can choose the Power mode from Settings > System > Power & battery.
Microsoft’s documentation states you can switch between Best power efficiency, Balanced, and Best performance separately for Plugged in / On Battery. Furthermore, changing the Power mode also affects the underlying power-related settings and PPM (Processor Power Management) behavior. In other words, this alone can change core parking and performance scaling policy.
The Power plan, on the other hand, is the traditional power scheme: Balanced, High performance, and so on.
You can check it with powercfg /list and powercfg /getactivescheme.
The confusing part is that Windows has both the power mode overlay and the power plan. So record at least the following with your benchmark results:
- AC or battery
- Which power mode
- Which active power plan
Benchmark results missing these three are quite painful to look at later.
Power conditions to pin down first
-
Always compare laptops on AC power Battery operation easily introduces unintended limits.
-
Pin the power mode For benchmarking, try
Best performancefirst. -
Record the active power plan Save the current value with
powercfg.
powercfg /list
powercfg /getactivescheme
- Switch to High performance if needed
# Balanced
powercfg /setactive 381b4222-f694-41f0-9685-ff5bb260df2e
# High performance
powercfg /setactive 8c5e7fda-e8bf-4a96-9a85-a6e23a8c635c
“High performance does not show up” is completely normal
This is another stumbling point. Microsoft’s documentation states that on devices supporting Modern Standby, only Balanced, or plans derived from Balanced, are allowed. So instead of “High performance is missing — is it broken?”, the answer may be that is how that machine is designed.
Microsoft also advises that if the Power mode cannot be changed, a custom power plan may be selected, so try selecting Balanced first. When the Power mode UI is unresponsive, this is the quickest thing to suspect.
Kill the Background Noise
Windows is a hard worker. Even when you want a quiet benchmark, it does all sorts of things in the background for you.
First, reboot and wait for things to settle
After changing settings, reboot once, and do not run immediately after login — wait a few minutes. Right after startup, updates, indexing, sync, Defender, and assorted residents are still thrashing around.
For serious comparisons, use a clean boot
Microsoft documents a procedure for reducing to a minimal startup configuration via clean boot:
stop non-Microsoft services in msconfig and disable Startup apps in Task Manager.
This is powerful for reducing noise. However, it diverges from the everyday environment, so it is suited to “lab comparisons aimed at seeing the code difference.”
Silence notifications
Windows notification banners look light but are surprisingly disruptive. Beyond the visual nuisance, they can change execution timing, focus, and background app activity.
Enable Do not disturb manually, or at minimum turn notifications off during the benchmark.
Suppress search indexing and sync
If the benchmark target reads lots of files, writes lots of artifacts, or rebuilds source trees repeatedly, search indexing and cloud sync quietly sting.
- Exclude the benchmark directory from search indexing
- Pause OneDrive / Dropbox / Google Drive sync
- Close browsers, Teams, Discord, Slack
Nothing flashy here, but when it matters, it matters a lot.
A Comparison That Does Not Align Thermals Is Mostly Comparing Thermals
A CPU or GPU is a different creature when cold versus warmed up. Laptops, thin mini PCs, and small desktops show this most clearly.
Rules to follow
- Keep room temperature as consistent as possible
- Fix how the laptop is positioned
- Fix the AC adapter, dock, and external display configuration
- Do no heavy work right before the benchmark
- Measure the first run and the steady state separately
Alternate the execution order
Avoid running A 10 times and then B 10 times. The skew of thermals, caches, and background activity piles on.
Recommended patterns:
A B A B A B ...A B B A A B B A ...- Pre-generate a random order and run in that order
What You Measure Changes What “Fast” Means
Squash “fast” into a single number and you mostly have an accident. The three representative metrics to look at on Windows:
1. Wall-clock time
The time the user waits. It is closest to the end-to-end experience, so this is the first value to look at.
On Windows, QueryPerformanceCounter (QPC) is available for high-resolution timing.
In managed code, the Stopwatch family is the standard.
Eyeballing milliseconds with DateTime.Now is, frankly, a bit defenseless.
2. CPU time (user + kernel time)
The time the process actually used the CPU, obtainable via GetProcessTimes.
This is useful for looking at computational efficiency. For example, if wall-clock improved but CPU time did not change, caches, I/O, wait time, or scheduling may be the active ingredient.
3. Cycle count (CPU cycles)
QueryProcessCycleTime gives you the CPU cycle count for the whole process.
This is also a CPU-work metric, but it shows a different face than wall-clock. It is particularly useful for asking “the wait time is the same, but did the computation itself get lighter?”
Priority, Affinity, and NUMA Are Last Resorts
These can have an effect. But touching them from the start, just because they work, easily creates a different phenomenon.
First, measure normally
If a difference shows up in the default state, that difference itself has value.
Throwing in /high or /affinity from the start imports “conditions that do not occur on real Windows.”
If you use them, be clear about the purpose
- /high: you want fewer disturbances from other processes
- /affinity: you want to pin CPU placement for the comparison
- NUMA control: you want to align memory locality on large machines
The Windows start command can launch with a priority class and affinity mask.
start "" /high /wait myapp.exe --bench case1.json
start "" /affinity F /high /wait myapp.exe --bench case1.json
But skip /realtime
/realtime is available, but you should not use it.
It tends to work less as noise removal and more as a generator of new accidents.
A Recommended Measurement Procedure
Putting it all together, here is a procedure that is easy to run in practice.
Lab-leaning comparison procedure
- Fix the comparison targets
- commit hash / build number
- compiler / runtime version
- Debug / Release
- logging, asserts, tracing on/off
- Fix the machine conditions
- Windows build
- BIOS / UEFI version
- driver version
- AC power
- room temperature, physical placement
- Fix the power conditions
- Decide the power mode
- Record the active power plan
- Reboot
- Wait a few minutes before benchmarking
- Clean boot if necessary
- Include a warm-up
- Alternate A / B runs
- Get enough repetitions
- Keep median, min, max, p95
- Save the raw data
- If the difference is small, capture ETW / WPR
Items Worth Recording That Save You Later
In the benchmark CSV or JSON, keeping at least the following pays off.
timestamp,version,scenario,elapsed_ms,user_ms,kernel_ms,cycles,power_mode,power_plan,ac_or_dc,room_temp_c,notes
If possible, these are handy as well.
cpu_package_temp_start_c,cpu_package_temp_end_c,affinity_mask,priority_class,windows_build,driver_version
With benchmarks, being interpretable later often matters more than the measuring itself.
Look at the Median and the Distribution, Not Just the Mean
The mean is convenient, but it breaks easily in Windows benchmarks. Defender kicking in just once, a notification popping, another process hammering the SSD — any of these can drag the mean away.
The recommended combination:
- Median: look at this first
- p95 / p99: check whether the tail has gotten worse
- min / max: see how things stray
- Box plots or scatter plots: useful when the difference is small
How to Read a Difference When You See One
Interpreting results is easiest when you look at combinations.
Only wall-clock is faster
Possibly improvements in I/O, wait time, caches, or scheduling.
CPU time and cycles both dropped
There is a good chance the implementation itself got lighter.
Only the first run is slow / fast
That is the cold / warm difference. Suspect startup, initialization, cache generation, JIT.
Gets slower the more runs you do
Suspect thermals, throttling, memory pressure, background activity.
Dig Down to “Why It Is Faster” with ETW / WPR
When the difference is small, or the reason is unreadable, moving on to Windows’s ETW (Event Tracing for Windows) tooling is the classic route.
Microsoft’s Windows Performance Recorder (WPR) is an ETW-based recording tool included in the Windows ADK.
It can capture CPU, I/O, context switches, page faults, and more in one go.
At a minimum, it looks like this.
wpr -start CPU -filemode
REM Run the benchmark here
wpr -stop trace.etl
Once you reach this stage, instead of “B is 3% faster,” you can speak with reasons: “B has less lock contention and lower ready time.” “A opens more files and has a slower cold start.”
Summary
When comparing different versions of a program on Windows, what really works is not flashy tricks. What matters is the unglamorous discipline that pays off in reproducibility:
- Pin and record AC / power mode / power plan
- Separate cold and warm
- Alternate A / B runs
- Look at the median and the distribution
- Clean boot if necessary
- If the difference is small, dig to the reason with ETW / WPR
And most important of all: write down, alongside the results, what you pinned and what you did not. A benchmark is a comparison of speed, and at the same time a record of experimental conditions.
A speedup report without conditions is about as entertaining as fortune-telling that occasionally hits — but in terms of reproducibility, it is quite unreliable. Conversely, if the conditions are properly written down, the result has real value even when the difference is small.
References
- Microsoft Support: Change the power mode for your Windows PC
- Microsoft Learn: Power Policy Settings
- Microsoft Learn: Customize the Windows performance power slider
- Microsoft Learn: Powercfg command-line options
- Microsoft Support: How to perform a clean boot in Windows
- Microsoft Support: Notifications and Do Not Disturb in Windows
- Microsoft Support: Search indexing in Windows
- Microsoft Learn: Configure custom exclusions for Microsoft Defender Antivirus
- Microsoft Support: Device Security in the Windows Security App
- Microsoft Learn: QueryPerformanceCounter function
- Microsoft Learn: Acquiring high-resolution time stamps
- Microsoft Learn: GetProcessTimes function
- Microsoft Learn: QueryProcessCycleTime function
- Microsoft Learn: start command
- Microsoft Learn: SetPriorityClass function
- Microsoft Learn: SetProcessAffinityMask function
- Microsoft Learn: Processor Groups
- Microsoft Learn: Windows Performance Recorder
- Microsoft Learn: WPR Command-Line Options
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
A Windows App Developer's Primer on CPU Settings: Priority, Affinity, and P-cores/E-cores
For Windows app developers: how CPU priority, affinity, P-cores/E-cores, power-saving settings, and EcoQoS/Efficiency Mode relate, and ho...
How to Fairly Compare the Execution Speed of C#, C++, Java, and Go
How to fairly compare the execution speed of C#, C++, Java, and Go, covering measurement design, warm-up, environment pinning, how to rea...
What Is MFC on Windows? Foundational Knowledge for Maintaining Existing Assets
An overview of the Microsoft Foundation Classes (MFC): its relationship to Win32, application structure, message maps, Document/View, DDX...
What to Do Before Disposing of a Windows PC — A Practical Checklist for Data Erasure, Account Unlinking, and Backups
What to do before disposing of, transferring, selling, or returning a leased Windows PC — covering backups, data erasure, BitLocker, Micr...
Windows App Outsourcing and Contract Development: What to Sort Out Before You Ask
Before commissioning Windows app outsourcing or contract development, here is how to sort out existing software modification, device inte...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Technical Consulting & Design Review
Designing performance comparisons, aligning measurement conditions, and digging deeper with ETW / WPR all fit well with our technical consulting / design review service.
Bug Investigation & Root Cause Analysis
When versions differ in speed, the workflow of isolating whether the cause is power conditions, thermals, background noise, or implementation differences proceeds well as a bug investigation / root cause analysis engagement.
Author Profile
Profile page for the article author.
Go Komura
Representative of KomuraSoft LLC
Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.
Public links