How to Fairly Compare the Execution Speed of C#, C++, Java, and Go
· Go Komura · Benchmark, Performance, C#, C++, Java, Go
“C++ is supposed to be fast.” “Go is lightweight in production.” “Java gets really fast on long-running workloads.” “C# is surprisingly strong too, thanks to the .NET JIT.”
You hear claims like these all the time. But the single worst thing you can do here is take numbers measured by different people in different environments, line them up, and declare a winner among languages.
C# and Java are heavily affected by JIT and warm-up, while C++ and Go are normally compiled ahead of time. The presence and characteristics of GC differ too. Differences in standard library and ecosystem library implementations matter quite a lot. And even on the same machine, results easily wobble due to power settings, heat, background activity, and skew in the input data. It is a rather messy world.
In this article, we lay out how to measure C# / C++ / Java / Go as fairly as possible. To give away the conclusion up front: the most important thing is not trying to decide “which language is fastest” with a single number.
The main subject of this article is, strictly, how to structure the comparison. Lining up environment-dependent numbers that merely look plausible turns into fortune-telling, so we will not publish a measured ranking here. Instead, we focus on how to design the comparison so it actually has value.
The Conclusion First
In a C# / C++ / Java / Go speed comparison, these seven things are what really matter.
-
Decide first what kind of speed you want to compare Whether it is startup time, steady-state throughput, p95 latency, or memory efficiency changes how you measure.
-
Never draw conclusions from a single benchmark CPU computation, memory allocation, parallel processing, and startup time each make different languages and runtimes look strong.
-
Separate cold and warm for C# and Java Mixing comparisons that include the first run with steady-state comparisons after warm-up twists the whole discussion.
-
Measure with the same algorithm, the same input, and the same correctness check “It wasn’t a faster implementation, it was just solving a different problem” is a classic benchmark failure.
-
Separate per-language microbenchmarks from cross-language end-to-end benchmarks Each language’s dedicated harness is convenient, but cross-language comparisons are better run by a common external runner.
-
Look at the median and the distribution, not just the mean A single GC pause or background task hitting one run is enough to wreck the average.
-
Record the conditions, not just the numbers A benchmark result is a record of the experimental conditions just as much as a record of speed. Results without documented conditions become quite painful later.
What to Decide First
If you let “fast” be a single word, things usually go wrong. Start by deciding what you will call fast.
Even for the same program, what you want to look at can differ considerably.
1. Do you want to look at startup time?
For CLI tools, short-lived batch jobs, and helper tools that start once and exit immediately, cold start and process startup matter. On this axis, results change dramatically depending on whether JIT and class-loading initialization costs are included.
2. Do you want to look at long-running throughput?
For servers, resident processes, workers, and long-running conversion jobs, steady-state throughput is what counts. In that case, being slow only on the first run is not the point; the question is how stable and how high it gets after warm-up.
3. Do you want to look at tail latency?
For APIs, UI, and near-real-time processing, p95 / p99 can matter more than the mean. Even if the average is fast, occasional long stalls hurt from a user experience and SLA perspective.
4. Do you want to include memory efficiency?
If you only look at CPU time and ignore peak RSS, allocation volume, GC count, and GC pauses, you will misjudge the real operational weight. “Fast but eats a lot of memory” versus “a bit slower but consistently lightweight” can flip in ranking depending on the use case.
In short, the question you should settle first is
What this comparison should answer is not which language is fast, but which workload, under which conditions, on which metric, can be processed faster.
If you start collecting numbers while this is still vague, nothing will hold together at the end.
Why Comparing Languages Is Hard
Mixing JIT and AOT turns it into a different experiment
C# and Java are normally affected by JIT. C++ and Go, on the other hand, are normally compiled ahead of time.
That means if you measure the first run, you are measuring not only the speed of the program itself but also runtime startup, class loading, and JIT preparation. Conversely, if you only look at fully warmed-up runs, the comparison becomes how far steady-state optimization can go.
Both are meaningful. But they do not mean the same thing.
Implementation differences routinely outweigh language differences
Even for the same “sort”,
- one side uses the standard library
- one side is hand-rolled
- one side does extra copies
- one side regenerates the input every time
That alone changes results considerably.
Moreover, once you get into JSON, compression, cryptography, or regular expressions, library implementation differences matter far more than the language itself. Unless you make explicit what you are measuring, what you intended as a “language comparison” becomes a “library comparison”.
C++ has a trap where optimization deletes the work
Especially in microbenchmarks, when the compiler decides “nobody is using this result”, it can eliminate the computation entirely. Then it is not that the code is fast — it is that it is doing nothing at all, which makes for a small horror story.
This problem tends to show up especially blatantly in C++, so consuming the result, printing a checksum, or using the benchmark framework’s optimization-suppression facilities is quite important.
GC is neither an “advantage” nor a “disadvantage” — it is a characteristic
C#, Java, and Go have a GC. Reducing this to “it has GC, therefore it is slow” is far too crude.
In practice, what matters more is
- how large numbers of short-lived objects are handled
- heap size configuration
- GC frequency and pauses
- object layout
- the allocation habits of libraries
Conversely, C++ allows fine-grained control via manual management and RAII, but that means design and implementation differences show up more easily. In other words, a difference in memory management strategy is not, by itself, a verdict of good or bad.
What Not to Do in a Comparison
1. Mixing Debug and Release
This is out of the question. Always align the comparison targets on production-grade optimized builds.
2. Not solving the same problem
Different input formats, different output, error handling present on only one side, different memory reuse policies. Leave these unaddressed and you end up measuring requirement differences, not speed.
3. Running once and drawing a conclusion
A single run is mostly noise.
- JIT
- page cache
- CPU boost
- heat
- background tasks
- GC
- first-time file reads
All of these mix together in a single run.
4. Blurring warm-up
When measuring C# and Java, if you are vague about whether the first run is included or only post-warm-up runs count, the discussion collapses. Treat cold and warm as separate things.
5. Skipping correctness checks
Before “fast”, a benchmark needs “returns the same result”. Always verify that every implementation under comparison produces the same checksum or the same output from the same input.
6. Building a worldview from a single microbenchmark
Winning a tight loop does not mean winning across a real service. Conversely, losing on startup time can still mean being plenty strong on long-running workloads.
The Basic Approach When Comparing C# / C++ / Java / Go
This part is quite important. Our recommendation is a two-layer structure.
1. For per-language measurement, use the harness suited to that language
Each language has benchmark tools that absorb that language’s particular quirks.
- C#: BenchmarkDotNet
- Java: JMH
- Go:
go test -benchandbenchstat - C++: Google Benchmark
These take care of each runtime’s quirks, the statistical processing, and the common measurement traps to a reasonable degree. They are quite effective for comparisons within a language and for drilling into an implementation.
2. For cross-language comparison, put a common runner on the outside
On the other hand, placing C# BenchmarkDotNet results next to Java JMH results as-is is a bit dangerous. The harnesses themselves follow different conventions.
So for cross-language work, we recommend turning each implementation into an executable that can be invoked with the same CLI contract and driving them all from outside under identical conditions.
For example, prepare an executable of this shape in each language.
bench --scenario sort_int32 --dataset data/sort_10m.bin --mode warm
bench --scenario group_words --dataset data/words_100mb.txt --mode cold
bench --scenario parallel_hash --dataset data/blob_1gb.bin --threads 8
Then, on the common runner side,
- randomize the execution order
- separate cold / warm
- pass the same dataset
- verify the checksum
- collect wall-clock time and memory
- keep the raw data in CSV / JSON
This makes it much easier to handle best practices within each language and cross-language fairness as separate concerns.
Concrete Example: What Benchmark Scenarios to Prepare
When asked to compare C# / C++ / Java / Go, our recommendation is: if you can only run one, pick a simple CPU-bound scenario that is hard to misinterpret; if you can run several, prepare 3-4 workloads with different characters.
Recommended lineup
1. sort_int32_10m
Purpose: observe CPU + memory bandwidth + use of temporary storage
- Input: 10 million
int32values generated with a fixed seed - Processing: sort the array and return a checksum
- Caveat: restore the same unsorted input every iteration
This one is relatively easy to understand. However, it includes differences in standard sort implementations, so it is a comparison including the standard library rather than the language itself.
2. hash_group_count
Purpose: observe hash tables, string processing, allocations, and GC tendencies
- Input: a fixed text dataset
- Processing: count occurrences of each word
- Output: top N entries plus a checksum
This is close to real-world work, but string library and map implementation differences also matter considerably. In exchange, it is a more realistic comparison.
3. parallel_sha256
Purpose: observe parallelism, the scheduler, worker pools, and synchronization habits
- Input: a sequence of fixed-size binary chunks
- Processing: hash them across N threads and return a final checksum
- Conditions: step the thread count through 1 / 2 / 4 / 8 and so on
Compared to a simple tight loop, this makes scaling behavior under parallel execution much easier to see.
4. startup_noop or startup_parse_small
Purpose: observe startup time
noop: start and exit immediatelyparse_small: process one small input and exit
Here the JIT and initialization costs of C# / Java become visible, and the picture differs quite a bit from C++ / Go. Conversely, even if a gap shows up here, it is a separate question from who wins on long-running workloads.
What about JSON and HTTP benchmarks?
JSON and HTTP are close to real-world work, so of course they are meaningful. However, in that case it becomes a comparison including libraries, frameworks, and the ecosystem, rather than a language comparison.
That in itself is not bad. In fact, in practice that is often the more important question. But in an article or report, it causes less misunderstanding if you state explicitly:
This is not a comparison of languages but a comparison of typical implementations together with the major libraries.
Conditions to Align per Language
C++
- Align on optimized builds
- Pin the compiler
- Pin the standard library implementation
- Document conditions such as
-O3//O2, LTO, and PGO - Watch out for results being optimized away
- Suspect undefined behavior whenever something looks suspiciously fast
C++ has a lot of freedom, which means differences in conditions show up directly. Therefore, which compiler, with which flags, against which STL you measured is quite important.
C#
- Align on Release builds
- Pin the .NET version
- Record conditions such as Server GC / Workstation GC
- Document whether Tiered Compilation, ReadyToRun, and Native AOT are in play
- Separate cold and warm
For C#, differences in .NET configuration change how things look.
In particular, JIT-compiled C# and Native AOT C# are different axes even though both are “C#”.
Mix them and what you are comparing is no longer the language but the deployment form.
Java
- Pin the JDK vendor and version
- Document the GC
- Pin warm-up / measurement / fork settings
- Record the heap size and JVM options
- Separate cold start and steady state
Java benefits readily from the JIT, but in exchange, how it looks on the first run varies considerably. Therefore, separating short-lived process comparisons from long-running comparisons is mandatory.
Go
- Pin the Go version
- Pin
GOMAXPROCS - Document
CGO_ENABLED - If you touch
GOGC, always record it - Keep benchmark-format output if possible
Go is relatively easy to handle, but in parallel benchmarks the impact of GOMAXPROCS is large.
Also, whether or not you use cgo changes the whole picture, so always record that in the conditions.
How to Align the Execution Environment
In any language, a comparison without an aligned environment is mostly a comparison of environments.
Things to align
- Same CPU / memory / storage
- Same OS version
- Same power conditions
- Conditions close to the same room temperature
- Same input data
- Same process priority
- Same core-count conditions
- Same container-or-bare-metal conditions
Things that matter especially
Power settings and CPU frequency
On a laptop, AC power versus battery alone puts you in a different world. If the CPU governor or power mode is not aligned, comparison results wobble considerably.
For how to align power conditions, notifications, background noise, heat, and execution order on Windows, see our other article How to Compare the Execution Speed of Different Versions of a Program on Windows where this is covered in detail. If you measure on Windows, this matters a lot.
Heat
If only the first few runs are fast and later runs degrade, suspect heat and throttling. Rather than running all of A and then all of B, alternating like A / B / A / B reduces the bias.
Background activity
Updates, indexing, sync, virus scans, browsers, chat tools. These are unglamorous, but they routinely interfere.
What to Measure
For language comparisons, we recommend looking at at least these four separately.
1. Wall-clock time
The real time the user waits. This is the first metric to look at.
2. CPU time
“How much CPU was actually consumed.” If only the wall-clock time is faster while CPU time stays the same, the difference may come from waiting or I/O.
3. Memory / allocations
- peak RSS
- total allocation volume
- allocation count
- GC count
- GC pauses
Looking at these reveals the cost behind the speed.
4. Distribution
- median
- p95 / p99
- min / max
- standard deviation and spread
If you talk in averages only, you never see the true nature of the runs that occasionally spike.
Recommended Execution Procedure
The flow that works well in practice goes roughly in this order.
1. Decide the workload
First, make explicit what you want to compare.
- startup time
- steady-state throughput
- tail latency
- memory efficiency
- parallel scaling
2. Fix a common dataset
Align the input data using a fixed seed or a fixed file. If you include data generation in the measurement, it too must run under the same conditions in every language.
3. Pass correctness checks first
Confirm that all implementations return the same result on both small and large data. Having them emit a checksum or hash makes this easy to handle.
4. Pin the build conditions
Produce Release / optimized executables in each language, and record versions and flags.
5. Separate cold and warm
This is especially important for C# and Java.
- cold: includes the moment right after process startup
- warm: the stable state after several runs
These two are cleaner kept out of the same table.
6. Alternate or randomize the execution order
Example:
cpp -> csharp -> java -> go
go -> java -> cpp -> csharp
csharp -> go -> java -> cpp
...
This reduces bias from heat and noise.
7. Secure enough iterations
For lightweight microbenchmarks, run a great many iterations; for end-to-end runs, you want at least 10. When the difference is small and the iteration count is low, interpretation becomes quite precarious.
8. Save the raw data
Keep the raw data of every run, not just the aggregates. Looking back later, you can read outliers and warm-up quirks out of it.
9. Profile when a difference appears
Only when a difference appears do you start digging into the cause.
- CPU profile
- allocation profile
- GC logs
- flame graphs
- OS-level traces
Once you get this far, you can discuss why it happens, not just “fast / slow”.
How to Read the Results
Even after the numbers are in, misreading them is still dangerous.
C# / Java are slow only on the first run
Suspect JIT, class loading, and initialization. In that case,
- if startup time matters, it is a meaningful difference
- if long-running operation is the subject, it is a difference that belongs in a separate table
C++ is strong in tight loops
Low-level optimization, object layout, and minimal runtime overhead may be paying off. However, looking only at that and concluding “therefore it is the fastest in a real service too” is a leap.
Go looks favorable on startup time and ease of distribution
The single binary, the relatively light startup, and the approachable concurrency model can pay off. However, that does not mean it is favorable for every CPU-bound workload.
C# / Java catch up considerably — or overtake — at steady state
JIT optimization may be kicking in. This is not a rare story either. That is why it is important not to mix startup-inclusive comparisons with steady-state comparisons.
Large differences on allocation-heavy workloads
In this case, more than the language name, what usually matters is
- memory layout
- how strings and maps are handled
- GC behavior
- extra copies
Recording Template
Keep at least these fields with every benchmark result and you will thank yourself later.
timestamp,language,scenario,run_kind,cold_or_warm,elapsed_ms,cpu_ms,max_rss_mb,alloc_bytes,gc_count,checksum
compiler_or_runtime,compiler_version,flags,os,cpu,threads,input_id,notes
For example, run_kind can be split like this.
micromacrostartupparallel
For cold_or_warm, you definitely want to make explicit which one it is.
coldwarm
With benchmarks, being interpretable later often matters more than the act of measuring.
Summary
What really matters in a C# / C++ / Java / Go speed comparison is taking the crude question of which language is fastest and turning it into the shape of an experiment: which workload, under which conditions, on which metric, are we comparing?
The points that are particularly hard to get wrong are these.
- Separate startup time from steady state
- Measure with the same algorithm, the same input, and the same correctness check
- Never draw conclusions from a single benchmark
- Separate per-language benchmarks from cross-language benchmarks
- Look at the median and the distribution rather than the mean
- Keep the conditions and the raw data
And the most important thing of all: do not try too hard to decide winners and losers by language name. Real-world performance is determined by the combination of language, runtime, libraries, build conditions, data, OS, and hardware.
“C++ is fast”, “Java is strong”, “Go is lightweight”, “C# is plenty fast too” — in some sense, all of these are true. But once under which conditions you are saying so drops out, it mostly turns into a fistfight in the fog.
Align the conditions, use multiple workloads, separate cold / warm, and look all the way down to the distribution. Unglamorous, but in the end this is what wins.
References
-
BenchmarkDotNet Getting Started https://benchmarkdotnet.org/articles/guides/getting-started.html
-
OpenJDK JMH Project https://openjdk.org/projects/code-tools/jmh/
-
JMH GitHub Repository / README https://github.com/openjdk/jmh
-
Go
testingpackage https://pkg.go.dev/testing -
Go
benchstathttps://pkg.go.dev/golang.org/x/perf/cmd/benchstat -
Google Benchmark User Guide https://google.github.io/benchmark/user_guide.html
-
How to Compare the Execution Speed of Different Versions of a Program on Windows https://comcomponent.com/en/blog/2026/03/16/002-windows-benchmark-comparing-program-versions/
Related Topics
Pages that are easier to understand when read together with this article.
Where to Discuss This Topic
Designing performance comparisons, aligning measurement conditions, interpreting results, and digging into root causes are a great fit for the following services.
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
A Checklist for Safely Handling Child Processes in Windows Apps
Safely handling child processes in Windows apps depends less on the launch API and more on designing process tree ownership and shutdown ...
Shared Memory Pitfalls and Practical Best Practices
The pitfalls of using shared memory in production, and a design approach that lowers the accident rate by covering synchronization, visib...
How to Correctly Compare the Speed of Different Program Versions on Windows
A reproducible procedure for comparing program versions on Windows, covering power mode, power plan, thermals, background noise, measurem...
Calling a C# Native AOT DLL from C/C++
How to publish a C# class library as a native DLL with Native AOT and call UnmanagedCallersOnly entry points from C/C++ — when this setup...
What Is MFC on Windows? Foundational Knowledge for Maintaining Existing Assets
An overview of the Microsoft Foundation Classes (MFC): its relationship to Win32, application structure, message maps, Document/View, DDX...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Technical Consulting & Design Review
Designing performance comparisons, aligning measurement conditions, and reading warm-up behavior and statistics correctly are a great fit for technical consulting and design reviews.
Bug Investigation & Root Cause Analysis
Isolating the cause of performance differences across languages and versions, pinpointing bottlenecks, and validating measurement procedures are well suited to bug investigation and root cause analysis.
Author Profile
Profile page for the article author.
Go Komura
Representative of KomuraSoft LLC
Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.
Public links