Building a Windows Failure-Path Test Foundation with Application Verifier

· · Windows Development, Bug Investigation, Industrial Camera, Application Verifier, Failure-Path Testing, Handle Leak

Application Verifier is a powerful tool when you want to surface, ahead of time, the anomalies that occur in Windows native code and at the Win32 boundary. Especially when you want to test handle anomalies, heap corruption, and low-resource failure paths, it can bring out problems quite quickly that normal-path testing alone would never show.

In Part 1, When an Industrial Camera Control App Suddenly Crashes After One Month (Part 1) - Finding Handle Leaks and Designing Logs for Long-Running Operation, we covered a case where investigating a control app that crashed after long-running operation revealed a handle leak as the cause. But strengthening the logs is only half the job. What you really want is to be able to test, in advance, whether you are in a state where “you can tell what happened” if an unexpected programming mistake ever causes a memory leak, a handle leak, a partial failure, or a missed release in the future.

That is where we used Application Verifier. It is a tool that lets you inject runtime checks and fault injection into code running in Windows native code and at the Win32 boundary. What is especially convenient in practice is that you can trigger memory-exhaustion-like and resource-exhaustion-like failure modes ahead of time, without actually devouring the machine’s memory.

In this second part, we organize what Application Verifier is, what it can do, and how to build it into a failure-path test foundation, in the context of an industrial camera control app.

Table of Contents

  1. The Conclusion First (In One Line)
  2. What Is Application Verifier?
    • 2.1. In One Sentence
    • 2.2. Where It Shines
    • 2.3. What You Gain
  3. What Application Verifier Can Do
    • 3.1. Basics: Handles / Heaps / Locks / Memory / TLS, etc.
    • 3.2. Low Resource Simulation: Front-Loading Memory and Resource Exhaustion
    • 3.3. Page Heap and the Debugger
    • 3.4. !avrf / !htrace / Logs
  4. Why We Introduced It This Time
    • 4.1. The Goal Is Not Just “Finding Bugs”
    • 4.2. Triggering Memory-Exhaustion-Like Phenomena
    • 4.3. Verifying We Can Trace Handle Anomalies When They Occur
  5. How to Trigger Memory- and Resource-Exhaustion-Like Phenomena
    • 5.1. The Idea Behind Low Resource Simulation
    • 5.2. What You Can Make Fail
    • 5.3. How to Apply It in Practice
  6. How to Look at Handle Anomalies
    • 6.1. The Handles Check
    • 6.2. Viewing Open / Close Stacks with !htrace
    • 6.3. How to Combine It with Your Own Logs
  7. How to Build a Failure-Path Test Foundation
    • 7.1. Move the Execution Unit into a Harness
    • 7.2. Split the Test Menu
    • 7.3. What to Collect
    • 7.4. Acceptance Criteria
    • 7.5. Caveats
  8. A Rough Decision Guide
  9. Summary
  10. References

1. The Conclusion First (In One Line)

  • Application Verifier is a tool that makes misuse at Windows’ unmanaged / native boundary easier to catch at runtime
  • Its value is not only “finding bugs,” but forcing rarely seen failure paths to occur ahead of time
  • Handles detects invalid handles, Heaps exposes heap corruption, and Low Resource Simulation performs fault injection of memory-exhaustion-like and resource-exhaustion-like situations
  • Delegating the leak investigation of a long-running resident EXE entirely to Application Verifier is a bad approach; combining it with your own Handle Count and resource-lifecycle logs is the realistic path
  • In a failure-path test foundation, it is easier to read results if you run a normal-path verifier run and a fault injection run separately
  • Even when you want to test a DLL, what you enable Application Verifier on is the test EXE that actually exercises that DLL

In short, Application Verifier is a tool for dragging the “nasty bugs” living around Windows’ native / Win32 boundary out into the open. It is an especially good fit in worlds like equipment control apps, where native SDKs, P/Invoke, and Win32 APIs routinely mix.

2. What Is Application Verifier?

2.1. In One Sentence

Application Verifier is a runtime verification tool for Windows user-mode applications. It monitors how a running app uses OS APIs and handles resources, detecting suspicious usage and letting you deliberately inject failures.

Unlike “static analysis” or “unit testing,” it is a tool for seeing how things break when that code path is actually exercised. That makes it well suited for flushing out failure paths that routine functional testing never reaches.

Test harnessControl app / SDK wrapperApplication VerifierWin32 API / native DLL / OS resourcesverifier stopdebugger outputAppVerifier logsOwn structured log

2.2. Where It Shines

It tends to be especially effective in situations like these.

  • You call native DLLs or a camera SDK
  • You cross P/Invoke or COM boundaries
  • You use handles, heaps, locks, and virtual memory heavily, directly or indirectly
  • The app rarely crashes on the normal path, but lifetime management looks fragile on the failure paths
  • “Occasionally returns strange failures” shows up before “crashes”

Conversely, it is not a tool for tracing object graphs in the purely managed world. So even in a C# app, it pays off considerably if the native SDK or Win32 boundary is thick — but it is not a single tool for fully investigating pure managed heap leaks.

2.3. What You Gain

In practice, the benefits boil down to roughly these three.

  1. Stop native-boundary misuse early
    • invalid handles
    • heap corruption
    • lock misuse
    • virtual memory API misuse, etc.
  2. Front-load failure modes that only appear under low resources
    • malloc-equivalents occasionally fail
    • CreateEvent and CreateFile occasionally fail
    • VirtualAlloc fails
  3. Easier tracing when combined with a debugger
    • !avrf
    • !htrace
    • !heap -p -a
    • verifier stop logs

What hurts in equipment control apps is “not knowing what happened on the failure path.” Application Verifier is quite effective at reducing that “not knowing.”

3. What Application Verifier Can Do

3.1. Basics: Handles / Heaps / Locks / Memory / TLS, etc.

Application Verifier’s basic set is Basics. The checks you use most in practice are gathered here.

Layer What it watches How it applies in this context
Handles Use of invalid handles Whether you are stepping on closed / corrupted handles
Heaps Heap corruption Flushing out buffer corruption and use-after-free at the native SDK boundary
Leak Resources not released at DLL unload Tests of short-lived harnesses, and cases that include unloads
Locks / SRWLock Lock misuse Checking races between reconnect and shutdown
Memory Misuse of VirtualAlloc / MapViewOfFile, etc. Checking anomalies around large buffers and shared memory
TLS Misuse of Thread Local Storage APIs Insurance for native code with complex thread boundaries
Threadpool Consistency of threadpool APIs and worker state Backup when callbacks and async processing are abundant

The point is to stop suspicious usage on the spot, rather than “read about it after the crash.” For long-running defects, this front-loading pays off considerably.

3.2. Low Resource Simulation: Front-Loading Memory and Resource Exhaustion

This is the genuinely convenient part in practice. That is because you can trigger phenomena close to memory exhaustion and resource exhaustion without actually devouring the RAM.

The idea is simple.

  • Take a certain API call
  • With a certain probability
  • Make it fail on purpose

This lets you exercise error paths that are practically never taken otherwise.

Concretely, it becomes easy to trigger phenomena like these on purpose.

  • HeapAlloc and VirtualAlloc fail
  • CreateFile fails
  • CreateEvent fails
  • MapViewOfFile fails
  • OLE/COM allocations like SysAllocString fail

This is far more manageable than trying to genuinely exhaust memory and torturing the whole machine. What is more, you can target fault injection at specific DLLs only. For configurations like equipment control apps where your own wrappers mix with vendor SDKs, this is quite practical.

3.3. Page Heap and the Debugger

For heap corruption, the combination of Heaps and page heap is strong. Full page heap in particular has the advantage of using guard pages to stop close to the moment of corruption.

However, it is quite heavy. Rather than long brute-force runs, it is more usable to narrow down to scenarios close to the repro and run them under the debugger.

So as an operating practice, a split like this is realistic.

  • First apply Basics broadly
  • Once the heap looks suspicious, use full page heap
  • If it is too heavy, fall back to light page heap
  • For production-like long-run testing, rely primarily on your own logs

Ultimately, AppVerifier is not a magic wand but a tool whose blade you swap per situation.

3.4. !avrf / !htrace / Logs

Application Verifier does not just raise a stop and walk away. With its debugger extensions and logs, what happened becomes easier to chase.

  • !avrf
    • View the current verifier settings and the stop currently raised
  • !htrace
    • View the stacks of a handle’s open / close / invalid references
  • !heap -p -a
    • Combined with page heap, trace the corrupted heap block
  • AppVerifier logs
    • Logs can be kept for when a stop occurs

It is especially welcome that enabling Handles automatically enables handle tracing. This makes it much easier to trace, after the fact, “where this handle was opened and where it was closed.”

4. Why We Introduced It This Time

4.1. The Goal Is Not Just “Finding Bugs”

Our goal this time was not simply “find one bug with AppVerifier.” Put more practically, what we wanted to verify was the following.

  • When a resource leak happens again on some other failure path in the future
  • Will the logs properly retain the context?
  • Can we chase it down to the end, together with debugger information?
  • Will we avoid ending up in a “no idea what happened” state?

In other words, we used it not only as a detector, but as a test of our observation infrastructure.

4.2. Triggering Memory-Exhaustion-Like Phenomena

Genuinely causing memory exhaustion on a regular development machine is fairly tedious. Worse, once the whole machine becomes unstable, the test itself fills with noise.

So we used Low Resource Simulation to go in the direction of deliberately stepping on the failure paths that memory or resource exhaustion would likely trigger.

This makes it much easier to answer questions like these.

  • If CreateEvent fails, do cameraId and phase remain in the logs?
  • After a half-finished initialization, does cleanup actually run?
  • If VirtualAlloc fails, does the retry avoid corrupting state?
  • If CreateFile fails on the save path, does the handle come back?

What we want to emphasize is that causing the anomaly is not the goal; the goal is that the failure mode is readable when the anomaly occurs.

4.3. Verifying We Can Trace Handle Anomalies When They Occur

As with the handle leak in Part 1, with handles the place that finally crashes and the true cause easily drift apart.

So what we wanted to confirm was this.

  • When an invalid handle stop is raised, can we trace the open / close with !htrace?
  • Does it tie back to the resourceId / sessionId / phase in our own logs?
  • Does the handle count come back down after the failure?
  • When the harness is a short-lived process, are the leak deltas easy to read?

Once you can see this far, you can go from a mere “a bug appeared” to “which responsibility’s lifetime management broke down.”

5. How to Trigger Memory- and Resource-Exhaustion-Like Phenomena

5.1. The Idea Behind Low Resource Simulation

Low Resource Simulation is, in plain terms, fault injection. Rather than faithfully recreating a low-resource environment, the idea is to artificially mix in the representative API failures that occur under low resources.

So its use cases are quite clear-cut.

  • Verifying cleanup on failure paths
  • Verifying the robustness of retry / reconnect
  • Verifying initialization where partial successes and partial failures mix
  • Verifying that logs remain even for “failures that normally never happen”

The trick here is to not fail everything from the start. If you turn everything on at once, the logs explode and you lose track of “what you are even looking at.”

5.2. What You Can Make Fail

With Low Resource Simulation, you can probabilistically fail the following representative classes of APIs.

Class Examples Examples in an equipment control app
Heap_Alloc Heap allocation Temporary buffers, image metadata, SDK-wrapper internal allocations
Virtual_Alloc Virtual memory allocation Larger frame buffers, ring buffers
File CreateFile, etc. Opens of save paths and log files
Event CreateEvent, etc. Frame-ready notification, stop/reconnect synchronization
MapView CreateMapView, etc. Shared memory and memory-mapped files
Ole_Alloc SysAllocString, etc. COM / OLE boundary
Wait WaitForXXX family Around synchronization wait failures
Registry Registry access Reading/writing settings and driver-adjacent configuration

In practice, rather than opening everything at once, the key is to start narrow, with the classes closest to the failure path you want to look at this time.

5.3. How to Apply It in Practice

As a command-line sketch, it looks like this, for example.

appverif /verify CameraHarness.exe
appverif /verify CameraHarness.exe /faults
appverif -enable lowres -for CameraHarness.exe -with heap_alloc=20000 virtual_alloc=20000 file=20000 event=20000
appverif -query lowres -for CameraHarness.exe

The approach goes like this.

  1. First run the normal path with Basics alone
  2. Then add Low Resource Simulation and run with fault injection
  3. If needed, assign probabilities only to the failures you want to see, such as file or event
  4. If you want to target a specific DLL, scope the injection to that DLL

The /faults shortcut is convenient, but on its own it is centered on OLE_ALLOC and HEAP_ALLOC. If you want to look at the failure paths of CreateFile or CreateEvent, it is more reliable to spell out -enable lowres -with file=... event=....

In equipment control apps, it is often easier to read results when you scope to the camera wrapper or the save-path DLL, rather than scattering faults across the whole app.

For example, you can build scenarios like these.

  • CreateEvent failure right after a reconnect starts
  • CreateFile failure at the start of saving
  • Temporary buffer allocation failure
  • SysAllocString failure during COM conversion
  • Verifying the failure paths of the wait APIs

These are practically never reached by routine normal-path testing alone. That is exactly why deliberately stepping on them is worth it.

6. How to Look at Handle Anomalies

6.1. The Handles Check

For everything handle-related, start with Handles. This makes the use of invalid handles easier to detect.

The accidents it typically catches are these.

  • Using a handle again after it was closed
  • Passing a corrupted handle value
  • Using a handle left uninitialized by a partial failure
  • A broken lifetime leading to access from another thread

Where long-run operation would only show “an odd error appears occasionally,” under the verifier it can stop right on the spot. This front-loading helps a great deal.

6.2. Viewing Open / Close Stacks with !htrace

What makes Handles so welcome is that it pairs well with handle tracing.

windbg -xd av -xd ch -xd sov CameraHarness.exe
!avrf
!htrace 0x00000ABC

What you want to see with !htrace is roughly this.

  • Where that handle was opened
  • Where it was closed
  • Whether it was referenced as an invalid handle
  • Whether opens are piling up more than expected

What makes handle leaks and handle misuse troublesome is that the API that finally fell over is not the true cause. With !htrace, you can trace that handle’s history quite concretely.

6.3. How to Combine It with Your Own Logs

That said, Application Verifier alone is not enough. In particular, doing the leak investigation of a long-running resident EXE with it alone is quite painful.

So in practice we combine the following.

  • Periodic Handle Count
  • sessionId
  • resourceId
  • phase
  • Lifecycle logs of create/open and close/dispose
  • Dumps and debugger output at verifier stops

With this, you can chase the problem like so, for example.

  1. The heartbeat shows the slope of Handle Count is suspicious
  2. The lifecycle logs narrow down the resource that has a Create but no Close
  3. A verifier run surfaces the invalid handle or misuse ahead of time
  4. !htrace shows the open / close stacks

This combination makes things dramatically easier to chase.

7. How to Build a Failure-Path Test Foundation

7.1. Move the Execution Unit into a Harness

Application Verifier cannot be enabled retroactively on an already-running process. You configure first, then launch.

Moreover, the settings persist until you explicitly remove them. So in practice, it is easier to handle if you target a test harness EXE rather than the production app itself.

For example, a configuration like this.

Scenario RunnerCameraHarness.exeCameraSdkWrapper.dllVendor SDKStructured LogDump / Debugger

With this, you get the advantages of:

  • Running one scenario per process
  • Leak deltas being easy to read
  • Easy toggling of the AppVerifier settings ON/OFF
  • Being able to test DLLs through the EXE side

The commands look like this.

appverif /verify CameraHarness.exe
appverif /n CameraHarness.exe

Enable before launch; disable explicitly. Running this with a harness as the premise also helps prevent configuration accidents.

7.2. Split the Test Menu

In a failure-path test foundation, it is better not to do everything in one run. Splitting into roughly these three tracks keeps things readable.

  1. Normal path + Basics
    • Inject no failures
    • Confirm that no verifier stops occur
  2. Fault injection track
    • Low Resource Simulation
    • Target failures at event / file / heap_alloc / virtual_alloc, etc.
  3. Heap deep-dive track
    • Heaps
    • full page heap
    • Reproduce locally under the debugger

Splitting these keeps “is it broken under normal usage” and “does it only break under low resources” from getting tangled.

The presence or absence of fault injection in particular changes the code paths taken considerably. So you should run both the no-fault run and the with-fault run.

7.3. What to Collect

At minimum, you want to capture these.

Category What you want
App logs cameraId, sessionId, phase, handleCount, error code
Process state Handle Count, Private Bytes, Thread Count
Debugger info !avrf, !htrace, and !heap -p -a as needed
Dumps At verifier stops, or on abnormal termination
AppVerifier logs Records of stops, exported to XML for aggregation if needed

If needed, the AppVerifier-side logs can also be exported to XML and aggregated. But the cause rarely closes from those alone, so the practical premise is reading them side by side with your own logs.

A large volume of logs is not, in itself, a virtue. What matters is that the causality can be connected later.

7.4. Acceptance Criteria

“It didn’t crash” is also too weak as an acceptance criterion. In this context, we needed at least the following.

  • No verifier stops in the normal path + Basics run
  • Even with fault injection, the expected failures remain in the logs
  • Half-initialized resources get cleaned up properly
  • After reconnect / retry, Handle Count returns near the baseline
  • When a verifier stop occurs, it can be traced via sessionId / phase / stack
  • No failure ends up as “no idea what happened”

What matters here is to evaluate not breaking and being traceable when broken as separate things.

7.5. Caveats

Application Verifier is quite convenient, but it is not magic.

  • Code paths not actually exercised are not verified
  • Full page heap is heavy
  • Stops can also occur inside third-party SDKs
  • The code paths taken differ considerably with and without fault injection
  • It is not a single tool for investigating pure managed heap leaks

So its position is this.

  • Long-run slopes: your own logs and counters
  • Native-boundary misuse: Application Verifier
  • Reconstructing causality on failure: structured logs + dumps + debugger

This division of labor is the most practical.

8. A Rough Decision Guide

  • Invalid handles or double closes are suspected
    • Handles + !htrace
  • Heap corruption / use-after-free is suspected
    • Heaps + full page heap + !heap -p -a
  • You want to trigger memory- or resource-exhaustion-like phenomena
    • Low Resource Simulation
  • Things break gradually under long-running operation
    • Start with your own Handle Count / Private Bytes / lifecycle logs
  • You want to test a DLL
    • Enable Application Verifier on the harness EXE that calls that DLL

Turning everything on from the start usually just produces a fog of logs. Applying the blade closest to the failure path you want to see is far clearer.

9. Summary

Application Verifier’s position is that of a runtime verifier for Windows’ native / Win32 boundary. Using Handles / Heaps / Locks / Memory / TLS / Low Resource Simulation and the rest, you can force rarely seen failure paths to be exercised ahead of time.

What paid off in this context was that handle anomalies became easy to trace with !htrace when they occurred, that memory- and resource-exhaustion-like phenomena could be triggered without wrecking the whole machine, and that we could confirm whether our own logs would genuinely be useful at that moment.

As for how to run it in practice: split the normal path + Basics run from the fault injection runs, prepare a harness EXE, and cycle scenarios through short-lived processes. On top of that, combine it with your own logs, dumps, and debugger information, while watching the slope of long-run leaks itself with your own counters — that is the division of labor.

Application Verifier is a tool for “going out to meet” rare anomalies, rather than “waiting around” for them to happen.

In equipment control apps, not breaking matters, but being able to explain what happened when things break matters just as much. In that sense, we think it is a thoroughly practical tool.

Part 1: When an Industrial Camera Control App Suddenly Crashes After One Month (Part 1) - Finding Handle Leaks and Designing Logs for Long-Running Operation

10. References

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

This case-study page shows a similar structure for diagnosis, prioritization, or redesign.

This article connects naturally to the following service pages.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog