Designing Windows Apps to Leave Logs and Dumps When They Crash

· · Windows Development, Exception Handling, Logging, WER, Crash Dump, Bug Investigation

The most painful thing in Windows app bug investigation is the state where you know it crashed, but nothing was left behind that says why.

The problem becomes especially heavy in projects like these:

  • It only crashes in the customer’s environment
  • It only crashes after long-duration operation
  • WPF / WinForms / Windows services / resident apps with low reproducibility
  • COM, P/Invoke, native DLLs, or vendor SDKs are involved
  • You captured “only the exception message,” with no context of what came right before

To be honest up front, though: the crashing process alone cannot “guarantee” that a log gets written. Once you include stack corruption, memory corruption, fast fail, forced termination, and power loss, the in-process final log is fundamentally best effort.

What you should aim for in practice is a configuration that does not pin its hopes solely on the inside of the crashing process. That is, you think in three layers:

  1. The regular, chronological log
  2. The final crash marker at the moment of the crash
  3. Crash evidence left by the OS or a separate process

In this article, with Windows desktop apps, resident apps, Windows services, and device-integration tools in mind, we organize the best practices for not losing investigability even when the app dies from an exception caused by a programming error.

1. The Conclusions First

The conclusions, listed up front.

  • The most important thing is to not bet “the last log line” on a single in-process handler.
  • The safest setup in practice is the combination regular log + final crash marker + WER LocalDumps.
  • For long-duration operation, device integration, plugins, or mixed native SDKs, adding a monitoring process (watchdog / launcher / service) makes things considerably stronger.
  • The iron rule in a crash handler is to do nothing heavy. Compression, HTTP transmission, DI resolution, UI dialogs, and complex JSON generation are out.
  • At crash time, write only a short record locally; compression, upload, and notification are deferred to the next startup or a separate process.
  • Using WinForms’s ThreadException or WPF’s DispatcherUnhandledException to keep the app superficially alive is dangerous when the cause is a programming error.
  • Whether .NET or native, for exceptions that suggest corrupted state, the safer baseline is “record and exit” rather than “recover.”
  • If you collect dumps, you must archive the PDBs and the shipped binaries at the same time, or you will not be able to read them later.

In short, the best practice is: “Don’t try to do everything at the moment of the crash. Divide the roles among before the crash, at the crash, and after the crash.”

2. Why In-Process Alone Cannot Be Made “Reliable”

Leave this vague and the design wobbles.

2.1 The Crashing Thread’s Own Context May Be Broken

Unhandled-exception hooks and top-level exception filters can run in the context of the broken thread. At that point, it is entirely normal that:

  • The stack is already unsafe
  • Heap corruption makes further allocation unsafe
  • Waiting deadlocks because of locks held when the exception was thrown
  • Objects the logger itself depends on are already broken

So it is safer to view the final handler not as “a place where anything is possible” but as “a place where very little is possible.”

2.2 Fast Fail and Corrupted-State Exceptions Assume “Minimal In-Process Activity”

Under memory corruption or fatal conditions, do not count on normal exception handling. In particular, the native __fastfail family and anomalies suggesting corrupted state are designed to “terminate immediately with as little overhead as possible.”

In other words, the natural mindset is: a final in-process log is a bonus if it gets written; the primary evidence lives on the OS / separate-process side.

2.3 .NET’s Unhandled-Exception Event Is Not a Place for “Heavy Recovery” Either

.NET’s AppDomain.UnhandledException is useful, but what you may safely do there should be limited to a short record.

  • It can be affected by locks held when the exception was thrown
  • It cannot safely capture absolutely everything, corrupted-state exceptions included
  • Forcing a continuation policy here makes it easy to keep a half-broken process alive

It is realistic to treat the unhandled-exception event as “the final notification,” not as “a safe recovery point.”

The cleanest way to organize this is to separate what happens at crash time from what happens after restart.

Phase Goal Where it runs What it does
Normal operation Preserve the timeline Inside the app Structured logging, heartbeat, boundary events
At crash time Drop minimal evidence Inside the app + OS Final crash marker, WER dump
Just after exit Detect unexpected exit Separate process Record exit code, decide on restart, notify
After next startup Heavy post-processing A fresh, healthy process Compression, upload, user notification, old-log cleanup

With this split, the design becomes considerably more stable.

3.1 Minimal Configuration

For smaller business tools or internal WPF / WinForms apps, this much is often enough to start.

  • Regular log: a local append-only file
  • Final crash marker: a dedicated short file
  • Dump: WER LocalDumps
  • At next startup: show “The application terminated abnormally last time. Diagnostic information is available.”

3.2 Stronger Configuration

It is worth going one level stronger under requirements like these.

  • 24/7 operation
  • Device control, monitoring, resident operation
  • Lots of COM / P/Invoke / native SDKs
  • Child processes, plugins, or script execution
  • “Stuck and staying stuck” is unacceptable in the customer environment

In that case, splitting into:

  • Worker process: the main workload
  • Launcher / watchdog / service: startup supervision, exit recording, restart
  • WER LocalDumps: on the worker side
  • Next startup or the watchdog: diagnostic-information collection

makes things very practice-ready.

4. Best Practices for the Regular Log

If you try to fight with only the last line at crash time, you usually lose. What really pays off is the regular log up to the moment before.

4.1 Logs Are “Information That Correlates Later,” Not “Prose for Humans”

The minimum items you want in the regular log:

  • UTC timestamp
  • Elapsed time since process start
  • PID / TID
  • App name, version, build number, commit identifier
  • Session ID
  • Operation ID / job ID / correlation ID
  • Module name / screen name / worker name
  • The most recent external effects
    • File writes
    • DB updates
    • Device commands sent
    • Communication requests
  • Exception type, HRESULT / Win32 error / exception code
  • A summary of the key input parameters
  • Target IDs, to the extent they contain no secrets

Our recommendation is one event per line, in JSON Lines or key=value format.

Rather than leaving long prose for humans, what matters more is “being able to cross-reference three files later.”

4.2 Write Critical Events Synchronously

Making every regular log write synchronous gets heavy. But entrusting everything to an asynchronous buffer means it all evaporates at the moment of the crash.

So in practice it is realistic to vary the handling by level.

  • Fine-grained Information events: buffering is fine
  • Warning and above: flush early
  • Important boundary events: write synchronously
    • ProcessStart
    • ConfigLoaded
    • WorkerStarted
    • ExternalCommandSent
    • TransactionCommitted
    • RecoveryStarted
    • FatalPathEntered

The point is: at least the business-level boundaries must properly hit the ground.

4.3 Separate “the Regular Log Being Written Now” from “the Final Crash Marker”

This matters a great deal.

If you try to put everything into a single rolling log:

  • It was mid-rotation
  • It was still sitting in the async queue
  • The logger itself died right after the exception
  • The log line was cut off mid-write

all of these happen.

So we recommend splitting into at least two files.

  • app-<session>.jsonl The regular chronological log
  • fatal-last.log or fatal-<session>.log Dedicated to the final crash marker

Just having “where the last line goes” be unambiguous helps enormously in the field.

4.4 Log Destination: Fixed Local Path, Never a Network Target

Relying on UNC paths, NAS, HTTP, or cloud APIs at crash time is dangerous, because of:

  • Momentary network outages
  • DNS delays
  • Expired credentials
  • Blocking on the UI thread
  • Insufficient service-account permissions

At crash time, drop to a fixed local path first. Sending happens after the next startup or from a separate process.

4.5 Put the Session in the File Name

A date alone is not enough, because the app restarts multiple times on the same day.

For example:

Logs\
  MyApp_20260318_101530_pid1234_session-4f1c.jsonl
  MyApp_fatal_20260318_101533_pid1234_session-4f1c.log
  MyApp_watchdog_20260318.jsonl

Just making “which launch instance is this about” unambiguous changes the speed of analysis considerably.

5. Best Practices for the Final Crash Marker

This is not the place to build a full-featured logger. It is the place to write once, briefly, leaning toward reliability.

5.1 The Goal Is “Pinning the Entry Point,” Not “Cause Details”

The information in the final crash marker is stronger when narrowed down.

  • UTC of occurrence
  • PID / TID
  • Session ID
  • Version / build number
  • Which hook it came from
    • AppDomain.UnhandledException
    • Application.ThreadException
    • DispatcherUnhandledException
    • SetUnhandledExceptionFilter
    • _set_invalid_parameter_handler
    • set_terminate
  • Exception type or exception code
  • A short message, if possible
  • The most recent operation ID
  • The regular log’s file name
  • The expected dump folder

That is enough.

5.2 What Not to Do in a Crash Handler

Every one of these is a land mine with high probability.

  • Resolving the logger from a DI container
  • Using async / await
  • Spawning Tasks
  • Waiting on locks
  • Assembling complex JSON
  • Touching COM objects
  • Showing UI dialogs
  • Compressing
  • HTTP / SMTP / Slack / Teams transmission
  • Analyzing and summarizing the dump
  • Swallowing the exception and continuing

A crash handler is not a continuation of the normal processing flow. Lean toward “do the minimal local write, then end.”

5.3 What to Do in a Crash Handler

Conversely, what to do is quite simple.

  1. Prevent reentry
  2. Write one line
  3. Flush
  4. Exit

In that order.

Ideally, use:

  • A dedicated folder created in advance
  • A path whose existence was verified in advance
  • A destination whose ACLs were verified in advance

Over-flushing the regular log is heavy, but the fatal marker is extremely low-volume, so flushing hard is fine here alone. In .NET, FileStream.Flush(true); in native code, FlushFileBuffers — design becomes easier when this one line is treated as “this hits the ground right now.”

5.4 Do Not Try to Keep the App Alive

For unexpected exceptions originating from programming errors, it is safer to consider the final handler a recording device, not a recovery device.

The cases where “do not continue” should be the baseline:

  • Even a NullReferenceException or InvalidOperationException that occurred mid-update of shared state
  • Unexpected exceptions on the UI thread
  • Unexpected exceptions that leaked out of monitoring or parent loops
  • AccessViolationException
  • StackOverflowException
  • Anomalies at the native boundary
  • CRT invalid parameter / purecall / terminate

The desire to “not let it crash” is understandable, but surviving half-broken is usually worse for both diagnosis and operations.

When terminating, consider immediate-termination APIsEnvironment.FailFast in .NET, RaiseFailFastException or __fastfail in native code — and design without counting on finally blocks or normal cleanup.

6. Framework-Specific Notes

6.1 .NET in General: AppDomain.CurrentDomain.UnhandledException

This is useful as the final notification. But avoid heavy recovery work here.

The basic usage is simple.

  • Write the final crash marker
  • If needed, leave a minimal message in the Windows Event Log
  • Do not continue
  • Do not wait or retry here

UnhandledException is convenient, but it is safer not to assume the app can be returned to a healthy state from here.

6.2 WinForms: Application.ThreadException

The tricky part here is that it catches unhandled UI-thread exceptions and lets the app continue, at least superficially.

Using it to turn expected business-input errors into dialogs is one thing, but it is not suited to continuing after unexpected exceptions caused by programming errors.

If root-cause investigation is your priority:

  • Do only minimal recording in ThreadException
  • Or lean toward UnhandledExceptionMode.ThrowException
  • Then terminate the process, leaving the dump and the logs

That is safer.

6.3 WPF: Application.DispatcherUnhandledException

WPF is similar.

  • It primarily targets exceptions on the UI thread
  • Setting Handled = true allows superficial continuation
  • But do that against a programming error and the screen state and internal state drift apart easily

So in WPF too, it is safer to use it as an entry point for recording, not as a life-support device for continuation.

6.4 Do Not Make TaskScheduler.UnobservedTaskException a Primary Path

This is not “the last bastion just before the crash.”

It can help detect dropped Task exceptions, but as a reliable recording path at crash time, it is weak.

So use it for:

  • Catching unobserved exceptions early
  • Flushing out Task design omissions during development

but do not cast it as the lead role of your final crash handler.

6.5 Native Win32 / C++: Do Not Over-Trust SetUnhandledExceptionFilter

On the native side, it is tempting to count on SetUnhandledExceptionFilter.

However, it runs in the context of the faulting thread, so it is affected by:

  • An invalid stack
  • Deep recursion
  • An already-corrupted heap
  • Locks held at exception time

Therefore, SetUnhandledExceptionFilter is best regarded as a best-effort entry point for receiving the final notification.

6.6 In Native C++, Also Catch the CRT Termination Paths

In native C++, watching only unhandled SEH leaves gaps.

Specifically, keep an eye on:

  • _set_invalid_parameter_handler
  • _set_purecall_handler
  • set_terminate

This family exists to catch “termination paths” originating in the C runtime and the C++ runtime.

In practice, the sound approach is:

  • Write the final crash marker in these handlers too
  • But do no heavy recovery work
  • Terminate reliably
  • Leave the primary evidence to WER / the dump

7. Build on WER LocalDumps

This part is quite powerful in practice.

7.1 The First Recommendation Is WER LocalDumps

In the sense of “leaving minimal evidence, leaning reliable, after the crash,” WER LocalDumps is the easiest thing to start with.

The reasons are simple.

  • The OS side can persist the dump
  • Easy to roll out with no extra tools
  • Configurable per application
  • It moves the primary crash evidence outside the in-process world

What logs alone cannot tell you:

  • Which thread crashed
  • On which stack it crashed
  • Which module boundary it was
  • Whether managed / native / COM / SDK is the suspect

Being able to see this afterward is its strength.

7.2 Typical Configuration

For example, to keep dumps for MyApp.exe in C:\CrashDumps\MyApp:

reg add "HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\MyApp.exe" /f
reg add "HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\MyApp.exe" /v DumpFolder /t REG_EXPAND_SZ /d "C:\CrashDumps\MyApp" /f
reg add "HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\MyApp.exe" /v DumpCount /t REG_DWORD /d 10 /f
reg add "HKLM\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\MyApp.exe" /v DumpType /t REG_DWORD /d 2 /f

Rough-and-ready settings like this are fine at first.

Value Initial recommendation
DumpFolder A dedicated folder
DumpCount 5-10
DumpType 2 on development machines; 1 or 2 in the field, depending on disk space and confidentiality requirements

7.3 Always Verify the ACL of the Dump Destination

Same as with logs: configuring a folder you cannot write to is pointless.

Especially with:

  • Windows services
  • Privilege-separated child processes
  • Restricted accounts on field machines
  • UAC involvement

destination ACLs are the main cause of empty-handed dumps.

For the destination, verify all the way through:

  • Pre-creation
  • A write test
  • Retention limits
  • Whether operations staff can actually go look there

7.4 When You Want to Attach the Current Log to a WER Report

If you use WER reporting to Microsoft or your own WER pipeline, there is also the method of calling WerRegisterFile to register the current log file for inclusion in the error report.

However, treat this as an additional channel, not a replacement for local storage. What you truly want at crash time is, first, a reliable-leaning record on the local machine.

The practical order is:

  1. Local regular log
  2. Local fatal marker
  3. Local dump
  4. If needed, also register related files in the WER submission path

7.5 Keep Version Control Artifacts, Not Just Dumps

Even with a dump in hand, if later:

  • The EXE / DLLs from that build are gone
  • The PDBs are gone
  • Nobody knows which commit the build came from

then you are in a weak position.

At minimum, keep:

  • The shipped binaries
  • The corresponding PDBs
  • The version
  • The build timestamp
  • The commit identifier
  • The installer version

Dump collection and PDB archiving are a set.

8. How to Think About MiniDumpWriteDump and Custom Crash Reporters

There are situations where a custom implementation is needed.

  • You want a “Save diagnostic information” button in the UI
  • You want to bundle logs and configuration files too
  • You want to handle a group of child processes together
  • You want custom masking before automatic upload

The most important thing here, though, is to not pile even the dump-taking work onto the crashing side.

8.1 A Separate Process Beats Self-Dump

MiniDumpWriteDump is powerful, but calling it from a separate process is safer than calling it from inside the crashed process itself.

A typical configuration looks like this.

  • The worker detects the anomaly
  • If possible, it notifies a helper via an event or named pipe
  • The helper takes a dump of the worker
  • The helper bundles the tail of the logs and the configuration files
  • The helper places everything in an upload queue after exit

This way, even if the worker is broken, the helper side is still healthy.

8.2 If It Must Be In-Process, Push It onto a Dedicated Thread

Even when a separate process is not an option, keeping a dedicated thread reserved for dumping is at least better.

Still, the essence remains best effort. “We added a custom dump implementation, so we’re 100% safe” does not follow.

8.3 Defer Heavy Work to the Next Startup

Things people tend to want in a custom reporter:

  • Zip compression
  • Matching against symbol information
  • Server upload
  • Screen capture
  • Fetching extra information from the DB

All of these go after restart or to the helper side, not crash time.

9. What Changes When You Add a Monitoring Process

For long-duration operation, a monitoring process pays off heavily.

9.1 What the Monitoring Process Records

The watchdog / launcher / parent service can record:

  • Child process start time
  • Launch arguments
  • PID
  • Version of the monitored target
  • Time of the last heartbeat received
  • Exit time
  • Exit code
  • Restart count
  • Whether a dump exists
  • Whether a restart happened

With just this, you can see fairly clearly:

  • Whether it truly crashed
  • Whether it was an OS shutdown
  • Whether the user closed it
  • Whether it hung and got killed
  • How many times it looped through restarts

9.2 Cases Where It Especially Fits

Separation is worth actively considering in cases like:

  • A worker carrying a vendor SDK
  • Image processing / video processing / device I/O
  • Monitoring or polling parent loops
  • Script or plugin execution
  • Hosting legacy COM / ActiveX assets
  • 64-bit / 32-bit bridging and interop

Confining dangerous work to a single worker makes both log design and recovery design easier.

10. Common Anti-Patterns

10.1 catch (Exception) That Logs and Continues

The most common, and the most dangerous.

  • Partial changes remain
  • Shared state gets corrupted
  • Follow-on failures multiply
  • The true point of origin gets blurred

You gain one log line, and in exchange, the incident drags on — often.

10.2 Trusting Only the Async Logger’s Queue

Asynchronous logging itself is not bad. The problem is pushing onto the same queue even on the fatal path and calling it done.

If the worker stops at the moment of the crash, the whole queue goes with it.

It is safer to keep an escape hatch where the fatal path alone writes directly.

10.3 Sending HTTP from the Crash Handler

Tempting to implement, and quite dangerous.

  • DNS
  • TLS
  • Proxy
  • Authentication
  • Timeouts
  • Retry waits

All of it rides on the crashed context.

Send after restart.

10.4 Dumps Exist, but Do Not Connect to the Regular Log

This is common.

  • No session in the dump file name
  • No PID / session on the log side
  • No PID on the watchdog side
  • Build numbers do not match

The result: the three pieces of evidence look like three unrelated stories.

10.5 Keeping WinForms / WPF Alive via the Unhandled-Exception Events

Superficially the app “stops crashing,” so at first everyone is pleased. But in reality, it tends to create a zombie state where:

  • Only the screen survives
  • The worker is dead
  • Only the button stays enabled
  • Nobody knows whether the save succeeded

10.6 Not Watching the Native Termination Paths

If SetUnhandledExceptionFilter alone makes you feel safe, you will miss:

  • invalid parameter
  • purecall
  • terminate
  • fast fail

In native C++, it is better to stay conscious of the CRT / C++ runtime termination paths, not just SEH.

11. The Minimum Adoption Checklist

If you satisfy the following, you are in quite practical shape.

  • The regular log records one event per line
  • Every log line has UTC, PID, TID, version, and session
  • ProcessStart and ProcessExit are recorded
  • Important boundary events are flushed synchronously
  • A dedicated final-crash-marker file exists
  • The fatal path does not go through the async logger
  • WER LocalDumps is configured per application
  • The dump destination’s ACLs have been verified
  • PDBs and shipped binaries are archived
  • The next startup can detect the previous abnormal exit
  • Compression / upload / notification happen after restart or in a separate process
  • In native C++, invalid parameter / purecall / terminate are accounted for
  • You crashed it deliberately on a test machine and confirmed the evidence is really left behind

The last line is especially important. Designing it is not enough — you must run the “does it actually capture everything” test.

12. How Far to Test

The items worth verifying, in a table.

Test What to confirm
Managed unhandled exception Do the regular log, fatal marker, and dump all show up?
UI thread exception Does the WinForms / WPF event path behave as designed?
Worker thread exception Does it reach AppDomain.UnhandledException? Can the watchdog detect it?
Native exception Is the WER dump really captured?
invalid parameter / terminate Is minimal recording left even on CRT / C++ runtime paths?
Forced kill Even if in-process can do nothing, does the watchdog record the unexpected exit?
Restart Do notification, collection, and upload work after the next startup?

What matters is not “a log should appear if an exception flies,” but confirming “under this condition, this file is left behind.”

13. Summary

If you want a Windows app to leave the information needed for investigation even when it dies from an exception caused by a programming error, the axes of thinking are quite simple.

  • Do not pin your hopes on the crashing process alone
  • Split into the regular log, the final crash marker, and OS / separate-process evidence
  • At crash time, write only a short record locally
  • Defer heavy work to after restart or to a separate process
  • Build on WER LocalDumps
  • Default to record-and-exit rather than continuation

In the end, “building a configuration that stays traceable even without the last line” beats “trying hard at the last line.”

Still, you do want that last line — so write the final crash marker briefly, to a separate file. And entrust the true primary evidence to the WER dump and the regular log up to the moment before. That is a very stable way of working in real-world Windows application practice.

References

  • Microsoft Learn: Collecting User-Mode Dumps https://learn.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps
  • Microsoft Learn: Using WER https://learn.microsoft.com/en-us/windows/win32/wer/using-wer
  • Microsoft Learn: MiniDumpWriteDump function https://learn.microsoft.com/en-us/windows/win32/api/minidumpapiset/nf-minidumpapiset-minidumpwritedump
  • Microsoft Learn: SetUnhandledExceptionFilter function https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-setunhandledexceptionfilter
  • Microsoft Learn: System.AppDomain.UnhandledException event https://learn.microsoft.com/en-us/dotnet/fundamentals/runtime-libraries/system-appdomain-unhandledexception
  • Microsoft Learn: Application.ThreadException Event https://learn.microsoft.com/en-us/dotnet/api/system.windows.forms.application.threadexception
  • Microsoft Learn: Application.DispatcherUnhandledException Event https://learn.microsoft.com/en-us/dotnet/api/system.windows.application.dispatcherunhandledexception
  • Microsoft Learn: TaskScheduler.UnobservedTaskException Event https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler.unobservedtaskexception
  • Microsoft Learn: Environment.FailFast https://learn.microsoft.com/en-us/dotnet/api/system.environment.failfast
  • Microsoft Learn: Registering for Application Recovery https://learn.microsoft.com/en-us/windows/win32/recovery/registering-for-application-recovery
  • Microsoft Learn: RegisterApplicationRecoveryCallback https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-registerapplicationrecoverycallback
  • Microsoft Learn: WerRegisterFile https://learn.microsoft.com/en-us/windows/win32/api/werapi/nf-werapi-werregisterfile
  • Microsoft Learn: _set_invalid_parameter_handler https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/set-invalid-parameter-handler-set-thread-local-invalid-parameter-handler
  • Microsoft Learn: _set_purecall_handler https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/get-purecall-handler-set-purecall-handler
  • Microsoft Learn: set_terminate https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/set-terminate-crt
  • Microsoft Learn: __fastfail https://learn.microsoft.com/en-us/cpp/intrinsics/fastfail

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

This article connects naturally to the following service pages.

Windows App Development

How to design regular logging, WER, and watchdogs for WPF, WinForms, resident apps, and Windows services is directly connected to Windows application development itself.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog