A Decision Table for Whether to Exit or Continue After an Unexpected Exception
· Go Komura · Windows Development, Exception Handling, Design, C# / .NET, Reliability
Download the Excel checklist with Japanese and English sheets
When the topic of unexpected exceptions comes up, it is tempting to frame it as a binary choice: crash, or catch and keep going. In practice, though, that framing is a little crude.
What you really want to know is whether you can contain the range of what may have been corrupted.
- Can you fail just that one operation and stop there?
- Is it enough to reinitialize just that screen / connection / worker?
- Or is the integrity of the entire process now in question?
Looking at it in that order makes things much easier to sort out.
In this article, assuming C# / .NET Windows apps, resident apps, Windows services, and device-integration tools, we put together a decision table for the conditions under which it is acceptable to continue after an unexpected exception, and the conditions under which it is better to exit.
1. The Conclusion First
- Swallowing everything with
catch (Exception)and carrying on is dangerous in most cases. - Continuing is acceptable only when three things hold together: you can discard the failed unit, you can restore shared state, and you can account for external side effects.
- If the processing boundary is clear—one UI operation, one input record, one job—continuation is sometimes possible.
- Conversely, if shared mutable state, resident loops, the main thread, startup code, native boundaries, or signs of memory corruption are involved, lean toward exiting.
- Exceptions that call the health of the entire process into question—
StackOverflowException,AccessViolationException,OutOfMemoryException—are safer not to treat as something you can continue from. - WPF and Windows Forms do offer ways to catch unhandled exceptions and appear to keep running, but being able to continue and being safe to continue are different things.
- For long-running services and monitoring apps, crashing and being restarted is often safer—and easier to diagnose—than limping along half-broken.
In short, the axis of the decision is whether you can restore your invariants.
2. What “Unexpected Exception” Means in This Article
2.1 Separating Expected from Unexpected
First, a rare exception and an unexpected exception are not the same thing.
For example, these can be treated as expected even if they are infrequent:
- The user selected a file that does not exist
- The remote endpoint timed out temporarily
- One row of an imported CSV was malformed
- An
OperationCanceledExceptionwas thrown by a cancel operation - A business-rule violation should fail just that one operation
These are the kind of failures whose handling can be decided up front in the design.
By contrast, the unexpected exceptions this article mainly deals with look like this:
- An assumption in your own code broke and a
NullReferenceExceptionorInvalidOperationExceptionwas thrown - An exception flew out mid-update of shared state, and it is unclear how much was applied
- The parent loop of a monitoring loop or message-processing loop died
- Something went wrong at a COM / P/Invoke / vendor SDK boundary
- The process itself fails its health check, as with
AccessViolationExceptionorStackOverflowException
In other words, these are cases where “after this exception, you no longer know whether the app’s state can still be trusted.”
2.2 It Looks Like Two Choices, But There Are Really Three
The culprit that makes this discussion confusing is treating “continue” as a single option.
In practice, it usually breaks down into three levels.
| Choice | Meaning |
|---|---|
| Fail only that operation and continue | Keep the screen, but treat just this save or import as failed |
| Stop only the subsystem and continue | Reinitialize only the connection, screen, worker, or child process |
| Exit the process | The extent of state corruption cannot be determined, so assume a restart |
Saying “the app continues” covers two very different things: carrying on as if nothing happened, and continuing after isolating the broken part.
3. The Decision Table to Look at First
3.1 The Big Picture
Start with this table and the general direction is usually settled.
| Situation | First choice | Reason |
|---|---|---|
| Only one input, one screen operation, or one job failed, and its state can be discarded | Lean toward continuing | The failed unit can be contained |
| After the exception, the affected object or connection can be disposed and recreated | Lean toward subsystem reinitialization | The damaged area can be localized |
| Shared state was partially updated and it is unclear how much was applied | Lean toward exiting | Invariants may have been broken |
| External side effects—DB / files / device commands—are half-done and you cannot account for duplicates or missing writes | Lean toward exiting | Consistency with the outside world cannot be determined |
| The monitoring loop, reconnection loop, or parent message-processing loop died from an unexpected exception | Lean toward exiting | Silently losing part of the functionality tends to create a zombie process |
| Startup, configuration loading, DI composition, or initialization of a required dependency failed | Lean toward exiting as a startup failure | Starting half-initialized is more dangerous |
AccessViolationException, StackOverflowException, a severe OutOfMemoryException, or signs of corruption on the native side |
Lean toward immediate exit | The health of the entire process is in question |
| The dangerous work is isolated in a separate process and the parent process is untouched | Parent continues, restart the child | The fault domain is already isolated |
flowchart TD
A["Unexpected exception"] --> B{"Signs of memory corruption / stack exhaustion / fatal resource exhaustion?"}
B -- "Yes" --> Z["Exit / FailFast / restart"]
B -- "No" --> C{"Can the failed unit be discarded?"}
C -- "No" --> Y["Lean toward exiting"]
C -- "Yes" --> D{"Can shared state be rolled back / reinitialized?"}
D -- "No" --> X["Stop the subsystem or exit"]
D -- "Yes" --> E{"Can external side effects be accounted for?"}
E -- "No" --> X
E -- "Yes" --> W["Continue, failing only that operation"]
3.2 What to Check Before the Exception Type
It is better not to decide on the exception type alone. These are the things to check first.
| Aspect | What to confirm |
|---|---|
| Where it happened | A UI event, a single job, a parent loop, startup code, or a native boundary |
| How far it got | Whether in-memory state, the DB, files, or device state changed partway through |
| Possible blast radius | Just that object, the whole screen, or the whole process |
| Rollback possible? | Can it be disposed and recreated, or rolled back with a transaction |
| External side effects | Sent or not sent, whether double execution is safe, whether compensation is possible |
| Monitoring / restart | Whether there is automatic restart or a recovery path after exiting |
3.3 High-Risk Exceptions
You do not need to go through every exception type in detail, but some should never be viewed with continuation in mind.
| Exception / symptom | First choice | Why it matters |
|---|---|---|
StackOverflowException |
Lean toward immediate exit | The call stack has collapsed; normal recovery cannot be assumed |
AccessViolationException |
Lean toward immediate exit | Illegal access to protected memory; native boundaries or memory corruption are suspect |
OutOfMemoryException |
Lean toward exiting | Recovery code that itself needs further allocations tends to be unstable |
Unexpected NullReferenceException / InvalidOperationException |
Context-dependent, but lean toward exiting | Your own assumptions broke, and partial changes may remain |
| An unexpected exception that escaped a parent loop | Lean toward exiting | The core of the feature is dead while the process risks staying alive |
| Failures originating in COM / P/Invoke / vendor SDK callbacks | Immediate exit to strongly exit-leaning | Safety is hard to judge from the managed side alone |
4. Deciding by Where It Happened
4.1 UI Events
UI events such as a button click, screen navigation, search, or file selection have relatively large room for continuation. There are conditions, however.
Continuation is easier in cases like these:
- The failure happened before loading, and business state has not been touched yet
- Only transient state inside a dialog is broken, and closing it discards everything
- The ViewModel or connection can be recreated after the exception
- You can honestly tell the user “this operation failed”
Conversely, you should lean toward exiting once things look like this:
- Both the screen and the domain state were partially updated
- Shared state visible to other screens—static / singleton / caches—was touched
- After the exception, button enablement or selection state is left over and consistency is unclear
- An unexpected exception occurred on the UI thread, and it is unclear how far rendering or notifications progressed
4.2 Jobs / Requests Processed One at a Time
This is a boundary where continuation is easy.
- One message
- One file
- One HTTP request
- One import job
- One batch item
If units like these are well defined, you can fail just that one item and move on to the next.
There are prerequisites, though:
- The unit of failure is clear from the outside
- Partial changes are tidied up by transactions or compensation
- Running the same processing again does not corrupt the result
- Failures can be routed to a quarantine queue or error log
4.3 Resident Loops / Monitoring / Queue Processing
This is the worst place to continue carelessly.
For example:
- Reconnection loops
- Monitoring loops
- Queue-consumption loops
- Periodic polling
- Device status monitoring
- Background processing in a tray app
The scary failure mode here is that the parent loop dies from a single unexpected exception while the process alone survives.
Here it pays to split the policy:
- Catch expected exceptions at the boundary of each item’s processing
- If an unexpected exception escapes the parent loop, lean toward terminating the process
4.4 Startup
Treating a startup failure as “start up anyway and figure it out later” almost always ends in tears.
- Required configuration cannot be read
- Version migration failed
- A required folder or certificate is missing
- Initialization of a core service failed
- The dependency configuration is broken
In cases like these, exiting as a startup failure is the clearer choice.
4.5 Native Boundaries / COM / P/Invoke / unsafe
This area deserves its own category and a somewhat stricter eye.
- COM
- P/Invoke
- Code beyond C++/CLI
- Vendor SDKs
- Native-side code coming back through callbacks
- Anything involving
unsafe
Lean toward exiting especially when you see any of these:
AccessViolationException- Symptoms suggesting heap corruption or a double free
- Handle anomalies, signs of use-after-free
- Sudden death at a callback boundary
5. Conditions Under Which Continuing Is Acceptable
Summarized, the conditions under which continuation is acceptable look like this. The premise is that most of them hold at the same time.
| Condition | Meaning |
|---|---|
| The unit of failure is clear | You know what to discard: one operation, one screen, one job, one connection |
| State can be discarded | It can be disposed and recreated, or treated as never applied |
| Shared state is protected | The contamination does not spread to other features |
| External side effects can be accounted for | You know whether it was sent / not sent / safe to resend |
| You can be honest with the user | You can display “this operation failed” |
| It is observable | Logs, metrics, and dumps allow follow-up investigation |
6. Conditions Under Which Exiting Is Better
Conversely, if any of these apply, lean toward exiting.
- You do not know what was changed partway through
- Shared mutable state was touched and consistency cannot be determined
- Lifetime management of locks, queues, threads, or monitoring loops is broken
- Duplicated / missing / half-done external side effects cannot be accounted for
- Startup or initialization of core infrastructure failed
- Native boundaries or memory corruption are suspect
At this level, engineering for an easy recovery after crashing beats engineering a graceful continuation.
7. Recommendations by Typical Pattern
| Pattern | Recommendation | Reason |
|---|---|---|
| A nonexistent path was specified via the file-open button | Continue, failing only that operation | The state damage is local |
| Only one row of a CSV import was malformed | Continue with one row failed or one file failed | The unit of failure is easy to contain |
An unexpected NullReferenceException occurred midway through saving a screen |
Recreate the screen, leaning toward exit | It is unclear how much of the ViewModel / business state changed |
| One queue message violated a business rule | Continue, failing only that message | It can be routed to a quarantine queue |
| The parent queue-consumption loop died from an unexpected exception | Lean toward exiting the process | The lifetime of the entire worker is broken |
| Required configuration cannot be read at startup | Exit as a startup failure | A half-initialized start is more dangerous |
An AccessViolationException around a vendor SDK callback |
Lean toward immediate exit | The possibility of memory corruption cannot be ignored |
| Only a non-essential telemetry send failed | Disable just that feature and continue | The fault domain can be separated from the main functionality |
8. Common Anti-Patterns
8.1 catch (Exception) That Just Logs and Continues
This is quite dangerous. It hides the cause while keeping the broken state alive.
8.2 Trying to Recover in the Last-Chance Unhandled-Exception Handler
AppDomain.UnhandledException, Application.ThreadException, DispatcherUnhandledException, and the like are useful as the place to record things last, but they are not magic recovery points.
8.3 Casually Retrying When External Side Effects Are Involved
If you retry device commands, email sends, billing, file moves, or DB updates without re-execution safety, double-execution incidents become the new headline.
8.4 Keeping the UI Alive After the Monitoring Loop Died
An app that looks alive but is doing no work is a serious nuisance.
8.5 Saying “We Don’t Want It to Crash” Without Designing for Crashes
If you do not want it to crash, there are things to put in place first.
- Automatic restart
- Session restore
- Saving intermediate results
- Re-execution safety
- Fault-domain isolation
9. Points to Sort Out at Implementation Time
9.1 Push catch Sites to Boundaries
Rather than catching everything in deep layers, it is easier to keep things organized by catching at places where a unit of failure can be defined, such as:
- UI operation boundaries
- Per-request boundaries
- Per-job boundaries
- Per-connection boundaries
- The process boundary
9.2 Separate Expected from Unexpected Exceptions
- Expected: validation, not found, timeout, cancel, business-rule violations
- Unexpected: broken assumptions, escapes from parent loops, native-boundary failures, signs of memory corruption
9.3 Keep Shared State Small
The larger your shared mutable state, the harder the continuation decision becomes. Conversely, the more you can confine state inside one screen, one session, one worker, the easier it is to confine failures as well.
9.4 Move Dangerous Work to a Separate Process
For anything where you do not want a crash to spread—COM / ActiveX / vendor SDKs / unsafe code / heavy image processing / external device control—putting it in a separate process pays off considerably.
9.5 Unhandled-Exception Handlers Are for “Recording,” Not “Recovery”
- Exception details
- The operation context
- The last important log entries
- Configuration / version / connection targets
- A path to collecting dumps
Getting these in place and prioritizing a setup where you can investigate after the crash leads to better stability in the end.
9.6 Do Not Over-Trust the WPF / WinForms Unhandled-Exception Events
In WPF, setting Handled = true in DispatcherUnhandledException does let you keep running after an unhandled exception.
In Windows Forms, on the main UI thread, Application.ThreadException and the SetUnhandledExceptionMode setting let you choose how the app stops.
But whether you can keep running and whether the conditions for recovery are met are separate questions.
10. Summary
When an unexpected exception occurs, the question to ask is not “can this exception be caught” but whether the app’s state can still be trusted afterward.
As a decision sequence, this is usually enough:
- Can the failed unit be discarded?
- Can shared state be restored or recreated?
- Can external side effects be accounted for?
- Can the health of memory / threads / native boundaries be trusted?
If you are confident in all four, you can continue. If you are not, lean toward exiting.
Especially for long-running apps, monitoring apps, services, and device integration, there are plenty of situations where staying alive broken is more dangerous than crashing honestly.
Exception handling is not the art of never crashing. It is designing so that failures stay small, the app stops honestly when broken, and recovery is easy.
11. References
- .NET: Best practices for exceptions
- .NET: System.Exception
- .NET: StackOverflowException
- .NET: System.AccessViolationException
- .NET: Environment.FailFast
- .NET: AppDomain.UnhandledException
- WPF: Application.DispatcherUnhandledException
- Windows Forms: Application.SetUnhandledExceptionMode
- .NET: Exceptions in Managed Threads
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Where Should catch and Logging Go in Exception Handling?
To avoid broad catches in deep helpers, duplicate logs at every layer, and result-mapping that hides root causes, we organize the respons...
A Minimum Security Checklist for Windows App Development
A checklist-style guide to the security basics for WPF / WinForms / WinUI / C++ / C# business apps: privileges, signing, updates, secrets...
Minimum Requirements for a Custom Logger, with an Integration Test Checklist
To make a custom app's diagnostic logs trustworthy, we lay out UTF-8 JSON Lines, the required fields, flush, rotation, and failure behavi...
Where to Draw the Line Between Unit Tests and Integration Tests
We organize the boundary between unit tests and integration tests along the axes of pure logic, formats, wiring, environment differences,...
Designing Windows Apps to Leave Logs and Dumps When They Crash
How to combine regular logging, a final crash marker, WER LocalDumps, and a watchdog process so that even when a Windows app dies from an...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Technical Consulting & Design Review
This topic covers exception-handling policy, fault boundaries, restart strategy, and criteria for deciding whether to continue, so it pairs well with technical consulting and design reviews.
Bug Investigation & Root Cause Analysis
Working out whether to continue or exit after an unexpected exception—including state corruption and external side effects—maps naturally onto bug investigation and root-cause analysis.
Author Profile
Profile page for the article author.
Go Komura
Representative of KomuraSoft LLC
Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.
Public links