A Decision Table for Whether to Exit or Continue After an Unexpected Exception

· · Windows Development, Exception Handling, Design, C# / .NET, Reliability

Download the Excel checklist with Japanese and English sheets

When the topic of unexpected exceptions comes up, it is tempting to frame it as a binary choice: crash, or catch and keep going. In practice, though, that framing is a little crude.

What you really want to know is whether you can contain the range of what may have been corrupted.

  • Can you fail just that one operation and stop there?
  • Is it enough to reinitialize just that screen / connection / worker?
  • Or is the integrity of the entire process now in question?

Looking at it in that order makes things much easier to sort out.

In this article, assuming C# / .NET Windows apps, resident apps, Windows services, and device-integration tools, we put together a decision table for the conditions under which it is acceptable to continue after an unexpected exception, and the conditions under which it is better to exit.

1. The Conclusion First

  • Swallowing everything with catch (Exception) and carrying on is dangerous in most cases.
  • Continuing is acceptable only when three things hold together: you can discard the failed unit, you can restore shared state, and you can account for external side effects.
  • If the processing boundary is clear—one UI operation, one input record, one job—continuation is sometimes possible.
  • Conversely, if shared mutable state, resident loops, the main thread, startup code, native boundaries, or signs of memory corruption are involved, lean toward exiting.
  • Exceptions that call the health of the entire process into question—StackOverflowException, AccessViolationException, OutOfMemoryException—are safer not to treat as something you can continue from.
  • WPF and Windows Forms do offer ways to catch unhandled exceptions and appear to keep running, but being able to continue and being safe to continue are different things.
  • For long-running services and monitoring apps, crashing and being restarted is often safer—and easier to diagnose—than limping along half-broken.

In short, the axis of the decision is whether you can restore your invariants.

2. What “Unexpected Exception” Means in This Article

2.1 Separating Expected from Unexpected

First, a rare exception and an unexpected exception are not the same thing.

For example, these can be treated as expected even if they are infrequent:

  • The user selected a file that does not exist
  • The remote endpoint timed out temporarily
  • One row of an imported CSV was malformed
  • An OperationCanceledException was thrown by a cancel operation
  • A business-rule violation should fail just that one operation

These are the kind of failures whose handling can be decided up front in the design.

By contrast, the unexpected exceptions this article mainly deals with look like this:

  • An assumption in your own code broke and a NullReferenceException or InvalidOperationException was thrown
  • An exception flew out mid-update of shared state, and it is unclear how much was applied
  • The parent loop of a monitoring loop or message-processing loop died
  • Something went wrong at a COM / P/Invoke / vendor SDK boundary
  • The process itself fails its health check, as with AccessViolationException or StackOverflowException

In other words, these are cases where “after this exception, you no longer know whether the app’s state can still be trusted.”

2.2 It Looks Like Two Choices, But There Are Really Three

The culprit that makes this discussion confusing is treating “continue” as a single option.

In practice, it usually breaks down into three levels.

Choice Meaning
Fail only that operation and continue Keep the screen, but treat just this save or import as failed
Stop only the subsystem and continue Reinitialize only the connection, screen, worker, or child process
Exit the process The extent of state corruption cannot be determined, so assume a restart

Saying “the app continues” covers two very different things: carrying on as if nothing happened, and continuing after isolating the broken part.

3. The Decision Table to Look at First

3.1 The Big Picture

Start with this table and the general direction is usually settled.

Situation First choice Reason
Only one input, one screen operation, or one job failed, and its state can be discarded Lean toward continuing The failed unit can be contained
After the exception, the affected object or connection can be disposed and recreated Lean toward subsystem reinitialization The damaged area can be localized
Shared state was partially updated and it is unclear how much was applied Lean toward exiting Invariants may have been broken
External side effects—DB / files / device commands—are half-done and you cannot account for duplicates or missing writes Lean toward exiting Consistency with the outside world cannot be determined
The monitoring loop, reconnection loop, or parent message-processing loop died from an unexpected exception Lean toward exiting Silently losing part of the functionality tends to create a zombie process
Startup, configuration loading, DI composition, or initialization of a required dependency failed Lean toward exiting as a startup failure Starting half-initialized is more dangerous
AccessViolationException, StackOverflowException, a severe OutOfMemoryException, or signs of corruption on the native side Lean toward immediate exit The health of the entire process is in question
The dangerous work is isolated in a separate process and the parent process is untouched Parent continues, restart the child The fault domain is already isolated
YesNoNoYesNoYesNoYesUnexpected exceptionSigns of memory corruption / stack exhaustion / fatal resource exhaustion?Exit / FailFast / restartCan the failed unit be discarded?Lean toward exitingCan shared state be rolled back / reinitialized?Stop the subsystem or exitCan external side effects be accounted for?Continue, failing only that operation

3.2 What to Check Before the Exception Type

It is better not to decide on the exception type alone. These are the things to check first.

Aspect What to confirm
Where it happened A UI event, a single job, a parent loop, startup code, or a native boundary
How far it got Whether in-memory state, the DB, files, or device state changed partway through
Possible blast radius Just that object, the whole screen, or the whole process
Rollback possible? Can it be disposed and recreated, or rolled back with a transaction
External side effects Sent or not sent, whether double execution is safe, whether compensation is possible
Monitoring / restart Whether there is automatic restart or a recovery path after exiting

3.3 High-Risk Exceptions

You do not need to go through every exception type in detail, but some should never be viewed with continuation in mind.

Exception / symptom First choice Why it matters
StackOverflowException Lean toward immediate exit The call stack has collapsed; normal recovery cannot be assumed
AccessViolationException Lean toward immediate exit Illegal access to protected memory; native boundaries or memory corruption are suspect
OutOfMemoryException Lean toward exiting Recovery code that itself needs further allocations tends to be unstable
Unexpected NullReferenceException / InvalidOperationException Context-dependent, but lean toward exiting Your own assumptions broke, and partial changes may remain
An unexpected exception that escaped a parent loop Lean toward exiting The core of the feature is dead while the process risks staying alive
Failures originating in COM / P/Invoke / vendor SDK callbacks Immediate exit to strongly exit-leaning Safety is hard to judge from the managed side alone

4. Deciding by Where It Happened

4.1 UI Events

UI events such as a button click, screen navigation, search, or file selection have relatively large room for continuation. There are conditions, however.

Continuation is easier in cases like these:

  • The failure happened before loading, and business state has not been touched yet
  • Only transient state inside a dialog is broken, and closing it discards everything
  • The ViewModel or connection can be recreated after the exception
  • You can honestly tell the user “this operation failed”

Conversely, you should lean toward exiting once things look like this:

  • Both the screen and the domain state were partially updated
  • Shared state visible to other screens—static / singleton / caches—was touched
  • After the exception, button enablement or selection state is left over and consistency is unclear
  • An unexpected exception occurred on the UI thread, and it is unclear how far rendering or notifications progressed

4.2 Jobs / Requests Processed One at a Time

This is a boundary where continuation is easy.

  • One message
  • One file
  • One HTTP request
  • One import job
  • One batch item

If units like these are well defined, you can fail just that one item and move on to the next.

There are prerequisites, though:

  • The unit of failure is clear from the outside
  • Partial changes are tidied up by transactions or compensation
  • Running the same processing again does not corrupt the result
  • Failures can be routed to a quarantine queue or error log

4.3 Resident Loops / Monitoring / Queue Processing

This is the worst place to continue carelessly.

For example:

  • Reconnection loops
  • Monitoring loops
  • Queue-consumption loops
  • Periodic polling
  • Device status monitoring
  • Background processing in a tray app

The scary failure mode here is that the parent loop dies from a single unexpected exception while the process alone survives.

Here it pays to split the policy:

  • Catch expected exceptions at the boundary of each item’s processing
  • If an unexpected exception escapes the parent loop, lean toward terminating the process

4.4 Startup

Treating a startup failure as “start up anyway and figure it out later” almost always ends in tears.

  • Required configuration cannot be read
  • Version migration failed
  • A required folder or certificate is missing
  • Initialization of a core service failed
  • The dependency configuration is broken

In cases like these, exiting as a startup failure is the clearer choice.

4.5 Native Boundaries / COM / P/Invoke / unsafe

This area deserves its own category and a somewhat stricter eye.

  • COM
  • P/Invoke
  • Code beyond C++/CLI
  • Vendor SDKs
  • Native-side code coming back through callbacks
  • Anything involving unsafe

Lean toward exiting especially when you see any of these:

  • AccessViolationException
  • Symptoms suggesting heap corruption or a double free
  • Handle anomalies, signs of use-after-free
  • Sudden death at a callback boundary

5. Conditions Under Which Continuing Is Acceptable

Summarized, the conditions under which continuation is acceptable look like this. The premise is that most of them hold at the same time.

Condition Meaning
The unit of failure is clear You know what to discard: one operation, one screen, one job, one connection
State can be discarded It can be disposed and recreated, or treated as never applied
Shared state is protected The contamination does not spread to other features
External side effects can be accounted for You know whether it was sent / not sent / safe to resend
You can be honest with the user You can display “this operation failed”
It is observable Logs, metrics, and dumps allow follow-up investigation

6. Conditions Under Which Exiting Is Better

Conversely, if any of these apply, lean toward exiting.

  • You do not know what was changed partway through
  • Shared mutable state was touched and consistency cannot be determined
  • Lifetime management of locks, queues, threads, or monitoring loops is broken
  • Duplicated / missing / half-done external side effects cannot be accounted for
  • Startup or initialization of core infrastructure failed
  • Native boundaries or memory corruption are suspect

At this level, engineering for an easy recovery after crashing beats engineering a graceful continuation.

7. Recommendations by Typical Pattern

Pattern Recommendation Reason
A nonexistent path was specified via the file-open button Continue, failing only that operation The state damage is local
Only one row of a CSV import was malformed Continue with one row failed or one file failed The unit of failure is easy to contain
An unexpected NullReferenceException occurred midway through saving a screen Recreate the screen, leaning toward exit It is unclear how much of the ViewModel / business state changed
One queue message violated a business rule Continue, failing only that message It can be routed to a quarantine queue
The parent queue-consumption loop died from an unexpected exception Lean toward exiting the process The lifetime of the entire worker is broken
Required configuration cannot be read at startup Exit as a startup failure A half-initialized start is more dangerous
An AccessViolationException around a vendor SDK callback Lean toward immediate exit The possibility of memory corruption cannot be ignored
Only a non-essential telemetry send failed Disable just that feature and continue The fault domain can be separated from the main functionality

8. Common Anti-Patterns

8.1 catch (Exception) That Just Logs and Continues

This is quite dangerous. It hides the cause while keeping the broken state alive.

8.2 Trying to Recover in the Last-Chance Unhandled-Exception Handler

AppDomain.UnhandledException, Application.ThreadException, DispatcherUnhandledException, and the like are useful as the place to record things last, but they are not magic recovery points.

8.3 Casually Retrying When External Side Effects Are Involved

If you retry device commands, email sends, billing, file moves, or DB updates without re-execution safety, double-execution incidents become the new headline.

8.4 Keeping the UI Alive After the Monitoring Loop Died

An app that looks alive but is doing no work is a serious nuisance.

8.5 Saying “We Don’t Want It to Crash” Without Designing for Crashes

If you do not want it to crash, there are things to put in place first.

  • Automatic restart
  • Session restore
  • Saving intermediate results
  • Re-execution safety
  • Fault-domain isolation

9. Points to Sort Out at Implementation Time

9.1 Push catch Sites to Boundaries

Rather than catching everything in deep layers, it is easier to keep things organized by catching at places where a unit of failure can be defined, such as:

  • UI operation boundaries
  • Per-request boundaries
  • Per-job boundaries
  • Per-connection boundaries
  • The process boundary

9.2 Separate Expected from Unexpected Exceptions

  • Expected: validation, not found, timeout, cancel, business-rule violations
  • Unexpected: broken assumptions, escapes from parent loops, native-boundary failures, signs of memory corruption

9.3 Keep Shared State Small

The larger your shared mutable state, the harder the continuation decision becomes. Conversely, the more you can confine state inside one screen, one session, one worker, the easier it is to confine failures as well.

9.4 Move Dangerous Work to a Separate Process

For anything where you do not want a crash to spread—COM / ActiveX / vendor SDKs / unsafe code / heavy image processing / external device control—putting it in a separate process pays off considerably.

9.5 Unhandled-Exception Handlers Are for “Recording,” Not “Recovery”

  • Exception details
  • The operation context
  • The last important log entries
  • Configuration / version / connection targets
  • A path to collecting dumps

Getting these in place and prioritizing a setup where you can investigate after the crash leads to better stability in the end.

9.6 Do Not Over-Trust the WPF / WinForms Unhandled-Exception Events

In WPF, setting Handled = true in DispatcherUnhandledException does let you keep running after an unhandled exception. In Windows Forms, on the main UI thread, Application.ThreadException and the SetUnhandledExceptionMode setting let you choose how the app stops.

But whether you can keep running and whether the conditions for recovery are met are separate questions.

10. Summary

When an unexpected exception occurs, the question to ask is not “can this exception be caught” but whether the app’s state can still be trusted afterward.

As a decision sequence, this is usually enough:

  1. Can the failed unit be discarded?
  2. Can shared state be restored or recreated?
  3. Can external side effects be accounted for?
  4. Can the health of memory / threads / native boundaries be trusted?

If you are confident in all four, you can continue. If you are not, lean toward exiting.

Especially for long-running apps, monitoring apps, services, and device integration, there are plenty of situations where staying alive broken is more dangerous than crashing honestly.

Exception handling is not the art of never crashing. It is designing so that failures stay small, the app stops honestly when broken, and recovery is easy.

11. References

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

These topic pages place the article in a broader service and decision context.

This article connects naturally to the following service pages.

Author Profile

Profile page for the article author.

Go Komura

Representative of KomuraSoft LLC

Focused on Windows software development, technical consulting, and investigations into failures that are difficult to reproduce.

Back to the Blog