The Atlas of Coordination
Resilience

Pattern 9: Error and Recovery Patterns

Overview

Coordination structures contain mechanisms for handling deviations from expected or desired system states. These mechanisms may detect errors immediately or with delay, respond through predefined procedures or improvisation, and contain effects locally or allow propagation across system boundaries.

Recovery approaches may focus on attribution and accountability, or on system restoration and learning. Response capabilities may be practiced and tested before need, or developed during actual disruptions. The presence and characteristics of error handling structures affect system resilience and disruption magnitude.

These structural features appear where work involves complexity, interdependence, or operating conditions that create deviation possibilities—in stable operations, during change or high load, and when executing novel work.

Observable Manifestations

Small deviations or errors amplifying into larger system disruptions

Absence of defined procedures specifying actions when disruptions occur

Organizational responses to errors focusing on individual attribution rather than system restoration

Local corrective actions creating unanticipated problems in other system parts

Heightened urgency and stress responses when disruptions occur

Errors or deviations not being reported or communicated to relevant actors

Recovery procedures not tested or practiced before actual need

System coupling characteristics allowing rapid propagation of disruptions across boundaries

Buffer or redundancy capacity being absent when errors occur

Error responses improvised in the moment rather than following established structures

Structural Conditions

Work complexity creating possibilities for deviations from expected states

System interdependencies allowing local disruptions to affect other components

Detection mechanisms capable of identifying deviations from desired states

Communication channels through which error information can be transmitted

Authority structures enabling response activation when errors are detected

Cultural norms regarding error surfacing, attribution, and organizational learning

Reserve capacity or buffers capable of absorbing error effects

Organizational memory of past errors and recovery experiences

Boundaries

Not about individual competence or care in work execution

Not implying poor quality, carelessness, or organizational dysfunction

Not explaining why specific error and recovery structures exist in particular contexts

Not evaluating whether particular error structures are appropriate for contexts

Not addressing optimal error tolerance levels for specific situations

Not distinguishing necessary from unnecessary recovery mechanisms

Common Misattributions

Attributed to individual carelessness or incompetence when error detection mechanisms are structurally absent

Attributed to poor training when recovery protocols have not been defined or practiced

Attributed to quality control failure when system coupling creates unavoidable propagation

Attributed to blame culture when organizational structures incentivize error hiding

Attributed to lack of planning when buffer capacity is structurally unavailable

Attributed to individual panic when predefined response procedures do not exist

Attributed to coordination failure when local fixes create downstream effects in complex systems

The presence of this pattern does not imply poor quality control, careless execution, or required change. It describes observable error and recovery structures that exist across many functional and successful organizations. Both explicit and implicit error handling approaches persist in different organizational contexts for context-specific structural reasons.