How Complex Systems Fail

This is a four page study close the nature of failures inward complex systems. It is a gloomy report. It says that complex systems are ever ridden amongst faults, in addition to volition neglect when some of these faults conspire in addition to cluster. In other words, complex systems constantly dwell on the verge of failures/outages/accidents.

The writing of the study is peculiar. It is written every bit a listing of xviii items (ooh, everyone loves lists). But the items are non independent. For example, it is hard to empathize items 1 in addition to 2, until you lot read item 3. Items 1 in addition to 2 are inward fact laying the foundations for item 3.

The study is written past times an MD, in addition to is primarily focused on healthcare related complex systems, but I mean value almost all of the points too apply for other complex systems, in addition to inward item cloud computing systems. In 2 recent posts (Post1, Post2), I had covered papers that investigate failures inward cloud computing systems, in addition to thus I idea this study would live on a prissy complement to them.


1) Complex systems are intrinsically hazardous systems.
I mean value the correct wording hither should live on "high-stakes" rather than "hazardous". For example, cloud computing is non "hazardous" but it is definitely "high-stakes".

2) Complex systems are heavily in addition to successfully defended against failure.
Is at that topographic point an undertone hither which implies these defence mechanisms contribute to brand these high-stakes systems fifty-fifty to a greater extent than complex?

3) Catastrophe requires multiple failures – unmarried betoken failures are non enough.
This is because the anticipated failure modes are already good guarded.

4) Complex systems comprise changing mixtures of failures latent inside them.
"Eradication of all latent failures is express primarily past times economical toll but too because it is hard earlier the fact to encounter how such failures mightiness contribute to an accident. The failures alter constantly because of changing technology, move organization, in addition to efforts to eradicate failures." This is pretty much the lesson from the cloud outages study. Old services neglect every bit much every bit novel services, because the playground keeps changing.

5) Complex systems run inward degraded mode.
"A corollary to the preceding betoken is that complex systems run every bit broken systems."

6) Catastrophe is ever precisely just about the corner.

7) Post-accident attribution accident to a ‘root cause’ is fundamentally wrong.
"Because overt failure requires multiple faults, at that topographic point is no isolated ‘cause’ of an accident. There are multiple contributors to accidents. Each of these is necessary insufficient inward itself to create an accident. Only jointly are these causes sufficient to create an accident."

8) Hindsight biases post-accident assessments of human performance.
The "everything is obvious inward hindsight" fallacy was covered good inward this book.

9) Human operators lead hold dual roles: every bit producers & every bit defenders against failure.
10) All practitioner actions are gambles.
11) Actions at the sudden cease resolve all ambiguity.

12) Human practitioners are the adaptable chemical element of complex systems.
What close software agents? They tin too react adaptively to the developing situations.  And today amongst machine learning in addition to deep learning, specially so.

13) Human expertise inward complex systems is constantly changing.
14) Change introduces novel forms of failure.
The cloud outages survey has showed that updates in addition to configuration changes in addition to human factors trace organisation human relationship for to a greater extent than than 1/3rd of outages.

15) Views of ‘cause’ boundary the effectiveness of defenses against hereafter events.
Case-by-case add-on of fault-tolerance is non really effective. "Instead of increasing safety, post-accident remedies commonly increase the coupling in addition to complexity of the system."

16) Safety is a feature of systems in addition to non of their components
Safety is a system-level property, unit of measurement testing of components is non enough.

17) People continuously create safety.
18) Failure gratis operations require sense amongst failure.
What doesn't kill you lot makes you lot stronger. In lodge to grow, you lot need to force the limits, in addition to stress the system. Nassim Taleb's mass close antifragility makes like points.
And this brusk video on resilience is only excellent.

0 Response to "How Complex Systems Fail"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel