Everything Is Broken

Last Wednesday, I attended i of the monthly meetings of the "Everything is Broken" reckon upwardly at Seattle. It turns out I selected a cracking coming together to attend, because both speakers, Charity Majors in addition to Tammy Butow, were excellent.

Here are to a greater extent than or less select quotes without context.

Observability-driven evolution - Charity Majors


Chaos applied scientific discipline is testing code inward production. "What if I told you: you lot could examination both inward in addition to earlier production."

Deploying code is non a binary switch; deploying code is a procedure of increasing your confidence inward your code.

"Microservices are hard!" equally a caption for a figure comparison the LAMP stack 2005 versus the complexity of the Parse stack 2015.

We are all distributed systems engineers in addition to unknowns outnumber the knowns!
Distributed systems accept an interplanetary space disclose of almost-impossible failures!

Without observability you lot don't accept chaos engineering, you lot accept a chaos.

Monitoring systems accept non changed significantly inward xx years, from Nagios. Complexity is exploding everywhere, but our tools are designed for a predictable world.

Observability for software engineers: tin lavatory you lot sympathize what is happening within your systems, only yesteryear scream for questions from the outside? Can you lot debug your code in addition to its conduct using its output?

For the LAMP stack monitoring was sufficient for identifying the problems.
For microservices, it is unclear what nosotros are supposed to monitor for. We demand observability!
The difficult business office is non debugging your code, but to uncovering which business office to debug!

Facebook's  Scuba was ugly, but it helped us spell in addition to die in addition to improve our debugging! It improved things a lot. I sympathize Scuba was hacked to bargain amongst MySQL problems.

You don't know what you lot don't know, hence dashboards are real express utility. Dashboards are exclusively for anticipated cases: every dashboard is an artifact of yesteryear failures. There are besides many dashboards, in addition to they are besides slow.

Aggregates are the kiss of death; of import details larn lost.

Black swans are the norm; you lot must aid close 99.9%, epsilons, corner cases.

Watch things run inward production inward the normal case; larn used to observing your systems when they aren't on fire.

Building Resilient Systems Using Chaos Engineering - Tammy Butow

Chaos applied scientific discipline is "thoughtful planned experiments designed to exhibit weak points inward the system".

Top v pop ways to utilization chaos applied scientific discipline now: kubernetes, kafka, aws ecs, cassandra, elasticsearch.

Fullstack chaos engineering: inject faults at api, app, cache, database, os, host, network, power

We are exploring a novel administration in addition to collaborating amongst the UI engineers on ways to shroud touching on of faults.

prerequisites for chaos engineering:
1. monitoring & observability
2. on-call & incident management
3. know the terms of your downtime per hr (British Airlines's 1 solar daytime outage costed $150 millon)

How to direct a chaos experiment?
+ position hand v critical systems
+ direct 1 system
+ whiteboard the system
+ select attack: resource/state/network
+ create upwardly one's remove heed scope

How to run your ain gameday: http://gremlin.com/gameday

Outage post-mortems: https://github.com/danluu/post-mortems

First chaos applied scientific discipline conference this year: http://twitter.com/chaosconf

Some notes close the venue: Snap Inc

There were fancy appetizers, real fancy. They had a kitchen at that spot at the 5th flooring (and every floor?). Do they supply gratis dejeuner to snap employees?

At the 5th floor, where the coming together took place, nosotros had a cracking persuasion of Puget Sound bay. The Snap edifice is only behind the Pike Market Place. There were close 80-100 people. I mean value the 30+ folks outnumbered 40+ folks, but non severely. Good exhibit upwardly from woman soul engineers. There was ambient music inward the commencement from 6-6:30pm, but it was loud.

By the way, I never used snapchat... I am old. But I don't accept a Facebook account, hence perhaps I am non that old.

MAD questions

1. Do you lot demand to examination inward production? 
The human activity of sabotaging parts of your system/availability may audio crazy to to a greater extent than or less people. But it puts forth a real theater commitment inward place. You should live cook for these faults, equally they volition hap inward i of these Thursdays. It establishes a dependent that you lot would test, gets you lot prepared amongst writing the instrumentation for observability, in addition to toughens you lot up. It puts you lot into a useful paranoid mindset: the enemy is ever at bay in addition to never sleeps, I should live cook to aspect upwardly attacks. (Hmm, hither is an soil forces analogy: should you lot develop amongst alive ammunition? It is nonetheless controversial because of the lives on the line.)

Why non expect till faults occur inward production yesteryear themselves, they volition hap anyways. But when you lot produce chaos testing, you lot accept command inward the inputs/failures, hence you lot already know the rootage cause. And this tin lavatory live give you lot much meliorate chance to uncovering the percolation effects.

2. Analogies for chaos engineering
I accept heard vaccination used equally an analogy. It is a tactful analogy (much meliorate than the alive firing analogy). Nobody tin lavatory combat against usefulness of vaccinations.

Other things chaos testing evokes could live blood letting in addition to antifragility. I had read somewhere that the athletes inward ancient Greek would get a diarrhea on utilization a twosome weeks earlier competitions, hence that their torso tin lavatory recover in addition to larn much stronger at the fourth dimension of the competition. I gauge the reasoning goes equally "too much of a monotone is a bad thing" in addition to it is beneficial to stress/shake the organisation to avoid a local maxima. That reminds me of this YouTube video I exhibit inward my distributed systems course of pedagogy on the theme of resilience. 

3. Debugging designs amongst TLA+
Even afterwards you lot accept a verified design, the implementation tin lavatory nonetheless innovate errors, hence using chaos applied scientific discipline tools is valuable in addition to of import fifty-fifty then.

It helps fifty-fifty for "verified" systems for its nonverified parts:
Folks encouraged us to endeavour testing verified file systems; nosotros were skeptical nosotros would uncovering anything, but to our surprise, when nosotros tested MIT’s FSCQ file system, nosotros found it did non persist information on fdatasync()! Apparently they had a põrnikas inward the un-verified part of their code (Haskell-C bindings), which was caught yesteryear Crashmonkey! This shows that fifty-fifty verified file systems accept un-verified components which are oftentimes complex, in addition to which volition accept bugs.

4. Chaos tag
Turns out I accept several posts mentioning chaos engineering, hence I am creating a chaos tag to live available for utilization for futurity posts.

0 Response to "Everything Is Broken"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel