Paper Review: Comprehensive As Well As Efficient Runtime Checking Inwards Organisation Software Through Watchdogs
This newspaper past times Chang Lou, Peng Huang, too Scott Smith appeared inwards HotOS 2019. The newspaper argues that organisation software needs intrinsic detectors that monitor internally for subtle issues specific to a process. In particular, the newspaper advocates employing intrinsic watchdogs every bit detectors. Watchdogs (also known every bit grenade timers) receive got been widely used inwards embedded devices. Watchdogs job a decrementing timeout counter which resets the processor when it reaches zero. To foreclose a reset, the software must maintain restarting the watchdog counter afterward performing/clearing or too then sanity checks.
Table 1 summarizes the comparing of crash failure detectors, intrinsic watchdogs, too mistake handlers. Failure detectors are likewise general, they only brand "up-or-down" decisions. They are alone skillful for achieving liveness, every bit they are likewise unreliable for making security decisions.
The disadvantage with mistake handlers, the newspaper argues, is that liveness-related failures frequently do non receive got explicit mistake signals that tin trigger a handler: at that topographic point is no signal for e.g., write existence blocked indefinitely or or too then thread deadlocking or infinitely looping.
The newspaper prescribes the mimicking approach, where the checker selects of import operations from the principal program, too mimics them for detecting whatsoever errors. Since the mimic checker exercises similar code logic inwards a production environment, it has the potential to grab too locate bugs inwards the plan every bit good every bit faults inwards the environment.
The challenge with this approach is to systematically select important/representative operations from the principal program. To solve this problem, the newspaper proposes a method using static analysis to automatically generate mimic-type watchdogs, which it calls program logic reduction.
The authors receive got built a prototype, called AutoWatchdog, too applied it to
ZooKeeper, Cassandra too HDFS, generating tens of checkers for each. Figure 2 shows an illustration from ZooKeeper.
1. If nosotros honour the problems past times mimicking the program, why don't nosotros do the detection every bit business office of the plan rather than using a split upwards watchdog?
The watchdog detectors receive got a countdown timer for detecting getting stuck at whatsoever indicate due to or too then operation. This, of course, could receive got been added to the plan every bit well, but mayhap that makes the plan await complicated. I gauge having this inwards the watchdog detectors render modularity, every bit inwards aspect-oriented programming.
I gauge or too then other do goodness of having a split upwards watchdog detector is that nosotros receive got a flexibility to locate it to a greater extent than centrally, rather than with the execution inwards 1 process. The greyish failures paper made a illustration of asymmetry of information, too that at that topographic point is a postulate for to a greater extent than end-to-end detection. Having a split upwards watchdog detector, nosotros tin mayhap set parts of it or copies of it inwards unlike processes for existence able to detect/check faults from unlike perspectives.
2. How do nosotros brand watchdogs compose?
One challenge that volition surface when AutoWatchdog creates multiple watchdog detectors for a plan is interference with the watchdogs. A reset triggered past times a watchdog detector may Pb to or too then other reset triggered on or too then other detector. And this may fifty-fifty larn continued every bit the 2 watchdog detectors trigger resets for each other. Of course, this is to a greater extent than close correction that detection, too then this is exterior the compass of the paper.
However, fifty-fifty for only detection, help must survive taken that the mimicking inwards the watchdog detectors do non receive got side effects too mess upwards the correctness of the program. The newspaper cautions close this problem: "Executing watchdog checkers should non incur unintended side-effects or add together pregnant toll to the normal execution. For example, inwards monitoring the indexer of kvs, the checkers may endeavor to recollect or insert or too then keys, which should non overwrite information produced from the normal execution or significantly delay normal asking handling." I am non certain how it is possible to avoid this job completely with automatically generated watchdogs. If sandboxes are used, the watchdogs would non survive testing/monitoring the existent production environment.
Table 1 summarizes the comparing of crash failure detectors, intrinsic watchdogs, too mistake handlers. Failure detectors are likewise general, they only brand "up-or-down" decisions. They are alone skillful for achieving liveness, every bit they are likewise unreliable for making security decisions.
The disadvantage with mistake handlers, the newspaper argues, is that liveness-related failures frequently do non receive got explicit mistake signals that tin trigger a handler: at that topographic point is no signal for e.g., write existence blocked indefinitely or or too then thread deadlocking or infinitely looping.
The mimicking approach for writing watchdog timers
The newspaper prescribes the mimicking approach, where the checker selects of import operations from the principal program, too mimics them for detecting whatsoever errors. Since the mimic checker exercises similar code logic inwards a production environment, it has the potential to grab too locate bugs inwards the plan every bit good every bit faults inwards the environment.
The challenge with this approach is to systematically select important/representative operations from the principal program. To solve this problem, the newspaper proposes a method using static analysis to automatically generate mimic-type watchdogs, which it calls program logic reduction.
We instead advise to derive from P a reduced but representative version W, which even too then retains plenty code to reveal greyish failures. Our hypothesis that such reduction is feasible stems from 2 insights. First, most code inwards P postulate non survive checked at runtime because its correctness is logically deterministic --such code is ameliorate suited for unit of measurement testing earlier production too thence should survive excluded from W. Second, W’s destination is to grab errors rather than to recreate all the details of P’s concern logic. Therefore, due west does non postulate to mimic the total execution of P. For example, if P invoked write() many times inwards a loop, for checking purposes, due west may alone postulate to invoke write() 1 time to honour a fault.
The authors receive got built a prototype, called AutoWatchdog, too applied it to
ZooKeeper, Cassandra too HDFS, generating tens of checkers for each. Figure 2 shows an illustration from ZooKeeper.
MAD questions
1. If nosotros honour the problems past times mimicking the program, why don't nosotros do the detection every bit business office of the plan rather than using a split upwards watchdog?
The watchdog detectors receive got a countdown timer for detecting getting stuck at whatsoever indicate due to or too then operation. This, of course, could receive got been added to the plan every bit well, but mayhap that makes the plan await complicated. I gauge having this inwards the watchdog detectors render modularity, every bit inwards aspect-oriented programming.
I gauge or too then other do goodness of having a split upwards watchdog detector is that nosotros receive got a flexibility to locate it to a greater extent than centrally, rather than with the execution inwards 1 process. The greyish failures paper made a illustration of asymmetry of information, too that at that topographic point is a postulate for to a greater extent than end-to-end detection. Having a split upwards watchdog detector, nosotros tin mayhap set parts of it or copies of it inwards unlike processes for existence able to detect/check faults from unlike perspectives.
2. How do nosotros brand watchdogs compose?
One challenge that volition surface when AutoWatchdog creates multiple watchdog detectors for a plan is interference with the watchdogs. A reset triggered past times a watchdog detector may Pb to or too then other reset triggered on or too then other detector. And this may fifty-fifty larn continued every bit the 2 watchdog detectors trigger resets for each other. Of course, this is to a greater extent than close correction that detection, too then this is exterior the compass of the paper.
However, fifty-fifty for only detection, help must survive taken that the mimicking inwards the watchdog detectors do non receive got side effects too mess upwards the correctness of the program. The newspaper cautions close this problem: "Executing watchdog checkers should non incur unintended side-effects or add together pregnant toll to the normal execution. For example, inwards monitoring the indexer of kvs, the checkers may endeavor to recollect or insert or too then keys, which should non overwrite information produced from the normal execution or significantly delay normal asking handling." I am non certain how it is possible to avoid this job completely with automatically generated watchdogs. If sandboxes are used, the watchdogs would non survive testing/monitoring the existent production environment.
0 Response to "Paper Review: Comprehensive As Well As Efficient Runtime Checking Inwards Organisation Software Through Watchdogs"
Post a Comment