Software reliability: the Therac failures
Reading:
Therac-25 (in reader)
- 6 accidents, 4 deaths
- no bad guys (cf. Ford Pinto case)
- Software doesn't degrade like hardware
- but it rots anyway
- but it has much greater complexity
cf. Star Wars (birth of CPSR)
- Continuum of life-or-deathness: Clearly Therac yes, clearly video game no.
- But what about OS, spreadsheet, etc.?
- Therac bugs:
- no atomic test and set
- hardware interlocks removed
- UI problems:
- cursor position
- defaults
- too many error messages
- documentation
- organizational response
easy to see after the fact, but problems are inherent in
organizations (esp. ones that can be sued)
- Solutions
- redundancy
- fail soft (work despite bugs)
- audit trail
- Software Engineering (an attitude about programming)
- Design techniques
- modularization (cf. OOP)
- understand concurrency (semaphores)
- analyze invariants
- Verification techniques
- correctness proofs
(can't be perfect because of halting theorem but still useful)
- automatic analysis in compiler
- Debugging techniques
- black box vs. glass box
- don't break old code with new fix
- introduce bugs on purpose to analyze results downstream
- debug by subtraction, not addition