SOSC 2080: Lecture February 24, 2003

Lecture February 24, 2003

Testing our Assumptions about the reliability of computer systems (including ethical considerations) February 24, 2003

Overview of lecture:

1. Examples of system failures…and “The Titanic Effect”

2. Why are computer systems unreliable?

2.2. nature of computer “systems” – idea of complexity
2.3. lack of liability on software vendors

Forester’s and Morrison’s ethical questions:
--> "Why isn't software guaranteed like other products?"
--> "Why does so much shoddy software exist in important systems?"
--> "Should we entrust so many decisions to complex software programs?"

3. Why do we live with this risk?

**************
1. Examples of system failures

:-->see examples in article (military + aerospace + air control + banking + medical)

--> FACT: August 1996 - America OnLine’s computer systems crashed for 16 hours

CAUSE: new software installed during a regular update

CLAIM: AOL computers “virtually immune” to thiskind of outage
(The Titanic Effect: see kit page )

--> FACT: June 1996 - France’s Ariane 5 rocket - to put satellites into orbit had to be destroyed 40 seconds after it was launched

CAUSE: “design errors in the software”
“ All it takes is a modest anomaly in a digital system to bring the whole system to its knees”(Brown, Cybertrends, 1997)

2. Why are computer systems unreliable?

2.1. Reprise Tenner’s ideas on “computer systems”?
--> in Industrial Age, in a print shop there were individual craftsmen working with their tools--today we talk about tools in factories as a system.

In today's factories : workers are not tool users, they're tool managers

the managed tool is more precise, BUT "the precision of the managed tool has a price. It may be less robust, and as it becomes more complex, less predictable."

Remember—tools "break," but systems have "bugs."

What if you can’t figure out why a failure happened?

--> after a system failure there is a ritual whereby the cause of the accident is supposed to be determined.

BUT "high technology accidents may not have clear causes at all. They may be inherent in the complexity of the technological systems we have created. "

so we don’t have "real accidents" - (where you can point a finger at the person responsible) but "normal accidents"--
-->doesn't mean frequent—it is the kind of accident you can "expect in the normal functioning of a technologically complex operation."

system failure = normal accident
(Gladwell, “Blowup,” 1996)
the theory of "risk homeostatis"
states that we can't assume that EVEN IF a problem has been fixed....like the new booster joints on the Challenger scuttle, then the system is "safer"....

When do systems fail?

--> not only in operation but in design and development:

--> see Forester and Morrison article: more systems don't make it then do...
it's GOOD that they don't make it!

e.g., Strategic Defense Initiative in 1983

(also known as "Star Wars" it was to be layered ballistic missile defense system, ...like a shield in space over the U.S.)

needed the most complex computer software ever designed to be the "brain" that guides and co-ordinates an immensely complex battle management system.

to cost - over a trillion dollars.

Problems with Star Wars System: too many unknowns in contrast to normal program development:
In normal program development the:
- program will do X,
- programmers can usually anticipate how much computing power they need,
- they usually have other programs to use as models,
- they have the opportunity to debug the program,
and test it before it is given to the client.

2.2. Liability issues

" Microsoft knows that reliable software is not cost effective. According to studies, 90% to 95% of all bugs are harmless. They’re never discovered by users, and they don’t affect performance. It’s much cheaper to release buggy software and fix the 5% to 10% of bugs people find and complain about." Bruce Schneier as quoted in "Monty Phython’s Flying Circus: Microsoft and the Aircraft Carriers" www.acm.org/ubiquity/views/m_kabay_3.html

- in 2000, it was annoucned that new U.S. aircraft carriers, the CVN-77, will be controlled by software from Microsoft Federal Systems. (the operating system will be based on Windows 2000)

Is this a good thing?
- "could lead to functional disarmament" "how do you reboot an aircraft carrier?"
(gambling on a 5% bug rate doesn’t work on a military vessel)

- could lead to spinoffs for the rest of us if the military demands service level agreements or terms of performance."

What drives computer systems development today?
- concerns for time-to-market
- novel features
- keep costs down
("with little concern for assurance, reliability or avoidance of system security vulnerabilites" ("Risks in Features vs. Assurance": www.csl.sri.com/users/
neumann/inside risks.html#137)

Legal situation: software related to risks under contract law rather than more
demanding liability laws.

- liability laws in play with other engineered artefacts….why not software??
software vendors base their non-liability claim saying they are selling a
" license"not a product…no protection for consumers

- contracts are inequitable - purchasers assume all liabilities, despite impossibility of assessing the security, reliability or survivabilty of software.
Recent law suit in California who is suing Microsoft + others because she couldn’t read the contract she was supposed to agree to because it was inside the shrink-wrapping)

Since now the customer takes all the risks, there is little incentive for the
developers to ship reliable, secure systems…

How can the customer know all the risks?
--> Maybe liability law should override unjust contract disclaimers!
--> Maybe we shouldn’t expect/demand upgrades so frequently/maybe not buy into obsolescence

3. If we rely heavily on them, and they're unreliable, then why don't we worry more?
According to Gerald Wilde in Target Risk: humans have a tendency to compensate for lower risks in one area by taking greater risks in another.

examples:
--> anti lock brake system experiment in Germany; more accidents with drivers of A.B.S. system than others.

--> more accidents near marked crossings than other parts of road.
when we're convinced that anti-accident measures are in place, we relax about other safety features...

--> We figure air bags + seat belts will protect us, so we speed up.

Add together the unreliable, complex computer systems (where "normal accidents"/system failures occur) + the fact that software systems aren’t “guaranteed” and our human nature (i.e., our propensity to maintain a high state of risk) and you get.......???

This page last revised 02/26/03