To celebrate National Engineers Week (February 19-25), MIT Press has conducted a series of Q&As with the authors of the new Engineering Systems series books.
Next up is Nancy Leveson, author of Engineering a Safer World: Systems Thinking Applied to Safety. Leveson is Professor of Aeronautics and Astronautics and Engineering Systems at MIT. An acknowledged leader in the field of safety engineering, she has worked to improve safety in nearly every industry over the past 30 years.
Q. What is System Safety?
A. System Safety is a specific approach to preventing accidents that was created by aerospace engineers in the 1950s for the ICBM systems the U.S. was building then. It was an early application of system thinking to engineering, in this case, for the system property of safety. The basic concept is a "hazard" and the goal is to identify and eliminate or control hazards in order to reduce losses. This goal is accomplished through appropriate analysis, design, and management of safety-critical projects.
Q. How did you become involved in researching System Safety?
A. I got involved in this topic about a week after assuming my first faculty position 30 years ago. I got a call from a system safety engineer at Hughes Aircraft Co. who wanted some help with the safety of a new torpedo system they were designing that had 15 microprocessors on it. They had no idea what to do about the software. I told him I didn't know anything about safety or torpedos and that it sounded like a reliability problem. He replied that it was a safety problem, not a reliability problem, and that they couldn't find anyone else that was willing to help them. I couldn't promise much but I said I would provide the help I could. I got interested in the problem and kept working on it.
Q. What changes are stretching the limits of engineering knowledge?
A. We are building systems today with a complexity level that makes it impossible for anyone to predetermine all the potential system behavior. In addition, software is becoming a major part of systems, yet traditional engineering techniques to deal with accident prevention do not apply to software. At the same time, our systems have the potential not only to harm large numbers of people and the environment, but negatively impact the lives and environment of future generations. We can no longer afford to have a few accidents and learn from them but need to prevent them before they occur.
Q. What are some of the most important assumptions that traditional safety engineers make about the cause and prevention of accidents that you have discovered to be wrong?
A. Engineers assume and are taught that reliability and safety are the same thing. In complex systems, building and assuring very reliable components will have only minor impact on accidents. This incorrect assumption is based on the fact that the simpler, electro-mechanical systems of the past can be exhaustively tested and system design errors can be eliminated before the system is used. That leaves only component failures (including operator error) as the cause of accidents during use. But exhaustive testing is no longer possible in systems containing significant amounts of software. Increasingly, accidents are arising from unsafe interactions among components that have not failed (that is, they satisfy their requirements).
Engineers also assume that the old fail-safe and fault tolerant techniques, like redundancy and building in safety margins apply to the new technology, such as software, which they do not. Something different is needed.
Other important erroneous assumptions involve the role of operators in accidents and the role of assigning blame for events in preventing future accidents.
Q. Why are safety efforts sometimes not cost-effective?
A. There are lots of reasons. Sometimes safety efforts are devoted simply to complying with regulations or getting the system approved by some government agency without these efforts having any impact on the actual design of the system. In other cases, the system safety engineers are doing useful things but the design engineers never get information about hazards and hazard analysis until most or all of the design has been completed. There are few if any effective and cheap ways to fix flaws in the design at that time. Additionally, the hazard analysis techniques may only look at a small part (component failures) of the causes of accidents today and may treat human error superficially. Another reason is that safety efforts may narrowly focus only on technology and not include organizational design, management, operations, and safety culture.
Q. How can engineers work to make their safety efforts more cost-effective?
A. They need to apply systems thinking to engineering. How to do this is the topic of my new book. The experience on real systems so far is that the new techniques described is cheaper, easier, and more effective than what people are doing now.
Q. What kinds of changes do you think we need in engineering education to train engineers to create safer systems?
A. There is almost no training in system safety today in most engineering schools. Engineers must learn on the job. A few classes exist at the graduate level but relatively few engineering students, graduate or undergraduate, are exposed to these concepts in their education. I am creating an undergraduate class in system safety at MIT which will be taught for the first time next fall. The D'Arbeloff Fund and the MIT-Singapore program are providing funding to develop the curriculum and teaching materials, which will be freely shared with anyone else who would like to teach such a class. The class is divided into four modules, which could be taught separately or integrated into other classes: (1) analyzing the causes of accidents, (2) hazard analysis, (3) design for safety, and (4) operating and managing safety-critical systems.
Next up is Nancy Leveson, author of Engineering a Safer World: Systems Thinking Applied to Safety. Leveson is Professor of Aeronautics and Astronautics and Engineering Systems at MIT. An acknowledged leader in the field of safety engineering, she has worked to improve safety in nearly every industry over the past 30 years.
Q. What is System Safety?
A. System Safety is a specific approach to preventing accidents that was created by aerospace engineers in the 1950s for the ICBM systems the U.S. was building then. It was an early application of system thinking to engineering, in this case, for the system property of safety. The basic concept is a "hazard" and the goal is to identify and eliminate or control hazards in order to reduce losses. This goal is accomplished through appropriate analysis, design, and management of safety-critical projects.
Q. How did you become involved in researching System Safety?
A. I got involved in this topic about a week after assuming my first faculty position 30 years ago. I got a call from a system safety engineer at Hughes Aircraft Co. who wanted some help with the safety of a new torpedo system they were designing that had 15 microprocessors on it. They had no idea what to do about the software. I told him I didn't know anything about safety or torpedos and that it sounded like a reliability problem. He replied that it was a safety problem, not a reliability problem, and that they couldn't find anyone else that was willing to help them. I couldn't promise much but I said I would provide the help I could. I got interested in the problem and kept working on it.
Q. What changes are stretching the limits of engineering knowledge?
A. We are building systems today with a complexity level that makes it impossible for anyone to predetermine all the potential system behavior. In addition, software is becoming a major part of systems, yet traditional engineering techniques to deal with accident prevention do not apply to software. At the same time, our systems have the potential not only to harm large numbers of people and the environment, but negatively impact the lives and environment of future generations. We can no longer afford to have a few accidents and learn from them but need to prevent them before they occur.
Q. What are some of the most important assumptions that traditional safety engineers make about the cause and prevention of accidents that you have discovered to be wrong?
A. Engineers assume and are taught that reliability and safety are the same thing. In complex systems, building and assuring very reliable components will have only minor impact on accidents. This incorrect assumption is based on the fact that the simpler, electro-mechanical systems of the past can be exhaustively tested and system design errors can be eliminated before the system is used. That leaves only component failures (including operator error) as the cause of accidents during use. But exhaustive testing is no longer possible in systems containing significant amounts of software. Increasingly, accidents are arising from unsafe interactions among components that have not failed (that is, they satisfy their requirements).
Engineers also assume that the old fail-safe and fault tolerant techniques, like redundancy and building in safety margins apply to the new technology, such as software, which they do not. Something different is needed.
Other important erroneous assumptions involve the role of operators in accidents and the role of assigning blame for events in preventing future accidents.
Q. Why are safety efforts sometimes not cost-effective?
A. There are lots of reasons. Sometimes safety efforts are devoted simply to complying with regulations or getting the system approved by some government agency without these efforts having any impact on the actual design of the system. In other cases, the system safety engineers are doing useful things but the design engineers never get information about hazards and hazard analysis until most or all of the design has been completed. There are few if any effective and cheap ways to fix flaws in the design at that time. Additionally, the hazard analysis techniques may only look at a small part (component failures) of the causes of accidents today and may treat human error superficially. Another reason is that safety efforts may narrowly focus only on technology and not include organizational design, management, operations, and safety culture.
Q. How can engineers work to make their safety efforts more cost-effective?
A. They need to apply systems thinking to engineering. How to do this is the topic of my new book. The experience on real systems so far is that the new techniques described is cheaper, easier, and more effective than what people are doing now.
Q. What kinds of changes do you think we need in engineering education to train engineers to create safer systems?
A. There is almost no training in system safety today in most engineering schools. Engineers must learn on the job. A few classes exist at the graduate level but relatively few engineering students, graduate or undergraduate, are exposed to these concepts in their education. I am creating an undergraduate class in system safety at MIT which will be taught for the first time next fall. The D'Arbeloff Fund and the MIT-Singapore program are providing funding to develop the curriculum and teaching materials, which will be freely shared with anyone else who would like to teach such a class. The class is divided into four modules, which could be taught separately or integrated into other classes: (1) analyzing the causes of accidents, (2) hazard analysis, (3) design for safety, and (4) operating and managing safety-critical systems.