What we can learn from a nuclear reactor
Hinkley Point B is an ageing power plant overlooking the Bristol Channel. The location was once designed to welcome visiting schoolchildren, but is now defended against terrorists by a maze of checkpoints and perimeter fencing. At the heart of the site, which I visited on a mizzling, unseasonable day in late July, looms a vast grey slab of a building containing a pair of nuclear reactors.
Hinkley Point B began operating shortly before the doomed TMI-2 reactor at Three Mile Island in Pennsylvania, US, and is due to be decommissioned after 40 years of service in 2016. As parts of the plant are showing the industrial equivalent of crow’s feet, it runs at 70 per cent capacity to minimise further wear and tear. But when I asked EDF Energy, a subsidiary of one of the world’s largest nuclear energy companies, whether I could visit a nuclear facility to talk about safety, Hinkley Point B was the site they volunteered.
It might have seemed a strange choice on their part, but I was on a strange mission. I hadn’t come to Hinkley Point B to learn about the safety of nuclear energy. I’d come because I wanted to learn about the safety of the financial system.
The connection between banks and nuclear reactors is not obvious to most bankers, nor banking regulators. But to the men and women who study industrial accidents such as Three Mile Island, Deepwater Horizon, Bhopal or the Challenger shuttle – engineers, psychologists and even sociologists – the connection is obvious. James Reason, a psychologist who studies human error in aviation, medicine, shipping and industry, uses the downfall of Barings Bank as a favourite case study. “I used to speak to bankers about risk and accidents and they thought I was talking about people banging their shins,” he told me. “Then they discovered what a risk is. It came with the name of Nick Leeson.”
Peter Higginson, the head of safety at Hinkley Point B, also thinks there is a parallel. An earnest physicist in a dark-blue shirt and tan slacks, he breaks off from a long safety briefing to muse about banking. “I have done my own thinking about the financial crisis,” he says. “Could they have learned something from us? I ask the question.”
One catastrophe expert who has no doubt about the answer is Charles Perrow, emeritus professor of sociology at Yale. He is convinced that bankers and banking regulators could and should have been paying attention to ideas in safety engineering and safety psychology. Perrow published a book, Normal Accidents, after Three Mile Island and before Chernobyl, which explored the dynamics of disasters and argued that in a certain kind of system, accidents were inevitable – or “normal”.
For Perrow, the dangerous combination is a system that is both complex and “tightly coupled”. Harvard university is a complex system. A change in US student visa policy; or a new government scheme to fund research; or the appearance of a fashionable book in economics, or physics, or anthropology; or an internecine academic row – all could have unpredictable consequences for Harvard and trigger a range of unexpected responses.
A domino-toppling display is not especially complex, but it is tightly coupled. So is a loaf of bread rising in the oven. For Perrow, the defining characteristic of a tightly coupled process is that once it starts, it’s difficult or impossible to stop.
Nuclear power stations are both complex and tightly coupled systems. They contain a bewildering array of mechanisms designed to start a nuclear reaction, slow it down, use it to generate heat, use the heat to generate electricity, intervene if there is a problem, or warn operators of odd conditions inside the plant. At Three Mile Island, the most famous nuclear accident in US history, four or five safety systems malfunctioned within the first 13 seconds. Dozens of separate alarms were sounding in the control room. The fundamental story of the accident was that the operators were trying to stabilise a nuclear reactor whose behaviour was, at the time, utterly mysterious. That is the nature of accidents in a complex and tightly coupled system.
Perrow believes that finance is a perfect example of a complex, tightly coupled system. In fact, he says, its complexity “exceeds the complexity of any nuclear plant I ever studied”.
So if the bankers and their regulators did start paying attention to the unglamorous insights of industrial safety experts, what might they learn?
. . .
It might seem obvious that the way to make a complex system safer is to install some safety measures. Engineers have long known that life is not so simple. In 1638, Galileo described an early example of unintended consequences in engineering. Masons would store stone columns horizontally, lifted off the soil by two piles of stone. The columns often cracked in the middle under their own weight. The “solution” – a third pile of stone in the centre – didn’t help. The two end supports would often settle a little, and the column, balanced like a see-saw on the central pile, would then snap as the ends sagged.
Galileo had found a simple example of a profound point: a new safety measure or reinforcement often introduces unexpected ways for things to go wrong. This was true at Three Mile Island. It was also true during the horrific accident on the Piper Alpha oil and gas platform in 1988, which was aggravated by a safety device designed to prevent vast seawater pumps from starting automatically and killing the rig’s divers. The death toll was 167.
In 1966, at the Fermi nuclear reactor near Detroit, a partial meltdown put the lives of 65,000 people at risk. Several weeks after the plant was shut down, the reactor vessel had cooled enough to identify the culprit: a zirconium filter the size of a crushed beer can, which had been dislodged by a surge of coolant in the reactor core and then blocked the circulation of the coolant. The filter had been installed at the last moment for safety reasons, at the express request of the Nuclear Regulatory Commission.
The problem in all of these cases is that the safety system introduced what an engineer would call a new “failure mode” – in other words, a new way for things to go wrong. And that was precisely the problem in the financial crisis.
The notorious repackaged mortgages – in the form of residential mortgage-backed securities (RMBS) and collateralised debt obligations (CDOs) – were financial safety systems that offered exciting new ways to blow things up. The safety mechanism was a complex legal structure to alter the distribution of the risks from mortgage defaults. In principle, this made the risks more comprehensible and the right sort of investor could take on the right degree of risk. In practice, the repackaged mortgages – especially when the repackaging was repeated several times – made the behaviour of the risks as incomprehensible as a malfunctioning nuclear reactor.
For instance, investors had to take a view on how likely mortgage defaults were, and the extent to which – like London buses – they all arrived at the same time. (Clearly, there was some degree of clustering: the challenge was to estimate it without much data.) Few people realised that the same CDO safety system, which appeared to parcel risks in a predictable way, also made investors vulnerable to errors in their assumptions. A one-in-a-million chance of taking a loss could become a million-to-one chance against getting any money back at all.
Part of the problem was what risk experts call “risk compensation”. Just as safety belts appear to encourage drivers to feel safe and take more risks, the apparent safety of CDOs encouraged banks to bet their entire franchise on them. (In both cases, bystanders often become casualties.) But the subtler effect was that of these new failure modes.
A second spectacular example is another financial safety system, the credit default swap. Credit default swaps were explicitly designed as a safety measure, a form of insurance against a company, including a subprime CDO, failing to pay its debts. Many major banks turned to the insurance giant AIG, or to “monoline” insurers, which sold credit default swaps to insure the banks’ investments. When the investments went bad, AIG and the monolines couldn’t pay – and the safety measure suddenly became a source of systemic risk. Bonds are given a credit rating by rating agencies, but if the bonds are insured they inherit the credit rating of their insurer. When AIG’s credit rating was downgraded, so were the ratings of the bonds it was insuring – which meant some banks were forced to sell them to meet their regulatory obligations. Financial institutions didn’t have to have any involvement at all with subprime products: if they were holding bonds that AIG or the monolines had insured, they could be sucked into the crisis by the unexpected interaction of a safety regulation and an insurance-based safety mechanism.
A series of measures intended to guarantee the safety of individual financial institutions had brought the system to its knees. To industrial safety experts, such unintended consequences are commonplace. So if a Rube Goldbergesque accretion of one safety system after another is not the solution to industrial or financial catastrophes, what is?
. . .
The 1979 crisis at Three Mile Island remains the closest the American nuclear industry has come to a major disaster. It would have been far less grave had the operators understood what was happening. Coolant pumps were useless because a maintenance error had trapped them behind closed valves. Another valve jammed in the open position, allowing pressurised radioactive water at more than 1,000° C to shoot into the sump below the reactor, eventually exposing the reactor core itself and risking a complete and catastrophic meltdown.
The operators were baffled by the confusing instrumentation in the control room. One vital warning light was obscured by a paper repair tag hanging from a nearby switch. The control panel seemed to show the jammed-open valve had closed as normal – in fact, it merely indicated that the valve had been “told” to close, not that it had responded. Later, the supervisor asked an engineer to check a temperature reading that would have revealed the truth about the jammed valve, but the engineer looked at the wrong gauge and mistakenly announced that all was well.
All these errors were understandable given the context. More than 100 alarms were filling the control room with an unholy noise. The control panels were baffling: they displayed almost 750 lights, each with letter codes, some near the relevant flip switch and some far. Red lights indicated open valves or active equipment; green indicated closed valves or inactive equipment. But since some of the lights were typically green and others were normally red, it was impossible even for highly trained operators to scan the winking mass of lights and immediately spot trouble.
I asked Philippe Jamet, the head of nuclear installation safety at the International Atomic Energy Agency, what Three Mile Island taught us. “When you look at the way the accident happened, the people who were operating the plant, they were absolutely, completely lost,” he replied.
Jamet says that since Three Mile Island, much attention has been lavished on the problem of telling the operators what they need to know in a format they can understand. The aim is to ensure that never again will operators have to try to control a misfiring reactor core against the sound of a hundred alarms and in the face of a thousand tiny winking indicator lights.
Steve Mitchelhill and traders in Brazil
At Hinkley Point B, next to the main plant, is a low-rise office building of an inoffensive style that has adorned countless nondescript business parks. At the heart of that building is the simulator: a near-perfect replica of Hinkley Point B’s control room. The simulator has a 1970s feel, with large sturdy metal consoles and chunky Bakelite switches. Modern flat-screen monitors have been added, just as in the real control room, to provide additional computer-moderated information about the reactor. Behind the scenes, a powerful computer simulates the nuclear reactor and can be programmed to behave in any number of inconvenient ways.
“There have been vast improvements over the years,” explained Steve Mitchelhill, the simulator instructor who showed me around. “Some of it looks cosmetic, but it isn’t. It’s about reducing human factors.”
“Human factors”, of course, means mistakes by the plant’s operator. And Mitchelhill goes out of his way to indicate a deceptively simple innovation introduced in the mid-1990s: coloured overlays designed to help operators understand, in a moment of panic or of inattention, which switches and indicators are related to each other.
The lesson for financial regulators might seem obscure. But at key points during the crisis, they were as “lost” as the operators of Three Mile Island. For example, as Lehman Brothers teetered on the brink of insolvency, all eyes were on the doomed bank. Tim Geithner, the man responsible for supervising Wall Street’s banks, had a meeting at the request of Robert Willumstad, the chief executive of AIG. As an insurance company, AIG was regulated by the US Treasury and by state regulators, so it was far from obvious why Willumstad was Geithner’s problem. Geithner was exhausted after an overnight flight and distracted by what appeared to be the overwhelmingly important question: what to do about Lehman Brothers. According to journalist Andrew Ross Sorkin, Geithner kept Willumstad waiting because he was on the phone to Dick Fuld of Lehman Brothers, and fidgeted throughout the meeting because he didn’t really understand why Willumstad wanted to see him. Willumstad, eager to get some support from the Federal Reserve but anxious not to panic Geithner, handed him a briefing note summarising how exposed banks were to a potential failure at AIG. When he left, Geithner filed the note with barely a glance and went back to the problem of Lehman Brothers. Five days later the government realised that AIG was about to wreck the financial system and gave it a vast injection of capital.
“We always blame the operator – ‘pilot error’”, says Charles Perrow, the Yale sociologist. But like a power plant operator staring at the wrong winking light, Geithner had the wrong focus not because he was a fool, but because he was being supplied with information that was confusing and inadequate.
Some economists and regulators have, belatedly, started to focus on this issue of information design. The Dodd-Frank reform act, signed by President Obama in July, establishes a new Office for Financial Research that seems likely to try to draw up a “heat map” of stresses in the financial system. Andrew Haldane, director for financial stability at the Bank of England, looks forward to the day when regulators will have such a map. He argues that the same technologies now used to check the health of an electricity grid could be applied to a financial network map, highlighting critical connections, overstressed nodes and unexpected interactions. “We’re a million miles away from that at the moment,” he readily admits.
Such a real-time map would certainly help make sense of the new “Basel III” capital requirements for banks. These measures, agreed last September, made provision for additional “loss-absorbing capacity” for “systemically important banks”. Well and good, but right now the definition of a systemically important bank is much the same as the definition of pornography: we know it when we see it. That is unlikely to be much help.
. . .
Perhaps the most profound and worrying parallel between preventing industrial catastrophes and financial ones emerges from Perrow’s pessimistic theory of “normal accidents”. For him, any sufficiently complex, tightly coupled system will fail sooner or later. The answers are to simplify the system, decouple it, or reduce the consequences of failure.
What might decoupling the banking system mean? Consider the slightly obsessive pastime of domino-toppling. One of the first domino-toppling record attempts – 8,000 stones – came to a premature and farcical end because a pen dropped out of the pocket of the television cameraman who had come to film the occasion. Other record attempts have been disrupted by moths and grasshoppers. It’s the quintessential tightly coupled system.
Professional domino-topplers now use safety gates, removed at the last moment, to ensure that when accidents happen they are contained. In 2005, a hundred volunteers had spent two months setting up 4,155,476 stones in a Dutch exhibition hall when a sparrow flew in and knocked one over. Because of safety gates, only 23,000 dominoes fell. It could have been much worse. (Though not for the hapless sparrow, which an enraged domino enthusiast shot with an air rifle.)
Given the propensity of finance to suffer frequent meltdowns, Perrow’s normal accident theory almost certainly describes the banking system. The financial system will never eliminate its sparrows (perhaps black swans would be a more appropriate bird) so it needs the equivalent of those safety gates. Rather than making a particular bank less likely to fail, it might be safer to focus on ensuring that one falling bank doesn’t topple other companies.
But few financial commentators have considered the implications of that. One notable exception is John Kay, a British economist and FT columnist, who argues for a system of “narrow banking” which, he asserts, would lead to “a far more robust industry structure, with simpler institutions, less interconnectedness, and greater diversity of industry structure”. Another is Laurence Kotlikoff, an economist at Boston university, who has a proposal for “limited purpose banking”. Both Kay and Kotlikoff have taken the view that it is worth pursuing a simpler and less tightly coupled financial system for its own sake – in sharp contrast to the prevailing regulatory approach, which unwittingly encouraged banks to become larger and more complicated, and actively encouraged off-balance sheet financial engineering. I do not know whether Kay or Kotlikoff have the right answer. Normal accident theory suggests that they are certainly asking the right question.
Also published at ft.com.