How do you manage risk when you do not understand it?

Paul Boughton
Everybody today talks about risk, but do we really understand it? And what level of risk is acceptable? To find out, Jon Severn discussed risk management with James Catmur, a director and the global head of the risk practice at Arthur D Little, and co-author of System Safety: HAZOP and Software HAZOP.

Given that it is unfeasible to make anything 100 per cent safe, residual risks posed by products and systems must be acceptably low. But what is an acceptable level of risk, and how do you manage the risks? James Catmur says that the first step in risk management is to recognise that you do not understand risk (Fig. 1).

"Almost nobody understands risk, as it is a fairly difficult concept. We are also all poor at estimating risk day-to-day and in our work. I have worked in the field of risk assessment and risk management for over 20 years and I know I still get risk estimation wrong, both in day-to-day life and professionally. The only advantage I have is that I often spot myself doing it and can tell myself why I did it."

There are several reasons why people do not understand risk and either over- or underestimate risk levels, as Catmur explains: "Risk judgement is often based on 'gut feeling' so that any formal risk assessment process usually ends up justifying the gut feeling rather than being totally objective. Also the 'example rule' says that when people estimate risk they will overestimate the risk of events that they have witnessed and underestimate risks they have not witnessed (so engineers in a company with a good safety record will tend to underestimate risk while experienced risk experts will err the other way). There is also the 'anchoring' effect: for example, if you think about something totally unrelated that involves low numbers (giving a low 'anchor point') when you try to estimate something you will subconsciously pitch low."

Other reasons behind poor estimates for risk include the 'rule of typical things' that leads to misjudgements of the likelihood of events that people think are more plausible, and the 'Good-bad rule' that says people rank good things as low risk but bad things as high risk. Catmur also warns us to watch out for 'black swans': "If you were to plot the distribution for the colour of swans you would find a Gaussian distribution of the different shades of white; and you would conclude that black swans would never occur - yet we know that black swans do exist. In the world of risk management, black swans are those events that nobody could have predicted, though when they do occur they can be catastrophic."

Another point worth highlighting is the confusion that exists between hazards and risks. "People often use the term 'risk' when they talk about hazards (such as moving machinery)," says Catmur. "When you create more operating modes or features that have additional hazards associated with them, you may need to add control measures to reduce the existing risks. As a result, the overall risk can be reduced and the situation made safer, even though the total number of hazards has increased" (Fig. 2).

Where to start

Leaving aside the semantics, peoples' inability to understand risk leads Catmur to caution against starting a project by estimating the risks: "If people go straight into risk estimation they often get it wrong, which has implications for the design of the product or system. It is better to just focus on the outcomes and bear the following questions in mind as you work on the design: What could go wrong? What are the possible consequences? Are you building in enough barriers for the higher-consequence hazards? Will those barriers last throughout the life of the product/system? Will those barriers be effective however the product/system is used or abused?

"Only after addressing those points should you ask whether the risk is being controlled."

Returning to the question of how to establish the acceptable level of risk, Catmur states: "You almost certainly cannot define a totally clear acceptable level of risk. This will depend on factors such as the industry, the skill of the operatives and even the culture of the nationals concerned. Deciding what level of risk is acceptable can involve a huge debate, so people often just prefer to avoid the issue or fall back on standards or legal definitions. In some cases standards and/or legal definitions are a good route forward but I always suggest that a proper debate is had about what the acceptable level of risk is, as this can help give some clarity - even if in the end the answer is not a firm one.

"For multinational companies there can also be a moral angle: do you impose equally high standards in your plants around the world, even if there is no regulatory requirement to do so? Within some companies this may be seen as an issue of corporate social responsibility, and there could equally be concerns about how investors and customers might react if health and safety management was perceived to be poor in factories located in low-wage economies."

Safety bubbles

Even if a company thinks it is managing risk effectively, it is easy to be lulled into accepting a level of risk that is actually higher than it is thought to be or rising as risk controls cease to be effective. "Arthur D. Little has developed the concept of 'safety bubbles' to describe what can happen," says Catmur. "These have striking parallels with the 'economic bubbles' found in financial markets.

"A safety bubble is a slowly growing set of conditions, fostered by certain behaviours and assumptions in the organisation. When the safety bubble bursts it can result in catastrophic disruption to the business and even fatalities. Typically a safety bubble will develop as a company attempts to cut costs, such as by using cheaper materials, fewer safety devices, longer maintenance intervals and so on. If the first round of cuts is successful, with improved margins and no significant incidents, then the company may seek to make further cuts. However, if operational cost cutting proceeds to such an extent that there is no longer adequate contingency based on objective risk pricing, the result is an unsustainable condition in which inherent safety margins are progressively eroded. Hence the safety bubble develops. Importantly, as the safety bubble grows, there is an atmosphere of self-congratulation and investor plaudits, as the company appears to be performing well.

When it comes to managing risk as part of the design process, various tools can help to remove some of the subjectivity. FMEA (Failure Modes and Effects Analysis) and HAZOP (Hazard and Operability Analysis) are two well known examples, but Catmur has reservations about these, especially with the way they are being performed in today's cost-conscious businesses: "Done well, these can be very useful but, if they are not done properly, they can be very bad. Ideally each should be undertaken using a team consisting of the right people; today, however, they are sometimes done by one person who then circulates the analysis for review. FMEAs and HAZOPs are, to be honest, boring, so it is easy to miss things. Really you need proper brainstorming sessions, led by the people who have in-depth knowledge of the design. Even then, it is not guaranteed that all possible failure modes will be identified. Cost-cutting also means that, for example, HAZOPs only analyse the nodes that are thought to be critical, but this approach is unsound and can easily result in no consideration being given to potentially critical failures and consequences (Fig. 3).

"Nevertheless, it would be unrealistic to expect every company to undertake complete, rigorous FMEAs and HAZOPs, as the costs would be prohibitive. The answer, therefore, is to use a mix of methods with which people are comfortable. For example, start by identifying all of the potential failure modes and consequences, then filter these so that you only pursue the most critical (based on consequence, not risk at the early stages). And be prepared to think about the many different ways that a product or system can be abused, and what the consequences could be; just because you think a person would have to be very stupid to do something, that does not mean that it will never happen."

Another trap to be aware of is to let the safety team drive the safety analyses, either from the front end (with an over-estimate of the risks) or afterwards in an attempt to back-justify a design by 'bolting on' safety. "Safety should be considered as an integral part of the design," says Catmur. "Engineers should always keep one eye on the potential consequences of what they are designing, and call in the safety experts when they come up against a problem so that they get safety advice in their design decisions."

While products are under warranty, manufacturers have a good opportunity to monitor failures and use this as feedback to check the validity of their risk assessments.

However, depending on the product, consumers today might simply discard the faulty item and purchase a replacement from another manufacturer. Furthermore, the feedback loop is lost once the product is out of warranty, and this valuable source of information simply does not exist for competitor products.

Despite all the foregoing, Catmur concludes: "Do not use safety as a reason to do nothing, and certainly do not let it stifle innovation. To sum up, I would offer these five tips:

- Have a rough idea of where you are heading by thinking in terms of consequences and what level of risk might be acceptable;

- Appreciate that you do not have a very good understanding of risk (and in-house experts and external consultants are likely to be not that much better);

- Avoid being too focused on risk estimates, as doing so can lead to a false sense of security;

- Continue to manage risk, even after the product is launched or the system is commissioned; and

- If you build in safety from the outset you will achieve a better result than any safety expert can by 'bolting on' safety afterwards."

 

Recent Issues