Functional Safety Design

Louise Smyth

Design engineers tend to focus on getting the green light to go on in simulation. However, when designing for markets that require functional safety certification, it’s important to pay attention to the bigger picture.

In these situations, the design task is more complicated than usual. Marketing and engineering teams are not only under pressure to create a compelling product but must also consider all applicable safety standards to ensure the product is fit for purpose. These can be specific to the market, the type of product, and the environment in which it will be used. For those who are not familiar with these challenges, trusted third-party engineering firms can support you on your journey.

The “art” in functional safety design is to achieve the required certifications with the minimum practicable investment. The independent assessor needs to see evidence that the design team has followed the functional safety standard with respect to systematic capability and random hardware failure mitigation based on a credible hazard and risk analysis. For the team, this can feel like taking an exam but with the ability to discuss answers with the instructor and implement their feedback to achieve an acceptable solution. To pass this assessment for certification, teams need to be confident that the chosen design path will be successful. To help with this, Xilinx has invested in advanced validation methods that give users a high level of confidence that the devices used will always behave as intended. These have helped many customers complete this process successfully.

Confidence With Evidence is Key

In the functional safety market such confidence is critical. Setting the correct expectations can make the difference between getting a product to market and requiring a substantial redesign, potentially requiring different components that can take time to learn how to use effectively. So, building confidence with evidence is key.

The Zynq Ultrascale+ MPSoC was the first such device designed by Xilinx with functional safety in mind. A combination of enhancements to engineering processes, testing the device diagnostic capabilities and building-in random hardware fault tolerance makes it possible to provide robust evidence about capabilities that is acceptable to internal and external assessment teams.

It’s just as important to know what the components are capable of and what they are not. Because of the nature of FPGAs, hardening and testing the Configuration RAM (CRAM) – which is the SRAM that stores the implementation of the design – requires special consideration. External certification authorities always need reliable assurances about this aspect when evaluating designs that contain FPGAs. The systems and procedures put in place to handle this make it possible to provide both the evidence and the education authorities need to properly understand how the devices behave in functional safety systems.

The techniques used at Xilinx to build SRAMs employ proven internal methods. In addition, industry-standard JEDEC test methods are used to assess reliability. Long-term testing (Rosetta), as well as neutron testing using the facilities at Los Alamos Science Center (LANCE), proton testing at Crocker Nuclear Laboratory, thermal neutron testing at McClellan Nuclear Research Center and Alpha particle testing using 232Thorium foil source 1 are also applied.

To get any product certified for functional safety, the systematic capability and random hardware failures in time (FIT) must be documented. Component suppliers should be able to help customers towards their goal by providing a certificate that attests to the systematic capability used in designing the components and the systematic capability of any firmware, such as the firmware running in Xilinx MPSoC products. The company has also certified its tool chain up to SIL 4 for IEC 61508 for the industrial market and ASLI-D for ISO 26262 for the automotive market. This lets customers ensure that the implementation of their design meets safety standards with respect to the tool chain used.

In addition, accelerated high-temperature operating life (HTOL) tests are used to address permanent hardware error rates.

Moreover, diagnostics built-in at the silicon level, or made available as soft IP for programmable logic circuitry, help with mitigation for random hardware failures. In addition, the Xilinx Vivado tool chain and compilers are certified up to the highest levels of functional safety to help with systematic capability. Guidance documents that are specific to functional safety, which have been created intentionally to help customers win the desired certifications, are also available.

Functional Safety Design and Single-Chip Integration

Designers of functional safety equipment typically rely on system-level architecture to solve systematic and random hardware issues that cannot easily be met using a single device. Zynq-Ultrascale+ MPSoCs overcomes this by providing three distinct compute domains called the Low Power Domain, Full Power Domain and Programmable Logic Domain. Because each of these compute domains are independent and heterogeneous, they can be used to implement separate safety channels and are used to enhance the diagnostic capability of a design using redundancy on the same device. This enables a lower cost of implementation, using decomposition rules and redundancy, as well as increasing overall interconnect reliability compared against a two-chip design.

The two-channel design shown in the diagram uses two domains on the same device. Each domain leverages a lock-step processor enhancing its diagnostic capability needed to support a one out of two-architecture with diagnostics (1oo2D).

To enhance systematic capability, each domain would use a different design team that leverages different CPU architecture and different compilers. To increase the diagnostic capability, lock-step processors are used in each domain which is further enhanced by using reciprocal comparison by software, increasing the random hardware fault detection capability above 99%. Using two channels also lowers the standards-based quality metric for each channel separately. This accelerates time to market and provides extra confidence that the product can be successfully certified.

Cyber-Secure to be safe

As enterprise operational technology (OT) and IT domains become connected in the burgeoning Industrial Internet of Things (IIoT), the risk of a cyber-attack is an increasingly serious threat. In this context it is clear that systems cannot be truly functionally safe unless they are also secure. This is the position of the current functional safety working group driving the next revision of the IEC 61508 standard.

Whereas work to ensure functional safety is based on hazard and risk analysis, building-in cyber security is driven by threat analysis. This analysis is used to identify the “threat surfaces” (means of access) of the system. It is important to perform both the safety-oriented hazard and risk analysis and the security-oriented threat analysis at the same time, so that any interactions can be understood. Performing a security assessment in isolation may have serious consequences for safety, such as a CPU lock out, resource restrictions, or other security responses that could potentially compromise safe operation of the system.

Consistent with successful approaches to functional safety, security is most effectively handled on a shared basis between component suppliers and equipment developers. Accordingly, Xilinx has moved to take responsibility for the security aspects of the supply chain in addition to designing protection features into the silicon and the boot process, and also provides interfaces and guidance for the runtime interface, isolation design, and recommended design flows. This effectively gives product developers freedom to focus their resources on designing-in protection at the application level.

Both functional safety and security are key issues in today’s connected industrial infrastructures. They are best addressed together through a collection of implementations encompassing software, firmware, and hardware that allow equipment developers to pick and choose an optimal combination of methods according to individual needs and expertise.

Paul Levy is a functional safety architect with Xilinx

Recent Issues