Comparison of Electronic Control Unit Microcontroller Safety Architectures - OpenECU

Introduction

With the increase in vehicle electrification and autonomy, electronic control units are now subject to rigorous functional safety standards such as ISO 26262. Functional safety is defined by ISO 26262 as “the absence of unreasonable risk due to hazards caused by malfunctioning behavior of electrical and electronic systems[1].”

Malfunctions are broadly grouped into two categories: random faults and systematic faults.

Random faults occur unpredictably during lifetime of a hardware part and follow probability distributions[2]. Examples include component wear-out and bit flips in silicon devices due to thermal noise or energetic particles. Random faults can be further classified into transient faults (faults which resolve after a short time, such as a bit flip) or permanent faults which remain once present (such as a transistor damaged by a random voltage transient).
Systematic faults are faults that are “manifested in a deterministic way that can only be prevented by applying process or design measures[3].” Examples include bad material batches, software bugs, and incorrect requirements. Systematic faults can also be introduced from software development tools; a compiler could produce anomalous code, or a test tool may generate incorrect test vectors or results.

The architecture of an electronic control unit establishes the likelihood for and ease of management of these two types of faults. Modern electronic control units can take advantage of several microcontroller architectures with different approaches to managing random and systematic faults. The main goals of any microcontroller architecture with respect to functional safety are:

Manage random hardware failure in the microcontrollers
Manage systematic failure in the hardware design
Assist in managing systematic failure in the associated software

ECU Functional Safety Architectures

The two dominant controller architectures for electronic control units are:

Multiple microcontrollers in separate packages
A single microcontroller package

Multiple Microcontrollers in Separate Packages

Multiple-package architecture with diverse redundancy architecture utilizes two separate, physical microcontrollers working in concert to provide functionality and functional safety. The separate packages allow the selection of two different microcontrollers to provide diverse redundancy through different design and manufacturing.

This architecture can also support fail-operational behavior since a failure of one package may not prevent the other package from providing some degraded function. Since the microcontrollers are separate they can be powered by two independent power supplies to provide additional redundancy. The tradeoff is higher electronics complexity, typically larger footprint, and typically higher unit cost.

A Single Microcontroller Package

The single-package architecture is made possible by recent automotive microcontrollers that provide features such as lockstep cores (two redundant CPUs running the same code at the same clock rate with one delayed some clock phases with their results compared for consistency) and multiple cores (often with diverse design) in the same package.

This provides protection for many random hardware failure modes in a smaller hardware footprint which often has part cost benefits over multiple-package designs. Single-package architectures also generally require an external watchdog; increasingly these are being included in “smart” power supplies so there is no additional part count. However, the single package carries higher risk of dependent failures within the single package and is more limited in fail-operational use cases.

Comparison

The following section compares these two architectures in more detail

Robustness to Random Hardware Failures

Both diverse redundancy and modern lockstep-core + multiple core microcontrollers provide excellent protection against random hardware failures, both transient and permanent. Lockstep cores provide this checking with very little software overhead, while the diverse redundancy approach requires application software and communications protocols which can be a significant development effort.

Robustness to Systematic and Dependent Failures

Dependent failures are any failures that are not statistically independent. These are typically systematic faults even though they may have a probabilistic occurrence.

Hardware

Diverse redundancy is still the standard for minimizing dependent failures; two microcontrollers in separate packages with different design and manufacturers have very few opportunities for dependent failure. Independent packages are also less susceptible to the increasing threat of cybersecurity side-channel vulnerabilities such as timing-based attacks or privilege escalation.

Single-package architectures have greater opportunity for dependent failures due to manufacturing, common design of duplicated sub-components (for example, if lockstep cores are identical), and physical proximity.

Software

Lockstep cores cannot protect against systematic software fault concerns since the same software is run on both cores; Put another way: lockstep cores only cover hardware failure modes. This means that the processes for developing software on a lockstep core must still be performed to the highest applicable ASIL.

Most microcontrollers targeted for functional safety applications include additional CPU cores which are not lock-stepped in the same package to provide co-processing / checker capability. These cores are often of a different implementation architecture from the lockstep core, but care must be taken to ensure that there are no common sources of failure such as a common compiler.
For both single- and multi-package architectures that use co-processors or checker cores, the development of the software for each core should be performed by separate teams. Independent implementation, verification, and development tools such as compilers minimize the possibility for systematic errors.

Coexistence of Elements and ASIL Decomposition

Microcontrollers for both multiple- and single-package architectures are available with hardware-based memory protection units which can support partitioning of software.

A single-package design relying on memory protection hardware features requires the portions of software responsible for managing that memory protection to be developed to the highest ASIL.

Dual microcontroller designs, however, allow full decomposition of all software. For instance, it is possible to decompose all software two-micro system to ASIL B(D)+B(D), including for the operating system, because the two microcontrollers can provide redundant, independent coverage of the system safety functions.

Confidence in the use of software tools

Although the compilers for both microcontrollers in a separate-package architecture must be subject to software tool evaluation, the use of different compilers eliminates a source of systematic error since different compilers for different microarchitectures will not be subject to the same failure modes.
Since the software running on each core can be developed to a decomposed ASIL, savings can be realized in the tool qualification efforts to that decomposed ASIL.

Summary

Both the diverse-redundant multiple-package design and the single-package multiple-core designs can satisfy the functional safety demands of modern control units.

Single-package architectures provide a simpler approach to managing random hardware failures, they require more analysis and development process diligence to manage systematic failures. The lower piece price for designs using single-package architectures makes this attractive for high-volume products even with the potential for higher development cost.
Multiple-package architectures require more board-space cost but are more robust against dependent failures and can require less analysis due to more efficient ASIL Decomposition. The ability to decompose safety requirements and simpler analysis for dependent failures means development costs with multiple-package designs may be lower than single-package designs, offset by higher piece price; this architecture may be more suitable for low-volume projects.

	Multiple-Package Architecture	Single-Package Architecture
Dependent Failure Robustness	Higher	Lower
Part Cost / Footprint	Larger	Smaller
Safety Analysis Effort	Lower	Higher
Software Implementation Effort	Higher	Lower
Tool qualification efforts	Lower per tool, but more tools	Higher per tool, fewer tools

The OpenECU M560 and M580 vehicle controllers from Dana incorporate the Multiple-Package, diverse redundancy architecture, making them a compelling choice for low to medium volume projects with aggressive development schedules.

[1] ISO 26262-1:2018 Clause 3.67

[2] Adapted from ISO 26262-1:2018 Clause 3.118

[3] Adapted from ISO 26262-1:2018 Clause 3.165