AI in Safety-Critical Systems: What Changes

When we talk about artificial intelligence in software today, the conversation often drifts toward Large Language Models and generative capabilities. While these advancements are impressive, they represent only a fraction of how AI is being integrated into the world’s infrastructure. The real engineering challenge—and the domain where AI’s impact is most profound and scrutinized—is in safety-critical systems. These are the environments where a software failure isn’t merely an inconvenience or a financial loss; it can result in loss of life, environmental disaster, or catastrophic infrastructure collapse.

Think of autonomous vehicles navigating dense urban traffic, fly-by-wire flight control systems in commercial airliners, or automated braking systems in modern automobiles. In traditional software engineering, we operate under the assumption of determinism: given a specific input, the code will execute the exact same path and produce the exact same output every time. This predictability is the bedrock of verification and validation. However, when we introduce AI, particularly deep learning models, we fundamentally alter the engineering landscape. We move from a world of explicit logic to a world of statistical inference.

The Shift from Deterministic Logic to Probabilistic Inference

In classical safety-critical engineering, standards like DO-178C (for avionics) or ISO 26262 (for automotive) dictate a rigorous process. These standards rely on requirements traceability, code coverage, and static analysis. You write requirements, you write code that satisfies those requirements, and you prove through testing and analysis that the code meets them. The logic is transparent; if a fault occurs, you can trace it back to a specific line of code or a missed requirement.

AI disrupts this lineage. A neural network is not “written” in the traditional sense; it is trained. Its behavior is determined by the architecture, the training data, and the optimization process. Consequently, the system’s internal logic is often opaque, even to its creators. This is the famous “black box” problem. In a safety-critical context, this opacity is not just an intellectual curiosity; it is a fundamental engineering hurdle.

Consider a convolutional neural network (CNN) tasked with identifying pedestrians for an autonomous emergency braking system. In a traditional rule-based system, an engineer might define specific geometric shapes or pixel intensity thresholds. With a CNN, the model learns features automatically from millions of images. While this yields superior performance in varied conditions, it lacks the explicit explainability of a rule set. When the system encounters a scenario outside its training distribution—an occluded pedestrian in unusual lighting—its failure mode is not a predictable logic error. It is a statistical hallucination.

“The transition to AI in safety-critical systems is less about replacing the engineer and more about changing the engineer’s role from a writer of explicit instructions to a curator of learning processes and a verifier of emergent behavior.”

Defining the Operational Design Domain (ODD)

To manage this unpredictability, engineers have introduced the concept of the Operational Design Domain (ODD). The ODD specifies the conditions under which the system is designed to operate safely. This includes geographic boundaries, weather conditions, time of day, and types of roadways.

When AI is deployed, the ODD becomes the primary safety boundary. Unlike deterministic code, which might execute regardless of context (leading to undefined behavior if inputs are invalid), an AI-driven system must be able to recognize when it is outside its ODD and hand over control or execute a safe stop state.

The engineering challenge here is defining the ODD tightly enough to ensure safety without making the system so restrictive that it loses utility. For example, an autonomous shuttle operating in a geofenced city center has a narrow ODD, making verification easier. An autonomous truck intended for cross-country haulage has a massive ODD, requiring exponentially more diverse training data and robust validation strategies.

Verification and Validation (V&V) in a Non-Deterministic World

Traditional V&V relies heavily on unit testing and integration testing. You feed the system known inputs and check for known outputs. With AI, this approach is insufficient because the input space is effectively infinite. You cannot test every possible variation of pixel data in a video feed or every possible LiDAR point cloud configuration.

Therefore, the industry is shifting toward statistical validation. Instead of proving correctness for every input, we aim to prove with high confidence that the system performs safely across a distribution of inputs. This involves:

Massive Dataset Curation: Curating training and test sets that represent the long tail of edge cases. This is computationally and logistically expensive.
Simulation: Using high-fidelity simulators to generate synthetic data. Engineers can run millions of virtual miles in scenarios that are too dangerous or rare to reproduce physically.
Fuzzing: Intentionally injecting noise and corrupted data to test the robustness of the model.

However, a lingering issue is “distributional shift.” A model trained on data from sunny California may fail catastrophically when deployed in snowy Michigan because the statistical distribution of the input data has changed. Engineering for safety requires not just training on diverse data, but building mechanisms to detect this shift in real-time.

The Role of Formal Methods

While deep learning is inherently statistical, there is a growing intersection with formal methods—the mathematical specification and verification of software. Formal methods are traditionally used for critical algorithms (e.g., cryptographic implementations or flight control logic).

Applying formal methods to neural networks is an active area of research. Techniques like Abstract Interpretation allow engineers to compute over-approximations of a neural network’s outputs. Instead of testing a single input, these methods can verify properties like: “For any input within this range of noise, the network output will remain within this safe bound.”

Tools like Neural Network Verification (NNV) are beginning to make this accessible. While we cannot yet verify massive LLMs this way, for smaller, specialized networks used in control systems (e.g., a PID controller tuned by a neural net), formal verification provides a mathematical guarantee of stability that statistical testing alone cannot offer.

Hardware Constraints and Edge Deployment

AI in safety-critical systems is rarely cloud-based due to latency and reliability requirements. The processing happens at the “edge”—on the vehicle, the aircraft, or the industrial robot. This imposes severe constraints on hardware.

Training a model might require gigawatts of power on a cluster of GPUs, but inference must happen on low-power embedded hardware. Engineers must optimize models for latency and determinism.

Consider the NVIDIA Drive platform or the Tesla FSD computer. These are System-on-Chips (SoCs) designed specifically for AI inference in vehicles. They utilize specialized accelerators (like Tensor Cores) to perform matrix multiplications efficiently.

However, there is a tension here. GPUs and NPUs are notoriously non-deterministic. Their performance varies based on thermal throttling, memory contention, and scheduling. In a safety-critical control loop, timing is everything. If a control signal is expected every 10 milliseconds, but the AI inference takes 12ms due to a spike in memory bus contention, the system is in a hazardous state.

Engineers address this through:

Partitioning: Separating safety-critical tasks (e.g., braking) from non-critical tasks (e.g., infotainment) on different cores or microcontrollers.
Real-Time Operating Systems (RTOS): Using RTOS like QNX or VxWorks to guarantee task scheduling priorities.
Quantization and Pruning: Reducing the precision of the neural network weights (e.g., from 32-bit floating point to 8-bit integers) to speed up inference and reduce memory footprint, often with minimal accuracy loss.

Thermal and Power Management

AI chips generate significant heat. In an autonomous vehicle, the cooling system must be robust enough to maintain peak performance in desert heat. If the system throttles due to overheating, the AI’s inference capability drops, potentially leading to a safety failure. This requires co-designing the AI algorithms with the thermal and electrical engineering of the hardware.

Redundancy and Fail-Operational Architectures

In traditional avionics, redundancy is achieved through duplication. If one flight computer fails, a second identical computer takes over. This works because the computers are deterministic; if they receive the same input, they produce the same output.

With AI, redundancy is more complex. If you run two identical neural networks on two separate chips, and both were trained on the same biased data, they are likely to fail in the exact same way. This is common-mode failure.

To mitigate this, safety engineers employ diverse redundancy:

Algorithmic Diversity: Using two different model architectures (e.g., a CNN and a Vision Transformer) trained on different datasets by different teams.
Sensor Diversity: Fusing data from cameras, radar, and LiDAR. If the camera-based AI fails due to glare, the radar-based AI should still detect the obstacle.
Classical Fallbacks: Maintaining a “dumb” but highly reliable deterministic system that can take over if the AI system exhibits uncertainty.

The transition from a “fail-safe” state (where the system shuts down to prevent harm) to a “fail-operational” state (where the system continues to function safely despite a fault) is a key driver for AI adoption. However, achieving fail-operational status with non-deterministic components is one of the hardest engineering problems in the field.

Data Engineering: The Foundation of Safety

In traditional software, the code is the asset. In AI systems, the data is the asset. The safety of the system is inextricably linked to the quality, diversity, and labeling of the training data.

Consider the problem of edge cases. In a dataset of millions of driving images, 99.9% might be straightforward highway scenes. The remaining 0.1%—covering scenarios like construction zones, erratic drivers, or debris on the road—are where safety is determined. If the training data lacks these examples, the model will be confident but wrong when they occur.

Engineers use techniques like active learning to address this. The deployed fleet identifies scenarios where the model has low confidence or high uncertainty. These “interesting” cases are flagged, uploaded to the cloud, labeled by humans (or automated labeling systems), and added to the training set. This creates a continuous loop of improvement.

However, this introduces a feedback delay. There is a gap between discovering a dangerous scenario and deploying the updated model to the fleet. During this gap, the vehicles are vulnerable. Engineers must implement temporary safety patches—often rule-based heuristics—until the AI model can be retrained and validated.

Handling Sensor Noise and Failure

AI models are sensitive to input noise. Adversarial attacks, where imperceptible changes to an image cause a model to misclassify it, are a known vulnerability. In safety-critical systems, this isn’t just about malicious actors; it’s about natural degradation.

Lens flare, rain droplets on a camera, or mud covering a LiDAR sensor can mimic adversarial noise. Engineers cannot simply rely on the AI to “see through” this. The system must include robust sensor fusion algorithms that weigh inputs based on their reliability.

For example, if the camera input is corrupted by glare, the system should down-weight the camera data and rely more heavily on radar, which is immune to visual obstructions. This requires a probabilistic framework, such as a Kalman Filter or a Bayesian Network, to sit on top of the raw AI outputs.

Regulatory Compliance and Standards

Regulators (FAA, NHTSA, ISO) are playing catch-up with technology. Existing standards like ISO 26262 were written for deterministic software and hardware. They are ill-suited for AI.

Recognizing this, the ISO released ISO 21448 (SOTIF) — Safety of the Intended Functionality. Unlike ISO 26262, which deals with malfunctions (things going wrong), SOTIF deals with the intended functionality (things going right but still causing harm due to algorithmic limitations).

SOTIF requires engineers to:

Identify triggering events (situations that could lead to hazardous behavior).
Validate that the system handles these situations safely.
Reduce the “unknown-unsafe” space through testing and validation.

Furthermore, the UL 4600 standard focuses specifically on the safety of the evaluation process for autonomous products. It doesn’t prescribe specific tests but requires a rigorous safety case argument. This argument must explain how the system handles corner cases, how it interacts with humans, and how it fails safely.

For engineers, this means documentation is as critical as code. Every dataset selection, every hyperparameter tuning decision, and every validation result must be traceable and justifiable.

The Human-in-the-Loop: Interaction Design

When AI fails, or when it reaches the limits of its ODD, it must hand over control to a human. This transition is fraught with risk. It is the “handover problem.”

Research in human-machine interaction (HMI) shows that humans are poor at taking over control of a system they haven’t been monitoring. This is known as the out-of-the-loop performance effect. If an autonomous vehicle drives perfectly for hours, the driver becomes disengaged. When the system suddenly requests help, the driver’s reaction time is slow, and situational awareness is low.

Engineers are designing systems to mitigate this:

Gradual Escalation: Instead of an immediate emergency handover, the system first issues warnings, then requests confirmation, and only then initiates a safe stop if no response is received.
Driver Monitoring Systems (DMS): Using inward-facing cameras and eye-tracking to ensure the driver is attentive. If the driver is distracted, the system intervenes earlier or more aggressively.
Explainable AI (XAI): Providing the human operator with insights into why the AI is making a decision. For example, highlighting the specific object on a screen that triggered a braking event.

The goal is not just to make the AI safe, but to make the system safe, where the human is considered an integral part of the control loop.

Security: The Intersection of Safety and Cybersecurity

In connected systems, safety and cybersecurity are converging. An attacker who compromises an AI system can cause physical harm. This is the domain of Adversarial Machine Learning.

Consider a scenario where an attacker places a specific pattern of stickers on a stop sign. To a human, it looks like a stop sign. To a neural network, it might look like a speed limit sign. This is an adversarial example.

Securing AI models requires a defense-in-depth approach:

Adversarial Training: Including adversarial examples in the training data to make the model robust against known attacks.
Input Sanitization: Pre-processing sensor data to remove noise or patterns that resemble adversarial attacks.
Model Hardening: Techniques like defensive distillation or adding randomness to the inference process (though this conflicts with determinism).
Secure Boot and Encryption: Ensuring the model weights and the inference engine have not been tampered with.

Furthermore, the data pipeline itself is a security risk. If an attacker can inject poisoned data into the training set (data poisoning), they can create backdoors in the model that activate only under specific conditions. Verifying the integrity of the training data is as important as verifying the code.

Software Engineering Practices for AI Safety

How do we adapt software engineering practices like CI/CD (Continuous Integration/Continuous Deployment) for safety-critical AI?

Traditional CI/CD relies on fast, automated testing. However, training a deep learning model can take days or weeks. You cannot run a full training cycle for every code commit. Therefore, the pipeline is split:

Code CI: Testing the infrastructure, data processing scripts, and simulation code on every commit. This ensures the tools work.
Data CI: Validating data quality, checking for distribution shifts, and ensuring labeling consistency. This runs periodically or when new data is ingested.
Model CI: Triggering full training and evaluation pipelines. This is expensive and happens less frequently.

A critical concept here is the Golden Dataset. This is a curated dataset that represents the safety-critical scenarios. Before any model is approved for deployment, it must pass evaluation on this dataset. If the new model performs worse than the previous one on the Golden Dataset (regression), it is rejected.

Versioning is also more complex. In traditional software, you version control the code. In AI, you must version control the code, the data, the model architecture, and the hyperparameters. Tools like DVC (Data Version Control) and MLflow are becoming essential infrastructure for tracking the lineage of a deployed model.

Case Study: The Tesla Approach vs. Waymo

To understand the engineering trade-offs, it is helpful to look at two different approaches to autonomous driving.

Waymo (and many traditional AV companies) relies on a geofenced approach. They operate in specific, well-mapped areas (like Phoenix or San Francisco). They use a combination of high-definition maps, LiDAR, radar, and cameras. The HD maps act as a strong prior; the car doesn’t just perceive the world, it knows exactly where it is in a pre-built 3D model. This reduces the cognitive load on the AI but limits scalability.

Tesla has taken a vision-only approach (mostly), relying on cameras and neural networks without HD maps or LiDAR. This is a bet on the scalability of AI. If a human can drive using eyes and a brain, a car should be able to do the same.

The engineering implications are stark:

Waymo’s Engineering: Focuses on high-precision localization and sensor fusion. The safety case relies on the redundancy of sensors and the accuracy of the maps. The ODD is strictly controlled.
Tesla’s Engineering: Focuses on massive data ingestion and neural network training. The safety case relies on the statistical performance of the AI across billions of miles of driving data. The ODD is global but the system is more prone to “unknown-unknowns.”

Neither approach is inherently “unsafe,” but they represent different engineering philosophies. Waymo prioritizes certainty and control (traditional engineering), while Tesla prioritizes scalability and adaptation (AI-native engineering). Both must adhere to the same safety standards, but they interpret them differently.

The Future: Foundation Models in Physical Systems

We are beginning to see the rise of Foundation Models (like GPT-4 or PaLM) moving into robotics and control systems. These are large models trained on vast amounts of data that can be adapted to specific tasks with minimal fine-tuning.

For example, a “Vision-Language-Action” model could take in a camera feed and a text command (“pick up the blue block”) and output motor controls. This is a paradigm shift from specialized, narrow AI to generalist AI.

However, the safety challenges here are even greater. Foundation models are prone to hallucinations. In a chatbot, a hallucination is a factual error. In a robot, a hallucination could be a physical collision.

Engineering these systems will require new layers of abstraction. We may see the emergence of “AI Safety Middleware”—a layer that sits between the large foundation model and the low-level actuators. This middleware would be responsible for:

Sanitizing the model’s outputs.
Converting high-level intents into safe, constrained trajectories.
Monitoring the model’s internal state for signs of uncertainty or confusion.

As we integrate AI deeper into the fabric of our physical world, the line between software engineering, control theory, and cognitive science blurs. The engineer of the future will need to be fluent in all three.

The tools we build today—the verification frameworks, the data pipelines, the diverse redundancy architectures—are laying the groundwork for this future. It is a slow, meticulous process. Unlike the rapid iteration of consumer software, safety-critical AI moves at the speed of trust. Every line of code, every data point, and every inference must be earned.

For those of us working in this space, the excitement comes not just from the capabilities of the technology, but from the rigor required to make it safe. It is a discipline that demands perfection, forgives no shortcuts, and ultimately, holds the key to unlocking the full potential of artificial intelligence.