Skip to main content
postmortem

The Mechanism

5 min read Chapter 6 of 38

The Mechanism

The failure is a type conversion. In the SRI alignment function, a 64-bit floating point value representing horizontal bias (BH) is converted to a 16-bit signed integer. The Ada code that performs this conversion:

-- RECONSTRUCTED FROM INQUIRY BOARD REPORT (Ariane 501 Inquiry Board, 1996)
-- Original Ada source reviewed by the Board

-- The alignment function computes horizontal bias
-- BH is a 64-bit floating point (Long_Float)
-- The result is stored in a 16-bit signed integer

procedure Compute_Horizontal_Bias is
   BH : Long_Float;           -- 64-bit IEEE 754 floating point
   BH_Integer : Integer_16;   -- 16-bit signed integer, range -32768 .. 32767
begin
   -- ... computation of BH from inertial measurements ...
   
   BH_Integer := Integer_16(BH);  
   -- FAILURE POINT: When BH > 32767.0, this raises Constraint_Error
   -- (called Operand_Error in the Ariane 4 Ada compiler)
   -- No exception handler exists for this conversion.
   -- The exception propagates and shuts down the SRI.
   
   -- ... use BH_Integer for alignment output ...
end Compute_Horizontal_Bias;

The Ada language specification requires that a conversion from a floating point type to an integer type raises Constraint_Error if the floating point value is outside the range of the target integer type. This is not a bug in the language. It is a safety feature. The Ada type system is designed to catch exactly this kind of error. The problem is what happens after the exception is raised.

The SRI software has seven variables that undergo type conversions from 64-bit float to 16-bit integer. Four of these seven conversions are protected by exception handlers that catch the overflow and substitute a safe default value. Three are unprotected, including the BH conversion. The Inquiry Board report documents the rationale: the developers analyzed the physical constraints of the Ariane 4 flight profile and determined that these three variables could never exceed the 16-bit range during any Ariane 4 flight. Protecting them with exception handlers was judged unnecessary.

The analysis was correct for the Ariane 4. The Ariane 5 has a fundamentally different flight profile.

The exact values at the moment of failure, as reconstructed by the Inquiry Board:

  • BH value at T+36.7s: approximately 32,768.0 (the exact boundary of 16-bit signed integer range)
  • Maximum BH value on Ariane 4 during alignment phase: approximately 12,000 (well within range)
  • 16-bit signed integer maximum: 32,767

The difference between the two rockets’ horizontal velocity profiles:

PropertyAriane 4Ariane 5
Liftoff thrust2,720 kN6,470 kN
Max horizontal velocity at T+37s~12,000 (BH units)~32,768 (BH units)
Alignment function needed after liftoffNoNo
Alignment function runs after liftoffYes (40s)Yes (40s, inherited)
BH overflow possible during flightNoYes

The alignment function was not needed after liftoff on either rocket. It runs for 40 seconds after launch as a holdover from the Ariane 4 design. On the Ariane 4, this was harmless. The function runs, its outputs are ignored, and the values stay in range. On the Ariane 5, the function still runs, its outputs are still ignored by the alignment system, but the conversion overflows and the exception kills the SRI.

The exception propagation follows a specific path:

  1. Constraint_Error is raised in the BH conversion
  2. No local handler exists. The exception propagates up the call stack.
  3. The Ada runtime’s default exception handler terminates the task.
  4. The SRI’s real-time operating system detects the task termination.
  5. The SRI enters its failure mode: it writes the contents of its internal registers to the output data bus as a diagnostic dump.
  6. The diagnostic dump is a sequence of bytes that happen to be valid as a data frame on the bus, but contain internal state values, not navigation measurements.
  7. The OBC reads this data frame and interprets the diagnostic bytes as an extreme horizontal velocity measurement.

The OBC’s misinterpretation of the diagnostic dump is the second mechanism failure. The data bus protocol between the SRI and OBC has no message type field, no validity flag, and no checksum that would distinguish a diagnostic dump from a navigation measurement. The OBC trusts that whatever appears on the bus is a valid navigation word. This was tested and verified under the assumption that the SRI would always produce valid navigation data. The failure mode where the SRI dumps internal state was not covered in integration testing.

The OBC receives what it interprets as a horizontal velocity of approximately 32,768 units per second. The actual horizontal velocity is approximately 0. The OBC’s flight control algorithm computes the correction needed to compensate for this apparent deviation and commands the nozzles to deflect fully. The nozzle actuators obey. The aerodynamic load at Mach 1+ exceeds the structural design limits. The vehicle breaks apart.

The redundancy architecture, two identical SRIs, provides zero protection because the failure is deterministic and synchronous. Both units run the same code, receive the same sensor inputs, compute the same BH value, overflow at the same instant, and shut down simultaneously. Redundancy protects against random hardware failures. It does not protect against systematic software faults. This distinction, now fundamental to safety engineering, was not clearly articulated in the Ariane 5 safety architecture.