The Ariane 5 Explosion: A 64-Bit Float, a 16-Bit Integer, and an Exception Handler That Said Nothing
The Ariane 5 Explosion: A 64-Bit Float, a 16-Bit Integer, and an Exception Handler That Said Nothing
The System as Its Engineers Understood It
The Ariane 5 is the European Space Agency’s heavy-lift launch vehicle, designed to carry payloads into geostationary orbit. Flight 501, the maiden flight, carries four Cluster scientific satellites valued at approximately $370 million. The launch takes place from the Guiana Space Centre in Kourou, French Guiana, on June 4, 1996.
The Inertial Reference System (SRI) is the component that measures the rocket’s attitude and velocity. There are two identical SRIs, designated SRI-1 and SRI-2, operating in an active/standby configuration. SRI-2 is the active unit. SRI-1 is the backup. If SRI-2 fails, the On-Board Computer (OBC) switches to SRI-1 automatically. This is redundancy. It is the standard approach for flight-critical avionics.
Both SRIs run identical software. This is a deliberate engineering decision. Running the same software on both units ensures that the backup behaves identically to the primary under all conditions. If the software produces different results on the two units, something is wrong with the hardware. Running different software would make the comparison meaningless. The reasoning is sound. It is also the reason the redundancy provides no protection against a software fault: a bug that crashes SRI-2 will crash SRI-1 in the same way, at the same time, for the same reason.
The SRI software was originally developed for the Ariane 4. The Ariane 4 flew successfully 113 times with this software. The software performed its function without failure for over a decade. When the Ariane 5 program began, the decision was made to reuse the SRI software from the Ariane 4 rather than develop new software. The reuse decision was reviewed and approved. The Ariane 4 SRI software was considered flight-proven.
The SRI software performs several functions. One of them is the alignment function, which calibrates the inertial platform before and during the early phase of flight. The alignment function converts a 64-bit floating point number, representing horizontal velocity (called BH, for Bias Horizontal), into a 16-bit signed integer. On the Ariane 4, the value of BH never exceeds the range of a 16-bit signed integer (approximately -32,768 to 32,767) because the Ariane 4’s horizontal velocity during the alignment phase is never large enough. The conversion has no range check. It does not need one. The physical behavior of the Ariane 4 guarantees that the value will fit.
The Ariane 5 has a different trajectory than the Ariane 4. It is a larger, more powerful rocket. Its horizontal velocity during the early phase of flight is substantially higher than the Ariane 4’s. The value of BH on the Ariane 5 exceeds 32,767 approximately 37 seconds after launch.
Nobody checked.
The alignment function continues to run for 40 seconds after launch, even though its output is only needed for pre-launch alignment. This is another Ariane 4 design decision preserved in the reuse. On the Ariane 4, continuing the alignment function after launch is harmless because the values remain in range. On the Ariane 5, it means the overflow occurs in flight.
The Chain
T-0. Ariane 5 Flight 501 lifts off from Kourou. Both SRI-1 and SRI-2 are operating normally. The alignment function is running on both units, computing BH and converting it to a 16-bit integer on every cycle.
T+30 seconds. The horizontal velocity of the Ariane 5 has exceeded the Ariane 4’s maximum horizontal velocity at this flight phase. The BH value is approaching the 16-bit integer boundary. The alignment function continues running because it is programmed to run for 40 seconds after liftoff.
T+36.7 seconds. The BH value in SRI-2 reaches approximately 32,768. The conversion from 64-bit float to 16-bit signed integer overflows. The Ada runtime raises an Operand Error exception. The exception handler for this conversion does not catch the exception. It was deliberately left unprotected, a design decision documented in the Ariane 4 software specification with the rationale that the physical behavior of the rocket would guarantee the value remained in range.
T+36.7 seconds (continued). The unhandled exception propagates. The SRI-2 software shuts down. The SRI-2 hardware, following its failure protocol, writes a diagnostic dump to its output bus. This diagnostic dump consists of the last valid data word, which is the internal state of the alignment function, not a navigation measurement. The On-Board Computer, which expects navigation data on this bus, receives the diagnostic dump and interprets it as flight data.
T+36.7 seconds (SRI-1). SRI-1, running identical software, encounters the identical overflow at the same time (the units are synchronized). SRI-1 shuts down in the same way. Both SRIs are now non-functional.
T+37 seconds. The OBC, having lost SRI-2 and failed over to SRI-1, receives the diagnostic dump from SRI-1 as well. The OBC interprets these diagnostic words as sudden, extreme deviation in the rocket’s horizontal velocity. It commands the nozzles of the solid rocket boosters and the main engine to compensate for what it believes is a massive trajectory error. The nozzle deflection exceeds the structural limit of the vehicle.
T+39 seconds. The aerodynamic forces produced by the nozzle deflection at supersonic speed tear the rocket apart. The onboard safety system detects the structural breakup and triggers self-destruction. The rocket, the four Cluster satellites, and $370 million in hardware are destroyed.
The entire sequence from overflow to destruction takes less than three seconds.
The diagram shows the propagation path from a single unhandled exception to vehicle destruction. Each node represents a component boundary where the failure could have been contained but was not. The SRI could have caught the exception. The SRI could have sent a failure signal instead of a diagnostic dump. The OBC could have validated the magnitude of the velocity change before commanding a nozzle correction. The flight control software could have limited nozzle deflection to structurally safe values. Four boundaries. Four missed containment opportunities.