MOVEdot

Building a Steer-by-Wire System · Chapter 2 · Part 2 of 6

6Gates failed of 11
Steer-by-Wire

The fault-injection campaign

Girish Radhakrishnan
Girish RadhakrishnanJune 29, 2026 · 7 min read
Steer-by-WireFault InjectionSeries

Chapter 1 proved the actuator under nominal conditions. Chapter 2 asks what happens when things go wrong: 384 fault-injection runs, eleven acceptance metrics, and six failures clustered in three localised mechanisms.

Chapter 1 established the baseline: 27 characterisation runs, five sources per run, one harmonisation pipeline, and a spec check that surfaced five metric failures at temperature and voltage extremes. The system worked, but only under nominal conditions.

Chapter 2 asks the harder question: what happens when things go wrong?

The fault-injection campaign exercises the dual-channel steer-by-wire system across 384 runs and 1,920 source files:

  • 324 single-point fault injections
  • 48 commanded A→B handover runs
  • 12 fault-free golden baselines

The analysis ingests five sources per run, harmonises them to a 100 Hz grid on the rig master clock, and extracts injection, detection, and emergency-op events from residuals, DTC latches, and channel-active flags. The eleven-metric acceptance verdict is 5 PASS / 6 FAIL.

The dataset at scale

Each run carries five synchronised sources: rig DAQ at 1 kHz (master clock), ECU-A and ECU-B at 100 Hz (nine monitors each), steering robot at 100 Hz, ambient sensors at 1 Hz. Single-point runs span nine fault types × three magnitudes (mild / moderate / severe) × three operating points (park / urban / high-speed) × two channels (ECU-A / ECU-B) × two repeats = 324. The operator log flags two truncated runs (inj_077, inj_244); both are processed on their available samples and their detection signatures recover normally inside the truncated window.

Campaign overview384 runs · 1,920 files
324Single-point injections
48Handover-dedicated
12Golden baselines
Rig DAQ
MF4 · 1 kHz
Master clock
ECU-A
MF4 · 100 Hz
Primary channel
ECU-B
MF4 · 100 Hz
Backup channel
Steering robot
CSV · 100 Hz
Input profile
Ambient
CSV · 1 Hz
Temp & voltage
Single-point run matrix · 324 runs
Operating points
High-speed
120 kph
Urban
50 kph
Parking
5 kph
Severity
Mild
Moderate
Severe
Fault types
Motor driver short
Motor driver open
Angle sensor bias
Angle sensor open
Torque sensor drift
Cross-channel freeze
CAN corruption
CAN timeout
Channel supply dropout
Motor driveAngle sensingTorque sensingBus / channelPower
9 faults3 severity3 op pointsECU-A / ECU-B2 repeats324

Clock alignment uses MDF header timestamps relative to the rig master, the same approach from Chapter 1 scaled to the full campaign. Robot angles convert from radians to degrees; rack position in millimetres maps to road-wheel angle. ECU-B channel names reconcile to the shared schema (theta_cmdHwaCmd_B, and so on).

What ran across every run

Every run goes through the same five-step check. For each one, the pipeline pulls four timestamps off the rig clock and scores the result against the acceptance gates.

  • Injection — when the fault takes effect. Found from the targeted monitor's residual stepping outside its pre-fault baseline, with a fallback to the nominal injection instant at 3.0 s.
  • Detection — when the first new DTC latches on the injected channel (ignoring any code that was already active before the run started).
  • Handover — when the A→B channel swap completes.
  • FTTI check — did injection through handover finish within the hazard budget? (100 ms at high speed, up to 500 ms in parking, depending on fault type.)
  • Handover quality — after the swap settles, does achieved road-wheel angle stay within 1.0° of commanded?
Analysis pipeline5 stages · every run
1Harmonise
5 sources → 100 Hz grid on rig master clock
384 runs
2Extract events
t_inject, t_detect, t_eop from residuals & DTCs
per run
3Coverage
correct monitor DTC latched per fault
324 SP runs
4FTTI check
total_ms vs hazard-class gate
34 breach
5Handover QC
steady-state RWA dev after t_eop + 1 s
≤ 1.0°
Fault injection is refined from the targeted monitor residual (5σ departure) with fallback to t = 3.0 s. FTTI compares total_ms (injection to emergency-op completion) against the hazard-class gate. All 384 runs complete in ~400 s on 8 parallel workers.

Fault types map to three hazard classes with different timing budgets: self-steer (commission faults), incorrect-steer, and handover-omission. If a fault is never detected, it counts as a timing failure. All 384 runs complete in ~400 s on eight parallel workers. No runs failed to harmonise.

The acceptance verdict

Acceptance gates6 fail · 5 pass (11 total)
MetricObservedGate
Coverage (pooled)93.21%≥ 99%fail
Coverage (per fault)50.0% min≥ 99%fail
FTTI (all hazards)34 / 324 breach< FTTIfail
Self-steer FTTI (high-speed)0 / 36 breach< 100 mspass
Self-steer FTTI (urban)0 / 36 breach< 200 mspass
Self-steer FTTI (parking)0 / 36 breach< 500 mspass
Incorrect-steer FTTI16 / 72 breach< 150 msfail
Handover-omission FTTI18 / 144 breach< 150 msfail
Handover RWA deviation18 / 48 fail≤ 1.0°fail
False positives00pass
EOTTI window1.17–3.49 s≤ 5 spass
Self-steer FTTI passes at every operating point. The six failures cluster in coverage, incorrect-steer timing, handover-omission timing, and steady-state handover deviation. Zero false positives across 12 golden runs.

The headline is blunt: the campaign does not meet acceptance. Coverage falls short at 93.21% pooled (302 / 324) against a 99% gate. FTTI compliance is clean on the self-steer hazard at every operating point, but incorrect-steer and handover-omission both miss the 150 ms gate. Handover RWA deviation fails on 18 of 48 runs. The one clean robustness gate is false positives: zero DTC latches across all 12 golden runs on 18 monitors across both ECUs.

Three localised mechanisms explain most of the failures.

Coverage falls short

Diagnostic coverage by faultpooled 93.21% · target ≥ 99%
99% target0%25%50%75%100%Motor driver open50%Angle sensor bias88.89%CAN corruption100%Angle sensor open100%CAN timeout100%Channel supply dropout100%Cross-channel freeze100%Motor driver short100%Torque sensor drift100%
All 18 ECU-A motor-driver-open runs detect; all 18 ECU-B runs miss. The four angle-sensor-bias misses are all mild magnitude (two urban, two park). Every other fault type reaches 100%.

Pooled single-point coverage is 93.21% (302 / 324). Two fault types drive the shortfall.

Motor driver open: 50% (18 / 36). All 18 ECU-A-injected runs detect; all 18 ECU-B-injected runs miss. The pattern is fully consistent with hot-standby behaviour: the standby motor carries no current, so an open-circuit monitor that relies on current/voltage residuals has no signal to detect.

Angle sensor bias: 88.89% (32 / 36). The four misses are all mild magnitude (two at urban, two at park). All moderate and severe runs detect. This is a sensitivity boundary, not a protocol error.

Every other fault type reaches 100% coverage.

FTTI compliance

FTTI compliance by hazard34 / 324 breach overall
Self-steer (commission)Gate: 100 / 200 / 500 ms · motor_driver_short, cross_channel_freeze, can_corruption0 / 108Incorrect-steerGate: < 150 ms · 12 mild torque_sensor_drift (210–230 ms) + 4 undetected angle_sensor_bias16 / 72Handover-omissionGate: < 150 ms · all 18 ECU-B motor_driver_open (undetected)18 / 144
Self-steer timing is clean at every operating point. Incorrect-steer breaches concentrate on mild torque-sensor drift (210–230 ms vs 150 ms gate). Handover-omission breaches are the same 18 undetected ECU-B motor-driver-open runs from the coverage gap.

Self-steer passes cleanly. All motor_driver_short, cross_channel_signal_freeze, and can_corruption runs detect inside their operating-point budget: zero breaches at high-speed (100 ms), urban (200 ms), and park (500 ms).

Incorrect-steer fails (16 / 72 breach). The breach concentrates on mild torque_sensor_drift: all 12 mild runs sit at 210–230 ms, well above the 150 ms gate. Moderate (110–130 ms) and severe (70–80 ms) sit cleanly below. This is magnitude-dependent monitor sensitivity: the residual takes longer to cross threshold at mild drift amplitudes. The four undetected mild angle_sensor_bias runs also count here.

Handover-omission fails (18 / 144 breach). All 18 breaches are ECU-B-injected motor_driver_open runs that the standby-channel monitor cannot see. This is the coverage gap from above showing up in the timing gate.

Most other fault types resolve in 10–50 ms with low scatter. torque_sensor_drift is the outlier, with mean ~141 ms driven by the magnitude-dependent latency ladder (severe ~77 ms, moderate ~125 ms, mild ~221 ms).

Handover quality

Handover RWA deviation (failures only)18 / 48 fail
02468tolerance · 1.0°8.81°MD short · severe · park5.48°MD short · severe · hs/urban3.46°MD short · moderate3.41°Park + severe omissionmax |cmd − achieved| · °
Measured steady-state over [t_eop + 1.0 s, end], excluding the handover transient. Thirty runs sit at ~0.03° and pass cleanly. Failures split between severe motor-driver-short residual carry-over (up to 8.81°) and a park + severe omission cluster at ~3.3°.

30 of 48 runs sit at ~0.03° steady-state RWA deviation, well inside the 1.0° gate. 18 of 48 fail, and the failure population splits into three groups.

Severe motor driver short (6 runs): mean deviation ~5.4° at high-speed/urban and ~8.7° at park. The fault residual carries forward into B's tracking; deviation scales with operating amplitude. Maximum observed: 8.807°.

Park + severe omission faults (6 runs across motor_driver_open, channel_supply_dropout, and can_timeout): a tight cluster at ~3.2–3.4°, indistinguishable across the three fault types. Mechanism looks like a B-channel calibration/gain offset that only becomes observable post-handover under high-current conditions.

Moderate motor driver short (6 runs): ~3.40–3.46° across all operating points.

Why this matters after kickoff

Chapter 1 paid the format tax once and proved the actuator mostly clean under nominal sweeps. Chapter 2 pays the volume tax: hundreds of fault runs, each needing the same harmonisation, event extraction, and gate checks, applied consistently across the whole campaign. Done by hand, that is weeks of review with no guarantee two engineers would score the same run the same way.

Six gates fail, clustered in three mechanisms an architect can act on. That is enough to prioritize fixes. It is not enough to sign off, and there is no sensible next step on a vehicle or a correlation drive until the system passes.

The agent does not sign the safety case. It ran every check the same way on every run and handed back a structured failure. What comes next is the regression run: the team ships fixes, re-runs the campaign, and something has to diff the two result sets and catch what the fix broke.


Next in the series: the regression run, where two full campaigns meet and MOVEcenter does the diff. If comparing 384 runs before and after a fix still happens in spreadsheets, we would like to talk: founders@movedot.com, or www.movedot.ai.