On a durability program, the bottleneck is rarely the test. The vehicle runs, the sensors record, and by the end of the day there is a clean pile of road load data sitting on a drive. Then the real work starts: ingest the files, check every channel, count cycles, build the damage matrix, compare against the rig's load schedule, find the severe events, scale it to customer life, and write it all up. A good engineer has scripts for most of these steps, but the scripts break on each new program, the steps live in different tools, and someone has to carry the data from one to the next by hand. So it rarely happens in one sitting. It gets picked up between other work and stretches across the better part of a week, and the slow steps are the first to be shortened or skipped when the schedule is tight.
This is a walkthrough of a real post-test analysis we ran with an agent doing that legwork. One prototype SUV, a mixed-terrain durability day, 93 instrumented channels. The data was ingested Monday evening. By Tuesday morning the full analysis was done and ready for engineering review. To keep it concrete we scoped it to the front suspension, but the same pipeline runs across the rest of the vehicle.
First, the time, since it is the thing everyone asks about. With the scripting a competent durability team already has, this post-test work is roughly a day and a half of focused effort, and in practice it stretches across most of a week because the slow steps keep getting deferred. The agent did all of it overnight, on every channel, including the ones that usually get skipped.
But the hours are the least interesting part. What matters is that running every step overnight, on every channel, catches things a rushed week misses, and this run caught two. The rig's load schedule under-tested the severe tail that does most of the damage, by enough to have invalidated the entire accelerated test. And a front-left wheel-force sensor had quietly saturated, which floors every damage number it reports. Neither is exotic. Both are exactly the kind of thing that gets skipped when the schedule is tight. The rest of this post is what happened in those eight minutes of compute.
The test day
The vehicle is a stock SUV prototype carrying wheel-force transducers, body and unsprung IMUs, damper position and load sensors, the steering system, and the full vehicle bus. The day is split into four terrain blocks that compress the 90th-percentile customer duty cycle into ten and a half minutes of driving.
The important detail is the rock garden. It takes 14.3% of the test time but represents only 3% of customer life. That over-representation is deliberate: severe terrain is amplified to accumulate meaningful fatigue inside a finite test window, then scaled back to customer-equivalent distance in post-processing. Holding onto that 3%-versus-86% relationship is what most of this analysis is really about.
Overnight, unattended
The pipeline ran on its own overnight and finished in eight minutes of compute. Nothing here is novel as a set of steps. Every durability engineer knows them. What changes is that they all run, in sequence, on every test, without a person feeding files from one tool to the next.
By the time anyone arrived Tuesday, the report was already on the shared drive with a verdict attached. The value is not just the speed. It is that the slow, easily-skipped steps, like the driver radio correlation and the rig schedule comparison, got done at all.
Sensor QC comes first
No damage number means anything if the sensor that produced it was lying. So the first gate is a health check on all 93 channels for drift, saturation, dropout, and flatline.
Most channels pass. Fifteen flatline, which is expected, since brake torque, tie-rod, and bump-stop channels simply are not exercised in a straight-line duty cycle. Three are flagged, and one of them matters a lot: the front-left vertical-force sensor saturated at its ±6,800 N limit on the three rough blocks. It was undersized for this prototype, which means the damage it reports is a floor, not the truth. That single fault propagates all the way to the verdict.
Peak loads
With the channels triaged, the first quantitative pass is peak vertical force per corner per block, expressed as a dynamic factor against the static load. Anything meaningfully above 1.10× is worth a look for ultimate-stress margin.
The largest single event is 7,347 N on the front-left in the rock garden, a 1.14× factor. The rear-left reaches 1.19×, the highest of the day. None of these are alarming on their own, but peaks are not what kills a component. Fatigue is about the whole spectrum of cycles, not the single largest one.
Where the damage actually is
So we count cycles. ASTM rainflow on every wheel-force vertical channel across every block, with pseudo-damage computed as Σ(range⁵). That is the standard slope exponent for welded steel, and a damage proxy that ranks correctly without committing to a specific S-N curve.
The result is stark. The rock garden block produces 86.2% of the total damage from 14.3% of the test time, and the rear-left corner alone accounts for 38% of the campaign. This is not a quirk of the test design. It is the range⁵ exponent at work. A single 1,849 N cycle from the rock garden does roughly 580× the damage of a single 317 N highway cycle. Large events dominate fatigue completely, which is exactly why the next step matters.
The rig was about to run the wrong loads
Every accelerated rig test runs to a load schedule, the recipe that tells the rig how hard to push and how many times to do it. The one queued up for this program was the 329 LT schedule, and on paper it looked like a perfectly reasonable default. There was just one problem nobody had caught. It was written for the previous generation of this vehicle, which carried less mass over the front axle. So the agent ran the comparison that, on a tight program, almost never gets done in time. It took the 12,983 cycles it had just counted, built the cumulative exceedance curve, and laid it straight over what the rig was actually going to apply.
For most of the range, the two curves sit right on top of each other. The rig handles the everyday loads exactly as it should. Then you get to the tail, the severe events above 1,000 N, and a gap opens up. There the rig under-applies by roughly 18%. That sounds like a rounding error until you remember where the damage actually comes from, because that same tail is responsible for 86% of it. The rig, in other words, was set to faithfully test the part everywhere except the one place it actually breaks.
Leave that alone and you get the worst outcome durability can hand you. The part passes on the rig, the report signs it off, and the real weakness sails through into production, where it shows up as warranty claims and a recall instead of a line item in a test report. The fix is almost insulting in how small it is: raise the severe-event counts by 15 to 18% before the rig starts. The hard part is timing. A finding like this is only worth something if it lands before someone hits go on the rig. Catch it the following Friday, the way the old workflow would have, and you have already burned weeks of rig time running the wrong test.
The severe events, in context
Peaks are also where the human record lives. The agent pulled the top five events, one per corner, and matched each against the driver's radio transcript within a thirty-second window.
This is the kind of correlation that gets dropped first under deadline pressure, and it pays for itself immediately. It also shows why the matching has to be fuzzy: the driver's "Rock garden. 30 kph. Hold on." at event 2 is a block-start callout, not a reaction to that specific hit. Radio comments are context, not timestamps.
Back to customer life
Finally, everything is weighted by the 90th-percentile mission profile and scaled to a full six-hour test day. That ties the abstract damage numbers back to something a program manager can act on.
A full test day at this terrain mix is worth roughly 800 km of customer use. The inversion is the whole point: customer life is dominated by the highway, but damage is dominated by the rocks, because the rock garden is 27,900× more damaging per kilometer. A duty cycle that just sampled terrain in proportion to use would never accumulate meaningful fatigue. Severe terrain has to be weighted, and then carefully scaled back, which is exactly what this pipeline does.
The verdict
All of it rolls up into a single call, with the caveats kept attached rather than buried.
This is the part we are deliberate about: the agent does not decide whether the part is good. It assembles the evidence, computes the verdict against the methodology the engineer defined, and surfaces the two things that would otherwise be easy to miss: the load gap and the saturated sensor. The sign-off stays with the engineer.
Why it matters
None of these steps are new. Every durability engineer knows how to count cycles, sanity-check a sensor, or hold a load history up against the rig schedule. What changes when an agent runs the whole pipeline is that all of them actually happen, every time, on every channel, instead of just the ones that survive the deadline. The load gap is the clearest example. It is exactly the check that gets quietly dropped in a busy week, and it is also the one that would have cost the most to miss.
So it is worth being concrete about that cost. An accelerated durability campaign is weeks of near-continuous rig time. Run it against the wrong loads, realize later, and the whole test is invalid. You do it again: the rig time, the re-instrumentation, and often a fresh prototype, all of it. That runs comfortably into six figures before you count a single day of schedule, and on a program approaching start of production, the slip usually hurts more than the re-test does. The agent caught this one in eight minutes, before the rig was even booked. You only bank that on the runs where the gap would have slipped through, but for a check this easy to skip, it does not have to happen often to pay for itself many times over.
Data note. The telemetry in this study was generated with VI-CarRealTime 2026 on a stock SUV configuration across rough-road profiles at varying scale factors. Sensor faults, the driver transcript, and the 329 LT rig reference are synthetic overlays applied to the clean solver output for instrumentation realism. Tools: VI-CarRealTime 2026, Python (asammdf, pandas, numpy, scipy), and the MOVEdot agent platform.
If you run durability programs and recognize the week that disappears into post-processing, we would like to talk. Get in touch: founders@movedot.com, or www.movedot.ai.