Red Hat Bugzilla – Bug 831527
Tasks aborted without apparent reason
Last modified: 2014-11-09 17:38:40 EST
Description of problem:
A number of tasks at the end of the job https://beaker.engineering.redhat.com/jobs/240020 were aborted without any apparent reason. Beaker states that "External Watchdog Expired" for each of these tasks, but their start time is exactly the same, which to me looks like beaker killed them without even trying to actually run any tests.
Version-Release number of selected component (if applicable):
Version - 0.8.2
I saw it only once so far
This is what happened:
/distribution/MRG/Messaging/qpid_ptest_cluster_failover_soak was running and it failed to complete in the time alloted. The local watchdog kicked in first and that means it tries to continue to the next test after it reboots.
When it booted back up and started running this test:
/distribution/MRG/Messaging/qpid_ptest_cluster_perftest it ended up running out of disk space (I looked at the console log). I'm betting that once we ran out of disk space everything went south.
Thats when the external watchdog kicked in. The external watchdog is kept track of from the lab controller and its the end of the line for a recipe. All we do is abort every remaining task and put the machine back in the pool for the next recipe to run on it.
so, yes, we didn't even try and rn those remaining tests and thats by design. The system is too broken at that point to do anything more.