Description of problem: I have seen this a few times with jobs queued from the same xml, an example job is https://beaker.engineering.redhat.com/jobs/3118 The task before a reserve task hits a /LOCALWATCHDOG event, instead of just moving onto the reserve task and reserving the system for debugging it then hits an External watchdog and the system is taken back into beaker. It may be worth noting that the failing test does do a reboot of the host under test. Version-Release number of selected component (if applicable): How reproducible: I have seen this happen a few times using the xml used in job 3118 Steps to Reproduce: 1. clone the job listed above and watch job.
The problem is 'shutdown -r now' does not work due to bug in plymouth.
So I understand the reboot does not work because of a plymouth bug, this causes a localwatchdog timeout on my task which seems reasonable. But shouldn't the harness still move to the next task (in my case a reserve system) and run that task, net result the system should not external watchdog. Just trying to understand the event sequence...
I believe the problem is because we currently reboot after a localwatchdog to try and get the system in a "clean" state. This reboot never finishes.
Local watchdog is most likely caused by Bug 599003 which I am trying to fix now. As Bill said this asks for reboot. Next task is not run as the machine is not in clean state and we are waiting for reboot which never happens thanks to Bug 598631 and External watchdog kills the recipe. Once these two are fixed, I hope to close this one as duplicate.
NAKing the beaker-blocker as this does not look like our bug.
Is this still an issue? Seems the recent problem were purely with infrastructure and this works now fine.
Close out this bz I have not seen for a long time.
closing not seen for a long time, will re-open if re-occurs.