Bug 954219

Summary: external watchdog never fires if system is stuck in install loop due to Anaconda reboot
Product: [Retired] Beaker Reporter: Dan Callaghan <dcallagh>
Component: lab controllerAssignee: Amit Saha <asaha>
Status: CLOSED CURRENTRELEASE QA Contact: tools-bugs <tools-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 0.12CC: aigao, asaha, dcallagh, ebaak, jburke, llim, mganisin, psklenar, qwan, rmancy, xjia
Target Milestone: 0.14.2Keywords: Reopened, Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: Provisioning
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 953543 Environment:
Last Closed: 2013-11-07 01:46:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Callaghan 2013-04-22 00:40:14 UTC
If Anaconda starts and completes its %pre check-in but reboots before the %post check-in, the system will get into an infinite reboot loop and the external watchdog will never be triggered (instead it is perpetually extended by the %pre check-in).

+++ This bug was initially created as a clone of Bug #953543 +++

--- Additional comment from Dan Callaghan on 2013-04-22 10:35:29 EST ---

I'm guessing the panic loop is because we now extend the watchdog for 10 minutes after panic, to allow for kdump or other post-panic activities. Really after that has happened, we should (a) disable any further panic detection, and maybe (b) prevent any further watchdog extensions (although it's possible that some people's post-panic activities might actually be extending the watchdog?).

The ./start loop is a separate issue, but similar. When Anaconda checks in during %pre we extend the watchdog and record the ./start result. After that Beaker should not accept any more %pre check-ins, since that just indicates Anaconda bailed out and rebooted.

The latter problem has been known for a long time, we have discussed it before but I can't find an open bug for it. I will clone this one.

Comment 3 Dan Callaghan 2013-08-08 21:57:10 UTC
*** Bug 995000 has been marked as a duplicate of this bug. ***

Comment 4 Amit Saha 2013-08-19 01:21:09 UTC
This patch will ensure that the installation start is recorded only once and hence the watchdog isn't extended everytime Anaconda checks in: http://gerrit.beaker-project.org/#/c/2168/

Comment 7 Nick Coghlan 2013-10-03 02:28:09 UTC
Beaker 0.15 has been released.

Comment 8 Dan Callaghan 2013-10-08 00:05:39 UTC
*** Bug 1016040 has been marked as a duplicate of this bug. ***

Comment 9 Raymond Mancy 2013-10-23 01:56:55 UTC
This change has been nominated to be back ported to the 0.14 branch, to be released as part of the next maintenance release 0.14.2.

Comment 10 Nick Coghlan 2013-10-25 06:35:58 UTC
Adjusting target milestone to make the changes backported to 0.14.2 easier to identify. 0.15.0 has enough significant regressions that it shouldn't be used, so the change means that 0.15.1 can be effectively reidentified as the union of that tag and the 0.14.2 target milestone.

Comment 13 Nick Coghlan 2013-11-07 01:46:53 UTC
Closing as addressed in Beaker 0.14.2.