Bug 953543
Summary: | external watchdog never fires if system is stuck in panic loop | |||
---|---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Jeff Burke <jburke> | |
Component: | lab controller | Assignee: | Raymond Mancy <rmancy> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Amit Saha <asaha> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 0.12 | CC: | aigao, asaha, dcallagh, ebaak, llim, qwan, rmancy, xjia | |
Target Milestone: | 0.14.2 | Keywords: | Reopened, Triaged | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | Misc | |||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 954219 (view as bug list) | Environment: | ||
Last Closed: | 2013-11-07 01:46:36 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Comment 2
Nick Coghlan
2013-04-19 08:03:13 UTC
I'm guessing the panic loop is because we now extend the watchdog for 10 minutes after panic, to allow for kdump or other post-panic activities. Really after that has happened, we should (a) disable any further panic detection, and maybe (b) prevent any further watchdog extensions (although it's possible that some people's post-panic activities might actually be extending the watchdog?). The ./start loop is a separate issue, but similar. When Anaconda checks in during %pre we extend the watchdog and record the ./start result. After that Beaker should not accept any more %pre check-ins, since that just indicates Anaconda bailed out and rebooted. The latter problem has been known for a long time, we have discussed it before but I can't find an open bug for it. I will clone this one. (In reply to comment #3) > The latter problem has been known for a long time, we have discussed it > before but I can't find an open bug for it. I will clone this one. Cloned as bug 954219. Yeah so I think what makes most sense here is to just not test for further panic strings once a panic has already been detected. Beaker 0.15 has been released. This change has been nominated to be back ported to the 0.14 branch, to be released as part of the next maintenance release 0.14.2. Adjusting target milestone to make the changes backported to 0.14.2 easier to identify. 0.15.0 has enough significant regressions that it shouldn't be used, so the change means that 0.15.1 can be effectively reidentified as the union of that tag and the 0.14.2 target milestone. Closing as addressed in Beaker 0.14.2. |