Bug 1459695
Summary: | Instance should not stuck in Resuming state forever when qemu crashes | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Yuri Obshansky <yobshans> |
Component: | openstack-nova | Assignee: | OSP DFG:Compute <osp-dfg-compute> |
Status: | CLOSED EOL | QA Contact: | OSP DFG:Compute <osp-dfg-compute> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 11.0 (Ocata) | CC: | berrange, dasmith, eglynn, kchamart, mbooth, sbauza, sferdjao, sgordon, srevivo, vromanso |
Target Milestone: | --- | Keywords: | Triaged, ZStream |
Target Release: | 11.0 (Ocata) | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-22 12:40:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Yuri Obshansky
2017-06-07 20:59:10 UTC
Okay, the root cause of the bug you linked seem to be QEMU process crashing due to a SeaBIOS problem (from https://bugzilla.redhat.com/show_bug.cgi?id=1425516#c33): [quote] a) Updating seabios to 7.4's seabios fixes it b) The errors are consistent with it being an SMM error which we disabled in 7.4's seabios. [/quote] However, we still don't know the original *cause* of the error / crash (in the bug - 1425516), except that we know updating the SeaBIOS to its 7.4 version resolves the errors. That said, the request from this bug sounds reasonable to me, from a Nova-perspective: instances should be placed in ERROR state when a crash of QEMU process occurs. Why did you set the severity/priority to high? It's true that Nova should correctly report that the QEMU process crashed but I don't think it's something we can do easily in Nova and probably not something we should consider soon as possible. That issue should be reported upstream first and that BZ should be closed as WONTFIX unfortunatly. (In reply to Sahid Ferdjaoui from comment #2) > Why did you set the severity/priority to high? It's true that Nova should > correctly report that the QEMU process crashed but I don't think it's > something we can do easily in Nova and probably not something we should > consider soon as possible. > > That issue should be reported upstream first and that BZ should be closed as > WONTFIX unfortunatly. The issue is that Nova can't recover when it happens, which is a severe problem to anybody who hits it. If there's a way for Nova to recover automatically without operator intervention we could drop the priority. The fact that it's difficult to fix doesn't mean it's not severe. I agree we should also report the bug upstream. If you do that, could you please link the launchpad bug in this bug? Please don't close this bug, though. There is nothing severe in Nova side. If QEMU crashes not sure to know what Nova could do to recover it, as if the kernel panic not sure to know what nova could do to recover it. OSP11 is now retired, see details at https://access.redhat.com/errata/product/191/ver=11/rhel---7/x86_64/RHBA-2018:1828 |