Bug 1064860
Summary: | VMs get stuck in 'Unknown' state when power management is not working. | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Roman Hodain <rhodain> |
Component: | ovirt-engine | Assignee: | Eli Mesika <emesika> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | sefi litmanovich <slitmano> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.3.0 | CC: | aberezin, bazulay, ecohen, emesika, gklein, iheim, lpeer, michal.skrivanek, oourfali, pstehlik, rbalakri, Rhev-m-bugs, rhodain, yeylon |
Target Milestone: | --- | ||
Target Release: | 3.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | infra | ||
Fixed In Version: | ovirt-engine-3.5.0_beta | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-02-17 17:07:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1142923, 1156165 |
Description
Roman Hodain
2014-02-13 12:39:00 UTC
the fencing operation should be aborted when the host comes up in the meantime. Then the rerun treatment should work properly and not get overwritten by failed fencing afterwards Roman, you should right click on the Host from the Hosts list in the web admin UI and select "Confirm Host has been rebooted" Please recheck with the above There's a user experience problem here, the user reboot a host with running VMs thus VMs are going to unknown state. There's not indication in the VMs tab(only hidden in the events section) for the user that he should go back to host level and confirm that the has been rebooted. Adding User Experience keyword. (In reply to Arthur Berezin from comment #3) > There's a user experience problem here, the user reboot a host with running > VMs thus VMs are going to unknown state. There's not indication in the VMs > tab(only hidden in the events section) for the user that he should go back > to host level and confirm that the has been rebooted. > Adding User Experience keyword. I do not thing that this is the problem here. The problem is that the hypervisor where the VM was running is already up and the VM is still in the unknown state. Why would I mark hypervisor which is up as rebooted? (In reply to Roman Hodain from comment #4) > (In reply to Arthur Berezin from comment #3) > > There's a user experience problem here, the user reboot a host with running > > VMs thus VMs are going to unknown state. There's not indication in the VMs > > tab(only hidden in the events section) for the user that he should go back > > to host level and confirm that the has been rebooted. > > Adding User Experience keyword. > > I do not thing that this is the problem here. The problem is that the > hypervisor where the VM was running is already up and the VM is still in the > unknown state. Why would I mark hypervisor which is up as rebooted? I am just copy/past from your bug description : Steps to Reproduce: 1. Create a new DC with just one hyperviosr (local storage) 2. Start a VM on it 3. Reboot it So, you had rebooted the Host manually right? If so , please test again while after you reboot the host you also right click on it as "Confirm host has been rebooted" BTW there is no fencing issue here since fencing can not work when there is only one Host in the DC (no proxy host available...) (In reply to Eli Mesika from comment #5) > (In reply to Roman Hodain from comment #4) > > (In reply to Arthur Berezin from comment #3) > > > There's a user experience problem here, the user reboot a host with running > > > VMs thus VMs are going to unknown state. There's not indication in the VMs > > > tab(only hidden in the events section) for the user that he should go back > > > to host level and confirm that the has been rebooted. > > > Adding User Experience keyword. > > > > I do not thing that this is the problem here. The problem is that the > > hypervisor where the VM was running is already up and the VM is still in the > > unknown state. Why would I mark hypervisor which is up as rebooted? > > I am just copy/past from your bug description : > > Steps to Reproduce: > > 1. Create a new DC with just one hyperviosr (local storage) > 2. Start a VM on it > 3. Reboot it > > So, you had rebooted the Host manually right? If so , please test again > while after you reboot the host you also right click on it as "Confirm host > has been rebooted" > > BTW there is no fencing issue here since fencing can not work when there is > only one Host in the DC (no proxy host available...) Hi, I have tested your suggestion, but thisis not possible. At the time when the VM is in the unknown state the hypervisor is already up: Error while executing action: Cannot confirm 'Host has been rebooted' Host. Valid Host statuses are "Non operational", "Maintenance" or "Connecting". let me repeat what happens: - VM is up - host is up - host is down - fencing is triggered - fencing in progress (not working) - hypervisor is up - Vm is marked as down - Fencing failed - Vm is marked as in Unknown state. - Mark the hypervisor as rbooted. (not possible) I still think that this is an issue of fencing. The fencing is triggered and if it fails it marks VM as in unknow state even if they are already marked as down by the hypervisor which is already up. It i snot related only to local storage, but also to an issues where the fencing not working. Roman (In reply to Arthur Berezin from comment #3) > There's a user experience problem here, the user reboot a host with running > VMs thus VMs are going to unknown state. There's not indication in the VMs > tab(only hidden in the events section) for the user that he should go back > to host level and confirm that the has been rebooted. > Adding User Experience keyword. is this what this bug is about? I see that this BZ is in POST, so the problem reported here was solved; what you are saying is that we have a user-experience problem that, if I understand correctly, should be tracked separately from this issue. if so - please open a separate RFE for that. For now I removed the UserExperience keyword from this BZ. My hunch is that this should be solved via a notification-center or something similar that we can plan for 4.0, definitely not 3.5 material. thanks. (In reply to Einav Cohen from comment #9) > (In reply to Arthur Berezin from comment #3) > > There's a user experience problem here, the user reboot a host with running > > VMs thus VMs are going to unknown state. There's not indication in the VMs > > tab(only hidden in the events section) for the user that he should go back > > to host level and confirm that the has been rebooted. > > Adding User Experience keyword. > > is this what this bug is about? I see that this BZ is in POST, so the > problem reported here was solved; what you are saying is that we have a > user-experience problem that, if I understand correctly, should be tracked > separately from this issue. if so - please open a separate RFE for that. For > now I removed the UserExperience keyword from this BZ. > My hunch is that this should be solved via a notification-center or > something similar that we can plan for 4.0, definitely not 3.5 material. > thanks. There are 2 issues here, the first is fixed by Eli's patch - VM are marked as unknown after the host was rebooted and fencing failed. The other is that there's no "Call for Action" in the VMs tab when the user is expected to manually confirm a host was rebooted. I'll open a separate RFE on the second issue. Verified with ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch. vdsm-4.16.2-1.gite8cba75.el6.x86_64. 1. single host in datacenter is up (host has no power management configured). 2. create vm. 3. vm is up. 4. manually reboot the host. 5. host state connecting. 6. fencing failed for SPM host in DC, setting DC to non-operational 7. host state non-responsive. 8. vm state unknown. 9. host up. 10. vm down. 11. host is contending for SPM. 12. DC up host is SPM. rhev 3.5.0 was released. closing. |