Bug 1128417
| Summary: | Suspend VM doesn't work | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Vered Volansky <vered> | ||||||||||||||||||||||||||
| Component: | ovirt-engine-core | Assignee: | Vinzenz Feenstra [evilissimo] <vfeenstr> | ||||||||||||||||||||||||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Pavel Stehlik <pstehlik> | ||||||||||||||||||||||||||
| Severity: | high | Docs Contact: | |||||||||||||||||||||||||||
| Priority: | high | ||||||||||||||||||||||||||||
| Version: | 3.5 | CC: | acathrow, amureini, bugs, ecohen, gklein, iheim, michal.skrivanek, ofrenkel, vered, vfeenstr, yeylon | ||||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||||
| Target Release: | 3.6.0 | ||||||||||||||||||||||||||||
| Hardware: | Unspecified | ||||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||||
| Whiteboard: | virt | ||||||||||||||||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||||
| Last Closed: | 2014-09-02 12:57:09 UTC | Type: | Bug | ||||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||||
Created attachment 925487 [details]
engine.log
taking the bug Created attachment 925636 [details]
vdsm.log for vdsm 4.16.`
I cannot reproduce the issue using lastest VDSM 4.16.1 on RHEL 6.5 (proof: see attached logs) MainThread::INFO::2014-08-11 09:22:04,404::vdsm::131::vds::(run) (PID: 18969) I am the actual vdsm 4.16.1-27.git54e8a97.el6 toji.rokugan.lan (2.6.32-431.20.3.el6.x86_64) test done on RHEL 6.5, but the code paths inside VDSM are exactly the same, so the test is defintely relevant. Can you please share a full VDSM log which shows at least 1. the VM creation logs 2. the (failed) VM hibernation logs? Thanks, I was working on a fedora 20 host, with said vdsm. This happens 100% and I can't verify my code and merge patch. I added a clean log with the above reproduction, and extracted the problematic messages. The VM is two weeks old, and I worked with it, including the hibernate, which worked last week. It's a RHEL7 compatible VM, which I failed to mention before. I just tried it again with a new VM, not rhel7, and it works, so no use for new logs. If you want a reproduction with a new VM please contact your QA. BTW it's the same with all my rhel7 VMs. this patch, among other benefits, will solve this issue. http://gerrit.ovirt.org/#/c/26142/ (In reply to Francesco Romani from comment #6) > this patch, among other benefits, will solve this issue. > > http://gerrit.ovirt.org/#/c/26142/ If that is the case, IMO, this patch should be commented with this bz, backported and verified by QE. Though I get that this patch may fix this issue, I must comment again that I used HibernateVmCommand on master during last week with no issues, so this looks like a regression to me. "I just tried it again with a new VM, not rhel7, and it works, so no use for new logs." so you mean you reproduce it 100% only for that one VM? For any RHEL 7 guest VM? (what's the OS type in that case?) you're talking about master, so is this happening for 3.5 branch or master only? how long is the VM running before you try suspend? (immediately upon starting/recovery or after running it for few mins) > so you mean you reproduce it 100% only for that one VM? No, I reproduced it on several vms, all kind of old, and just noticed they're all RHEL7. Some are not actually installed with RHEL7 - not sure which is which, and spice doesn't open in order for me to check and report back. > > For any RHEL 7 guest VM? (what's the OS type in that case?) You mean the specific install - can't tell you without spice, probably the newest on the newest on PXE. > you're talking about master, so is this happening for 3.5 branch or master > only? Only checked on engine master. vdsm side should be some 3.5 nightly. vdsm-4.16.1-0.gita4d9abf.fc20 > how long is the VM running before you try suspend? (immediately upon > starting/recovery or after running it for few mins) Happened several times after a long time (hours/minutes). Also happens when I immediately try suspending right after it goes back up when that last suspend failed. (In reply to Vered Volansky from comment #9) > > so you mean you reproduce it 100% only for that one VM? > No, I reproduced it on several vms, all kind of old, and just noticed > they're all RHEL7. > Some are not actually installed with RHEL7 - not sure which is which, and > spice doesn't open in order for me to check and report back. hm that SPICE issue is interesting. What's wrong with it? Share the logs pls > > > > For any RHEL 7 guest VM? (what's the OS type in that case?) > You mean the specific install - can't tell you without spice, probably the > newest on the newest on PXE. I mean what's set in the Edit VM dialog. Also, I suppose you have 3.5 cluster level > > you're talking about master, so is this happening for 3.5 branch or master > > only? > Only checked on engine master. vdsm side should be some 3.5 nightly. > vdsm-4.16.1-0.gita4d9abf.fc20 > > how long is the VM running before you try suspend? (immediately upon > > starting/recovery or after running it for few mins) > Happened several times after a long time (hours/minutes). > Also happens when I immediately try suspending right after it goes back up > when that last suspend failed. hm, that's very weird and shouldn't happen unless there is some serious issue in VM initialization. Is it a proper deploy from engine on a pristine Fedora 20? VDSM logs since the VM creation till hibernate attempt may show something > hm that SPICE issue is interesting. What's wrong with it? Share the logs pls Will do. > I mean what's set in the Edit VM dialog. I only have one rhel7 option. Edit Vm dialog screenshot will be attached soon. > Also, I suppose you have 3.5 cluster level I do. > Is it a proper deploy from engine on a pristine Fedora 20? VDSM logs since > the VM creation till hibernate attempt may show something Don't have the time of creation :( engine is built from master (make clean install-dev), on fc19. vdsm is installed from engine on fc20. Created attachment 925824 [details]
edit vm dialog screenshot
Created attachment 925836 [details]
engine_spice_18:59
Click on spice last on vm_rhel7 Aug 11, 18:59
Created attachment 925837 [details]
engine.log
This is after another try with the same vms.
Created attachment 925838 [details]
vdsm.log
This is after another try with the same vms.
Eureka! Thanks to rotation, will add vm creation logs tomorrow. BTW, just tried to reproduce with the same VMs I did just this morning - no success. I think due to bz1128807, but this could be just a phantom bug :( (In reply to Vered Volansky from comment #16) > Eureka! > Thanks to rotation, will add vm creation logs tomorrow. > > BTW, just tried to reproduce with the same VMs I did just this morning - no > success. I think due to bz1128807, but this could be just a phantom bug :( hm, you hit that issue in the logs in comment #14, then the engine misbehavior is explained, but the original error in vdsm you described…that must be something else. I wouldn't duplicate this bug yet, it vdsm shouldn't be affected by the engine NPE if you can please try to get the full vdsm log with the exception Created attachment 925969 [details]
vm creation
Created attachment 925970 [details]
vm creation
Vms are named vm2,vm3,vm4,vm5,vm6 Created attachment 925973 [details]
vdsm log around first-time bug occurance
Created attachment 925974 [details]
vdsm log around first-time bug occurance
Created attachment 925975 [details]
vdsm log around first-time bug occurance
@vered could you try to reproduce it with the current vdsm master? We have just merged the patch http://gerrit.ovirt.org/#/c/26142/ Please note that I haven't annotated the patch since we were not sure yet if this is really the reason for your issues and it is a more generic issue anyway. If this is fixing your issue we'll backport the patch and use this bug as a reference for the backport. Thanks. Please note that you'd have to build VDSM from git right now, there's no nightly build for this as of this moment. @vered: Any update? Sorry, was on PTO and this got away from me. Unfortunately no development environment for vdsm right now. Obviously I can work on that, but is there a point for me to do it considering I did manage to suspend the same vm later on? good then:) seems it works! |
Created attachment 925486 [details] vdsm.log Description of problem: Suspend a running VM results in VM preparing for hibernate, saving status, then back up. Engine looks good. vdsm.log looks suspicious, attached. Version-Release number of selected component (if applicable): eeba2de3e1198b0901525d58068c0ef51a497ed2 (engine), vdsm-4.16.1-0.gita4d9abf.fc20 How reproducible: 100% Steps to Reproduce: 1. Suspend a running VM Actual results: Vm goes back up, no user error message. Expected results: Vm should go into hibernation. Additional info: vdsm.log extract: 4886-4af7-af0b-d668ecee9e4f,00000002-0002-0002-0002-00000000037f,ef7c8ef2-cf66-4f60-a752-f1a79235e2d6,952565e4-a91d-4876-a3b1-ccd5dfe2a8b0,4bd8a1f0-e234-40d6-a71c-9a93818794e4, bb223e58-a6b3-449a-a77a-344b4863850b') {} Thread-8778::DEBUG::2014-08-10 13:15:40,184::API::511::vds::(migrate) {'hiberVolHandle': '7158eae9-4886-4af7-af0b-d668ecee9e4f,00000002-0002-0002-0002-00000000037f,ef7c8ef2-cf6 6-4f60-a752-f1a79235e2d6,952565e4-a91d-4876-a3b1-ccd5dfe2a8b0,4bd8a1f0-e234-40d6-a71c-9a93818794e4,bb223e58-a6b3-449a-a77a-344b4863850b', 'vmId': 'de580b19-e975-426f-80de-6c7da f1f24ab', 'mode': 'file'} Thread-8778::DEBUG::2014-08-10 13:15:40,185::BindingXMLRPC::1134::vds::(wrapper) return vmHibernate with {'status': {'message': 'Hibernation process starting', 'code': 0}, 'pro gress': 0} Thread-8780::DEBUG::2014-08-10 13:15:40,186::migration::135::vm.Vm::(_prepareGuest) vmId=`de580b19-e975-426f-80de-6c7daf1f24ab`::Save State begins Thread-8780::ERROR::2014-08-10 13:15:40,186::migration::160::vm.Vm::(_recover) vmId=`de580b19-e975-426f-80de-6c7daf1f24ab`::'NoneType' object has no attribute 'isResponsive' Thread-8780::ERROR::2014-08-10 13:15:40,187::migration::259::vm.Vm::(run) vmId=`de580b19-e975-426f-80de-6c7daf1f24ab`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/virt/migration.py", line 231, in run self._prepareGuest() File "/usr/share/vdsm/virt/migration.py", line 136, in _prepareGuest if self._vm.guestAgent.isResponsive(): AttributeError: 'NoneType' object has no attribute 'isResponsive' Thread-8778::DEBUG::2014-08-10 13:15:40,237::BindingXMLRPC::318::vds::(wrapper) client [10.35.1.234] Thread-8778::DEBUG::2014-08-10 13:15:40,237::task::595::Storage.TaskManager.Task::(_updateState) Task=`be562742-8089-4a7c-b0a3-54549ca5888f`::moving from state init -> state pr eparing Thread-8778::INFO::2014-08-10 13:15:40,237::logUtils::44::dispatcher::(wrapper) Run and protect: clearTask(taskID='5812d8ec-cc9a-412f-8005-99c507282276', spUUID=None, options=N one)