Bug 1128417

Summary: Suspend VM doesn't work
Product: [Retired] oVirt Reporter: Vered Volansky <vered>
Component: ovirt-engine-coreAssignee: Vinzenz Feenstra [evilissimo] <vfeenstr>
Status: CLOSED NEXTRELEASE QA Contact: Pavel Stehlik <pstehlik>
Severity: high Docs Contact:
Priority: high    
Version: 3.5CC: acathrow, amureini, bugs, ecohen, gklein, iheim, michal.skrivanek, ofrenkel, vered, vfeenstr, yeylon
Target Milestone: ---   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-02 12:57:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm.log
none
engine.log
none
vdsm.log for vdsm 4.16.`
none
edit vm dialog screenshot
none
engine_spice_18:59
none
engine.log
none
vdsm.log
none
vm creation
none
vm creation
none
vdsm log around first-time bug occurance
none
vdsm log around first-time bug occurance
none
vdsm log around first-time bug occurance none

Description Vered Volansky 2014-08-10 10:56:36 UTC
Created attachment 925486 [details]
vdsm.log

Description of problem:
Suspend a running VM results in VM preparing for hibernate, saving status, then back up. Engine looks good. vdsm.log looks suspicious, attached.


Version-Release number of selected component (if applicable):
eeba2de3e1198b0901525d58068c0ef51a497ed2 (engine), vdsm-4.16.1-0.gita4d9abf.fc20 


How reproducible:
100%

Steps to Reproduce:
1. Suspend a running VM

Actual results:
Vm goes back up, no user error message.

Expected results:
Vm should go into hibernation.

Additional info:
vdsm.log extract:

4886-4af7-af0b-d668ecee9e4f,00000002-0002-0002-0002-00000000037f,ef7c8ef2-cf66-4f60-a752-f1a79235e2d6,952565e4-a91d-4876-a3b1-ccd5dfe2a8b0,4bd8a1f0-e234-40d6-a71c-9a93818794e4,
bb223e58-a6b3-449a-a77a-344b4863850b') {}
Thread-8778::DEBUG::2014-08-10 13:15:40,184::API::511::vds::(migrate) {'hiberVolHandle': '7158eae9-4886-4af7-af0b-d668ecee9e4f,00000002-0002-0002-0002-00000000037f,ef7c8ef2-cf6
6-4f60-a752-f1a79235e2d6,952565e4-a91d-4876-a3b1-ccd5dfe2a8b0,4bd8a1f0-e234-40d6-a71c-9a93818794e4,bb223e58-a6b3-449a-a77a-344b4863850b', 'vmId': 'de580b19-e975-426f-80de-6c7da
f1f24ab', 'mode': 'file'}
Thread-8778::DEBUG::2014-08-10 13:15:40,185::BindingXMLRPC::1134::vds::(wrapper) return vmHibernate with {'status': {'message': 'Hibernation process starting', 'code': 0}, 'pro
gress': 0}
Thread-8780::DEBUG::2014-08-10 13:15:40,186::migration::135::vm.Vm::(_prepareGuest) vmId=`de580b19-e975-426f-80de-6c7daf1f24ab`::Save State begins
Thread-8780::ERROR::2014-08-10 13:15:40,186::migration::160::vm.Vm::(_recover) vmId=`de580b19-e975-426f-80de-6c7daf1f24ab`::'NoneType' object has no attribute 'isResponsive'
Thread-8780::ERROR::2014-08-10 13:15:40,187::migration::259::vm.Vm::(run) vmId=`de580b19-e975-426f-80de-6c7daf1f24ab`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 231, in run
    self._prepareGuest()
  File "/usr/share/vdsm/virt/migration.py", line 136, in _prepareGuest
    if self._vm.guestAgent.isResponsive():
AttributeError: 'NoneType' object has no attribute 'isResponsive'
Thread-8778::DEBUG::2014-08-10 13:15:40,237::BindingXMLRPC::318::vds::(wrapper) client [10.35.1.234]
Thread-8778::DEBUG::2014-08-10 13:15:40,237::task::595::Storage.TaskManager.Task::(_updateState) Task=`be562742-8089-4a7c-b0a3-54549ca5888f`::moving from state init -> state pr
eparing
Thread-8778::INFO::2014-08-10 13:15:40,237::logUtils::44::dispatcher::(wrapper) Run and protect: clearTask(taskID='5812d8ec-cc9a-412f-8005-99c507282276', spUUID=None, options=N
one)

Comment 1 Vered Volansky 2014-08-10 11:01:34 UTC
Created attachment 925487 [details]
engine.log

Comment 2 Francesco Romani 2014-08-11 07:14:40 UTC
taking the bug

Comment 3 Francesco Romani 2014-08-11 07:39:13 UTC
Created attachment 925636 [details]
vdsm.log for vdsm 4.16.`

Comment 4 Francesco Romani 2014-08-11 07:40:53 UTC
I cannot reproduce the issue using lastest VDSM 4.16.1 on RHEL 6.5 (proof: see attached logs)

MainThread::INFO::2014-08-11 09:22:04,404::vdsm::131::vds::(run) (PID: 18969) I am the actual vdsm 4.16.1-27.git54e8a97.el6 toji.rokugan.lan (2.6.32-431.20.3.el6.x86_64)


test done on RHEL 6.5, but the code paths inside VDSM are exactly the same, so the test is defintely relevant.

Can you please share a full VDSM log which shows at least
1. the VM creation logs
2. the (failed) VM hibernation logs?

Thanks,

Comment 5 Vered Volansky 2014-08-11 08:24:42 UTC
I was working on a fedora 20 host, with said vdsm. This happens 100% and I can't verify my code and merge patch.
I added a clean log with the above reproduction, and extracted the problematic messages. The VM is two weeks old, and I worked with it, including the hibernate, which worked last week. It's a RHEL7 compatible VM, which I failed to mention before.
I just tried it again with a new VM, not rhel7, and it works, so no use for new logs.
If you want a reproduction with a new VM please contact your QA.
BTW it's the same with all my rhel7 VMs.

Comment 6 Francesco Romani 2014-08-11 08:44:03 UTC
this patch, among other benefits, will solve this issue.

http://gerrit.ovirt.org/#/c/26142/

Comment 7 Vered Volansky 2014-08-11 10:22:49 UTC
(In reply to Francesco Romani from comment #6)
> this patch, among other benefits, will solve this issue.
> 
> http://gerrit.ovirt.org/#/c/26142/

If that is the case, IMO, this patch should be commented with this bz, backported and verified by QE.
Though I get that this patch may fix this issue, I must comment again that I used HibernateVmCommand on master during last week with no issues, so this looks like a regression to me.

Comment 8 Michal Skrivanek 2014-08-11 14:19:29 UTC
"I just tried it again with a new VM, not rhel7, and it works, so no use for new logs."
so you mean you reproduce it 100% only for that one VM? 

For any RHEL 7 guest VM? (what's the OS type in that case?)

you're talking about master, so is this happening for 3.5 branch or master only?

how long is the VM running before you try suspend? (immediately upon starting/recovery or after running it for few mins)

Comment 9 Vered Volansky 2014-08-11 14:41:51 UTC
> so you mean you reproduce it 100% only for that one VM? 
No, I reproduced it on several vms, all kind of old, and just noticed they're all RHEL7.
Some are not actually installed with RHEL7 - not sure which is which, and spice doesn't open in order for me to check and report back.
> 
> For any RHEL 7 guest VM? (what's the OS type in that case?)
You mean the specific install - can't tell you without spice, probably the newest on the newest on PXE. 
> you're talking about master, so is this happening for 3.5 branch or master
> only?
Only checked on engine master. vdsm side should be some 3.5 nightly. vdsm-4.16.1-0.gita4d9abf.fc20
> how long is the VM running before you try suspend? (immediately upon
> starting/recovery or after running it for few mins)
Happened several times after a long time (hours/minutes).
Also happens when I immediately try suspending right after it goes back up when that last suspend failed.

Comment 10 Michal Skrivanek 2014-08-11 14:55:32 UTC
(In reply to Vered Volansky from comment #9)
> > so you mean you reproduce it 100% only for that one VM? 
> No, I reproduced it on several vms, all kind of old, and just noticed
> they're all RHEL7.
> Some are not actually installed with RHEL7 - not sure which is which, and
> spice doesn't open in order for me to check and report back.

hm that SPICE issue is interesting. What's wrong with it? Share the logs pls

> > 
> > For any RHEL 7 guest VM? (what's the OS type in that case?)
> You mean the specific install - can't tell you without spice, probably the
> newest on the newest on PXE. 

I mean what's set in the Edit VM dialog. Also, I suppose you have 3.5 cluster level

> > you're talking about master, so is this happening for 3.5 branch or master
> > only?
> Only checked on engine master. vdsm side should be some 3.5 nightly.
> vdsm-4.16.1-0.gita4d9abf.fc20
> > how long is the VM running before you try suspend? (immediately upon
> > starting/recovery or after running it for few mins)
> Happened several times after a long time (hours/minutes).
> Also happens when I immediately try suspending right after it goes back up
> when that last suspend failed.

hm, that's very weird and shouldn't happen unless there is some serious issue in VM initialization.
Is it a proper deploy from engine on a pristine Fedora 20? VDSM logs since the VM creation till hibernate attempt may show something

Comment 11 Vered Volansky 2014-08-11 15:46:30 UTC
> hm that SPICE issue is interesting. What's wrong with it? Share the logs pls
Will do.

> I mean what's set in the Edit VM dialog. 
I only have one rhel7 option. Edit Vm dialog screenshot will be attached soon.
> Also, I suppose you have 3.5 cluster level
I do.

> Is it a proper deploy from engine on a pristine Fedora 20? VDSM logs since
> the VM creation till hibernate attempt may show something
Don't have the time of creation :(
engine is built from master (make clean install-dev), on fc19.
vdsm is installed from engine on fc20.

Comment 12 Vered Volansky 2014-08-11 15:54:30 UTC
Created attachment 925824 [details]
edit vm dialog  screenshot

Comment 13 Vered Volansky 2014-08-11 16:02:06 UTC
Created attachment 925836 [details]
engine_spice_18:59

Click on spice last on vm_rhel7 Aug 11, 18:59

Comment 14 Vered Volansky 2014-08-11 16:32:22 UTC
Created attachment 925837 [details]
engine.log

This is after another try with the same vms.

Comment 15 Vered Volansky 2014-08-11 16:35:08 UTC
Created attachment 925838 [details]
vdsm.log

This is after another try with the same vms.

Comment 16 Vered Volansky 2014-08-11 16:52:11 UTC
Eureka!
Thanks to rotation, will add vm creation logs tomorrow.

BTW, just tried to reproduce with the same VMs I did just this morning - no success. I think due to bz1128807, but this could be just a phantom bug :(

Comment 17 Michal Skrivanek 2014-08-12 06:34:36 UTC
(In reply to Vered Volansky from comment #16)
> Eureka!
> Thanks to rotation, will add vm creation logs tomorrow.
> 
> BTW, just tried to reproduce with the same VMs I did just this morning - no
> success. I think due to bz1128807, but this could be just a phantom bug :(

hm, you hit that issue in the logs in comment #14, then the engine misbehavior is explained, but the original error in vdsm you described…that must be something else.
I wouldn't duplicate this bug yet, it vdsm shouldn't be affected by the engine NPE if you can please try to get the full vdsm log with the exception

Comment 18 Vered Volansky 2014-08-12 08:29:01 UTC
Created attachment 925969 [details]
vm creation

Comment 19 Vered Volansky 2014-08-12 08:29:59 UTC
Created attachment 925970 [details]
vm creation

Comment 20 Vered Volansky 2014-08-12 08:32:28 UTC
Vms are named vm2,vm3,vm4,vm5,vm6

Comment 21 Vered Volansky 2014-08-12 09:01:55 UTC
Created attachment 925973 [details]
vdsm log around first-time bug occurance

Comment 22 Vered Volansky 2014-08-12 09:02:38 UTC
Created attachment 925974 [details]
vdsm log around first-time bug occurance

Comment 23 Vered Volansky 2014-08-12 09:03:11 UTC
Created attachment 925975 [details]
vdsm log around first-time bug occurance

Comment 24 Vinzenz Feenstra [evilissimo] 2014-08-14 10:37:16 UTC
@vered could you try to reproduce it with the current vdsm master?

We have just merged the patch http://gerrit.ovirt.org/#/c/26142/ 

Please note that I haven't annotated the patch since we were not sure yet if this is really the reason for your issues and it is a more generic issue anyway.

If this is fixing your issue we'll backport the patch and use this bug as a reference for the backport.

Thanks.

Comment 25 Vinzenz Feenstra [evilissimo] 2014-08-14 10:37:57 UTC
Please note that you'd have to build VDSM from git right now, there's no nightly build for this as of this moment.

Comment 26 Vinzenz Feenstra [evilissimo] 2014-08-29 09:42:56 UTC
@vered: Any update?

Comment 27 Vered Volansky 2014-09-02 05:39:38 UTC
Sorry, was on PTO and this got away from me.
Unfortunately no development environment for vdsm right now. Obviously I can work on that, but is there a point for me to do it considering I did manage to suspend the same vm later on?

Comment 28 Michal Skrivanek 2014-09-02 12:57:09 UTC
good then:) seems it works!