Bug 866388 - libvirt: no event is sent to vdsm in case vm is terminated on signal 15 after hibernate failure
libvirt: no event is sent to vdsm in case vm is terminated on signal 15 after...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt (Show other bugs)
6.3
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Michal Privoznik
Virtualization Bugs
: ZStream
Depends On:
Blocks: 874235
  Show dependency treegraph
 
Reported: 2012-10-15 05:15 EDT by Dafna Ron
Modified: 2013-02-21 02:09 EST (History)
12 users (show)

See Also:
Fixed In Version: libvirt-0.10.2-8.el6
Doc Type: Bug Fix
Doc Text:
Certain operations in libvirt can be done only when a domain is paused to prevent data corruption. However, if a resuming operation failed, the management application was not notified since no event was sent. This update introduces the VIR_DOMAIN_EVENT_SUSPENDED_API_ERROR event and management applications can now keep closer track of domain states and act accordingly.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 02:09:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
logs (1.17 MB, application/x-gzip)
2012-10-15 05:15 EDT, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2012-10-15 05:15:16 EDT
Created attachment 627264 [details]
logs

Description of problem:

I hibernated a vm which failed on ENOSPACE and vm was terminated on signal 15. 
although there is no libvirt and qemu anymore the vdsm process is still up. 
looking at vdsm log, there was an error on hibernation but I do not see the terminate even in vdsm log. 

Version-Release number of selected component (if applicable):

libvirt-0.9.10-21.el6_3.5.x86_64
vdsm-4.9.6-37.0.el6_3.x86_64
qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create a small storage domain, run a vm -> hibernate
2.
3.
  
Actual results:

vm will fail on hibernate and will be killed in qemu and libvirt but vdsm pid will remain since no even is sent to vdsm. 
the vm will be shown as non-responsive to user although it is no longer running

Expected results:

term sig should be sent to vdsm as well. 

Additional info: logs

[root@gold-vdsc kill]# vdsClient -s 0 list table
d730e33d-e191-4f84-aef1-bb0d961ddb7f   4188  RHEL6-01             Paused*                                  
[root@gold-vdsc kill]# virsh -r list
 Id    Name                           State
----------------------------------------------------

[root@gold-vdsc kill]# ps -elf |grep qemu
0 S root     31923 24215  0  80   0 - 25811 pipe_w 11:09 pts/2    00:00:00 grep qemu
[root@gold-vdsc kill]# 


vdsm is getting error on migrate but bithing on vm term sig:

Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::179::vm.Vm::(_recover) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Bad volume specification {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0', 'domainID': 'ac6943fb-d2
67-45c3-a5af-128a6e761a2e', 'volumeID': '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID': '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'}
Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::830::vm.Vm::(cont) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::cannot cont while Saving State
Thread-8343::ERROR::2012-10-14 17:55:25,875::vm::243::vm.Vm::(run) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 227, in run
    self._finishSuccessfully()
  File "/usr/share/vdsm/vm.py", line 201, in _finishSuccessfully
    fname = self._vm.cif.prepareVolumePath(self._dstparams)
  File "/usr/share/vdsm/clientIF.py", line 192, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0', 'domainID': 'ac6943fb-d267-45c3-a5af-128a6e761a2e', 'volumeID': '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID': '11d18980-5c97
-40ca-b7ff-6d1fa0f01cc8'}
Comment 1 Osier Yang 2012-10-16 05:55:44 EDT
(In reply to comment #0)
> Created attachment 627264 [details]
> logs
> 
> Description of problem:
> 
> I hibernated a vm which failed on ENOSPACE and vm was terminated on signal
> 15. 
> although there is no libvirt and qemu anymore the vdsm process is still up. 
> looking at vdsm log, there was an error on hibernation but I do not see the
> terminate even in vdsm log. 
> 
> Version-Release number of selected component (if applicable):
> 
> libvirt-0.9.10-21.el6_3.5.x86_64
> vdsm-4.9.6-37.0.el6_3.x86_64
> qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64
> qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64
> 
> How reproducible:
> 
> 100%
> 
> Steps to Reproduce:
> 1. create a small storage domain, run a vm -> hibernate
> 2.
> 3.
>   
> Actual results:
> 
> vm will fail on hibernate and will be killed in qemu and libvirt but vdsm
> pid will remain since no even is sent to vdsm. 
> the vm will be shown as non-responsive to user although it is no longer
> running
> 
> Expected results:
> 
> term sig should be sent to vdsm as well. 
> 
> Additional info: logs
> 
> [root@gold-vdsc kill]# vdsClient -s 0 list table
> d730e33d-e191-4f84-aef1-bb0d961ddb7f   4188  RHEL6-01             Paused*   
> 
> [root@gold-vdsc kill]# virsh -r list
>  Id    Name                           State
> ----------------------------------------------------
> 
> [root@gold-vdsc kill]# ps -elf |grep qemu
> 0 S root     31923 24215  0  80   0 - 25811 pipe_w 11:09 pts/2    00:00:00
> grep qemu
> [root@gold-vdsc kill]# 
> 
> 
> vdsm is getting error on migrate but bithing on vm term sig:
> 
> Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::179::vm.Vm::(_recover)
> vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Bad volume specification
> {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0',
> 'domainID': 'ac6943fb-d2
> 67-45c3-a5af-128a6e761a2e', 'volumeID':
> '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID':
> '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'}
> Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::830::vm.Vm::(cont)
> vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::cannot cont while Saving State

libvirt should already supported the event (I/O error from qemu), can you
run "/usr/share/doc/libvirt-python-0.9.10/events-python/event-test.py" to see if any event is captured?
Comment 2 Dafna Ron 2012-10-18 05:59:29 EDT
I suspended several vm's
the only one that had a problem was the one with low disk space and it seems that it happens during the suspen itself (since the other vm's are taking resources as well). 


although the vm is reported as non-responsive: 

[root@gold-vdsc ~]# vdsClient -s 0 list table
d730e33d-e191-4f84-aef1-bb0d961ddb7f  21811  RHEL6-01             Paused*                                  
[root@gold-vdsc ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------

[root@gold-vdsc ~]# 

no unusual even it reported when running event-test.py: 

myDomainEventCallback1 EVENT: Domain RHEL6-01(107) Started Booted
myDomainEventCallback2 EVENT: Domain RHEL6-01(107) Started Booted
myDomainEventCallback1 EVENT: Domain Windows7-09(108) Started Booted
myDomainEventCallback2 EVENT: Domain Windows7-09(108) Started Booted
myDomainEventCallback1 EVENT: Domain Windows7-08(103) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-08(103) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-07(104) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-07(104) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-09(108) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-09(108) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-09(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-09(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain Windows7-07(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-07(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain Windows7-08(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-08(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain Windows7-06(105) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-06(105) Suspended Paused
myDomainEventCallback1 EVENT: Domain Win7(106) Suspended Paused
myDomainEventCallback2 EVENT: Domain Win7(106) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-06(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-06(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain RHEL6-01(107) Suspended Paused
myDomainEventCallback2 EVENT: Domain RHEL6-01(107) Suspended Paused
myDomainEventCallback1 EVENT: Domain Win7(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Win7(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain RHEL6-01(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain RHEL6-01(-1) Stopped Saved



vdsm log: 

Thread-69586::ERROR::2012-10-18 11:42:07,750::vm::243::vm.Vm::(run) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 227, in run
    self._finishSuccessfully()
  File "/usr/share/vdsm/vm.py", line 201, in _finishSuccessfully
    fname = self._vm.cif.prepareVolumePath(self._dstparams)
  File "/usr/share/vdsm/clientIF.py", line 192, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'device': 'disk', 'imageID': '3d2d86b2-dd7f-4170-b100-97f7608e1836', 'domainID': 'ac6943fb-d267-45c3-a5af-128a6e761a2e', 'volumeID': 'd09
cb73e-f751-482d-ba5d-79f32b8727f8', 'poolID': '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'}
Dummy-67092::DEBUG::2012-10-18 11:42:07,840::__init__::1164::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/11d18980-5c97-40ca-b7ff-6d1fa0f01cc8/mastersd/dom_md/inbox ifl
ag=direct,fullblock count=1 bs=1024000' (cwd None)
Comment 3 Osier Yang 2012-10-18 10:02:05 EDT
(In reply to comment #2)
> I suspended several vm's
> the only one that had a problem was the one with low disk space and it seems
> that it happens during the suspen itself (since the other vm's are taking
> resources as well). 
> 

I should ask how you suspended the vm? inside the guest, or command using
vdsClient?
Comment 4 Dafna Ron 2012-10-18 10:21:31 EDT
use rhevm if possible. 
if not it's suspend from vdsClient not from the guest
Comment 6 zhe peng 2012-10-26 06:56:29 EDT
Hi Dafna :

As we can't reproduce this in our env, could you help to verify this when the bug fixed? Thanks in advance. the build same with https://bugzilla.redhat.com/show_bug.cgi?id=866369#c5
Comment 7 Michal Privoznik 2012-10-29 10:16:48 EDT
Dafna,

Since this bug is closely related to bug 866369 the requested logs would let more light in here as well. Hence I am not requesting them here again. But just for the record, I am setting a needinfo flag. Feel free to remove it once you attach logs to the referenced bug.
Comment 8 Dafna Ron 2012-10-31 11:41:17 EDT
tested and commented in bug 866369
Comment 9 Michal Privoznik 2012-11-01 12:07:52 EDT
Dafna,

I've got a strong feeling that this is a dup of bug 866369. Can you please try to reproduce with scratch build I've created and see if this bug still reproduce? If it doesn't then it is a dup.
Comment 10 Dafna Ron 2012-11-01 13:03:41 EDT
since bug 866396 is not reproduced with the patch no event is suppose to be sent
it's not that the patch fixed this bug its that it's no longer reproducible.
Comment 12 Jay Turner 2012-11-07 13:39:11 EST
qa_ack this so we can get it into the build.
Comment 16 Dafna Ron 2012-11-15 11:10:47 EST
this build required rhel7 packages - I cannot verify with rhel7. 
verified on libvirt-0.9.10-21.el6_3.6 only
Comment 20 Dafna Ron 2012-11-22 10:37:43 EST
verified on libvirt-0.10.2-8.el6
Comment 21 errata-xmlrpc 2013-02-21 02:09:59 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html

Note You need to log in before you can comment on or make changes to this bug.