Bug 866388 - libvirt: no event is sent to vdsm in case vm is terminated on signal 15 after hibernate failure
Summary: libvirt: no event is sent to vdsm in case vm is terminated on signal 15 after...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Michal Privoznik
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 874235
TreeView+ depends on / blocked
 
Reported: 2012-10-15 09:15 UTC by Dafna Ron
Modified: 2013-02-21 07:09 UTC (History)
12 users (show)

Fixed In Version: libvirt-0.10.2-8.el6
Doc Type: Bug Fix
Doc Text:
Certain operations in libvirt can be done only when a domain is paused to prevent data corruption. However, if a resuming operation failed, the management application was not notified since no event was sent. This update introduces the VIR_DOMAIN_EVENT_SUSPENDED_API_ERROR event and management applications can now keep closer track of domain states and act accordingly.
Clone Of:
Environment:
Last Closed: 2013-02-21 07:09:59 UTC
Target Upstream Version:


Attachments (Terms of Use)
logs (1.17 MB, application/x-gzip)
2012-10-15 09:15 UTC, Dafna Ron
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0276 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2013-02-20 21:18:26 UTC

Description Dafna Ron 2012-10-15 09:15:16 UTC
Created attachment 627264 [details]
logs

Description of problem:

I hibernated a vm which failed on ENOSPACE and vm was terminated on signal 15. 
although there is no libvirt and qemu anymore the vdsm process is still up. 
looking at vdsm log, there was an error on hibernation but I do not see the terminate even in vdsm log. 

Version-Release number of selected component (if applicable):

libvirt-0.9.10-21.el6_3.5.x86_64
vdsm-4.9.6-37.0.el6_3.x86_64
qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64
qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64

How reproducible:

100%

Steps to Reproduce:
1. create a small storage domain, run a vm -> hibernate
2.
3.
  
Actual results:

vm will fail on hibernate and will be killed in qemu and libvirt but vdsm pid will remain since no even is sent to vdsm. 
the vm will be shown as non-responsive to user although it is no longer running

Expected results:

term sig should be sent to vdsm as well. 

Additional info: logs

[root@gold-vdsc kill]# vdsClient -s 0 list table
d730e33d-e191-4f84-aef1-bb0d961ddb7f   4188  RHEL6-01             Paused*                                  
[root@gold-vdsc kill]# virsh -r list
 Id    Name                           State
----------------------------------------------------

[root@gold-vdsc kill]# ps -elf |grep qemu
0 S root     31923 24215  0  80   0 - 25811 pipe_w 11:09 pts/2    00:00:00 grep qemu
[root@gold-vdsc kill]# 


vdsm is getting error on migrate but bithing on vm term sig:

Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::179::vm.Vm::(_recover) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Bad volume specification {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0', 'domainID': 'ac6943fb-d2
67-45c3-a5af-128a6e761a2e', 'volumeID': '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID': '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'}
Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::830::vm.Vm::(cont) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::cannot cont while Saving State
Thread-8343::ERROR::2012-10-14 17:55:25,875::vm::243::vm.Vm::(run) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 227, in run
    self._finishSuccessfully()
  File "/usr/share/vdsm/vm.py", line 201, in _finishSuccessfully
    fname = self._vm.cif.prepareVolumePath(self._dstparams)
  File "/usr/share/vdsm/clientIF.py", line 192, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0', 'domainID': 'ac6943fb-d267-45c3-a5af-128a6e761a2e', 'volumeID': '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID': '11d18980-5c97
-40ca-b7ff-6d1fa0f01cc8'}

Comment 1 Osier Yang 2012-10-16 09:55:44 UTC
(In reply to comment #0)
> Created attachment 627264 [details]
> logs
> 
> Description of problem:
> 
> I hibernated a vm which failed on ENOSPACE and vm was terminated on signal
> 15. 
> although there is no libvirt and qemu anymore the vdsm process is still up. 
> looking at vdsm log, there was an error on hibernation but I do not see the
> terminate even in vdsm log. 
> 
> Version-Release number of selected component (if applicable):
> 
> libvirt-0.9.10-21.el6_3.5.x86_64
> vdsm-4.9.6-37.0.el6_3.x86_64
> qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64
> qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64
> 
> How reproducible:
> 
> 100%
> 
> Steps to Reproduce:
> 1. create a small storage domain, run a vm -> hibernate
> 2.
> 3.
>   
> Actual results:
> 
> vm will fail on hibernate and will be killed in qemu and libvirt but vdsm
> pid will remain since no even is sent to vdsm. 
> the vm will be shown as non-responsive to user although it is no longer
> running
> 
> Expected results:
> 
> term sig should be sent to vdsm as well. 
> 
> Additional info: logs
> 
> [root@gold-vdsc kill]# vdsClient -s 0 list table
> d730e33d-e191-4f84-aef1-bb0d961ddb7f   4188  RHEL6-01             Paused*   
> 
> [root@gold-vdsc kill]# virsh -r list
>  Id    Name                           State
> ----------------------------------------------------
> 
> [root@gold-vdsc kill]# ps -elf |grep qemu
> 0 S root     31923 24215  0  80   0 - 25811 pipe_w 11:09 pts/2    00:00:00
> grep qemu
> [root@gold-vdsc kill]# 
> 
> 
> vdsm is getting error on migrate but bithing on vm term sig:
> 
> Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::179::vm.Vm::(_recover)
> vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Bad volume specification
> {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0',
> 'domainID': 'ac6943fb-d2
> 67-45c3-a5af-128a6e761a2e', 'volumeID':
> '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID':
> '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'}
> Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::830::vm.Vm::(cont)
> vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::cannot cont while Saving State

libvirt should already supported the event (I/O error from qemu), can you
run "/usr/share/doc/libvirt-python-0.9.10/events-python/event-test.py" to see if any event is captured?

Comment 2 Dafna Ron 2012-10-18 09:59:29 UTC
I suspended several vm's
the only one that had a problem was the one with low disk space and it seems that it happens during the suspen itself (since the other vm's are taking resources as well). 


although the vm is reported as non-responsive: 

[root@gold-vdsc ~]# vdsClient -s 0 list table
d730e33d-e191-4f84-aef1-bb0d961ddb7f  21811  RHEL6-01             Paused*                                  
[root@gold-vdsc ~]# virsh -r list
 Id    Name                           State
----------------------------------------------------

[root@gold-vdsc ~]# 

no unusual even it reported when running event-test.py: 

myDomainEventCallback1 EVENT: Domain RHEL6-01(107) Started Booted
myDomainEventCallback2 EVENT: Domain RHEL6-01(107) Started Booted
myDomainEventCallback1 EVENT: Domain Windows7-09(108) Started Booted
myDomainEventCallback2 EVENT: Domain Windows7-09(108) Started Booted
myDomainEventCallback1 EVENT: Domain Windows7-08(103) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-08(103) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-07(104) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-07(104) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-09(108) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-09(108) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-09(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-09(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain Windows7-07(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-07(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain Windows7-08(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-08(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain Windows7-06(105) Suspended Paused
myDomainEventCallback2 EVENT: Domain Windows7-06(105) Suspended Paused
myDomainEventCallback1 EVENT: Domain Win7(106) Suspended Paused
myDomainEventCallback2 EVENT: Domain Win7(106) Suspended Paused
myDomainEventCallback1 EVENT: Domain Windows7-06(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Windows7-06(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain RHEL6-01(107) Suspended Paused
myDomainEventCallback2 EVENT: Domain RHEL6-01(107) Suspended Paused
myDomainEventCallback1 EVENT: Domain Win7(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain Win7(-1) Stopped Saved
myDomainEventCallback1 EVENT: Domain RHEL6-01(-1) Stopped Saved
myDomainEventCallback2 EVENT: Domain RHEL6-01(-1) Stopped Saved



vdsm log: 

Thread-69586::ERROR::2012-10-18 11:42:07,750::vm::243::vm.Vm::(run) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 227, in run
    self._finishSuccessfully()
  File "/usr/share/vdsm/vm.py", line 201, in _finishSuccessfully
    fname = self._vm.cif.prepareVolumePath(self._dstparams)
  File "/usr/share/vdsm/clientIF.py", line 192, in prepareVolumePath
    raise vm.VolumeError(drive)
VolumeError: Bad volume specification {'device': 'disk', 'imageID': '3d2d86b2-dd7f-4170-b100-97f7608e1836', 'domainID': 'ac6943fb-d267-45c3-a5af-128a6e761a2e', 'volumeID': 'd09
cb73e-f751-482d-ba5d-79f32b8727f8', 'poolID': '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'}
Dummy-67092::DEBUG::2012-10-18 11:42:07,840::__init__::1164::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/11d18980-5c97-40ca-b7ff-6d1fa0f01cc8/mastersd/dom_md/inbox ifl
ag=direct,fullblock count=1 bs=1024000' (cwd None)

Comment 3 Osier Yang 2012-10-18 14:02:05 UTC
(In reply to comment #2)
> I suspended several vm's
> the only one that had a problem was the one with low disk space and it seems
> that it happens during the suspen itself (since the other vm's are taking
> resources as well). 
> 

I should ask how you suspended the vm? inside the guest, or command using
vdsClient?

Comment 4 Dafna Ron 2012-10-18 14:21:31 UTC
use rhevm if possible. 
if not it's suspend from vdsClient not from the guest

Comment 6 zhe peng 2012-10-26 10:56:29 UTC
Hi Dafna :

As we can't reproduce this in our env, could you help to verify this when the bug fixed? Thanks in advance. the build same with https://bugzilla.redhat.com/show_bug.cgi?id=866369#c5

Comment 7 Michal Privoznik 2012-10-29 14:16:48 UTC
Dafna,

Since this bug is closely related to bug 866369 the requested logs would let more light in here as well. Hence I am not requesting them here again. But just for the record, I am setting a needinfo flag. Feel free to remove it once you attach logs to the referenced bug.

Comment 8 Dafna Ron 2012-10-31 15:41:17 UTC
tested and commented in bug 866369

Comment 9 Michal Privoznik 2012-11-01 16:07:52 UTC
Dafna,

I've got a strong feeling that this is a dup of bug 866369. Can you please try to reproduce with scratch build I've created and see if this bug still reproduce? If it doesn't then it is a dup.

Comment 10 Dafna Ron 2012-11-01 17:03:41 UTC
since bug 866396 is not reproduced with the patch no event is suppose to be sent
it's not that the patch fixed this bug its that it's no longer reproducible.

Comment 12 Jay Turner 2012-11-07 18:39:11 UTC
qa_ack this so we can get it into the build.

Comment 16 Dafna Ron 2012-11-15 16:10:47 UTC
this build required rhel7 packages - I cannot verify with rhel7. 
verified on libvirt-0.9.10-21.el6_3.6 only

Comment 20 Dafna Ron 2012-11-22 15:37:43 UTC
verified on libvirt-0.10.2-8.el6

Comment 21 errata-xmlrpc 2013-02-21 07:09:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html


Note You need to log in before you can comment on or make changes to this bug.