Summary: | libvirt: no event is sent to vdsm in case vm is terminated on signal 15 after hibernate failure | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 6.3 | CC: | acathrow, cpelland, dallan, dyasny, dyuan, hateya, jturner, mburns, mzhan, rwu, whuang, zpeng | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-0.10.2-8.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Certain operations in libvirt can be done only when a domain is paused to prevent data corruption. However, if a resuming operation failed, the management application was not notified since no event was sent. This update introduces the VIR_DOMAIN_EVENT_SUSPENDED_API_ERROR event and management applications can now keep closer track of domain states and act accordingly.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-21 07:09:59 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Bug Depends On: | |||||||
Bug Blocks: | 874235 | ||||||
Attachments: |
|
(In reply to comment #0) > Created attachment 627264 [details] > logs > > Description of problem: > > I hibernated a vm which failed on ENOSPACE and vm was terminated on signal > 15. > although there is no libvirt and qemu anymore the vdsm process is still up. > looking at vdsm log, there was an error on hibernation but I do not see the > terminate even in vdsm log. > > Version-Release number of selected component (if applicable): > > libvirt-0.9.10-21.el6_3.5.x86_64 > vdsm-4.9.6-37.0.el6_3.x86_64 > qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64 > qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64 > > How reproducible: > > 100% > > Steps to Reproduce: > 1. create a small storage domain, run a vm -> hibernate > 2. > 3. > > Actual results: > > vm will fail on hibernate and will be killed in qemu and libvirt but vdsm > pid will remain since no even is sent to vdsm. > the vm will be shown as non-responsive to user although it is no longer > running > > Expected results: > > term sig should be sent to vdsm as well. > > Additional info: logs > > [root@gold-vdsc kill]# vdsClient -s 0 list table > d730e33d-e191-4f84-aef1-bb0d961ddb7f 4188 RHEL6-01 Paused* > > [root@gold-vdsc kill]# virsh -r list > Id Name State > ---------------------------------------------------- > > [root@gold-vdsc kill]# ps -elf |grep qemu > 0 S root 31923 24215 0 80 0 - 25811 pipe_w 11:09 pts/2 00:00:00 > grep qemu > [root@gold-vdsc kill]# > > > vdsm is getting error on migrate but bithing on vm term sig: > > Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::179::vm.Vm::(_recover) > vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Bad volume specification > {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0', > 'domainID': 'ac6943fb-d2 > 67-45c3-a5af-128a6e761a2e', 'volumeID': > '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID': > '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'} > Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::830::vm.Vm::(cont) > vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::cannot cont while Saving State libvirt should already supported the event (I/O error from qemu), can you run "/usr/share/doc/libvirt-python-0.9.10/events-python/event-test.py" to see if any event is captured? I suspended several vm's the only one that had a problem was the one with low disk space and it seems that it happens during the suspen itself (since the other vm's are taking resources as well). although the vm is reported as non-responsive: [root@gold-vdsc ~]# vdsClient -s 0 list table d730e33d-e191-4f84-aef1-bb0d961ddb7f 21811 RHEL6-01 Paused* [root@gold-vdsc ~]# virsh -r list Id Name State ---------------------------------------------------- [root@gold-vdsc ~]# no unusual even it reported when running event-test.py: myDomainEventCallback1 EVENT: Domain RHEL6-01(107) Started Booted myDomainEventCallback2 EVENT: Domain RHEL6-01(107) Started Booted myDomainEventCallback1 EVENT: Domain Windows7-09(108) Started Booted myDomainEventCallback2 EVENT: Domain Windows7-09(108) Started Booted myDomainEventCallback1 EVENT: Domain Windows7-08(103) Suspended Paused myDomainEventCallback2 EVENT: Domain Windows7-08(103) Suspended Paused myDomainEventCallback1 EVENT: Domain Windows7-07(104) Suspended Paused myDomainEventCallback2 EVENT: Domain Windows7-07(104) Suspended Paused myDomainEventCallback1 EVENT: Domain Windows7-09(108) Suspended Paused myDomainEventCallback2 EVENT: Domain Windows7-09(108) Suspended Paused myDomainEventCallback1 EVENT: Domain Windows7-09(-1) Stopped Saved myDomainEventCallback2 EVENT: Domain Windows7-09(-1) Stopped Saved myDomainEventCallback1 EVENT: Domain Windows7-07(-1) Stopped Saved myDomainEventCallback2 EVENT: Domain Windows7-07(-1) Stopped Saved myDomainEventCallback1 EVENT: Domain Windows7-08(-1) Stopped Saved myDomainEventCallback2 EVENT: Domain Windows7-08(-1) Stopped Saved myDomainEventCallback1 EVENT: Domain Windows7-06(105) Suspended Paused myDomainEventCallback2 EVENT: Domain Windows7-06(105) Suspended Paused myDomainEventCallback1 EVENT: Domain Win7(106) Suspended Paused myDomainEventCallback2 EVENT: Domain Win7(106) Suspended Paused myDomainEventCallback1 EVENT: Domain Windows7-06(-1) Stopped Saved myDomainEventCallback2 EVENT: Domain Windows7-06(-1) Stopped Saved myDomainEventCallback1 EVENT: Domain RHEL6-01(107) Suspended Paused myDomainEventCallback2 EVENT: Domain RHEL6-01(107) Suspended Paused myDomainEventCallback1 EVENT: Domain Win7(-1) Stopped Saved myDomainEventCallback2 EVENT: Domain Win7(-1) Stopped Saved myDomainEventCallback1 EVENT: Domain RHEL6-01(-1) Stopped Saved myDomainEventCallback2 EVENT: Domain RHEL6-01(-1) Stopped Saved vdsm log: Thread-69586::ERROR::2012-10-18 11:42:07,750::vm::243::vm.Vm::(run) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 227, in run self._finishSuccessfully() File "/usr/share/vdsm/vm.py", line 201, in _finishSuccessfully fname = self._vm.cif.prepareVolumePath(self._dstparams) File "/usr/share/vdsm/clientIF.py", line 192, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {'device': 'disk', 'imageID': '3d2d86b2-dd7f-4170-b100-97f7608e1836', 'domainID': 'ac6943fb-d267-45c3-a5af-128a6e761a2e', 'volumeID': 'd09 cb73e-f751-482d-ba5d-79f32b8727f8', 'poolID': '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'} Dummy-67092::DEBUG::2012-10-18 11:42:07,840::__init__::1164::Storage.Misc.excCmd::(_log) 'dd if=/rhev/data-center/11d18980-5c97-40ca-b7ff-6d1fa0f01cc8/mastersd/dom_md/inbox ifl ag=direct,fullblock count=1 bs=1024000' (cwd None) (In reply to comment #2) > I suspended several vm's > the only one that had a problem was the one with low disk space and it seems > that it happens during the suspen itself (since the other vm's are taking > resources as well). > I should ask how you suspended the vm? inside the guest, or command using vdsClient? use rhevm if possible. if not it's suspend from vdsClient not from the guest Hi Dafna : As we can't reproduce this in our env, could you help to verify this when the bug fixed? Thanks in advance. the build same with https://bugzilla.redhat.com/show_bug.cgi?id=866369#c5 Dafna, Since this bug is closely related to bug 866369 the requested logs would let more light in here as well. Hence I am not requesting them here again. But just for the record, I am setting a needinfo flag. Feel free to remove it once you attach logs to the referenced bug. tested and commented in bug 866369 Dafna, I've got a strong feeling that this is a dup of bug 866369. Can you please try to reproduce with scratch build I've created and see if this bug still reproduce? If it doesn't then it is a dup. since bug 866396 is not reproduced with the patch no event is suppose to be sent it's not that the patch fixed this bug its that it's no longer reproducible. Moving to POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-November/msg00069.html qa_ack this so we can get it into the build. this build required rhel7 packages - I cannot verify with rhel7. verified on libvirt-0.9.10-21.el6_3.6 only verified on libvirt-0.10.2-8.el6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0276.html |
Created attachment 627264 [details] logs Description of problem: I hibernated a vm which failed on ENOSPACE and vm was terminated on signal 15. although there is no libvirt and qemu anymore the vdsm process is still up. looking at vdsm log, there was an error on hibernation but I do not see the terminate even in vdsm log. Version-Release number of selected component (if applicable): libvirt-0.9.10-21.el6_3.5.x86_64 vdsm-4.9.6-37.0.el6_3.x86_64 qemu-img-rhev-0.12.1.2-2.295.el6_3.2.x86_64 qemu-kvm-rhev-0.12.1.2-2.295.el6_3.2.x86_64 How reproducible: 100% Steps to Reproduce: 1. create a small storage domain, run a vm -> hibernate 2. 3. Actual results: vm will fail on hibernate and will be killed in qemu and libvirt but vdsm pid will remain since no even is sent to vdsm. the vm will be shown as non-responsive to user although it is no longer running Expected results: term sig should be sent to vdsm as well. Additional info: logs [root@gold-vdsc kill]# vdsClient -s 0 list table d730e33d-e191-4f84-aef1-bb0d961ddb7f 4188 RHEL6-01 Paused* [root@gold-vdsc kill]# virsh -r list Id Name State ---------------------------------------------------- [root@gold-vdsc kill]# ps -elf |grep qemu 0 S root 31923 24215 0 80 0 - 25811 pipe_w 11:09 pts/2 00:00:00 grep qemu [root@gold-vdsc kill]# vdsm is getting error on migrate but bithing on vm term sig: Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::179::vm.Vm::(_recover) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Bad volume specification {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0', 'domainID': 'ac6943fb-d2 67-45c3-a5af-128a6e761a2e', 'volumeID': '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID': '11d18980-5c97-40ca-b7ff-6d1fa0f01cc8'} Thread-8343::ERROR::2012-10-14 17:55:25,867::vm::830::vm.Vm::(cont) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::cannot cont while Saving State Thread-8343::ERROR::2012-10-14 17:55:25,875::vm::243::vm.Vm::(run) vmId=`d730e33d-e191-4f84-aef1-bb0d961ddb7f`::Failed to migrate Traceback (most recent call last): File "/usr/share/vdsm/vm.py", line 227, in run self._finishSuccessfully() File "/usr/share/vdsm/vm.py", line 201, in _finishSuccessfully fname = self._vm.cif.prepareVolumePath(self._dstparams) File "/usr/share/vdsm/clientIF.py", line 192, in prepareVolumePath raise vm.VolumeError(drive) VolumeError: Bad volume specification {'device': 'disk', 'imageID': '895afdf7-0f5f-4686-bb4a-32cb71f01be0', 'domainID': 'ac6943fb-d267-45c3-a5af-128a6e761a2e', 'volumeID': '05e9f625-c17d-406c-a9eb-3178b7d725e4', 'poolID': '11d18980-5c97 -40ca-b7ff-6d1fa0f01cc8'}