Bug 700048 - VDSM (w/ libvirt): when sending setVmTicket right after VM destroy libvirt returns empty reply which results in vdsm error
Summary: VDSM (w/ libvirt): when sending setVmTicket right after VM destroy libvirt re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Eduardo Warszawski
QA Contact: Dafna Ron
URL:
Whiteboard:
Depends On:
Blocks: 705839
TreeView+ depends on / blocked
 
Reported: 2011-04-27 10:56 UTC by Dafna Ron
Modified: 2011-12-06 07:15 UTC (History)
11 users (show)

Fixed In Version: vdsm-4.9-65.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 705839 (view as bug list)
Environment:
Last Closed: 2011-12-06 07:15:14 UTC
Target Upstream Version:


Attachments (Terms of Use)
logs (11.17 KB, application/x-gzip)
2011-05-02 11:14 UTC, Dafna Ron
no flags Details
reproducer.py (975 bytes, text/x-python)
2011-05-03 09:11 UTC, Michal Privoznik
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 0 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description Dafna Ron 2011-04-27 10:56:43 UTC
Description of problem:

Right after sending Destroy for a VM and opening a monitor window in rhevm (setVmTicket) the libvirt sends empty reply which causes exception in vdsm:

Thread-326::ERROR::2011-04-27 12:06:25,981::clientIF::51::vds::(wrapper) Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 47, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/clientIF.py", line 426, in setVmTicket
    return vm.setTicket(otp, seconds, connAct)
  File "/usr/share/vdsm/libvirtvm.py", line 934, in setTicket
    graphics = xml.dom.minidom.parseString(drone) \
  File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/minidom.py", line 1925, in parseString
    return expatbuilder.parseString(string)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 942, in parseString
    return builder.parseString(string)
  File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
TypeError: Parse() argument 1 must be string or read-only buffer, not None



Version-Release number of selected component (if applicable):
ic114
vdsm-debug-plugin-4.9-59.el6.x86_64
vdsm-4.9-59.el6.x86_64
vdsm-cli-4.9-59.el6.x86_64
vdsm-debuginfo-4.9-59.el6.x86_64
qemu-kvm-0.12.1.2-2.152.el6.x86_64
qemu-img-0.12.1.2-2.152.el6.x86_64
gpxe-roms-qemu-0.9.7-6.7.el6.noarch
libvirt-debuginfo-0.8.7-17.el6.x86_64
libvirt-devel-0.8.7-17.el6.x86_64
libvirt-python-0.8.7-17.el6.x86_64
libvirt-client-0.8.7-17.el6.x86_64
libvirt-0.8.7-17.el6.x86_64


How reproducible:

100%

Steps to Reproduce:
1. start a VM -> send destroy 
2. open monitor (setVmTicket)
3.
  
Actual results:

libvirt will send blank reply and you will get error in vdsm and in backend

Expected results:

we should either get an error or the operation should succeed. 
in the past, you could open a monitor while VM was shutting down. 

Additional info:vdsm and libvirt logs

Comment 5 Michal Privoznik 2011-05-02 10:50:50 UTC
Dafna,

i guess the problem arises when you call self._dom.XMLDesc(0) which returns empty reply, right? So can you please attach libvirt & vdsm logs? And are you able to reproduce this with bare libvirt? I tried this:

virsh -c qemu:///system "destroy test1; dumpxml test1"

(assume test1 was running) and it works well.

Comment 6 Dafna Ron 2011-05-02 11:14:11 UTC
(In reply to comment #5)
> Dafna,
> 
> i guess the problem arises when you call self._dom.XMLDesc(0) which returns
> empty reply, right? So can you please attach libvirt & vdsm logs? 

yes, the error happens when call self._dom.XMLDesc 
I'll attach logs for vdsm and libvirt

And are you
> able to reproduce this with bare libvirt? I tried this:
> 
> virsh -c qemu:///system "destroy test1; dumpxml test1"
> 
> (assume test1 was running) and it works well.

I cannot really reproduce this with bare libvirt - this operation is directly connected to the vdsm and rhevm.

Comment 7 Dafna Ron 2011-05-02 11:14:54 UTC
Created attachment 496221 [details]
logs

Comment 8 Michal Privoznik 2011-05-02 12:59:47 UTC
Dafna,

those VMs created are persistent or transient one? Because from libvirt logs I can see how VDSM destroy domain in one thread and right after another thread tries to get XML for the domain being destroyed. But it does not succeed, because the domain could not be found. And this happens if the domain is transient = destroying it will also remove its definition which results in failures of any further calls related to it.

And if I read the VDSM code right, you obtain domain pointer once and forward it then to all subsequent calls. Combined with written above, you'll get an error.

Whether domain is persistent or not you can find out by running virsh dominfo <domain>;

Comment 11 Dan Kenigsberg 2011-05-02 14:22:16 UTC
vdsm uses only transient VMs.

In case of an error, libvirt-python should raise an exception, not return None as the domxml.

Comment 12 Michal Privoznik 2011-05-02 17:32:04 UTC
I agree, but in the code is check to aviod returning None in favor of raising exception. And if i try to destroy domain and get its XML right after, I obtain exception.

Is there a possibility to reproduce this on a running system?

Comment 13 Dan Kenigsberg 2011-05-03 07:51:54 UTC
I suspect it's a race that can be manifested only when destroy and dumpxml are called from different threads.

14:03:25.048: 21723: debug : virDomainDestroy:2251 : dom=0x7f0830035210, (VM: name=VM1, uuid=a585336d-308b-4d87-a3f2-a6d1140f0dc9), 
14:03:25.057: 21724: debug : virDomainGetXMLDesc:3256 : dom=0x7f0830035210, (VM: name=VM1, uuid=a585336d-308b-4d87-a3f2-a6d1140f0dc9), flags=0
14:03:25.289: 21723: debug : remoteIO:10506 : All done with our call 12 0x7f08280208a0 0x7f082c075650

Comment 14 Michal Privoznik 2011-05-03 09:11:04 UTC
Created attachment 496462 [details]
reproducer.py

Well, I've tried to create a reproducer with threads and didn't succeed.
It shouldn't be race condition, because libvirt locks drivers (=parts of software talking to hypervisor). And we can see from log the domain was really not found, which is right and expected.

What I wonder about is - virDomainGetXMLDesc returns NULL, which get translated into python None. But the test right after, which raises an exeption in case of returning None, somehow fails.

I ran attached 'reproducer' and got this:

10:54:04.076: 3405: debug : virDomainDestroy:2251 : dom=0x168a560, (VM: name=test1, uuid=60f7975a-c6c7-0d2b-5344-867986dddef2),
10:54:04.076: 3406: debug : virDomainGetXMLDesc:3256 : dom=0x168a560, (VM: name=test1, uuid=60f7975a-c6c7-0d2b-5344-867986dddef2), flags=0
10:54:04.076: 3405: debug : remoteIO:10406 : Do proc=12 serial=4 length=60 wait=(nil)
10:54:04.076: 3405: debug : remoteIO:10478 : We have the buck 12 0x7f3e54001300 0x7f3e54001300
10:54:04.076: 3406: debug : remoteIO:10406 : Do proc=14 serial=5 length=64 wait=0x7f3e54001300
10:54:04.076: 3406: debug : remoteIO:10435 : Going to sleep 14 0x7f3e54001300 0x7f3e4c001140
10:54:04.076: 3405: debug : remoteIOEventLoop:10266 : Woken up from poll by other thread
10:54:04.210: 3405: debug : remoteIODecodeMessageLength:9802 : Got length, now need 56 total (52 more)
10:54:04.210: 3405: debug : remoteIOEventLoop:10332 : Giving up the buck 12 0x7f3e54001300 0x7f3e4c001140
10:54:04.210: 3405: debug : remoteIOEventLoop:10336 : Passing the buck to 14 0x7f3e4c001140
10:54:04.210: 3405: debug : remoteIO:10506 : All done with our call 12 0x7f3e4c001140 0x7f3e54001300
10:54:04.210: 3406: debug : remoteIO:10454 : Wokeup from sleep 14 0x7f3e4c001140 0x7f3e4c001140
10:54:04.210: 3406: debug : remoteIO:10478 : We have the buck 14 0x7f3e4c001140 0x7f3e4c001140
10:54:04.210: 3406: debug : remoteIODecodeMessageLength:9802 : Got length, now need 260 total (256 more)
10:54:04.210: 3406: debug : remoteIOEventLoop:10332 : Giving up the buck 14 0x7f3e4c001140 (nil)
10:54:04.210: 3406: debug : remoteIO:10506 : All done with our call 14 (nil) 0x7f3e4c001140
libvir: QEMU error : Domain not found: no domain with matching uuid '60f7975a-c6c7-0d2b-5344-867986dddef2'
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "reproducer.py", line 23, in run
    print dom.XMLDesc(0)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 239, in XMLDesc
    if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '60f7975a-c6c7-0d2b-5344-867986dddef2'


Can you please try running this and confirm it does or does not return None?

Comment 15 Dan Kenigsberg 2011-05-06 20:14:26 UTC
I'm afraid your analysis is correct. vdsm is probably the culprit that have swallowed the exception.

Eduardo, I think this is a dup of bug 695244.

Comment 17 Dafna Ron 2011-05-17 07:45:00 UTC
we are noe seeing an error in vdsm:

Thread-278::ERROR::2011-05-17 10:33:31,457::clientIF::52::vds::(wrapper) Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 48, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/clientIF.py", line 426, in setVmTicket
    return vm.setTicket(otp, seconds, connAct)
  File "/usr/share/vdsm/libvirtvm.py", line 1059, in setTicket
    graphics = xml.dom.minidom.parseString(self._dom.XMLDesc(0)) \
  File "/usr/share/vdsm/libvirtvm.py", line 306, in f
    ret = attr(*args, **kwargs)
  File "/usr/share/vdsm/libvirtconnection.py", line 59, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 239, in XMLDesc
    if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'b06e6840-38da-41da-883c-90dba8aeb757'


but there is no error code which means that the backend presents "unexpected exception code:16"

2011-05-17 10:34:52,685 ERROR [com.redhat.rhevm.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-2) Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = Unexpected exception
2011-05-17 10:34:52,686 ERROR [com.redhat.rhevm.vdsbroker.VDSCommandBase] (http-0.0.0.0-8443-2) Command SetVmTicketVDS execution failed. Exception: VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = Unexpected exception

please add error code so that backend can display proper error.

Comment 18 Ayal Baron 2011-05-18 11:23:52 UTC
> 2011-05-17 10:34:52,685 ERROR
> [com.redhat.rhevm.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-2)
> Error code unexpected and error message VDSGenericException: VDSErrorException:
> Failed to SetVmTicketVDS, error = Unexpected exception
> 2011-05-17 10:34:52,686 ERROR [com.redhat.rhevm.vdsbroker.VDSCommandBase]
> (http-0.0.0.0-8443-2) Command SetVmTicketVDS execution failed. Exception:
> VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error =
> Unexpected exception
> 
> please add error code so that backend can display proper error.

This has nothing to do with the original bug (which was a regression etc), this is a minor issue about propagating a different error.
Moving this bug to VERIFIED according to comment #17.
If you wish, please open a different bug about throwing a more specific error.

Comment 19 errata-xmlrpc 2011-12-06 07:15:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html


Note You need to log in before you can comment on or make changes to this bug.