Hide Forgot
Description of problem: Right after sending Destroy for a VM and opening a monitor window in rhevm (setVmTicket) the libvirt sends empty reply which causes exception in vdsm: Thread-326::ERROR::2011-04-27 12:06:25,981::clientIF::51::vds::(wrapper) Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 47, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/clientIF.py", line 426, in setVmTicket return vm.setTicket(otp, seconds, connAct) File "/usr/share/vdsm/libvirtvm.py", line 934, in setTicket graphics = xml.dom.minidom.parseString(drone) \ File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/minidom.py", line 1925, in parseString return expatbuilder.parseString(string) File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 942, in parseString return builder.parseString(string) File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 223, in parseString parser.Parse(string, True) TypeError: Parse() argument 1 must be string or read-only buffer, not None Version-Release number of selected component (if applicable): ic114 vdsm-debug-plugin-4.9-59.el6.x86_64 vdsm-4.9-59.el6.x86_64 vdsm-cli-4.9-59.el6.x86_64 vdsm-debuginfo-4.9-59.el6.x86_64 qemu-kvm-0.12.1.2-2.152.el6.x86_64 qemu-img-0.12.1.2-2.152.el6.x86_64 gpxe-roms-qemu-0.9.7-6.7.el6.noarch libvirt-debuginfo-0.8.7-17.el6.x86_64 libvirt-devel-0.8.7-17.el6.x86_64 libvirt-python-0.8.7-17.el6.x86_64 libvirt-client-0.8.7-17.el6.x86_64 libvirt-0.8.7-17.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. start a VM -> send destroy 2. open monitor (setVmTicket) 3. Actual results: libvirt will send blank reply and you will get error in vdsm and in backend Expected results: we should either get an error or the operation should succeed. in the past, you could open a monitor while VM was shutting down. Additional info:vdsm and libvirt logs
Dafna, i guess the problem arises when you call self._dom.XMLDesc(0) which returns empty reply, right? So can you please attach libvirt & vdsm logs? And are you able to reproduce this with bare libvirt? I tried this: virsh -c qemu:///system "destroy test1; dumpxml test1" (assume test1 was running) and it works well.
(In reply to comment #5) > Dafna, > > i guess the problem arises when you call self._dom.XMLDesc(0) which returns > empty reply, right? So can you please attach libvirt & vdsm logs? yes, the error happens when call self._dom.XMLDesc I'll attach logs for vdsm and libvirt And are you > able to reproduce this with bare libvirt? I tried this: > > virsh -c qemu:///system "destroy test1; dumpxml test1" > > (assume test1 was running) and it works well. I cannot really reproduce this with bare libvirt - this operation is directly connected to the vdsm and rhevm.
Created attachment 496221 [details] logs
Dafna, those VMs created are persistent or transient one? Because from libvirt logs I can see how VDSM destroy domain in one thread and right after another thread tries to get XML for the domain being destroyed. But it does not succeed, because the domain could not be found. And this happens if the domain is transient = destroying it will also remove its definition which results in failures of any further calls related to it. And if I read the VDSM code right, you obtain domain pointer once and forward it then to all subsequent calls. Combined with written above, you'll get an error. Whether domain is persistent or not you can find out by running virsh dominfo <domain>;
vdsm uses only transient VMs. In case of an error, libvirt-python should raise an exception, not return None as the domxml.
I agree, but in the code is check to aviod returning None in favor of raising exception. And if i try to destroy domain and get its XML right after, I obtain exception. Is there a possibility to reproduce this on a running system?
I suspect it's a race that can be manifested only when destroy and dumpxml are called from different threads. 14:03:25.048: 21723: debug : virDomainDestroy:2251 : dom=0x7f0830035210, (VM: name=VM1, uuid=a585336d-308b-4d87-a3f2-a6d1140f0dc9), 14:03:25.057: 21724: debug : virDomainGetXMLDesc:3256 : dom=0x7f0830035210, (VM: name=VM1, uuid=a585336d-308b-4d87-a3f2-a6d1140f0dc9), flags=0 14:03:25.289: 21723: debug : remoteIO:10506 : All done with our call 12 0x7f08280208a0 0x7f082c075650
Created attachment 496462 [details] reproducer.py Well, I've tried to create a reproducer with threads and didn't succeed. It shouldn't be race condition, because libvirt locks drivers (=parts of software talking to hypervisor). And we can see from log the domain was really not found, which is right and expected. What I wonder about is - virDomainGetXMLDesc returns NULL, which get translated into python None. But the test right after, which raises an exeption in case of returning None, somehow fails. I ran attached 'reproducer' and got this: 10:54:04.076: 3405: debug : virDomainDestroy:2251 : dom=0x168a560, (VM: name=test1, uuid=60f7975a-c6c7-0d2b-5344-867986dddef2), 10:54:04.076: 3406: debug : virDomainGetXMLDesc:3256 : dom=0x168a560, (VM: name=test1, uuid=60f7975a-c6c7-0d2b-5344-867986dddef2), flags=0 10:54:04.076: 3405: debug : remoteIO:10406 : Do proc=12 serial=4 length=60 wait=(nil) 10:54:04.076: 3405: debug : remoteIO:10478 : We have the buck 12 0x7f3e54001300 0x7f3e54001300 10:54:04.076: 3406: debug : remoteIO:10406 : Do proc=14 serial=5 length=64 wait=0x7f3e54001300 10:54:04.076: 3406: debug : remoteIO:10435 : Going to sleep 14 0x7f3e54001300 0x7f3e4c001140 10:54:04.076: 3405: debug : remoteIOEventLoop:10266 : Woken up from poll by other thread 10:54:04.210: 3405: debug : remoteIODecodeMessageLength:9802 : Got length, now need 56 total (52 more) 10:54:04.210: 3405: debug : remoteIOEventLoop:10332 : Giving up the buck 12 0x7f3e54001300 0x7f3e4c001140 10:54:04.210: 3405: debug : remoteIOEventLoop:10336 : Passing the buck to 14 0x7f3e4c001140 10:54:04.210: 3405: debug : remoteIO:10506 : All done with our call 12 0x7f3e4c001140 0x7f3e54001300 10:54:04.210: 3406: debug : remoteIO:10454 : Wokeup from sleep 14 0x7f3e4c001140 0x7f3e4c001140 10:54:04.210: 3406: debug : remoteIO:10478 : We have the buck 14 0x7f3e4c001140 0x7f3e4c001140 10:54:04.210: 3406: debug : remoteIODecodeMessageLength:9802 : Got length, now need 260 total (256 more) 10:54:04.210: 3406: debug : remoteIOEventLoop:10332 : Giving up the buck 14 0x7f3e4c001140 (nil) 10:54:04.210: 3406: debug : remoteIO:10506 : All done with our call 14 (nil) 0x7f3e4c001140 libvir: QEMU error : Domain not found: no domain with matching uuid '60f7975a-c6c7-0d2b-5344-867986dddef2' Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner self.run() File "reproducer.py", line 23, in run print dom.XMLDesc(0) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 239, in XMLDesc if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self) libvirtError: Domain not found: no domain with matching uuid '60f7975a-c6c7-0d2b-5344-867986dddef2' Can you please try running this and confirm it does or does not return None?
I'm afraid your analysis is correct. vdsm is probably the culprit that have swallowed the exception. Eduardo, I think this is a dup of bug 695244.
we are noe seeing an error in vdsm: Thread-278::ERROR::2011-05-17 10:33:31,457::clientIF::52::vds::(wrapper) Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 48, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/clientIF.py", line 426, in setVmTicket return vm.setTicket(otp, seconds, connAct) File "/usr/share/vdsm/libvirtvm.py", line 1059, in setTicket graphics = xml.dom.minidom.parseString(self._dom.XMLDesc(0)) \ File "/usr/share/vdsm/libvirtvm.py", line 306, in f ret = attr(*args, **kwargs) File "/usr/share/vdsm/libvirtconnection.py", line 59, in wrapper ret = f(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/libvirt.py", line 239, in XMLDesc if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self) libvirtError: Domain not found: no domain with matching uuid 'b06e6840-38da-41da-883c-90dba8aeb757' but there is no error code which means that the backend presents "unexpected exception code:16" 2011-05-17 10:34:52,685 ERROR [com.redhat.rhevm.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-2) Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = Unexpected exception 2011-05-17 10:34:52,686 ERROR [com.redhat.rhevm.vdsbroker.VDSCommandBase] (http-0.0.0.0-8443-2) Command SetVmTicketVDS execution failed. Exception: VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = Unexpected exception please add error code so that backend can display proper error.
> 2011-05-17 10:34:52,685 ERROR > [com.redhat.rhevm.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-2) > Error code unexpected and error message VDSGenericException: VDSErrorException: > Failed to SetVmTicketVDS, error = Unexpected exception > 2011-05-17 10:34:52,686 ERROR [com.redhat.rhevm.vdsbroker.VDSCommandBase] > (http-0.0.0.0-8443-2) Command SetVmTicketVDS execution failed. Exception: > VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = > Unexpected exception > > please add error code so that backend can display proper error. This has nothing to do with the original bug (which was a regression etc), this is a minor issue about propagating a different error. Moving this bug to VERIFIED according to comment #17. If you wish, please open a different bug about throwing a more specific error.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html