| Summary: | VDSM (w/ libvirt): when sending setVmTicket right after VM destroy libvirt returns empty reply which results in vdsm error | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||||
| Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Dafna Ron <dron> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 6.1 | CC: | abaron, bazulay, dallan, danken, dyuan, gren, iheim, ilvovsky, nzhang, sgrinber, ykaul | ||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | vdsm-4.9-65.el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 705839 (view as bug list) | Environment: | |||||||
| Last Closed: | 2011-12-06 07:15:14 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 705839 | ||||||||
| Attachments: |
|
||||||||
Dafna, i guess the problem arises when you call self._dom.XMLDesc(0) which returns empty reply, right? So can you please attach libvirt & vdsm logs? And are you able to reproduce this with bare libvirt? I tried this: virsh -c qemu:///system "destroy test1; dumpxml test1" (assume test1 was running) and it works well. (In reply to comment #5) > Dafna, > > i guess the problem arises when you call self._dom.XMLDesc(0) which returns > empty reply, right? So can you please attach libvirt & vdsm logs? yes, the error happens when call self._dom.XMLDesc I'll attach logs for vdsm and libvirt And are you > able to reproduce this with bare libvirt? I tried this: > > virsh -c qemu:///system "destroy test1; dumpxml test1" > > (assume test1 was running) and it works well. I cannot really reproduce this with bare libvirt - this operation is directly connected to the vdsm and rhevm. Created attachment 496221 [details]
logs
Dafna, those VMs created are persistent or transient one? Because from libvirt logs I can see how VDSM destroy domain in one thread and right after another thread tries to get XML for the domain being destroyed. But it does not succeed, because the domain could not be found. And this happens if the domain is transient = destroying it will also remove its definition which results in failures of any further calls related to it. And if I read the VDSM code right, you obtain domain pointer once and forward it then to all subsequent calls. Combined with written above, you'll get an error. Whether domain is persistent or not you can find out by running virsh dominfo <domain>; vdsm uses only transient VMs. In case of an error, libvirt-python should raise an exception, not return None as the domxml. I agree, but in the code is check to aviod returning None in favor of raising exception. And if i try to destroy domain and get its XML right after, I obtain exception. Is there a possibility to reproduce this on a running system? I suspect it's a race that can be manifested only when destroy and dumpxml are called from different threads. 14:03:25.048: 21723: debug : virDomainDestroy:2251 : dom=0x7f0830035210, (VM: name=VM1, uuid=a585336d-308b-4d87-a3f2-a6d1140f0dc9), 14:03:25.057: 21724: debug : virDomainGetXMLDesc:3256 : dom=0x7f0830035210, (VM: name=VM1, uuid=a585336d-308b-4d87-a3f2-a6d1140f0dc9), flags=0 14:03:25.289: 21723: debug : remoteIO:10506 : All done with our call 12 0x7f08280208a0 0x7f082c075650 Created attachment 496462 [details]
reproducer.py
Well, I've tried to create a reproducer with threads and didn't succeed.
It shouldn't be race condition, because libvirt locks drivers (=parts of software talking to hypervisor). And we can see from log the domain was really not found, which is right and expected.
What I wonder about is - virDomainGetXMLDesc returns NULL, which get translated into python None. But the test right after, which raises an exeption in case of returning None, somehow fails.
I ran attached 'reproducer' and got this:
10:54:04.076: 3405: debug : virDomainDestroy:2251 : dom=0x168a560, (VM: name=test1, uuid=60f7975a-c6c7-0d2b-5344-867986dddef2),
10:54:04.076: 3406: debug : virDomainGetXMLDesc:3256 : dom=0x168a560, (VM: name=test1, uuid=60f7975a-c6c7-0d2b-5344-867986dddef2), flags=0
10:54:04.076: 3405: debug : remoteIO:10406 : Do proc=12 serial=4 length=60 wait=(nil)
10:54:04.076: 3405: debug : remoteIO:10478 : We have the buck 12 0x7f3e54001300 0x7f3e54001300
10:54:04.076: 3406: debug : remoteIO:10406 : Do proc=14 serial=5 length=64 wait=0x7f3e54001300
10:54:04.076: 3406: debug : remoteIO:10435 : Going to sleep 14 0x7f3e54001300 0x7f3e4c001140
10:54:04.076: 3405: debug : remoteIOEventLoop:10266 : Woken up from poll by other thread
10:54:04.210: 3405: debug : remoteIODecodeMessageLength:9802 : Got length, now need 56 total (52 more)
10:54:04.210: 3405: debug : remoteIOEventLoop:10332 : Giving up the buck 12 0x7f3e54001300 0x7f3e4c001140
10:54:04.210: 3405: debug : remoteIOEventLoop:10336 : Passing the buck to 14 0x7f3e4c001140
10:54:04.210: 3405: debug : remoteIO:10506 : All done with our call 12 0x7f3e4c001140 0x7f3e54001300
10:54:04.210: 3406: debug : remoteIO:10454 : Wokeup from sleep 14 0x7f3e4c001140 0x7f3e4c001140
10:54:04.210: 3406: debug : remoteIO:10478 : We have the buck 14 0x7f3e4c001140 0x7f3e4c001140
10:54:04.210: 3406: debug : remoteIODecodeMessageLength:9802 : Got length, now need 260 total (256 more)
10:54:04.210: 3406: debug : remoteIOEventLoop:10332 : Giving up the buck 14 0x7f3e4c001140 (nil)
10:54:04.210: 3406: debug : remoteIO:10506 : All done with our call 14 (nil) 0x7f3e4c001140
libvir: QEMU error : Domain not found: no domain with matching uuid '60f7975a-c6c7-0d2b-5344-867986dddef2'
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "reproducer.py", line 23, in run
print dom.XMLDesc(0)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 239, in XMLDesc
if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '60f7975a-c6c7-0d2b-5344-867986dddef2'
Can you please try running this and confirm it does or does not return None?
I'm afraid your analysis is correct. vdsm is probably the culprit that have swallowed the exception. Eduardo, I think this is a dup of bug 695244. we are noe seeing an error in vdsm:
Thread-278::ERROR::2011-05-17 10:33:31,457::clientIF::52::vds::(wrapper) Traceback (most recent call last):
File "/usr/share/vdsm/clientIF.py", line 48, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/clientIF.py", line 426, in setVmTicket
return vm.setTicket(otp, seconds, connAct)
File "/usr/share/vdsm/libvirtvm.py", line 1059, in setTicket
graphics = xml.dom.minidom.parseString(self._dom.XMLDesc(0)) \
File "/usr/share/vdsm/libvirtvm.py", line 306, in f
ret = attr(*args, **kwargs)
File "/usr/share/vdsm/libvirtconnection.py", line 59, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 239, in XMLDesc
if ret is None: raise libvirtError ('virDomainGetXMLDesc() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid 'b06e6840-38da-41da-883c-90dba8aeb757'
but there is no error code which means that the backend presents "unexpected exception code:16"
2011-05-17 10:34:52,685 ERROR [com.redhat.rhevm.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-2) Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = Unexpected exception
2011-05-17 10:34:52,686 ERROR [com.redhat.rhevm.vdsbroker.VDSCommandBase] (http-0.0.0.0-8443-2) Command SetVmTicketVDS execution failed. Exception: VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = Unexpected exception
please add error code so that backend can display proper error.
> 2011-05-17 10:34:52,685 ERROR > [com.redhat.rhevm.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-2) > Error code unexpected and error message VDSGenericException: VDSErrorException: > Failed to SetVmTicketVDS, error = Unexpected exception > 2011-05-17 10:34:52,686 ERROR [com.redhat.rhevm.vdsbroker.VDSCommandBase] > (http-0.0.0.0-8443-2) Command SetVmTicketVDS execution failed. Exception: > VDSGenericException: VDSErrorException: Failed to SetVmTicketVDS, error = > Unexpected exception > > please add error code so that backend can display proper error. This has nothing to do with the original bug (which was a regression etc), this is a minor issue about propagating a different error. Moving this bug to VERIFIED according to comment #17. If you wish, please open a different bug about throwing a more specific error. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html |
Description of problem: Right after sending Destroy for a VM and opening a monitor window in rhevm (setVmTicket) the libvirt sends empty reply which causes exception in vdsm: Thread-326::ERROR::2011-04-27 12:06:25,981::clientIF::51::vds::(wrapper) Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 47, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/clientIF.py", line 426, in setVmTicket return vm.setTicket(otp, seconds, connAct) File "/usr/share/vdsm/libvirtvm.py", line 934, in setTicket graphics = xml.dom.minidom.parseString(drone) \ File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/minidom.py", line 1925, in parseString return expatbuilder.parseString(string) File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 942, in parseString return builder.parseString(string) File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 223, in parseString parser.Parse(string, True) TypeError: Parse() argument 1 must be string or read-only buffer, not None Version-Release number of selected component (if applicable): ic114 vdsm-debug-plugin-4.9-59.el6.x86_64 vdsm-4.9-59.el6.x86_64 vdsm-cli-4.9-59.el6.x86_64 vdsm-debuginfo-4.9-59.el6.x86_64 qemu-kvm-0.12.1.2-2.152.el6.x86_64 qemu-img-0.12.1.2-2.152.el6.x86_64 gpxe-roms-qemu-0.9.7-6.7.el6.noarch libvirt-debuginfo-0.8.7-17.el6.x86_64 libvirt-devel-0.8.7-17.el6.x86_64 libvirt-python-0.8.7-17.el6.x86_64 libvirt-client-0.8.7-17.el6.x86_64 libvirt-0.8.7-17.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. start a VM -> send destroy 2. open monitor (setVmTicket) 3. Actual results: libvirt will send blank reply and you will get error in vdsm and in backend Expected results: we should either get an error or the operation should succeed. in the past, you could open a monitor while VM was shutting down. Additional info:vdsm and libvirt logs