Bug 672226
Summary: | virDomainCreateXML over remote driver sometimes hangs | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dan Kenigsberg <danken> | ||||||||||
Component: | libvirt | Assignee: | Daniel Berrangé <berrange> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 6.0 | CC: | abaron, dallan, dnaori, eblake, hateya, ilvovsky, jdenemar, jyang, mgoldboi, mjenner, mzhan, vbian, xen-maint, yoyzhang | ||||||||||
Target Milestone: | rc | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | libvirt-0.8.7-5.el6 | Doc Type: | Bug Fix | ||||||||||
Doc Text: |
Using SASL authentication with a single libvirt connection for multiple threads could result in libvirt hanging while waiting for a response from the libvirt daemon. SASL decoding has been fixed such that clients do not wait for further data while already decoded SASL data remains unprocessed, so libvirt no longer hangs in this situation.
|
Story Points: | --- | ||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-05-19 13:26:17 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 669974 | ||||||||||||
Attachments: |
|
Description
Dan Kenigsberg
2011-01-24 13:33:23 UTC
Apparently, after a long wait, that poll() has ended. Just before createXML we printed: Thread-2506::DEBUG::2011-01-24 13:54:52,726::libvirtvm::755::vds.vmlog.b63a6c0e-afd1-4962-836e-3ea4bebcb884::(_run) DDD Just after createXML we printed: Thread-2506::DEBUG::2011-01-24 15:19:30,793::libvirtvm::759::vds.vmlog.b63a6c0e-afd1-4962-836e-3ea4bebcb884::(_run) DDD Dan B, have any theories about what's going on here? The client side blocking on poll(), means it is waiting for the reply from the daemon. This suggests that the Create API is still running. Would need to capture the libvirtd stack trace at this point to see what its doing while the client is waiting. Dan K., can you get a trace from libvirtd the next time you see this? Created attachment 475433 [details]
"t a a bt full" in libvirtd
I hope this helps.
That attachment shows libvirtd completely idle, in its normal waiting for more RPC calls state. Can we get a trace of all threads in the client too at time of hang. I'm more suspicious of the pure python event loop impl in the client now, because of the problems just identified in bug 624252 comment #38 Created attachment 476139 [details]
"t a a bt full" in vdsm
Created attachment 476141 [details]
"t a a bt full" in libvirtd
Happened again (libvirt-0.8.7-3.el6.x86_64), backtraces for client and server attached.
When this occurs, does VDSM hang forever, or does it eventually recover after a long wait ? Can you also attach the full VDSM log, and the libvirtd daemon logs from the same period, so I can try and correlate what's going on between the two. If it happens again, it'd be helpful if I can login & poke libvirtd with GDB while it is still hung. Interesting points from debugging on a live system - libvirtd is completely idle. Analysing the logs shows that every single virDomainCreateXML call completed in < 2 seconds. - On client, there are about 50 threads, all blocked up on remoteIO() and one in remoteIOEventLoop(). - All the blocked threads have transmitted all their data and are in 'mode = 1' which indicates waiting for 'rx' of reply data. - Poll() is listening on POLLIN on data, so if there was any data from libvirtd it would have seen it - saslDecodedLength = 256, and saslDecodedOffset = 96. This indicates we didn't consume all the decoded SASL data for inexplicable reasons. The last point is the only sign of clear trouble thus far. We should not be sitting in poll() waiting to rx data, if we still have some previously decoded SASL data available. One idea worth testing - disable the SASL authentication on the UNIX socket. eg edit /etc/libvirt/libvirtd.conf and set auth_unix_rw = "none" auth_unix_ro = "none" and restart libvirtd. This will cause libvirt to take a different code path for I/O which may avoid this problem. Confirmed that there is an I/O flaw in the libvirt client SASL codepath. The following patch upstream will fix it http://www.redhat.com/archives/libvir-list/2011-February/msg00017.html It will affect potentially any API call, not simply virDomainCreateXML. Depending entirely on luck the client might unblock itself automagically or not. Disabling SASL auth on the server is a suitable workaround. Verified on libvirt-0.8.7-8.el6.x86_64 I have reproduced this bug with libvirt-0.8.7-3.el6.x86_64, and verify this bug with libvirt-0.8.7-8.el6.x86_64. Reproduce steps: 1. Modify libvirtd conf file # cat /etc/libvirt/libvirtd.conf make sure: auth_unix_ro = "sasl" auth_unix_rw = "sasl" 2. # service libvirtd restart 3. In host use sasl encryption # saslpasswd2 -a libvirt root Password: xxxxxx Again (for verification): xxxxxx 4. Setup libvirt-test-API environment, which can be found on libvirt.org (http://libvirt.org/git/?p=libvirt-test-API.git;a=summary) The main script about multithread is: ~/libvirt-test-API/repos/Python/regression/multiple_thread_block_on_domain_create.py you can set the guest name and number in ~/libvirt-test-API/case.conf set guestnum: 0-10 means guest0,...,guest9 Final execute # python libvirt-test-api.py Then guest0,...,guest9 disk img will be created and installed in /var/lib/libvirt/images/guest*. 5. Since this bug is about multithread reply problem, use script and case.conf to create some guests at a time, and keep these running at the same time, then repeat; finally the script is hang for waiting for the reply/return. 6. Check libvirtd status, found it is idle # top -p `pidof libvirtd` top - 16:57:41 up 19:23, 8 users, load average: 0.00, 0.15, 2.19 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3756240k total, 495500k used, 3260740k free, 5052k buffers Swap: 2047992k total, 222760k used, 1825232k free, 81596k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16983 root 20 0 701m 29m 24 S 0.0 0.8 0:08.29 libvirtd Now I have tried several times about step5 in libvirt-0.8.7-8.el6.x86_64,and it works well now. Created attachment 482231 [details]
multiple_thread script
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause enabling SASL authentication and using a single libvirt connection from multiple threads Consequence libvirt client may hang while waiting for a reply from libvirtd Fix SASL decoding was fixed so that clients are not waiting for more data even though existing SASL decoded data remains unprocessed Result libvirt client does not hang when SASL authentication is enabled Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,13 +1 @@ -Cause +Using SASL authentication with a single libvirt connection for multiple threads could result in libvirt hanging while waiting for a response from the libvirt daemon. SASL decoding has been fixed such that clients do not wait for further data while already decoded SASL data remains unprocessed, so libvirt no longer hangs in this situation.- enabling SASL authentication and using a single libvirt connection from - multiple threads - -Consequence - libvirt client may hang while waiting for a reply from libvirtd - -Fix - SASL decoding was fixed so that clients are not waiting for more data even - though existing SASL decoded data remains unprocessed - -Result - libvirt client does not hang when SASL authentication is enabled An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |