RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 672226 - virDomainCreateXML over remote driver sometimes hangs
Summary: virDomainCreateXML over remote driver sometimes hangs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: libvirt
Version: 6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Daniel Berrangé
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 669974
TreeView+ depends on / blocked
 
Reported: 2011-01-24 13:33 UTC by Dan Kenigsberg
Modified: 2015-09-28 02:29 UTC (History)
14 users (show)

Fixed In Version: libvirt-0.8.7-5.el6
Doc Type: Bug Fix
Doc Text:
Using SASL authentication with a single libvirt connection for multiple threads could result in libvirt hanging while waiting for a response from the libvirt daemon. SASL decoding has been fixed such that clients do not wait for further data while already decoded SASL data remains unprocessed, so libvirt no longer hangs in this situation.
Clone Of:
Environment:
Last Closed: 2011-05-19 13:26:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
"t a a bt full" in libvirtd (9.81 KB, text/plain)
2011-01-26 16:27 UTC, Dan Kenigsberg
no flags Details
"t a a bt full" in vdsm (351.77 KB, text/plain)
2011-01-31 08:46 UTC, Dan Kenigsberg
no flags Details
"t a a bt full" in libvirtd (5.52 KB, text/plain)
2011-01-31 08:48 UTC, Dan Kenigsberg
no flags Details
multiple_thread script (5.92 KB, text/plain)
2011-03-04 06:23 UTC, Min Zhan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0596 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2011-05-18 17:56:36 UTC

Description Dan Kenigsberg 2011-01-24 13:33:23 UTC
Description of problem:
Sometimes, a vdsm-issued virDomainCreateXML call succeeds to create a VM, but does not return. The libvirtd side shows nothing suspicious (to me), but the client side blocks on poll() - see attached backtrace.

Version-Release number of selected component (if applicable):
vdsm-4.9-43
libvirt-0.8.7-3.el6.x86_64

How reproducible:
Seldom. Start and destroy multiple VMs via vdsm.

Comment 1 Dan Kenigsberg 2011-01-24 13:35:48 UTC
Apparently, after a long wait, that poll() has ended.

Just before createXML we printed:

Thread-2506::DEBUG::2011-01-24 13:54:52,726::libvirtvm::755::vds.vmlog.b63a6c0e-afd1-4962-836e-3ea4bebcb884::(_run) DDD

Just after createXML we printed:

Thread-2506::DEBUG::2011-01-24 15:19:30,793::libvirtvm::759::vds.vmlog.b63a6c0e-afd1-4962-836e-3ea4bebcb884::(_run) DDD

Comment 2 Dave Allan 2011-01-24 21:07:38 UTC
Dan B, have any theories about what's going on here?

Comment 3 Daniel Berrangé 2011-01-25 10:38:17 UTC
The client side blocking on poll(), means it is waiting for the reply from the daemon. This suggests that the Create API is still running. Would need to capture the libvirtd stack trace at this point to see what its doing while the client is waiting.

Comment 4 Dave Allan 2011-01-25 14:56:20 UTC
Dan K., can you get a trace from libvirtd the next time you see this?

Comment 5 Dan Kenigsberg 2011-01-26 16:27:23 UTC
Created attachment 475433 [details]
"t a a bt full" in libvirtd

I hope this helps.

Comment 6 Daniel Berrangé 2011-01-27 13:19:22 UTC
That attachment shows libvirtd completely idle, in its normal waiting for more RPC calls state.

Can we get a trace of all threads in the client too at time of hang. I'm more suspicious of the pure python event loop impl in the client now, because of the problems just identified in  bug 624252 comment #38

Comment 7 Dan Kenigsberg 2011-01-31 08:46:13 UTC
Created attachment 476139 [details]
"t a a bt full" in vdsm

Comment 8 Dan Kenigsberg 2011-01-31 08:48:13 UTC
Created attachment 476141 [details]
"t a a bt full" in libvirtd

Happened again (libvirt-0.8.7-3.el6.x86_64), backtraces for client and server attached.

Comment 9 Daniel Berrangé 2011-02-01 11:02:18 UTC
When this occurs, does VDSM hang forever, or does it eventually recover after a long wait ?

Comment 10 Daniel Berrangé 2011-02-01 11:36:46 UTC
Can you also attach the full VDSM log, and the libvirtd daemon logs from the same period, so I can try and correlate what's going on between the two. If it happens again, it'd be helpful if I can login & poke libvirtd with GDB while it is still hung.

Comment 11 Daniel Berrangé 2011-02-01 13:00:57 UTC
Interesting points from debugging on a live system

 - libvirtd is completely idle. Analysing the logs shows that every single virDomainCreateXML call completed in < 2 seconds.

 - On client, there are about 50 threads, all blocked up on remoteIO() and one in remoteIOEventLoop().

 - All the blocked threads have transmitted all their data and are in 'mode = 1' which indicates waiting for 'rx' of reply data.

 - Poll() is listening on POLLIN on data, so if there was any data from libvirtd it would have seen it

 - saslDecodedLength = 256, and saslDecodedOffset = 96. This indicates we didn't consume all the decoded SASL data for inexplicable reasons.

The last point is the only sign of clear trouble thus far. We should not be sitting in poll() waiting to rx data, if we still have some previously decoded SASL data available.

Comment 12 Daniel Berrangé 2011-02-01 13:15:09 UTC
One idea worth testing - disable the SASL authentication on the UNIX socket. eg edit /etc/libvirt/libvirtd.conf and set

  auth_unix_rw = "none"
  auth_unix_ro = "none"

and restart libvirtd. This will cause libvirt to take a different code path for I/O which may avoid this problem.

Comment 13 Daniel Berrangé 2011-02-01 16:11:50 UTC
Confirmed that there is an I/O flaw in the libvirt client SASL codepath. The following patch upstream will fix it

http://www.redhat.com/archives/libvir-list/2011-February/msg00017.html

It will affect potentially any API call, not simply virDomainCreateXML. Depending entirely on luck the client might unblock itself automagically or not.

Disabling SASL auth on the server is a suitable workaround.

Comment 16 David Naori 2011-02-28 19:28:30 UTC
Verified on libvirt-0.8.7-8.el6.x86_64

Comment 17 Min Zhan 2011-03-04 06:20:51 UTC
I have reproduced this bug with libvirt-0.8.7-3.el6.x86_64, and verify this bug with libvirt-0.8.7-8.el6.x86_64.

Reproduce steps:

1. Modify libvirtd conf file 
# cat /etc/libvirt/libvirtd.conf
make sure:

auth_unix_ro = "sasl"

auth_unix_rw = "sasl"

2. # service libvirtd restart

3. In host use sasl encryption
# saslpasswd2 -a libvirt root
Password: xxxxxx
Again (for verification): xxxxxx

4. Setup libvirt-test-API environment, which can be found on libvirt.org (http://libvirt.org/git/?p=libvirt-test-API.git;a=summary)

The main script about multithread is: ~/libvirt-test-API/repos/Python/regression/multiple_thread_block_on_domain_create.py

you can set the guest name and number in ~/libvirt-test-API/case.conf
set guestnum: 0-10 means guest0,...,guest9

Final execute
# python libvirt-test-api.py

Then guest0,...,guest9 disk img will be created and installed in /var/lib/libvirt/images/guest*.

5. Since this bug is about multithread reply problem, use script and case.conf  to create some guests at a time, and keep these running at the same time, then repeat; finally the script is hang for waiting for the reply/return.

6. Check libvirtd status, found it is idle
# top -p `pidof libvirtd`

top - 16:57:41 up 19:23,  8 users,  load average: 0.00, 0.15, 2.19
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.0%sy,  0.0%ni, 99.7%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3756240k total,   495500k used,  3260740k free,     5052k buffers
Swap:  2047992k total,   222760k used,  1825232k free,    81596k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                         
16983 root      20   0  701m  29m   24 S  0.0  0.8   0:08.29 libvirtd   


Now I have tried several times about step5 in libvirt-0.8.7-8.el6.x86_64,and it works well now.

Comment 18 Min Zhan 2011-03-04 06:23:03 UTC
Created attachment 482231 [details]
multiple_thread script

Comment 19 Jiri Denemark 2011-05-03 12:33:35 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    enabling SASL authentication and using a single libvirt connection from
    multiple threads

Consequence
    libvirt client may hang while waiting for a reply from libvirtd

Fix
    SASL decoding was fixed so that clients are not waiting for more data even
    though existing SASL decoded data remains unprocessed

Result
    libvirt client does not hang when SASL authentication is enabled

Comment 22 Laura Bailey 2011-05-04 02:20:59 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,13 +1 @@
-Cause
+Using SASL authentication with a single libvirt connection for multiple threads could result in libvirt hanging while waiting for a response from the libvirt daemon. SASL decoding has been fixed such that clients do not wait for further data while already decoded SASL data remains unprocessed, so libvirt no longer hangs in this situation.-    enabling SASL authentication and using a single libvirt connection from
-    multiple threads
-
-Consequence
-    libvirt client may hang while waiting for a reply from libvirtd
-
-Fix
-    SASL decoding was fixed so that clients are not waiting for more data even
-    though existing SASL decoded data remains unprocessed
-
-Result
-    libvirt client does not hang when SASL authentication is enabled

Comment 23 errata-xmlrpc 2011-05-19 13:26:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html


Note You need to log in before you can comment on or make changes to this bug.