Bug 691514
Summary: | [Libvirt] When starting multiple vms using vdsm (sasl-authentication) virDomainCreateXML does not return. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | David Naori <dnaori> | ||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.1 | CC: | dallan, dnaori, dyuan, eblake, gsun, hateya, jdenemar, mgoldboi, mzhan, nzhang, oschreib, syeghiay, vbian, yoyzhang | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-0.8.7-17.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
When creating virtual machines via remote protocol, the client hung because the list of remote procedure calls to execute was not traversed correctly. Traversal has been corrected so that creating virtual machines remotely no longer causes libvirt to hang.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-05-19 13:29:34 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 682015, 690068, 691485 | ||||||
Attachments: |
|
Description
David Naori
2011-03-28 17:54:53 UTC
I wonder if the fix for bug 624252 interacted poorly with the fix for bug 672226 to cause the symptoms of 672226 to reappear. (In reply to comment #0) > attached "t a a bt full" client&server side, vdsm and libvirtd logs. still missing those attachments... (In reply to comment #2) > (In reply to comment #0) > > attached "t a a bt full" client&server side, vdsm and libvirtd logs. > > still missing those attachments... Added the attachments. If reproduce is needed, i can easily reproduce the issue. Created attachment 488591 [details]
libvirtd.log vdsm.log t a a bt full server & client side
*** Bug 691485 has been marked as a duplicate of this bug. *** *** Bug 690068 has been marked as a duplicate of this bug. *** I've done some research. Noticable thing is, vdsm hangs up and consumes 100% cpu. The problem is: libvirt (client side) creates a linked list of waiting RPC. When a response arrives, or another thread decide to go sleep, we traverse thru this list and find the right thread to wake up. But somehow, something is overwriting pointers to the next element, so the pointer points to itself: [Switching to thread 3 (Thread 0x7f4944dfa700 (LWP 23607))]#0 0x00000037888a989d in remoteIO (conn=0x288c530, priv=0x288f890, flags=0, thiscall=0x7f492811d7c0) at remote/remote_driver.c:10413 10413 while (tmp && tmp->next) (gdb) p tmp $7 = (struct remote_thread_call *) 0x7f49480f3d60 (gdb) p tmp->next $8 = (struct remote_thread_call *) 0x7f49480f3d60 Or, it likely stucks in another location: while() on remote/remote_driver.c:10302, what has the same semantics. The other noticable thing - all SASL data seems to be consumed: .., saslDecoded = 0x0, saslDecodedLength = 0, saslDecodedOffset = 0, saslEncoded = 0x0, saslEncodedLength = 0, saslEncodedOffset = 0, .., bufferLength = 0, bufferOffset = 0, .. Anyway, symptoms & research outputs are the same as in https://bugzilla.redhat.com/show_bug.cgi?id=672226 Pushed upstream: http://www.libvirt.org/git/?p=libvirt.git;a=commit;h=50e4b9195d2d8b46969940336b44221b500a2de3 Post: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-April/msg00315.html To avoid confusion - scratch build was intended to lock queue, beacause I thought of race condition. But later it turned out it was not the root of problem - error in qeue traversing. The posted fix actually fix the root. So scratch build is not any related to fix. I was just narrowing possibilities. Verified with following builds, no any "WaitForLunch" information. Moving to VERIFIED. vdsm-4.9-60.el6.x86_64 libvirt-0.8.7-17.el6.x86_64 Steps: 1. Save the belows script as 691514.sh, and execute in command line. # ./691514.sh ------------------ #!/bin/bash times=20 macaddr() { local random local MAC random=$(echo $RANDOM | md5sum | sed 's/\(..\)/&:/g'| cut -d\ -f1) MAC=52:54:00:${random:0:8} echo $MAC } for i in $(seq 1 $times) do vdsClient -s 0 create /dev/null vmId=$(uuidgen) vmName=vm${i} memSize=256 macAddr=$(macaddr) bridge=rhevm nicModel=pv display=vnc done ------------------ 2. Open another console, execute the following command. # vdsClient -s 0 list table cde0d6f7-00ce-4ac6-9125-3ae82522ddc9 7771 vm19 Up 388b6a04-f12a-4cd0-ac4d-f22d7948c550 6735 vm6 Up 176f8a0c-ef67-4e43-83cd-e53de14a8b6b 7538 vm16 Up f5fba8ac-481e-46fe-9d20-670ba43b672f 6913 vm8 Up e1b02f19-00d6-48fa-b8d2-d32782b72e2a 7619 vm17 Up 65cc45ff-829a-4a56-8daf-b6e663b6fff1 6998 vm9 Up 6fe3c8be-2029-4b29-949c-2f24fdc72718 6640 vm5 Up 19016c55-412c-463c-8c9e-133437724daf 7149 vm11 Up b6239a26-1c12-4c93-a3f7-68d0f6f14bb6 7074 vm10 Up 01f17fc3-e79b-46d5-b2b9-1d206b9f97c9 7460 vm15 Up a54b3d24-4e65-4866-a4c7-62c810bd4088 7380 vm14 Up 2b87831e-10b6-4720-9701-e0c1f956bfa4 7302 vm13 Up b808ffa7-0e89-4341-9a3a-18836dda290c 7848 vm20 Up 4ae12ea7-8b71-47d7-be0c-69c547d69ca6 6244 vm2 Up b01cb46c-3b05-4e46-acfc-0013974a9a8b 6819 vm7 Up 57a80086-e9d7-468d-9956-ce301a7be688 7697 vm18 Up 40e15f57-62ff-49b3-90a1-aa83b801ad87 6378 vm3 Up 80e5a82e-fb09-4c6e-92be-2f428d3d1094 6109 vm1 Up 4fb68774-afee-4809-a0c2-b3a67bd30d95 7227 vm12 Up 77e91f40-1165-4a57-95cf-1b78449b5a2c 6513 vm4 Up Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: creating a couple of VMs over remote protocol Consequence: client hangs up and consumes 100% CPU Fix: investigation showed we smashed list of waiting calls during traversing. Result: creating does not hang up anymore Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,8 +1 @@ -Cause: +When creating virtual machines via remote protocol, the client hung because the list of remote procedure calls to execute was not traversed correctly. Traversal has been corrected creating virtual machines remotely no longer causes libvirt to hang.- creating a couple of VMs over remote protocol -Consequence: - client hangs up and consumes 100% CPU -Fix: - investigation showed we smashed list of waiting calls during traversing. -Result: - creating does not hang up anymore Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -When creating virtual machines via remote protocol, the client hung because the list of remote procedure calls to execute was not traversed correctly. Traversal has been corrected creating virtual machines remotely no longer causes libvirt to hang.+When creating virtual machines via remote protocol, the client hung because the list of remote procedure calls to execute was not traversed correctly. Traversal has been corrected so that creating virtual machines remotely no longer causes libvirt to hang. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0596.html |