Bug 662043

Summary: [libvirt] deadlock on concurrent multiple bidirectional migration
Product: Red Hat Enterprise Linux 6 Reporter: RHEL Program Management <pm-rhel>
Component: libvirtAssignee: Daniel Veillard <veillard>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.1CC: bazulay, berrange, cpelland, dallan, danken, dyuan, eblake, hateya, iheim, jdenemar, jialiu, juzhang, mgoldboi, mjenner, mkenneth, plyons, pm-eus, riek, weizhan, xen-maint, yimwang, ykaul
Target Milestone: rcKeywords: TestBlocker, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.8.1-27.el6_0.2 Doc Type: Bug Fix
Doc Text:
A deadlock occurred in the libvirt service when running concurrent bidirectional migration because certain calls did not release their local driver lock before issuing an RPC (Remote Procedure Call) call on a remote libvirt daemon. A deadlock no longer occurs between two communicating libvirt daemons.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-14 16:18:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 659310    
Bug Blocks:    

Description RHEL Program Management 2010-12-10 12:24:23 UTC
This bug has been copied from bug #659310 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 4 Jiri Denemark 2010-12-10 13:53:00 UTC
Fixed in libvirt-0.8.1-27.el6_0.2

Comment 5 Johnny Liu 2010-12-21 12:26:19 UTC
I can not reproduce this bug on old version - libvirt-0.8.1-27.el6.x86_64.

Here is my steps:
1. Individually create a vm on 2 server.
2. On one of two servers, do "virsh list" repeatly
# while [ "1" ]; do virsh list --all; sleep 1; done
3. Running concurrent bidirectional migration.
On server 1:
# virsh migrate --p2p vm1 qemu+ssh://server2/system
On server 2:
# virsh migrate --p2p vm2 qemu+ssh://server2/system
4. Check the output of step 2, virsh list works fine.


Is there some steps missing?

Comment 6 Haim 2010-12-21 12:33:17 UTC
(In reply to comment #5)
> I can not reproduce this bug on old version - libvirt-0.8.1-27.el6.x86_64.
> 
> Here is my steps:
> 1. Individually create a vm on 2 server.
> 2. On one of two servers, do "virsh list" repeatly
> # while [ "1" ]; do virsh list --all; sleep 1; done
> 3. Running concurrent bidirectional migration.
> On server 1:
> # virsh migrate --p2p vm1 qemu+ssh://server2/system
> On server 2:
> # virsh migrate --p2p vm2 qemu+ssh://server2/system
> 4. Check the output of step 2, virsh list works fine.
> 
> 
> Is there some steps missing?

well, it requires larger scale then 2 vms..
I did it with concurrent migration of 20 vms (10 on each side). 
this bug is consistent in our environment, and was a test blocker for quit some time (till we got a fix).

Comment 7 Johnny Liu 2010-12-22 08:18:33 UTC
Verify this bug in plain libvirt environment (libvirt-0.8.1-27.el6_0.2.x86_64), this bug is fixed.

Steps is almost the same as comments 5, only the difference is concurrent migration of 20 vms (10 on each side).


Thanks for Haim's help.

Comment 9 wangyimiao 2011-02-09 07:59:26 UTC
verified it PASSED on build :
libvirt-0.8.1-27.el6_0.3.x86_64
qemu-img-0.12.1.2-2.113.el6_0.6.x86_64
qemu-kvm-0.12.1.2-2.113.el6_0.6.x86_64
kernel-2.6.32-71.18.1.el6.x86_64

Comment 10 errata-xmlrpc 2011-04-14 16:18:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0446.html

Comment 11 Martin Prpič 2011-04-15 14:21:48 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A deadlock occurred in the libvirt service when running concurrent bidirectional migration because certain calls did not release their local driver lock before issuing an RPC (Remote Procedure Call) call on a remote libvirt daemon. A deadlock no longer occurs between two communicating libvirt daemons.