Bug 875788

Summary:	Deadlock on libvirt when playing with hotplug and add/remove vm
Product:	Red Hat Enterprise Linux 6	Reporter:	Chris Pelland <cpelland>
Component:	libvirt	Assignee:	Michal Privoznik <mprivozn>
Status:	CLOSED ERRATA	QA Contact:	Virtualization Bugs <virt-bugs>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	6.3	CC:	acathrow, ajia, berrange, cpelland, dallan, dyasny, dyuan, gcheresh, jpallich, mavital, mprivozn, mzhan, ohochman, pm-eus, rwu, weizhan, ydu, ykaul
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	libvirt-0.9.10-21.el6_3.6	Doc Type:	Bug Fix
Doc Text:	Cause: When libvirt is tearing qemu process up, it does a clean up of some internal structures, free some locks, and so on. Since users may destroy qemu processes in parallel, libvirt holds what we call 'qemu driver lock'. It's lock that protects the most important internal structure where we keep list of domains among with their state. Consequence: One function tried to lock qemu driver even though it was already locked. This lead to unresolvable deadlock. Fix: Code was rewritten and the locking was moved after unlocking the qemu driver. Result: Libvirt doesn't deadlock anymore.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-11-22 09:40:32 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	856950
Bug Blocks:

Description Chris Pelland 2012-11-12 15:17:55 UTC

This bug has been copied from bug #856950 and has been proposed
to be backported to 6.3 z-stream (EUS).

Comment 4 Michal Privoznik 2012-11-12 15:23:38 UTC

Moving to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-November/msg00108.html

Comment 6 Michal Privoznik 2012-11-13 12:38:11 UTC

*** Bug 876102 has been marked as a duplicate of this bug. ***

Comment 7 weizhang 2012-11-15 08:34:47 UTC

I test with steps in https://bugzilla.redhat.com/show_bug.cgi?id=856950#c13
version
qemu-kvm-rhev-0.12.1.2-2.330.el6.x86_64
kernel-2.6.32-335.el6.x86_64
libvirt-0.9.10-21.el6_3.6.x86_64

It may report error, but after that sometimes it can succeed
The message is get from the attach-detach loop

Disk attached successfully

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to attach disk
error: operation failed: target vdb already exists

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

Disk detached successfully

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

error: No found disk whose source path or target is vdb

Is that still have problem ?

Comment 8 weizhang 2012-11-15 08:40:21 UTC

And after 15 minutes, all messages are like 

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

Comment 9 Michal Privoznik 2012-11-15 08:58:12 UTC

No, we don't have a problem.

Just for the record, I've managed to log in into the machine and found the source of those error messages:

2012-11-15 08:40:17.614+0000: 26234: error : virNetServerDispatchNewClient:246 : Too many active clients (20), dropping connection from 127.0.0.1;0

So I think this is okay.

Comment 10 weizhang 2012-11-15 12:19:34 UTC

Thanks for Michal's help.

Verify pass on
qemu-kvm-0.12.1.2-2.295.el6.x86_64
kernel-2.6.32-279.el6.x86_64
libvirt-0.9.10-21.el6_3.6.x86_64


Steps
on one console do
#  while true; do for i in {1..10}; do virsh create /tmp/test$i.xml; done ; for i in {1..10}; do virsh destroy test$i; done; done

on another console do
# while true;do virsh attach-disk tt /var/lib/libvirt/images/disk.img vdb; sleep 2; virsh detach-disk tt vdb;sleep 2; done

Running about 1.5 hours, libvirtd still running, no error


Can reproduce on 
libvirt-0.10.1-2.el6.x86_64

After about 1.5 hours, libvirtd crash.

Comment 12 errata-xmlrpc 2012-11-22 09:40:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1484.html