875788 – Deadlock on libvirt when playing with hotplug and add/remove vm

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 875788 - Deadlock on libvirt when playing with hotplug and add/remove vm

Summary: Deadlock on libvirt when playing with hotplug and add/remove vm

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	6.3
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Michal Privoznik
QA Contact:	Virtualization Bugs
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	876102 (view as bug list)
Depends On:	856950
Blocks:
TreeView+	depends on / blocked

Reported:	2012-11-12 15:17 UTC by Chris Pelland
Modified:	2012-11-22 09:40 UTC (History)
CC List:	18 users (show)
Fixed In Version:	libvirt-0.9.10-21.el6_3.6
Doc Type:	Bug Fix
Doc Text:	Cause: When libvirt is tearing qemu process up, it does a clean up of some internal structures, free some locks, and so on. Since users may destroy qemu processes in parallel, libvirt holds what we call 'qemu driver lock'. It's lock that protects the most important internal structure where we keep list of domains among with their state. Consequence: One function tried to lock qemu driver even though it was already locked. This lead to unresolvable deadlock. Fix: Code was rewritten and the locking was moved after unlocking the qemu driver. Result: Libvirt doesn't deadlock anymore.
Clone Of:
Environment:
Last Closed:	2012-11-22 09:40:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2012:1484	0	normal	SHIPPED_LIVE	libvirt bug fix update	2012-11-22 14:39:12 UTC

Description Chris Pelland 2012-11-12 15:17:55 UTC

This bug has been copied from bug #856950 and has been proposed
to be backported to 6.3 z-stream (EUS).

Comment 4 Michal Privoznik 2012-11-12 15:23:38 UTC

Moving to POST:

http://post-office.corp.redhat.com/archives/rhvirt-patches/2012-November/msg00108.html

Comment 6 Michal Privoznik 2012-11-13 12:38:11 UTC

*** Bug 876102 has been marked as a duplicate of this bug. ***

Comment 7 weizhang 2012-11-15 08:34:47 UTC

I test with steps in https://bugzilla.redhat.com/show_bug.cgi?id=856950#c13
version
qemu-kvm-rhev-0.12.1.2-2.330.el6.x86_64
kernel-2.6.32-335.el6.x86_64
libvirt-0.9.10-21.el6_3.6.x86_64

It may report error, but after that sometimes it can succeed
The message is get from the attach-detach loop

Disk attached successfully

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to attach disk
error: operation failed: target vdb already exists

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

Disk detached successfully

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

error: No found disk whose source path or target is vdb

Is that still have problem ?

Comment 8 weizhang 2012-11-15 08:40:21 UTC

And after 15 minutes, all messages are like 

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot write data: Broken pipe

error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

Comment 9 Michal Privoznik 2012-11-15 08:58:12 UTC

No, we don't have a problem.

Just for the record, I've managed to log in into the machine and found the source of those error messages:

2012-11-15 08:40:17.614+0000: 26234: error : virNetServerDispatchNewClient:246 : Too many active clients (20), dropping connection from 127.0.0.1;0

So I think this is okay.

Comment 10 weizhang 2012-11-15 12:19:34 UTC

Thanks for Michal's help.

Verify pass on
qemu-kvm-0.12.1.2-2.295.el6.x86_64
kernel-2.6.32-279.el6.x86_64
libvirt-0.9.10-21.el6_3.6.x86_64


Steps
on one console do
#  while true; do for i in {1..10}; do virsh create /tmp/test$i.xml; done ; for i in {1..10}; do virsh destroy test$i; done; done

on another console do
# while true;do virsh attach-disk tt /var/lib/libvirt/images/disk.img vdb; sleep 2; virsh detach-disk tt vdb;sleep 2; done

Running about 1.5 hours, libvirtd still running, no error


Can reproduce on 
libvirt-0.10.1-2.el6.x86_64

After about 1.5 hours, libvirtd crash.

Comment 12 errata-xmlrpc 2012-11-22 09:40:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-1484.html

Note You need to log in before you can comment on or make changes to this bug.