Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 903238

Summary:

Concurrency/locking causes segfault

Product:

Red Hat Enterprise Linux 6

Reporter:

Dave Allan <dallan>

Component:

libvirt

Assignee:

Michal Privoznik <mprivozn>

Status:

CLOSED ERRATA

QA Contact:

Virtualization Bugs <virt-bugs>

Severity:

urgent

Docs Contact:

Priority:

high

Version:

6.4

CC:

acathrow, cpelland, cwei, dallan, dyuan, jdenemar, mprivozn, mzhan, ssullivan, ydu, zhwang

Target Milestone:

Keywords:

ZStream

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

libvirt-0.10.2-19.el6

Doc Type:

Bug Fix

Doc Text:

Occasionally, when users ran multiple virsh create/destroy loops, a race condition could have occurred and libvirtd terminated unexpectedly with a segmentation fault. False error messages regarding the domain having already been destroyed to the caller also occurred. With this update, the outlined script is run and completes without libvirtd crashing.

Story Points:

---

Clone Of:

892901

Environment:

Last Closed:

2013-11-21 08:41:16 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

892649, 892901

Bug Blocks:

915353

Attachments:

Description	Flags
I modified the scirpt to match my environment	none

Description Dave Allan 2013-01-23 14:44:02 UTC

+++ This bug was initially created as a clone of Bug #892901 +++

+++ This bug was initially created as a clone of Bug #892649 +++

Description of problem:

When running multiple virsh create/destroy loops, sometimes (if the timing is right) a segfault will occur, causing libvirtd to crash. 

Version-Release number of selected component (if applicable):

This problem was introduced with v0.9.12. I cannot reproduce this issue under v0.9.11.X or older. I am able to reproduce this problem as well with the latest code from master.

How reproducible:

This posting has the steps to reproduce the problem:

http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html

Steps to Reproduce:
1. Go to above link, follow steps outlined.
  
Actual results:

When the script is running and doing its operations with libvirtd, within 10 or 20 minutes libvirtd will segfault. 

Expected results:

The script outlined all get ran and complete without libvirtd crashing.

Additional info:

All additional info is in the list; including multiple GDB output from the crashes I reproduced. In addition, there was a patch by Michal Privoznik (http://www.redhat.com/archives/libvir-list/2012-December/msg01372.html) that attempted to fix this problem, however the issue still occurs after applying this patch on top of v1.0.0 or v1.0.1. 

Here was Michals response once I told him his patch wasn't working for me:

http://www.redhat.com/archives/libvir-list/2012-December/msg01378.html

--- Additional comment from Scott Sullivan on 2013-01-22 12:33:46 EST ---

As the original reporter of this bug, I can say for me at least this issue was fixed with this commit:

http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=81621f3e6e45e8681cc18ae49404736a0e772a11

Comment 1 Michal Privoznik 2013-01-23 15:34:26 UTC

Moving to POST:

commit 81621f3e6e45e8681cc18ae49404736a0e772a11
Author:     Daniel P. Berrange <berrange>
AuthorDate: Fri Jan 18 14:33:51 2013 +0000
Commit:     Daniel P. Berrange <berrange>
CommitDate: Fri Jan 18 15:45:38 2013 +0000

    Fix race condition when destroying guests
    
    When running virDomainDestroy, we need to make sure that no other
    background thread cleans up the domain while we're doing our work.
    This can happen if we release the domain object while in the
    middle of work, because the monitor might detect EOF in this window.
    For this reason we have a 'beingDestroyed' flag to stop the monitor
    from doing its normal cleanup. Unfortunately this flag was only
    being used to protect qemuDomainBeginJob, and not qemuProcessKill
    
    This left open a race condition where either libvirtd could crash,
    or alternatively report bogus error messages about the domain already
    having been destroyed to the caller
    
    Signed-off-by: Daniel P. Berrange <berrange>

v1.0.1-349-g81621f3

Comment 4 zhenfeng wang 2013-07-19 08:39:25 UTC

Hi Dave,
I need to verify this bug on the latest libvirt version, However, i couldn't reproduce this bug on my machine, evenif i try to run the script for many times and a long time(the whole night and half a day). So can you help me check it, do i mistake to do something during my reproduce ?  thanks

1 my environment info
ernel-2.6.32-358.el6.x86_64
libvirt-0.10.2-18.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64

2.run the script as the above link
http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html

3.I always met the following error while i ran the script for a long time

####
Shutting down vg_ssd...
	virsh destroy tnrekhko
	virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

Starting vg_ssd...
	virsh create /tmp/tnrekhko.cfg
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

	virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

Removing vg_ssd...
	virsh destroy tnrekhko
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer

	virsh list
	lvremove -f /dev/vg_ssd/tnrekhko
  Logical volume "sxxdnpbj" successfully removed
####
# service libvirtd status
libvirtd (pid  4613) is running...

Comment 5 zhenfeng wang 2013-07-19 08:41:20 UTC

Created attachment 775681 [details]
I modified the scirpt to match my environment

Comment 6 Michal Privoznik 2013-07-19 09:05:05 UTC

The problem is, your libvirt connection gets closed for some reason. But since we have the same patch in 6.4.z and we've verified it there successfully, I don't expect this one to be different.

Comment 7 zhenfeng wang 2013-07-24 01:52:59 UTC

I can reproduce this bug with libvirt-0.10.2-18.el6.x86_64.
Following the steps from
http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html
Then, Run the script about 20 mins later, libvirtd crashed:

# virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused

# service libvirtd status
libvirtd dead but pid file exists


Then retest upper steps with libvirt-0.10.2-19.el6, run the script about 1 hour , the libvirtd always keep running status, So this bug can be marked VERIFIED.

Comment 9 errata-xmlrpc 2013-11-21 08:41:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1581.html