Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Occasionally, when users ran multiple virsh create/destroy loops, a race condition could have occurred and libvirtd terminated unexpectedly with a segmentation fault. False error messages regarding the domain having already been destroyed to the caller also occurred. With this update, the outlined script is run and completes without libvirtd crashing.
+++ This bug was initially created as a clone of Bug #892901 +++
+++ This bug was initially created as a clone of Bug #892649 +++
Description of problem:
When running multiple virsh create/destroy loops, sometimes (if the timing is right) a segfault will occur, causing libvirtd to crash.
Version-Release number of selected component (if applicable):
This problem was introduced with v0.9.12. I cannot reproduce this issue under v0.9.11.X or older. I am able to reproduce this problem as well with the latest code from master.
How reproducible:
This posting has the steps to reproduce the problem:
http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html
Steps to Reproduce:
1. Go to above link, follow steps outlined.
Actual results:
When the script is running and doing its operations with libvirtd, within 10 or 20 minutes libvirtd will segfault.
Expected results:
The script outlined all get ran and complete without libvirtd crashing.
Additional info:
All additional info is in the list; including multiple GDB output from the crashes I reproduced. In addition, there was a patch by Michal Privoznik (http://www.redhat.com/archives/libvir-list/2012-December/msg01372.html) that attempted to fix this problem, however the issue still occurs after applying this patch on top of v1.0.0 or v1.0.1.
Here was Michals response once I told him his patch wasn't working for me:
http://www.redhat.com/archives/libvir-list/2012-December/msg01378.html
--- Additional comment from Scott Sullivan on 2013-01-22 12:33:46 EST ---
As the original reporter of this bug, I can say for me at least this issue was fixed with this commit:
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=81621f3e6e45e8681cc18ae49404736a0e772a11
Moving to POST:
commit 81621f3e6e45e8681cc18ae49404736a0e772a11
Author: Daniel P. Berrange <berrange>
AuthorDate: Fri Jan 18 14:33:51 2013 +0000
Commit: Daniel P. Berrange <berrange>
CommitDate: Fri Jan 18 15:45:38 2013 +0000
Fix race condition when destroying guests
When running virDomainDestroy, we need to make sure that no other
background thread cleans up the domain while we're doing our work.
This can happen if we release the domain object while in the
middle of work, because the monitor might detect EOF in this window.
For this reason we have a 'beingDestroyed' flag to stop the monitor
from doing its normal cleanup. Unfortunately this flag was only
being used to protect qemuDomainBeginJob, and not qemuProcessKill
This left open a race condition where either libvirtd could crash,
or alternatively report bogus error messages about the domain already
having been destroyed to the caller
Signed-off-by: Daniel P. Berrange <berrange>
v1.0.1-349-g81621f3
Hi Dave,
I need to verify this bug on the latest libvirt version, However, i couldn't reproduce this bug on my machine, evenif i try to run the script for many times and a long time(the whole night and half a day). So can you help me check it, do i mistake to do something during my reproduce ? thanks
1 my environment info
ernel-2.6.32-358.el6.x86_64
libvirt-0.10.2-18.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64
2.run the script as the above link
http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html
3.I always met the following error while i ran the script for a long time
####
Shutting down vg_ssd...
virsh destroy tnrekhko
virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer
Starting vg_ssd...
virsh create /tmp/tnrekhko.cfg
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer
virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer
Removing vg_ssd...
virsh destroy tnrekhko
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Cannot recv data: Connection reset by peer
virsh list
lvremove -f /dev/vg_ssd/tnrekhko
Logical volume "sxxdnpbj" successfully removed
####
# service libvirtd status
libvirtd (pid 4613) is running...
The problem is, your libvirt connection gets closed for some reason. But since we have the same patch in 6.4.z and we've verified it there successfully, I don't expect this one to be different.
I can reproduce this bug with libvirt-0.10.2-18.el6.x86_64.
Following the steps from
http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html
Then, Run the script about 20 mins later, libvirtd crashed:
# virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused
# service libvirtd status
libvirtd dead but pid file exists
Then retest upper steps with libvirt-0.10.2-19.el6, run the script about 1 hour , the libvirtd always keep running status, So this bug can be marked VERIFIED.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHBA-2013-1581.html