Bug 903238
Summary: | Concurrency/locking causes segfault | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Allan <dallan> | ||||
Component: | libvirt | Assignee: | Michal Privoznik <mprivozn> | ||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 6.4 | CC: | acathrow, cpelland, cwei, dallan, dyuan, jdenemar, mprivozn, mzhan, ssullivan, ydu, zhwang | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-0.10.2-19.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Occasionally, when users ran multiple virsh create/destroy loops, a race condition could have occurred and libvirtd terminated unexpectedly with a segmentation fault. False error messages regarding the domain having already been destroyed to the caller also occurred. With this update, the outlined script is run and completes without libvirtd crashing.
|
Story Points: | --- | ||||
Clone Of: | 892901 | Environment: | |||||
Last Closed: | 2013-11-21 08:41:16 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 892649, 892901 | ||||||
Bug Blocks: | 915353 | ||||||
Attachments: |
|
Description
Dave Allan
2013-01-23 14:44:02 UTC
Moving to POST: commit 81621f3e6e45e8681cc18ae49404736a0e772a11 Author: Daniel P. Berrange <berrange> AuthorDate: Fri Jan 18 14:33:51 2013 +0000 Commit: Daniel P. Berrange <berrange> CommitDate: Fri Jan 18 15:45:38 2013 +0000 Fix race condition when destroying guests When running virDomainDestroy, we need to make sure that no other background thread cleans up the domain while we're doing our work. This can happen if we release the domain object while in the middle of work, because the monitor might detect EOF in this window. For this reason we have a 'beingDestroyed' flag to stop the monitor from doing its normal cleanup. Unfortunately this flag was only being used to protect qemuDomainBeginJob, and not qemuProcessKill This left open a race condition where either libvirtd could crash, or alternatively report bogus error messages about the domain already having been destroyed to the caller Signed-off-by: Daniel P. Berrange <berrange> v1.0.1-349-g81621f3 Hi Dave, I need to verify this bug on the latest libvirt version, However, i couldn't reproduce this bug on my machine, evenif i try to run the script for many times and a long time(the whole night and half a day). So can you help me check it, do i mistake to do something during my reproduce ? thanks 1 my environment info ernel-2.6.32-358.el6.x86_64 libvirt-0.10.2-18.el6.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.2.x86_64 2.run the script as the above link http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html 3.I always met the following error while i ran the script for a long time #### Shutting down vg_ssd... virsh destroy tnrekhko virsh list error: Failed to reconnect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer Starting vg_ssd... virsh create /tmp/tnrekhko.cfg error: Failed to reconnect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer virsh list error: Failed to reconnect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer Removing vg_ssd... virsh destroy tnrekhko error: Failed to reconnect to the hypervisor error: no valid connection error: Cannot recv data: Connection reset by peer virsh list lvremove -f /dev/vg_ssd/tnrekhko Logical volume "sxxdnpbj" successfully removed #### # service libvirtd status libvirtd (pid 4613) is running... Created attachment 775681 [details]
I modified the scirpt to match my environment
The problem is, your libvirt connection gets closed for some reason. But since we have the same patch in 6.4.z and we've verified it there successfully, I don't expect this one to be different. I can reproduce this bug with libvirt-0.10.2-18.el6.x86_64. Following the steps from http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html Then, Run the script about 20 mins later, libvirtd crashed: # virsh list error: Failed to reconnect to the hypervisor error: no valid connection error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused # service libvirtd status libvirtd dead but pid file exists Then retest upper steps with libvirt-0.10.2-19.el6, run the script about 1 hour , the libvirtd always keep running status, So this bug can be marked VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1581.html |