Bug 892649

Summary: Concurrency/locking causes segfault
Product: [Community] Virtualization Tools Reporter: Scott Sullivan <ssullivan>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: dallan, dyasny
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 892901 (view as bug list) Environment:
Last Closed: 2013-01-23 15:31:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 892901, 903238    

Description Scott Sullivan 2013-01-07 14:04:38 UTC
Description of problem:

When running multiple virsh create/destroy loops, sometimes (if the timing is right) a segfault will occur, causing libvirtd to crash. 

Version-Release number of selected component (if applicable):

This problem was introduced with v0.9.12. I cannot reproduce this issue under v0.9.11.X or older. I am able to reproduce this problem as well with the latest code from master.

How reproducible:

This posting has the steps to reproduce the problem:

http://www.redhat.com/archives/libvir-list/2012-December/msg01365.html

Steps to Reproduce:
1. Go to above link, follow steps outlined.
  
Actual results:

When the script is running and doing its operations with libvirtd, within 10 or 20 minutes libvirtd will segfault. 

Expected results:

The script outlined all get ran and complete without libvirtd crashing.

Additional info:

All additional info is in the list; including multiple GDB output from the crashes I reproduced. In addition, there was a patch by Michal Privoznik (http://www.redhat.com/archives/libvir-list/2012-December/msg01372.html) that attempted to fix this problem, however the issue still occurs after applying this patch on top of v1.0.0 or v1.0.1. 

Here was Michals response once I told him his patch wasn't working for me:

http://www.redhat.com/archives/libvir-list/2012-December/msg01378.html

Comment 1 Michal Privoznik 2013-01-23 15:31:08 UTC
The fix has been pushed:

commit 81621f3e6e45e8681cc18ae49404736a0e772a11
Author:     Daniel P. Berrange <berrange>
AuthorDate: Fri Jan 18 14:33:51 2013 +0000
Commit:     Daniel P. Berrange <berrange>
CommitDate: Fri Jan 18 15:45:38 2013 +0000

    Fix race condition when destroying guests
    
    When running virDomainDestroy, we need to make sure that no other
    background thread cleans up the domain while we're doing our work.
    This can happen if we release the domain object while in the
    middle of work, because the monitor might detect EOF in this window.
    For this reason we have a 'beingDestroyed' flag to stop the monitor
    from doing its normal cleanup. Unfortunately this flag was only
    being used to protect qemuDomainBeginJob, and not qemuProcessKill
    
    This left open a race condition where either libvirtd could crash,
    or alternatively report bogus error messages about the domain already
    having been destroyed to the caller
    
    Signed-off-by: Daniel P. Berrange <berrange>


v1.0.1-349-g81621f3