Bug 794691

Summary: qemu-kvm core dumps when being killed
Product: Red Hat Enterprise Linux 6 Reporter: Xiaoqing Wei <xwei>
Component: qemu-kvmAssignee: Kevin Wolf <kwolf>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.3CC: acathrow, areis, bsarathy, juzhang, michen, mkenneth, shuang, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-04-12 10:24:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
gdb thread apply all bt full none

Description Xiaoqing Wei 2012-02-17 10:29:33 UTC
Description of problem:
qemu-kvm core dumps when being killed

Version-Release number of selected component (if applicable):

qemu-kvm-0.12.1.2-2.231.el6.x86_64
How reproducible:
met it twice out of 20 attempts

Steps to Reproduce:
1.install a win7.32 guest by:
qemu-kvm -monitor stdio -chardev socket,id=serial_id_20120206-132300-0aWu,path=/tmp/serial-20120206,server,nowait -device isa-serial,chardev=serial_id_20120206-132300-0aWu -drive file='win7-32.qcow2',index=0,if=none,id=drive-ide0-0-0,media=disk,cache=none,format=qcow2,aio=native -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device rtl8139,netdev=idf3e0Fy,mac=9a:38:bf:b1:ae:f9,id=ndev00idf3e0Fy,bus=pci.0,addr=0x3 -netdev tap,id=idf3e0Fy,fd=21 -m 4G -smp 4,cores=1,threads=2,sockets=2 -drive file='619077.iso',index=1,if=none,id=drive-ide0-0-1,media=cdrom,readonly=on,format=raw -device ide-drive,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -drive file='winutils.iso',index=2,if=none,id=drive-ide0-1-0,media=cdrom,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file='virtio-win.iso',index=3,if=none,id=drive-ide0-1-1,media=cdrom,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -cpu host -fda 'answer.vfd' -spice port=8000,disable-ticketing -vga qxl -rtc base=localtime,clock=host,driftfix=slew -M rhel6.3.0 -boot order=cdn,once=d,menu=off   -enable-kvm
2.kill the qemu-kvm process
3. qemu-img check win7-32.qcow2
...
Leaked cluster 40039 refcount=1 reference=0
Leaked cluster 40040 refcount=1 reference=0
ERROR cluster 40041 refcount=1 reference=6

  
Actual results:
qemu-kvm core dumps and disk corrupted

Expected results:
qemu-kvm killed without core dump.

Additional info:

kernel-2.6.32-220.el6.x86_64
qemu-kvm-0.12.1.2-2.231.el6.x86_64
qemu-img-0.12.1.2-2.231.el6.x86_64

Comment 2 Xiaoqing Wei 2012-02-17 10:31:43 UTC
Created attachment 563879 [details]
gdb thread apply all bt full

Comment 3 Kevin Wolf 2012-03-14 10:37:28 UTC
"killing" qemu means sending SIGTERM here? At which point during the installation do you kill it? How did you create the image file (size, options)?

Comment 4 Xiaoqing Wei 2012-03-14 12:04:09 UTC
(In reply to comment #3)
> "killing" qemu means sending SIGTERM here? At which point during the
   kill -9 $qemu-pid

> installation do you kill it? How did you create the image file (size, options)?

I encountered once / twice manually, on starting the copy process. 

images created by:
qemu-img create -f qcow2 xxx.qcow2 20G

Comment 5 Kevin Wolf 2012-03-14 12:36:37 UTC
How can qemu-kvm abort and generate a core dump when you used SIGKILL?

Comment 6 Xiaoqing Wei 2012-03-23 12:48:01 UTC
(In reply to comment #5)
> How can qemu-kvm abort and generate a core dump when you used SIGKILL?

Oops, maybe is SIGTERM / kill -15.

I found that can be reproduced sometimes when install guest by autotest, and then ctrl+c to end the autotest process, that might pass -15 to qemu-kvm instead of -9.

Comment 7 Kevin Wolf 2012-04-11 13:40:15 UTC
This may be related to bug 798857.

Can you please try if the following scratch build fixes the problem? https://brewweb.devel.redhat.com/taskinfo?taskID=4281327

Comment 8 Xiaoqing Wei 2012-04-12 01:45:21 UTC
(In reply to comment #7)
> This may be related to bug 798857.
> 
> Can you please try if the following scratch build fixes the problem?
> https://brewweb.devel.redhat.com/taskinfo?taskID=4281327

Sure, will update then.

Comment 9 Xiaoqing Wei 2012-04-12 06:21:52 UTC
(In reply to comment #7)
> This may be related to bug 798857.
> 
> Can you please try if the following scratch build fixes the problem?
> https://brewweb.devel.redhat.com/taskinfo?taskID=4281327

still able to reproduce with 
qemu-kvm-0.12.1.2-2.272.el6.kwolf_drain_on_close_3.x86_64,
1) but this time I am using kill -6, 
2) tried 10+ installation killed with ctrl-C didn't reproduce.


btw, I re-check the attachment gdb output of this bug, seems it was killed by signal 6 when reporting, sorry for incorrect info in comment 6.

Comment 10 Kevin Wolf 2012-04-12 09:06:16 UTC
Signal 6 is SIGABRT, a core dump is expected there.

What shouldn't happen is corrupted images. Do you still get messages like "ERROR cluster 40041 refcount=1 reference=6" in qemu-img check?

Comment 11 Kevin Wolf 2012-04-12 09:09:57 UTC
Oh, and in the original case the SIGABRT is not what you did. I believe you really did a SIGTERM and qemu tried to shut down in response. It's just that during the shutdown something went wrong (an assertion failed) and qemu called abort(), which uses SIGABRT internally.

Comment 12 Xiaoqing Wei 2012-04-12 09:47:15 UTC
(In reply to comment #10)
> Signal 6 is SIGABRT, a core dump is expected there.
> 
> What shouldn't happen is corrupted images. Do you still get messages like
> "ERROR cluster 40041 refcount=1 reference=6" in qemu-img check?

is shows "Leaked clusters were noticed during image check. No data integrity problem was found though." this time.

Comment 13 Kevin Wolf 2012-04-12 10:24:02 UTC
Thanks for testing. It seems to be fixed by this patch then.

*** This bug has been marked as a duplicate of bug 798857 ***