Bug 699556

Summary: Executing system_powerdown caused 6.1-32 guest soft lockup - CPU#3 stuck for 63s! [migration/3:16]
Product: Red Hat Enterprise Linux 6 Reporter: Amos Kong <akong>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: ailan, ddutile, juzhang, michen, mkenneth, mshao, shuang, tburke, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-28 03:28:17 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 668775, 1225598    
Bug Blocks:    
Attachments:
Description Flags
serial output (it contains panic info) none

Description Amos Kong 2011-04-26 00:07:43 UTC
Description of problem:
Executing 'systrem_powerdown' through monitor to shutdown a rhel6.1-32 guest, it caused guest soft-lockup.


Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2-2.158.el6.x86_64
guest kernel: 2.6.32-131.0.1.el6.x86_64
host kernel: 2.6.32-131.0.1.el6.x86_64

How reproducible:
not always

Steps to Reproduce:
1. Boot up a rhel6.1-32 guest
2. execute monitor cmd
(qemu) system_powerdown

  
Actual results:
guest soft-lockup

Expected results:
guest can shutdown successfully

Additional info:
# qemu-kvm -name vm1 -chardev socket,id=qmp_monitor_id_qmpmonitor1,path=/tmp/monitor-qmpmonitor1-20110426-075054-rHtp,server,nowait -mon chardev=qmp_monitor_id_qmpmonitor1,mode=control -chardev socket,id=serial_id_20110426-075054-rHtp,path=/tmp/serial-20110426-075054-rHtp,server,nowait -device isa-serial,chardev=serial_id_20110426-075054-rHtp -drive file=/home/devel/autotest-devel/client/tests/kvm/images/RHEL-Server-6.1-32-virtio.qcow2,index=0,if=none,id=drive-virtio-disk1,media=disk,cache=none,format=qcow2,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -device virtio-net-pci,netdev=idgaclnb,mac=9a:8a:ec:e1:06:5e,id=ndev00idgaclnb,bus=pci.0,addr=0x3 -netdev tap,id=idgaclnb,vhost=on,ifname=t0-075054-rHtp,script=/home/devel/autotest-devel/client/tests/kvm/scripts/qemu-ifup-switch,downscript=no -m 2048 -smp 2,cores=1,threads=1,sockets=2 -cpu cpu64-rhel6,+sse2,+x2apic -vnc :0 -rtc base=utc,clock=host,driftfix=none -M rhel6.1.0 -boot order=cdn,once=c,menu=off -usbdevice tablet -no-kvm-pit-reinjection -enable-kvm

Comment 1 Amos Kong 2011-04-26 00:13:40 UTC
Created attachment 494775 [details]
serial output (it contains panic info)

Comment 2 Don Dutile (Red Hat) 2011-06-08 02:58:17 UTC
This may be another case of 668775.
I've requested a brew built rpm with the patches from 668775; should re-test with the five patches for smp_call_fcn() lockup.

Comment 3 Dor Laor 2011-07-24 07:23:52 UTC
Please retest with the solution of 668775 which is on_qa

Comment 4 Suqin Huang 2011-07-25 06:53:23 UTC
repeat around 300 times, can not reproduce it (2.6.32-131.0.1.el6.x86_64 & qemu-kvm-0.12.1.2-2.159.el6.x86_64), continue to test it.

Comment 5 Suqin Huang 2011-07-26 10:15:49 UTC
repeat 1000 times, can not reproduce it, akong can you reproduce it?
host:
2.6.32-167.el6.x86_64
qemu-kvm-0.12.1.2-2.171.el6.x86_64

guest: 2.6.32-131.0.15.el6

Comment 6 Amos Kong 2011-07-27 09:27:37 UTC
Hi shuang,

I also could not reproduce this bug in 2.6.32-131.0.15.el6
I remembered that the reproduce rate is very low.
So could you help to execute this test with latest guest kernel ?
we can close this bz if all the tests pass. thanks.

Comment 7 Suqin Huang 2011-07-28 03:28:17 UTC
repeat 200 times with guest kernel 2.6.32-131.6.1.el6, can not reproduce it, close it.