Bug 1026142

Summary: soft lockup on reboot or poweroff
Product: [Fedora] Fedora Reporter: Chris Murphy <bugzilla>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 20CC: collura, gansalmon, itamar, jonathan, kernel-maint, kparal, madhu.chinakonda, satellitgo
Target Milestone: ---Flags: jforbes: needinfo?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-17 18:43:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
virsh console output
none
dmesg for host
none
virsh console output, guest 3.11.6-302 debug
none
host dmesg after sysrq+w none

Description Chris Murphy 2013-11-04 03:21:23 UTC
Created attachment 818863 [details]
virsh console output

Description of problem: Fedora 20 Live Desktop Beta RC2 guest running in qemu/kvm on Fedora 20 host, consisently fails to reboot/poweroff normally, or reboot -f or poweroff -f.

Occurs when booting the Live Desktop Beta RC2 ISO, or the installed not yet updated system produced by that ISO.


Version-Release number of selected component (if applicable):
Guest= 3.11.6-300.fc20.x86_64
Host= 3.11.6-301.fc20.x86_64
qemu-1.6.0-10.fc20.x86_64

How reproducible:
Always with this combination. Regression to Live Desktop TC6, cannot reproduce this bug.

Steps to Reproduce:
1. virsh start fedora20 initiates the qemu process listed below
2. reboot -f or poweroff -f

via virsh console I'm able to capture the lockup messages, I am also logged in but the console isn't responsive.


Actual results:
[root@localhost ~]# sync && reboot -f
Accepted connection on private bus.
Rebooting.
[  184.036030] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  212.036022] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  222.060036] INFO: rcu_sched self-detected stall on CPU { 1}  (t=60000 jiffies g=7656 c=7655 q=127)
[  248.036025] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  276.036027] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  304.036017] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  332.036032] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  360.036031] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  388.036032] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  402.063035] INFO: rcu_sched self-detected stall on CPU { 1}  (t=240003 jiffies g=7656 c=7655 q=1225)
[  428.036043] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  456.036041] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  484.036024] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  512.036017] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  540.036041] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  568.036016] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  582.066022] INFO: rcu_sched self-detected stall on CPU { 1}  (t=420006 jiffies g=7656 c=7655 q=1789)
[  608.036028] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  636.036028] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  664.036040] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  692.036049] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  720.036040] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  748.036035] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  762.069038] INFO: rcu_sched self-detected stall on CPU { 1}  (t=600009 jiffies g=7656 c=7655 q=2801)
[  788.036040] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  816.036035] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  844.036038] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  872.036029] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  900.036031] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  928.036031] BUG: soft lockup - CPU#1 stuck for 23s! [reboot:1724]
[  942.072021] INFO: rcu_sched self-detected stall on CPU { 1}  (t=780012 jiffies g=7656 c=7655 q=3801)
[  968.036025] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[  996.036028] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[ 1024.036037] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[ 1052.036029] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[ 1080.036013] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[ 1108.036034] BUG: soft lockup - CPU#1 stuck for 22s! [reboot:1724]
[ 1122.075040] INFO: rcu_sched self-detected stall on CPU { 1}  (t=960015 jiffies g=7656 c=7655 q=4529)

Expected results:
Not this.

Additional info:

qemu command

/usr/bin/qemu-system-x86_64 -machine accel=kvm -name fedora20 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -m 1536 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid bf76abde-a035-42a2-b010-9f35c2f8f3d2 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/fedora20.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -no-kvm-pit-reinjection -no-hpet -no-shutdown -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/var/lib/libvirt/images/fedoraB.img,if=none,id=drive-virtio-disk0,format=qcow2,cache=unsafe,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/var/lib/libvirt/images/fedoraA.img,if=none,id=drive-virtio-disk1,format=qcow2,cache=unsafe,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/data/Fedora-Live-Desktop-x86_64-20-Beta-2.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw,cache=unsafe,aio=threads -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ea:fc:87,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -vnc 127.0.0.1:0 -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8

Comment 1 Chris Murphy 2013-11-04 03:24:14 UTC
Created attachment 818864 [details]
dmesg for host

EFI booted Apple Inc. MacBookPro4,1

Comment 2 Chris Murphy 2013-11-04 05:01:15 UTC
Created attachment 818880 [details]
virsh console output, guest 3.11.6-302 debug

1. Updated the host kernel to 3.11.6-302 and the same problem happens.
2. Update the guest kernel to kernel-debug-3.11.6-302.fc20.x86_64.
3. After reboot and ssh'ing in, issue dmesg -n7.
4. From host, virsh console guest.
5. Back to guest ssh shell issue 'sync && reboot -f'
6. The virsh console logs a bunch of stuff. That's what this attachment is with lots of call traces.

I don't know if it's the dmesg -n or the debug kernel or the debug kernel parameters that are producing the extra info.

This guest console isn't responsive, output only, so I can't sysrq+w.

Comment 3 Chris Murphy 2013-11-04 05:04:30 UTC
Created attachment 818881 [details]
host dmesg after sysrq+w

Issued sysrq+w then dmesg on host, right after comment 2 (which is guest info).

Comment 4 Chris Murphy 2013-11-04 05:35:20 UTC
After ~15 reboots/shutdowns I've got 10 in a row with this soft lock. And now nada, it reboots immediately and I can't reproduce it. And I haven't changed anything. Lovely.

Comment 5 Kamil Páral 2013-11-04 12:23:12 UTC
I can confirm this problem, VMs often fail to shut down and hog CPU at 100%. It does not happen always for me, but quite often, and I see it throughout the whole F20 cycle.

Comment 6 Justin M. Forbes 2014-02-24 13:53:13 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs.

Fedora 20 has now been rebased to 3.13.4-200.fc20.  Please test this kernel update and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you experience different issues, please open a new bug report for those.

Comment 7 Justin M. Forbes 2014-03-17 18:43:09 UTC
*********** MASS BUG UPDATE **************

This bug has been in a needinfo state for several weeks and is being closed with insufficient data due to inactivity. If this is still an issue with Fedora 20, please feel free to reopen the bug and provide the additional information requested.