Bug 1044540

Summary: [3.13.0-0.rc4.git0.1.fc21] Rebooting/booting of an L2 guest result in L1 (KVM guest) crash with "soft lockup"
Product: [Fedora] Fedora Reporter: Kashyap Chamarthy <kchamart>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-18 15:52:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Full dmesg output from L1 guest.
none
dmidecode output on L1 guest none

Description Kashyap Chamarthy 2013-12-18 14:10:02 UTC
Description of problem:
-----------------------

I don't really have a reliable reproducer to obtain this soft lockup. 
Let me describe the two occurrences I noticed it.

On first occurrence -- this was with kernel-3.11.8-300.fc20 -- I noticed
(on serial console) the soft lockup on L1 crash when I tried to reboot an
L2 KVM guest (i.e. nested guest).

On second occurrence -- this was with kernel-3.13.0-0.rc4.git0.1.fc21 --
L1 crashed with the soft lockup when I tried to just *boot* (not reboot)
an L2 KVM guest.

    $ rpm -q kernel --changelog | grep "Disable soft lockup detector on virtual machines"
    - Disable soft lockup detector on virtual machines. (rhbz 971139)
    - Disable soft lockup detector on virtual machines. (rhbz 971139)
    - Disable soft lockup detector on virtual machines. (rhbz 971139)


However, in the guest (that crashed with soft lockup):

    $ dmesg | grep watchdog
    [    0.061707] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.


And this is indeed running inside a virtual machine:

    $ virt-what
    kvm
    $ file /dev/kvm
    /dev/kvm: character special
    $ cat /proc/cpuinfo | grep vmx 
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon rep_good nopl pni vmx ssse3 cx16 sse4_1 sse4_2 x2apic popcnt hypervisor lahf_lm vnmi ept tsc_adjust


Just to note: the Fedora Kernel patch which tries to disable the
watchdog in a VM describes the symptom as "for various reasons"

  http://pkgs.fedoraproject.org/cgit/kernel.git/tree/nowatchdog-on-virt.patch


Version:
--------

On L0 (physical host):

    $ uname -r; rpm -q libvirt-daemon-kvm qemu-system-x86
    3.13.0-0.rc4.git0.1.fc21.x86_64
    libvirt-daemon-kvm-1.2.0-1.fc20.x86_64
    qemu-system-x86-1.7.0-1.fc20.x86_64

On L1 (KVM guest):

    $ uname -r; rpm -q libvirt-daemon-kvm qemu-system-x86
    3.13.0-0.rc4.git0.1.fc21.x86_64
    libvirt-daemon-kvm-1.1.3.2-1.fc20.x86_64
    qemu-system-x86-1.6.1-2.fc20.x86_64


How reproducible:
-----------------
At-least twice.


Steps to Reproduce:
-------------------

In a nested KVM environment, reboot or boot an L2 KVM guest

Details to setup nested KVM environment

  https://github.com/kashyapc/nvmx-haswell/blob/master/SETUP-nVMX.rst

  (Note: the above outlines for Haswell, but, it should apply for
  any other recent CPU models)


Actual results:
---------------

KVM guest (L1) crash with a stack trace (from serial console):
===============
.
.
.
[  296.134004] BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:39]
[  296.134004] Modules linked in: vhost_net vhost macvtap macvlan tun nf_conntrack_ipv6 nf_defrag_ipv6 xt_mac xt_physdev veth 
.
.
.
ipt_MASQUERADE iptable_n
at nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables openvswitch vxlan ip_tunnel gre libcrc
32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack kvm_intel kvm virtio_net ppdev crc32c_intel i2c_piix4 virtio_balloon parport_pc i2c_cor
e parport serio_raw microcode nfsd auth_rpcgss nfs_acl lockd sunrpc virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi
[  296.134004] CPU: 3 PID: 39 Comm: kworker/3:1 Not tainted 3.13.0-0.rc4.git0.1.fc21.x86_64 #1

[  296.134004] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  296.134004] Workqueue: events jump_label_update_timeout
[  296.134004] task: ffff8801fce41140 ti: ffff8801fce68000 task.ti: ffff8801fce68000
[  296.134004] RIP: 0010:[<ffffffff810d560a>]  [<ffffffff810d560a>] smp_call_function_many+0x25a/0x2c0
[  296.134004] RSP: 0018:ffff8801fce69c98  EFLAGS: 00000202
[  296.134004] RAX: 0000000000000001 RBX: 0000000000000282 RCX: ffff880208098178
[  296.134004] RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000000
[  296.134004] RBP: ffff8801fce69ce8 R08: ffff8801fd388c00 R09: ffff880208196ee0
[  296.134004] R10: ffffea0007e9f0c0 R11: ffffffff8130d829 R12: ffff8801fce69c50
[  296.134004] R13: 0000000000000fca R14: 0000000000000000 R15: ffff8801fcc6e240
[  296.134004] FS:  0000000000000000(0000) GS:ffff880208180000(0000) knlGS:0000000000000000
[  296.134004] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  296.134004] CR2: 00000000020d4830 CR3: 00000001fc3d2000 CR4: 00000000000027e0
[  268.134004] Call Trace:
[  268.134004]  [<ffffffff81019bb0>] ? setup_data_read+0x90/0x90
[  268.134004]  [<ffffffffa01a9e7a>] ? apic_has_pending_timer+0x1a/0x90 [kvm]
[  268.134004]  [<ffffffff81019bb0>] ? setup_data_read+0x90/0x90
[  268.134004]  [<ffffffffa01a9e7b>] ? apic_has_pending_timer+0x1b/0x90 [kvm]
[  268.134004]  [<ffffffff810d56cd>] on_each_cpu+0x2d/0x60
[  268.134004]  [<ffffffffa01a9e7a>] ? apic_has_pending_timer+0x1a/0x90 [kvm]
[  268.134004]  [<ffffffff8101a38b>] text_poke_bp+0x5b/0xc0
[  268.134004]  [<ffffffffa01a9e7a>] ? apic_has_pending_timer+0x1a/0x90 [kvm]
[  268.134004]  [<ffffffff8101789c>] arch_jump_label_transform+0x8c/0x100
[  268.134004]  [<ffffffff811438ff>] __jump_label_update+0x5f/0x80
[  268.134004]  [<ffffffff811439bd>] jump_label_update+0x9d/0xb0
[  268.134004]  [<ffffffff81143c1d>] __static_key_slow_dec+0x4d/0xb0
[  268.134004]  [<ffffffff81143d36>] jump_label_update_timeout+0x16/0x20
[  268.134004]  [<ffffffff810829a6>] process_one_work+0x176/0x430
[  268.134004]  [<ffffffff810835db>] worker_thread+0x11b/0x3a0
[  268.134004]  [<ffffffff810834c0>] ? rescuer_thread+0x350/0x350
[  268.134004]  [<ffffffff8108a062>] kthread+0xd2/0xf0
[  268.134004]  [<ffffffff81089f90>] ? insert_kthread_work+0x40/0x40
[  268.134004]  [<ffffffff816849bc>] ret_from_fork+0x7c/0xb0
[  268.134004]  [<ffffffff81089f90>] ? insert_kthread_work+0x40/0x40
===============


Expected results:
-----------------

L1 (KVM guest) do not crash with a softlockup.


Additional info:
----------------

(0) Related bug: https://bugzilla.redhat.com/show_bug.cgi?id=971139 -- BUG: soft lockup - CPU#3 stuck for 27s! [mc:39870] 

(1) dmesg output from guest attached.

(2) dmidecode output from guest attached.

(3) QEMU command-line of (L1) guest:

    $ ps -ef | grep qemu
    qemu      3958     1 99 15:36 ?        00:54:41 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name ostack-compute -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu host -m 7813 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 3db96f52-11b0-41f5-bdb0-3a4605efaee5 -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ostack-compute.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/export/ostack-images/ostack-compute-sparse.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:58:76:13,bus=pci.0,addr=0x2 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
    
(4) The host was never under a heavy load. It's just running another
Fedora guest, just a few OpenStack daemons (no heavy network or disk
I/O).

(5) I checked on Fedora's Kernelmaster sources, nowatchdog-on-virt.patch 
is indeed applied, but not taking effect?

    $ fedpkg clone -B kernel
    $ grep nowatchdog kernel.spec 
    Patch15000: nowatchdog-on-virt.patch
    ApplyPatch nowatchdog-on-virt.patch

Comment 1 Kashyap Chamarthy 2013-12-18 14:18:13 UTC
Created attachment 838374 [details]
Full dmesg output from L1 guest.

Comment 2 Kashyap Chamarthy 2013-12-18 14:19:06 UTC
Created attachment 838375 [details]
dmidecode output on L1 guest

Comment 3 Kashyap Chamarthy 2013-12-18 14:34:18 UTC
1/ Full stack trace with 3.13.0-0.rc4.git0.1.fc21:
================
[  268.134004] Code: 67 c2 00 89 c2 39 f0 0f 8d 33 fe ff ff 48 98 48 8b 0b 48 03 0c c5 20 77 cf 81 f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 f6 41 20 01 
<75> f8 48 63 35 d1 66 c2 00 eb b7 0f b6 4d b4 48 8b 75 c0 4c 89 
[  296.134004] BUG: soft lockup - CPU#3 stuck for 22s! [kworker/3:1:39]
[  296.134004] Modules linked in: vhost_net vhost macvtap macvlan tun nf_conntrack_ipv6 nf_defrag_ipv6 xt_mac xt_physdev veth ipt_MASQUERADE iptable_n
at nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables openvswitch vxlan ip_tunnel gre libcrc
32c nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack kvm_intel kvm virtio_net ppdev crc32c_intel i2c_piix4 virtio_balloon parport_pc i2c_cor
e parport serio_raw microcode nfsd auth_rpcgss nfs_acl lockd sunrpc virtio_blk virtio_pci virtio_ring virtio ata_generic pata_acpi
[  296.134004] CPU: 3 PID: 39 Comm: kworker/3:1 Not tainted 3.13.0-0.rc4.git0.1.fc21.x86_64 #1

[  296.134004] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  296.134004] Workqueue: events jump_label_update_timeout
[  296.134004] task: ffff8801fce41140 ti: ffff8801fce68000 task.ti: ffff8801fce68000
[  296.134004] RIP: 0010:[<ffffffff810d560a>]  [<ffffffff810d560a>] smp_call_function_many+0x25a/0x2c0
[  296.134004] RSP: 0018:ffff8801fce69c98  EFLAGS: 00000202
[  296.134004] RAX: 0000000000000001 RBX: 0000000000000282 RCX: ffff880208098178
[  296.134004] RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000000
[  296.134004] RBP: ffff8801fce69ce8 R08: ffff8801fd388c00 R09: ffff880208196ee0
[  296.134004] R10: ffffea0007e9f0c0 R11: ffffffff8130d829 R12: ffff8801fce69c50
[  296.134004] R13: 0000000000000fca R14: 0000000000000000 R15: ffff8801fcc6e240
[  296.134004] FS:  0000000000000000(0000) GS:ffff880208180000(0000) knlGS:0000000000000000
[  296.134004] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  296.134004] CR2: 00000000020d4830 CR3: 00000001fc3d2000 CR4: 00000000000027e0
[  268.134004] Call Trace:
[  268.134004]  [<ffffffff81019bb0>] ? setup_data_read+0x90/0x90
[  268.134004]  [<ffffffffa01a9e7a>] ? apic_has_pending_timer+0x1a/0x90 [kvm]
[  268.134004]  [<ffffffff81019bb0>] ? setup_data_read+0x90/0x90
[  268.134004]  [<ffffffffa01a9e7b>] ? apic_has_pending_timer+0x1b/0x90 [kvm]
[  268.134004]  [<ffffffff810d56cd>] on_each_cpu+0x2d/0x60
[  268.134004]  [<ffffffffa01a9e7a>] ? apic_has_pending_timer+0x1a/0x90 [kvm]
[  268.134004]  [<ffffffff8101a38b>] text_poke_bp+0x5b/0xc0
[  268.134004]  [<ffffffffa01a9e7a>] ? apic_has_pending_timer+0x1a/0x90 [kvm]
[  268.134004]  [<ffffffff8101789c>] arch_jump_label_transform+0x8c/0x100
[  268.134004]  [<ffffffff811438ff>] __jump_label_update+0x5f/0x80
[  268.134004]  [<ffffffff811439bd>] jump_label_update+0x9d/0xb0
[  268.134004]  [<ffffffff81143c1d>] __static_key_slow_dec+0x4d/0xb0
[  268.134004]  [<ffffffff81143d36>] jump_label_update_timeout+0x16/0x20
[  268.134004]  [<ffffffff810829a6>] process_one_work+0x176/0x430
[  268.134004]  [<ffffffff810835db>] worker_thread+0x11b/0x3a0
[  268.134004]  [<ffffffff810834c0>] ? rescuer_thread+0x350/0x350
[  268.134004]  [<ffffffff8108a062>] kthread+0xd2/0xf0
[  268.134004]  [<ffffffff81089f90>] ? insert_kthread_work+0x40/0x40
[  268.134004]  [<ffffffff816849bc>] ret_from_fork+0x7c/0xb0
[  268.134004]  [<ffffffff81089f90>] ? insert_kthread_work+0x40/0x40
[  268.134004] Code: 67 c2 00 89 c2 39 f0 0f 8d 33 fe ff ff 48 98 48 8b 0b 48 03 0c c5 20 77 cf 81 f6 41 20 01 74 cd 0f 1f 44 00 00 f3 90 f6 41 20 01 <75> f8 48 63 35 d1 66 c2 00 eb b7 0f b6 4d b4 48 8b 75 c0 4c 89 
================

2/ Stack trace with 3.11.8-300.fc20 (noticed on L1 guest's serial console):
================
[ 1140.060007] Code: f0 ff 4b 24 0f 94 c1 84 c9 44 89 e2 74 12 8b 43 20 8b 4b 10 8d 50 01 89 4b 24 89 53 20 44 89 e2 83 fa 04 74 20 f3 90 44 8b 63 20 
<41> 39 d4 74 f0 41 83 fc 02 75 bf fa 66 66 90 66 66 90 eb c1 66 
[ 1152.076005] BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:12]
[ 1152.076005] Modules linked in: vhost_net vhost macvtap macvlan tun nf_conntrack_ipv6 nf_defrag_ipv6 xt_mac xt_physdev veth ipt_MASQUERADE iptable_n
at nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle bridge stp llc ip6table_filter ip6_tables ebtable_nat ebtables openvswitch gre nf_conntrack_ipv4 nf_d
efrag_ipv4 xt_conntrack nf_conntrack joydev virtio_balloon kvm_intel i2c_piix4 i2c_core ppdev kvm virtio_net serio_raw crc32c_intel parport_pc parport
 mperf microcode nfsd auth_rpcgss nfs_acl lockd sunrpc virtio_blk virtio_pci virtio_ring ata_generic virtio pata_acpi
[ 1152.076005] CPU: 1 PID: 12 Comm: migration/1 Not tainted 3.11.8-300.fc20.x86_64 #1
[ 1152.076005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1152.076005] task: ffff8801fd396ac0 ti: ffff8801fcc24000 task.ti: ffff8801fcc24000
[ 1152.076005] RIP: 0010:[<ffffffff810dec76>]  [<ffffffff810dec76>] stop_machine_cpu_stop+0x86/0xe0
[ 1152.076005] RSP: 0018:ffff8801fcc25d90  EFLAGS: 00000293
[ 1152.076005] RAX: ffffffff81cec880 RBX: ffff8801fcc60540 RCX: dead000000200200
[ 1152.076005] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8801f9f21cd8
[ 1168.099005] RBP: ffff8801fcca9db0 R08: ffff88020818e900 R09: 0000000000000000
[ 1168.099005] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8801fcca9ddc
[ 1168.099005] R13: 0000000000014180 R14: 0000000200000003 R15: ffffffff810a2a33
[ 1168.099005] FS:  0000000000000000(0000) GS:ffff880208180000(0000) knlGS:0000000000000000
[ 1168.099005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1168.099005] CR2: 000000000047ca40 CR3: 00000001f9d77000 CR4: 00000000000027e0
[ 1168.099005] Stack:
[ 1168.099005]  ffff8801f9f21c58 ffff88020818e930 ffff8801f9f21cd8 ffffffff810debf0
[ 1168.099005]  ffff8801fcca9e78 ffffffff810deb42 ffff88020818e938 0000000000000000
[ 1168.099005]  0000000000000000 ffff880208194180 ffff8801fbf32680 0000000000000000
[ 1168.099005] Call Trace:
[ 1168.099005]  [<ffffffff810debf0>] ? cpu_stopper_thread+0x130/0x130
[ 1168.099005]  [<ffffffff810deb42>] cpu_stopper_thread+0x82/0x130
[ 1168.099005]  [<ffffffff81647f3a>] ? __schedule+0x2ba/0x750
[ 1168.099005]  [<ffffffff8109085f>] smpboot_thread_fn+0xff/0x1a0
[ 1168.099005]  [<ffffffff81090760>] ? lg_local_lock_cpu+0x40/0x40
[ 1168.099005]  [<ffffffff81088660>] kthread+0xc0/0xd0
[ 1168.099005]  [<ffffffff810885a0>] ? insert_kthread_work+0x40/0x40
[ 1168.099005]  [<ffffffff816528ac>] ret_from_fork+0x7c/0xb0
[ 1168.099005]  [<ffffffff810885a0>] ? insert_kthread_work+0x40/0x40
[ 1168.099005] Code: 84 ed 75 5d f0 ff 4b 24 0f 94 c1 84 c9 44 89 e2 74 12 8b 43 20 8b 4b 10 8d 50 01 89 4b 24 89 53 20 44 89 e2 83 fa 04 74 20 f3 90 
<44> 8b 63 20 41 39 d4 74 f0 41 83 fc 02 75 bf fa 66 66 90 66 66 
================

Comment 4 Josh Boyer 2013-12-18 15:52:23 UTC
The soft lockups are basically the result of the host not scheduling the guest to run in a sufficient period of time.  At least as far as we can tell.  We have a patch to prevent them from triggering on KVM guests, but it wasn't working.

It was trying to do a DMI match against "Bochs Bochs" for the Vendor, but the Vendor is just "Bochs".  I've fixed the patch in F19-rawhide now and I see the message I would expect on an F20 KVM guest:

[    0.030000] watchdog: disabled (inside virtual machine)


Should be in tomorrow's rawhide.

Comment 5 Kashyap Chamarthy 2013-12-18 16:45:39 UTC
(In reply to Josh Boyer from comment #4)
> The soft lockups are basically the result of the host not scheduling the
> guest to run in a sufficient period of time.  At least as far as we can
> tell.  We have a patch to prevent them from triggering on KVM guests, but it
> wasn't working.
> 
> It was trying to do a DMI match against "Bochs Bochs" for the Vendor, but
> the Vendor is just "Bochs".  I've fixed the patch in F19-rawhide now and I
> see the message I would expect on an F20 KVM guest:
> 
> [    0.030000] watchdog: disabled (inside virtual machine)
> 
> 
> Should be in tomorrow's rawhide.

Thanks for the patch, Josh.

It'd be nicer if the bug was  closed *after* the successful build lands in Rawhide and note the NVR in "Fixed In Version".

Comment 6 Kashyap Chamarthy 2013-12-18 19:17:59 UTC
Ok, that's the version Josh is referring to in his previous comment: kernel-3.13.0-0.rc4.git2.1.fc21