Red Hat Bugzilla – Bug 903454
kvm guest crash after long stop/cont cycle
Last modified: 2013-11-21 01:32:04 EST
Description of problem: ------------------------ In bug 739151, kvm suppress cpu softlockup message, but with x-data-plan=on, it does not work, this bug is to track the issue. Version-Release number of selected component (if applicable): --------------------------------------------------------------- Test with kernel 355 in guest, 351 in host, and qemu-kvm 351 How reproducible: ------------------ 2/2 Steps to Reproduce: -------------------------- 1.add stress in guest: for(( I=0; I<`cat /proc/cpuinfo | grep processor | wc -l`;I++)) ; do echo $I; taskset -c $I /bin/bash -c 'for ((;;)); do X=1; done &' ; done 2.stop guest for a whole night 3.cont guest, still hit the cpu softlockup, and guest finally hangs in minutes cmd has nothing special, expect for i use x-data-plane: /usr/libexec/qemu-kvm -enable-kvm -M rhel6.4.0 -smp 4 -m 4G -name rhel6.4-64 -uuid 3f2ea5cd-3d29-48ff-aab2-23df1b6ae213 -drive file=RHEL-Server-6.4-64-virtio.raw,cache=none,if=none,rerror=stop,werror=stop,id=drive-virtio-disk0,format=raw,aio=native -device virtio-blk-pci,drive=drive-virtio-disk0,id=device-virtio-disk0,x-data-plane=on,scsi=off -netdev tap,script=/etc/qemu-ifup,id=netdev0 -device virtio-net-pci,netdev=netdev0,id=device-net0,mac=aa:54:00:11:22:33 -boot order=cd -monitor stdio -usb -device usb-tablet,id=input0 -chardev socket,id=s1,path=/tmp/s1,server,nowait -device isa-serial,chardev=s1 -vnc :10 -monitor tcp::1234,server,nowait -rtc base=localtime,clock=host,driftfix=slew Actual results: ---------------- call trace, guest hangs in several minutes. There are several call trace, i put them in the sequence they occur: BUG: soft lockup - CPU#0 stuck for 76635s! [bash:2158] Modules linked in: fuse sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput p CPU 0 Modules linked in: fuse sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput p Pid: 2158, comm: bash Not tainted 2.6.32-355.el6.x86_64 #1 Red Hat KVM RIP: 0010:[<ffffffff810817f6>] [<ffffffff810817f6>] run_timer_softirq+0x176/0x340 RSP: 0000:ffff880028203e50 EFLAGS: 00000246 RAX: ffff880028203e90 RBX: ffff880028203ed0 RCX: 0000000000000ca0 RDX: ffff880028203e90 RSI: dead000000200200 RDI: 0000000000000000 RBP: ffffffff8100bb93 R08: ffffffff81eae200 R09: 0000000000000000 R10: 00000000b2dc04f6 R11: ffff880028203cc0 R12: ffff880028203dd0 R13: ffff880119b110a8 R14: ffffffff81ead540 R15: ffffffff815168eb FS: 00007fe0946a2700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000368b21c2b0 CR3: 0000000118d49000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 2158, threadinfo ffff88011a4ae000, task ffff880118eb0040) Stack: ffff880028203e60 ffffffff81eaf160 ffffffff81eaed60 ffffffff81eae960 <d> ffff88011a4affd8 ffff88011a4affd8 000001002820e0e0 ffffffff81eae560 <d> ffff880028203e90 ffff880028203e90 0000000000008a7f 0000000000000141 Call Trace: <IRQ> [<ffffffff81076fb1>] ? __do_softirq+0xc1/0x1e0 [<ffffffff8109b75b>] ? hrtimer_interrupt+0x14b/0x260 [<ffffffff8100c1cc>] ? call_softirq+0x1c/0x30 [<ffffffff8100de05>] ? do_softirq+0x65/0xa0 [<ffffffff81076d95>] ? irq_exit+0x85/0x90 [<ffffffff815168f0>] ? smp_apic_timer_interrupt+0x70/0x9b [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20 <EOI> Code: 48 be 00 02 20 00 00 00 ad de 48 89 42 08 48 89 10 49 c7 04 24 00 00 00 00 49 89 74 24 08 c7 03 00 00 00 00 fb 66 0f 1f 44 00 00 <48> 8b 55 a8 8b 0d 88 aa b4 00 8b 92 44 e0 ff ff 85 c9 89 55 b4 BUG: soft lockup - CPU#0 stuck for 67s! [bash:2158] Modules linked in: fuse sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput ppdev parport_pc parport sg microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] CPU 0 Modules linked in: fuse sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput ppdev parport_pc parport sg microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 2158, comm: bash Not tainted 2.6.32-355.el6.x86_64 #1 Red Hat KVM RIP: 0033:[<000000000042c4ca>] [<000000000042c4ca>] 0x42c4ca RSP: 002b:00007fff3ecb9ff0 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000368af8ee88 RDX: 000000000078ad70 RSI: 0000000000000000 RDI: 000000368af8ee80 RBP: ffffffff8100bb8e R08: fefefefefefefeff R09: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: 00007fff3ecba124 R13: 0000000000000000 R14: 0000000000000000 R15: 000000000078a4d0 FS: 00007fe0946a2700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2703ff0000 CR3: 0000000118d49000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 2158, threadinfo ffff88011a4ae000, task ffff880118eb0040) Call Trace: Message from syslogd@dhcp-8-128 at Jan 24 05:33:35 ... kernel:BUG: soft lockup - CPU#0 stuck for 67s! [bash:2158] INFO: task dbus-daemon-lau:2376 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dbus-daemon-l D 0000000000000000 0 2376 1 0x00000080 ffff88011a06dce8 0000000000000086 0000000000000000 ffff88011af26aa0 ffff88011af26ad8 0000000000000000 ffff88011a06dca8 ffffffff81064a00 ffff88011af27058 ffff88011a06dfd8 000000000000fb88 ffff88011af27058 Call Trace: [<ffffffff81064a00>] ? pick_next_task_fair+0xd0/0x130 [<ffffffff8150d36f>] ? thread_return+0x16d/0x76e [<ffffffff8150e0c5>] schedule_timeout+0x215/0x2e0 [<ffffffff810669eb>] ? enqueue_rt_entity+0x6b/0x80 [<ffffffff8150dd43>] wait_for_common+0x123/0x180 [<ffffffff81063310>] ? default_wake_function+0x0/0x20 [<ffffffff8150de5d>] wait_for_completion+0x1d/0x20 [<ffffffff8106513c>] sched_exec+0xdc/0xe0 [<ffffffff81189ed0>] do_execve+0xe0/0x2c0 [<ffffffff810095ea>] sys_execve+0x4a/0x80 [<ffffffff8100b4ca>] stub_execve+0x6a/0xc0 INFO: task nm-dispatcher.a:2382 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nm-dispatcher D 0000000000000002 0 2382 2381 0x00000080 ffff8801131a3ce8 0000000000000086 ffff8801131a3c98 ffffffff8100988e ffff88011b3fc078 0000000000000000 00000000011a3ca8 ffff880028314280 ffff88011c835058 ffff8801131a3fd8 000000000000fb88 ffff88011c835058 Call Trace: [<ffffffff8100988e>] ? __switch_to+0x26e/0x320 [<ffffffff8150d250>] ? thread_return+0x4e/0x76e [<ffffffff8150e0c5>] schedule_timeout+0x215/0x2e0 [<ffffffff810669eb>] ? enqueue_rt_entity+0x6b/0x80 [<ffffffff8150dd43>] wait_for_common+0x123/0x180 [<ffffffff81063310>] ? default_wake_function+0x0/0x20 [<ffffffff8150de5d>] wait_for_completion+0x1d/0x20 [<ffffffff8106513c>] sched_exec+0xdc/0xe0 [<ffffffff81189ed0>] do_execve+0xe0/0x2c0 [<ffffffff810095ea>] sys_execve+0x4a/0x80 [<ffffffff8100b4ca>] stub_execve+0x6a/0xc0 INFO: task crond:2386 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. crond D 0000000000000002 0 2386 2385 0x00000080 ffff8801124c7ce8 0000000000000086 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 ffff88011a145098 ffff8801124c7fd8 000000000000fb88 ffff88011a145098 Call Trace: [<ffffffffa00b3bf0>] ? ext4_file_open+0x0/0x130 [ext4] [<ffffffff8150e0c5>] schedule_timeout+0x215/0x2e0 [<ffffffff8117e364>] ? nameidata_to_filp+0x54/0x70 [<ffffffff81277059>] ? cpumask_next_and+0x29/0x50 [<ffffffff8150dd43>] wait_for_common+0x123/0x180 [<ffffffff81063310>] ? default_wake_function+0x0/0x20 [<ffffffff8150de5d>] wait_for_completion+0x1d/0x20 [<ffffffff8106513c>] sched_exec+0xdc/0xe0 [<ffffffff81189ed0>] do_execve+0xe0/0x2c0 [<ffffffff810095ea>] sys_execve+0x4a/0x80 [<ffffffff8100b4ca>] stub_execve+0x6a/0xc0 kernel:BUG: soft lockup - CPU#0 stuck for 68s! [bash:2158] BUG: soft lockup - CPU#2 stuck for 67s! [bash:2170] Modules linked in: fuse sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput ppdev parport_pc parport sg microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] CPU 2 Modules linked in: fuse sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput ppdev parport_pc parport sg microcode virtio_net i2c_piix4 i2c_core ext4 mbcache jbd2 virtio_blk sr_mod cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 2170, comm: bash Not tainted 2.6.32-355.el6.x86_64 #1 Red Hat KVM RIP: 0010:[<ffffffff8118f03d>] [<ffffffff8118f03d>] path_put+0x1d/0x40 RSP: 0018:ffff88011a48df28 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88011a48df38 RCX: ffff88011a48c000 RDX: 0000000000000080 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffffff8100bb8e R08: 0000000000000000 R09: ffff000000000000 R10: 0000000000000008 R11: 0000000000000246 R12: 00000000006dd7e8 R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000246 FS: 00007f7fcd515700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f27048c30d8 CR3: 000000011ac95000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 2170, threadinfo ffff88011a48c000, task ffff88011b3fc040) Stack: ffff88011a48df78 ffff88011b1ae400 ffff88011a48df78 ffffffff810dc4a4 <d> 0000000000000000 00000000006dd7a0 0000000001bcc4d0 0000000000000000 <d> 0000000000000000 00007fff716f6f44 00007fff716f6f44 ffffffff8100b1a5 Call Trace: [<ffffffff810dc4a4>] ? __audit_syscall_exit+0x1a4/0x290 [<ffffffff8100b1a5>] ? sysret_audit+0x17/0x21 Code: 00 8b 45 d8 e9 30 ff ff ff 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 89 fb 48 8b 7f 08 e8 e6 bc 00 00 48 8b 3b <48> 85 ff 74 0f c7 87 d4 00 00 00 00 00 00 00 e8 cf 28 01 00 48 Call Trace: [<ffffffff8118f03a>] ? path_put+0x1a/0x40 [<ffffffff810dc4a4>] ? __audit_syscall_exit+0x1a4/0x290 [<ffffffff8100b1a5>] ? sysret_audit+0x17/0x21
I'll try to reproduce this with 4 VMs tonight. 2 with x-data-plane=on and 2 with x-data-plane=off. Since there is no I/O going on and virtio-blk-data-plane does not block vcpus from executing code, it's hard to see how x-data-plane=on could cause softlockups inside the guest.
I reproduced the soft lockup without a hard disk! This is not related to virtio-blk-data-plane. Live ISO: rhel-workstation-x86_64-live-6.4_20130123.0-1.iso Guest kernel: 2.6.32-356.el6.x86_64 Host kernel: F18 3.7 (includes kvmclock patches for stopped bit) qemu-kvm: qemu-kvm-0.12.1.2-2.352.el6 Guest CPU count: 3 kernel:BUG: soft lockup - CPU#0 stuck for 1520s! [bash:2529] It seems that Bug 739151 may not be resolved completely.
According to https://bugzilla.redhat.com/show_bug.cgi?id=739151#c42. --- snip log from comment42--- Let's use the other bug (bug 883746) reported for problems in this feature. If the problem looks different, please open a new bug. Re-assign this bug to amit. Hi, Amit Any mistake, please fix me. Best Regards & Thanks, Junyi
Test on qemu-kvm-0.12.1.2-2.380.el6 with steps in comment 0, after resumise the guest, no call trace, only one message: Clocksource tsc unstable (delta = 2091710798 ns). Enable clocksource failover by adding clocksource_failover kernel parameter. Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1553.html