Bug 562699
Summary: | qemu-kvm occasionally goes to take up all cpu and becomes unresponsive | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kevin Fenzi <kevin> | ||||||||||
Component: | qemu | Assignee: | Justin M. Forbes <jforbes> | ||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 12 | CC: | amit.shah, berrange, dwmw2, ehabkost, gcosta, itamar, jaswinder, jforbes, markmc, mtosatti, nenad, scottt.tw, virt-maint | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2010-12-03 23:09:41 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Kevin Fenzi
2010-02-08 03:12:15 UTC
More info today: The guest didn't crash, but load spiked way up and it became unresponsive a few times today. When it went into this state load went up to 40-50 or so, and I couldn't login via the net. However, I could login on a console and the machine was responsive. I saw a number of processes in D state. The host disk was idle. The guest disk was idle. After about 10min it recovered and went back to normal. ;( dmesg on the guest: BUG: soft lockup - CPU#1 stuck for 4096s! [munin-graph:6294] Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] CPU 1: Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Pid: 6294, comm: munin-graph Not tainted 2.6.32.7-37.fc12.x86_64 #1 RIP: 0010:[<ffffffff8102fdc7>] [<ffffffff8102fdc7>] kvm_mmu_op+0x3c/0x55 RSP: 0018:ffff88017bfd7ca8 EFLAGS: 00000293 RAX: 0000000000000010 RBX: ffff88017bfd7cd8 RCX: 000000017bfd7ce8 RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffff88017bfd7ce8 RBP: ffffffff810127ee R08: 00000032bbc00000 R09: 00000000ffffffdc R10: 0000000000000000 R11: ffff88017bfd7dd8 R12: 000000017bfd7cd8 R13: 0000000000000018 R14: 00000030a8800000 R15: 00000000ffffffdc FS: 0000000000000000(0000) GS:ffff880007420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f36fdf04270 CR3: 0000000001001000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff8102fdbb>] ? kvm_mmu_op+0x30/0x55 [<ffffffff8102fe66>] ? kvm_release_pt+0x3a/0x3c [<ffffffff8103a261>] ? ___pte_free_tlb+0x4c/0x5e [<ffffffff810f2067>] ? free_pgd_range+0x241/0x39c [<ffffffff810f226b>] ? free_pgtables+0xa9/0xce [<ffffffff810f4504>] ? exit_mmap+0xea/0x122 [<ffffffff81053dde>] ? mmput+0x5e/0xca [<ffffffff81058807>] ? exit_mm+0x105/0x111 [<ffffffff8105a2b7>] ? do_exit+0x212/0x6f9 [<ffffffff814579b3>] ? do_page_fault+0x270/0x2a0 [<ffffffff8105a81f>] ? do_group_exit+0x81/0xab [<ffffffff8105a860>] ? sys_exit_group+0x17/0x1b [<ffffffff81011cf2>] ? system_call_fastpath+0x16/0x1b BUG: soft lockup - CPU#3 stuck for 4097s! [munin-graph:6292] Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] CPU 3: Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Pid: 6292, comm: munin-graph Not tainted 2.6.32.7-37.fc12.x86_64 #1 RIP: 0010:[<ffffffff8122da65>] [<ffffffff8122da65>] copy_page_c+0x5/0x10 RSP: 0000:ffff8801685b5d70 EFLAGS: 00010286 RAX: ffff880000000000 RBX: ffff8801685b5d88 RCX: 0000000000000044 RDX: 6db6db6db6db6db7 RSI: ffff8801549fede0 RDI: ffff880102928de0 RBP: ffffffff810127ee R08: 0000000000000000 R09: 0000000000000002 R10: 0000000000000511 R11: 00000000000048c1 R12: ffff8801685b5d68 R13: 0000000000001896 R14: ffff880198c04240 R15: ffff8801685b5d68 FS: 00007f36ff16e700(0000) GS:ffff880007460000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000201a118 CR3: 000000015cb75000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff810ede74>] ? copy_user_highpage+0x6b/0x75 [<ffffffff810ee95b>] ? do_wp_page+0x3eb/0x682 [<ffffffff810f059d>] ? handle_mm_fault+0x758/0x7bd [<ffffffff81074c76>] ? remove_wait_queue+0x35/0x3f [<ffffffff814579cb>] ? do_page_fault+0x288/0x2a0 [<ffffffff814558b5>] ? page_fault+0x25/0x30 BUG: soft lockup - CPU#0 stuck for 4096s! [munin-graph:6297] Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] CPU 0: Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Pid: 6297, comm: munin-graph Not tainted 2.6.32.7-37.fc12.x86_64 #1 RIP: 0010:[<ffffffff8102fdc7>] [<ffffffff8102fdc7>] kvm_mmu_op+0x3c/0x55 RSP: 0000:ffff88015dea5d78 EFLAGS: 00000293 RAX: 0000000000000018 RBX: ffff88015dea5da8 RCX: 000000015dea5de8 RDX: 0000000000000000 RSI: 0000000000000018 RDI: ffff88015dea5de8 RBP: ffffffff810127ee R08: ffffea0003ad35a0 R09: 00000000000000d0 R10: 0000000000000511 R11: 0000000000004843 R12: 000000000000000a R13: 00000000000280da R14: 0000000000000002 R15: ffff88015dea5ce8 FS: 00007f36ff16e700(0000) GS:ffff880007400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002a81000 CR3: 00000001562ca000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff8102fdbb>] ? kvm_mmu_op+0x30/0x55 [<ffffffff8102feb9>] ? kvm_deferred_mmu_op+0x41/0x96 [<ffffffff8102ff6d>] ? kvm_mmu_write+0x33/0x3a [<ffffffff810dea05>] ? lru_cache_add_lru+0x27/0x29 [<ffffffff8102fffb>] ? kvm_set_pte_at+0x1e/0x20 [<ffffffff810f0136>] ? handle_mm_fault+0x2f1/0x7bd [<ffffffff810f5a4b>] ? vma_merge+0x1d1/0x30d [<ffffffff814579cb>] ? do_page_fault+0x288/0x2a0 [<ffffffff814558b5>] ? page_fault+0x25/0x30 BUG: soft lockup - CPU#3 stuck for 850s! [munin-graph:8188] Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] CPU 3: Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Pid: 8188, comm: munin-graph Not tainted 2.6.32.7-37.fc12.x86_64 #1 RIP: 0010:[<ffffffff8103abbc>] [<ffffffff8103abbc>] flush_tlb_page+0x3a/0x84 RSP: 0000:ffff88019570dd68 EFLAGS: 00000282 RAX: ffff88004fdfc620 RBX: ffff88019570dd88 RCX: 00003ffffffff000 RDX: ffff88017d2f6668 RSI: 0000000000ccdc18 RDI: 0000000000ccdc18 RBP: ffffffff810127ee R08: ffffea0004834da8 R09: 00000000000000d0 R10: 0000000000000511 R11: 0000000000001faa R12: ffffc90000c11058 R13: 01ff88019570dd08 R14: ffffea0004834da8 R15: 000000108102fdc7 FS: 00007fb34aab4700(0000) GS:ffff880007460000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000ccdc18 CR3: 0000000170866000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff810eeab4>] ? do_wp_page+0x544/0x682 [<ffffffff81046fad>] ? __enqueue_entity+0x7b/0x7d [<ffffffff8104720a>] ? enqueue_entity+0x25b/0x268 [<ffffffff81046343>] ? check_preempt_wakeup+0x1d4/0x270 [<ffffffff8104814c>] ? enqueue_task_fair+0x68/0x6d [<ffffffff810f059d>] ? handle_mm_fault+0x758/0x7bd [<ffffffff81051f8e>] ? wake_up_new_task+0xd0/0xdb [<ffffffff810559f7>] ? do_fork+0x277/0x33b [<ffffffff814579cb>] ? do_page_fault+0x288/0x2a0 [<ffffffff814558b5>] ? page_fault+0x25/0x30 BUG: soft lockup - CPU#0 stuck for 850s! [munin-graph:8365] Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] CPU 0: Modules linked in: tun sunrpc ipt_LOG iptable_nat nf_nat ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1370 gameport snd_rawmidi snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc i2c_piix4 i2c_core virtio_balloon virtio_net virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Pid: 8365, comm: munin-graph Not tainted 2.6.32.7-37.fc12.x86_64 #1 RIP: 0033:[<0000003d2b0092ba>] [<0000003d2b0092ba>] 0x3d2b0092ba RSP: 002b:00007fff097ec7b0 EFLAGS: 00000202 RAX: 000000360a0001c8 RBX: 0000000000000006 RCX: 0000000000000008 RDX: b081940420202900 RSI: 0000000000eb3a90 RDI: 0000003d30c0352b RBP: ffffffff810127ee R08: 0000000002e710e5 R09: 0000000000000029 R10: 00007fff097ec820 R11: 00007fff097ec820 R12: 0000000000000000 R13: 0000000001770540 R14: 0000000000000000 R15: 0000000000000005 FS: 00007fb34aab4700(0000) GS:ffff880007400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000003d3041a820 CR3: 0000000196e6d000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: Also, I am not using ksm/ksmtuned at all here. They are disabled on boot. ok. I looked at munin graphs for the machine and happened to note that in any cases where it became unresponsive it was spiking way way up in irq's. Running powertop on the guest was showing quite a lot of int's. I noticed further than another guest on an almost identical host wasn't seeing any issues, and was NOT using the virtio net driver. I have switched this guest to use e1000 instead of virtio-net now to see if the problem goes away. Powertop shows a LOT less irq's after switching to e1000. Could this be a virtio-net issue with SMP and/or large memory guests? (In reply to comment #2) > ok. I looked at munin graphs for the machine and happened to note that in any > cases where it became unresponsive it was spiking way way up in irq's. Kevin, maybe save a copy of /proc/interrupts to see if it's virtio-net's interrupt or some other interrupt that is spiking could help diagnose this further? Created attachment 394175 [details]
munin day irq graph
Here's the munin irq graph.
You can see the spikes in local timer and rescheduleing irqs.
Created attachment 394176 [details]
powertop output
Here's powertop from the console.
This is after I switched to e1000. It was fine for a bit, but now has jumped up again in irq's.
:(
Here's a diff of /proc/interrupts accross 10 seconds: 3c3 < 1: 2377 44 150 343 IO-APIC-edge i8042 --- > 1: 2394 44 150 343 IO-APIC-edge i8042 7c7 < 10: 2902824 1259 329190 1250 IO-APIC-fasteoi virtio1, eth0 --- > 10: 2906121 1259 329190 1250 IO-APIC-fasteoi virtio1, eth0 13c13 < LOC: 14804388 13540058 21021734 27221744 Local timer interrupts --- > LOC: 14805573 13540115 21021796 27231762 Local timer interrupts 17c17 < RES: 1813155 9907702 4239122 1239847 Rescheduling interrupts --- > RES: 1813308 9914649 4243742 1240481 Rescheduling interrupts 19c19 < TLB: 13719 19106 16422 14785 TLB shootdowns --- > TLB: 13721 19106 16422 14787 TLB shootdowns During a heavy IRQ period I connected gdb and got the following: 0x0000003617ad73e3 in select () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) Missing separate debuginfos, use: debuginfo-install libXi-1.3-2.fc12.x86_64 pciutils-libs-3.1.6-1.fc12.x86_64 (gdb) info threads 24 Thread 0x7f5a6a58e710 (LWP 21655) 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 23 Thread 0x7f5a69b8d710 (LWP 21656) 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 22 Thread 0x7f5a6918c710 (LWP 21657) 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 21 Thread 0x7f5a5bfff710 (LWP 21658) 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 20 Thread 0x7f58d17f7710 (LWP 14742) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 19 Thread 0x7f58d3ffb710 (LWP 14743) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 18 Thread 0x7f5a59bde710 (LWP 14744) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 17 Thread 0x7f58d35fa710 (LWP 14746) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 16 Thread 0x7f58d0df6710 (LWP 14748) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 15 Thread 0x7f58ceff3710 (LWP 14750) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 14 Thread 0x7f58cc7ef710 (LWP 14913) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 13 Thread 0x7f58cbdee710 (LWP 14914) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 12 Thread 0x7f58cb3ed710 (LWP 14915) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 11 Thread 0x7f5a591dd710 (LWP 14940) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 10 Thread 0x7f58d2bf9710 (LWP 14941) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 9 Thread 0x7f58d21f8710 (LWP 14942) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 8 Thread 0x7f58d03f5710 (LWP 14943) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 7 Thread 0x7f58cf9f4710 (LWP 14944) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 6 Thread 0x7f58ce5f2710 (LWP 14945) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 5 Thread 0x7f58cdbf1710 (LWP 14946) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 4 Thread 0x7f58cd1f0710 (LWP 14957) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 3 Thread 0x7f58ca9ec710 (LWP 14958) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 2 Thread 0x7f58c9feb710 (LWP 15875) pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 * 1 Thread 0x7f5a6a7ae780 (LWP 21654) 0x0000003617ad73e3 in select () at ../sysdeps/unix/syscall-template.S:82 (gdb) thread 1 [Switching to thread 1 (Thread 0x7f5a6a7ae780 (LWP 21654))]#0 0x0000003617ad73e3 in select () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x0000003617ad73e3 in select () at ../sysdeps/unix/syscall-template.S:82 #1 0x000000000040a633 in main_loop_wait (timeout=<value optimized out>) at /usr/src/debug/qemu-kvm-0.11.0/vl.c:4181 #2 0x00000000004231aa in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:2079 #3 0x000000000040f157 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.11.0/vl.c:4393 #4 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /usr/src/debug/qemu-kvm-0.11.0/vl.c:6263 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f58c9feb710 (LWP 15875))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 3 [Switching to thread 3 (Thread 0x7f58ca9ec710 (LWP 14958))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 4 [Switching to thread 4 (Thread 0x7f58cd1f0710 (LWP 14957))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 5 [Switching to thread 5 (Thread 0x7f58cdbf1710 (LWP 14946))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 6 [Switching to thread 6 (Thread 0x7f58ce5f2710 (LWP 14945))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 7 [Switching to thread 7 (Thread 0x7f58cf9f4710 (LWP 14944))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 8 [Switching to thread 8 (Thread 0x7f58d03f5710 (LWP 14943))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 9 [Switching to thread 9 (Thread 0x7f58d21f8710 (LWP 14942))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 10 [Switching to thread 10 (Thread 0x7f58d2bf9710 (LWP 14941))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 11 [Switching to thread 11 (Thread 0x7f5a591dd710 (LWP 14940))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 12 [Switching to thread 12 (Thread 0x7f58cb3ed710 (LWP 14915))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 13 [Switching to thread 13 (Thread 0x7f58cbdee710 (LWP 14914))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 14 [Switching to thread 14 (Thread 0x7f58cc7ef710 (LWP 14913))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 15 [Switching to thread 15 (Thread 0x7f58ceff3710 (LWP 14750))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 16 [Switching to thread 16 (Thread 0x7f58d0df6710 (LWP 14748))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 17 [Switching to thread 17 (Thread 0x7f58d35fa710 (LWP 14746))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 18 [Switching to thread 18 (Thread 0x7f5a59bde710 (LWP 14744))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 19 [Switching to thread 19 (Thread 0x7f58d3ffb710 (LWP 14743))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 20 [Switching to thread 20 (Thread 0x7f58d17f7710 (LWP 14742))]#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 211 62: movq %rax, %r14 (gdb) bt #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0000000000477ed7 in cond_timedwait (unused=<value optimized out>) at posix-aio-compat.c:68 #2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:301 #3 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #4 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x0000000000000000 in ?? () (gdb) thread 21 [Switching to thread 21 (Thread 0x7f5a5bfff710 (LWP 21658))]#0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000423de6 in kvm_run (vcpu=<value optimized out>, env=0x17f68b0) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:967 #2 0x0000000000424273 in kvm_cpu_exec (env=0xae80) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1686 #3 0x0000000000425856 in kvm_main_loop_cpu (_env=0x17f68b0) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1868 #4 ap_main_loop (_env=0x17f68b0) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1905 #5 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #6 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () (gdb) thread 22 [Switching to thread 22 (Thread 0x7f5a6918c710 (LWP 21657))]#0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000423de6 in kvm_run (vcpu=<value optimized out>, env=0x17e8c60) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:967 #2 0x0000000000424273 in kvm_cpu_exec (env=0xae80) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1686 #3 0x0000000000425856 in kvm_main_loop_cpu (_env=0x17e8c60) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1868 #4 ap_main_loop (_env=0x17e8c60) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1905 #5 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #6 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () (gdb) thread 23 [Switching to thread 23 (Thread 0x7f5a69b8d710 (LWP 21656))]#0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000423de6 in kvm_run (vcpu=<value optimized out>, env=0x17db010) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:967 #2 0x0000000000424273 in kvm_cpu_exec (env=0xae80) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1686 #3 0x0000000000425856 in kvm_main_loop_cpu (_env=0x17db010) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1868 #4 ap_main_loop (_env=0x17db010) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1905 #5 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #6 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () (gdb) thread 24 [Switching to thread 24 (Thread 0x7f5a6a58e710 (LWP 21655))]#0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) bt #0 0x0000003617ad6937 in ioctl () at ../sysdeps/unix/syscall-template.S:82 #1 0x0000000000423de6 in kvm_run (vcpu=<value optimized out>, env=0x17c17f0) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:967 #2 0x0000000000424273 in kvm_cpu_exec (env=0xae80) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1686 #3 0x0000000000425856 in kvm_main_loop_cpu (_env=0x17c17f0) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1868 #4 ap_main_loop (_env=0x17c17f0) at /usr/src/debug/qemu-kvm-0.11.0/qemu-kvm.c:1905 #5 0x0000003618606a3a in start_thread (arg=<value optimized out>) at pthread_create.c:297 #6 0x0000003617ade67d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #7 0x0000000000000000 in ?? () Created attachment 394236 [details]
sysrq-t output from the guest when in the "bad" state.
I got a sysrq-t dump from the guest this afternoon when it's in the weird state. The e1000 driver doesn't seem to have changed anything.
I am going to try the latest 2.6.31.12 kernels on both host and guest now.
Kevin, can you please try qemu-kvm from the virt-preview repository? It has qemu-kvm-0.12, which is the latest stable release. From the bug, it looks like it's a problem with the iothread not exiting. Just an update: I rebooted both the host/guest on: 2.6.31.12-174.2.19.fc12.x86_64 And the problem showed up in just a few minutes. I then rebooted the guest with just one cpu, and it's been ok since then. ;( So, perhaps this is an issue with 4 cpu guests? Anyhow, I will try the latest qemu as suggested later tonight and change it back to 4cpu. ok. I updated qemu/libvirt from the virt-prevew repo, rebooted and brought the guest up again with 4 cpus. so far it's been up about 18hours and not a single issue. Something could show up in a few days, but it's looking promising. Great; please let us know if you hit the problem again. Sadly, it just happened again. ;( Just took about a week or so to happen... /var/log/messages has: Feb 22 19:31:47 dworkin kernel: BUG: soft lockup - CPU#2 stuck for 61s! [munin-graph:8058] Feb 22 19:31:47 dworkin kernel: Modules linked in: cpufreq_stats freq_table tun sunrpc ipv6 ipt_LOG iptable_nat nf_nat ip6_tables dm_multipath uinput i2c_piix4 e1000 virtio_balloon i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Feb 22 19:31:47 dworkin kernel: CPU 2: Feb 22 19:31:47 dworkin kernel: Modules linked in: cpufreq_stats freq_table tun sunrpc ipv6 ipt_LOG iptable_nat nf_nat ip6_tables dm_multipath uinput i2c_piix4 e1000 virtio_balloon i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Feb 22 19:31:47 dworkin kernel: Pid: 8058, comm: munin-graph Not tainted 2.6.31.12-174.2.19.fc12.x86_64 #1 Bochs Feb 22 19:31:47 dworkin kernel: RIP: 0010:[<ffffffff8102de03>] [<ffffffff8102de03>] kvm_mmu_op+0x3c/0x55 Feb 22 19:31:47 dworkin kernel: RSP: 0018:ffff880039823c68 EFLAGS: 00000293 Feb 22 19:31:47 dworkin kernel: RAX: 0000000000000018 RBX: ffff880039823c98 RCX: 0000000039823cd8 Feb 22 19:31:47 dworkin kernel: RDX: 0000000000000000 RSI: 0000000000000018 RDI: ffff880039823cd8 Feb 22 19:31:47 dworkin kernel: RBP: ffffffff810128ae R08: 0000003d2d400000 R09: 00000000ffffffdc Feb 22 19:31:47 dworkin kernel: R10: ffff880039823cc8 R11: ffff88000000fe00 R12: 0000000000000000 Feb 22 19:31:47 dworkin kernel: R13: 0000000000000002 R14: ffff88002807c938 R15: ffff880000012808 Feb 22 19:31:47 dworkin kernel: FS: 0000000000000000(0000) GS:ffff88002806e000(0000) knlGS:0000000000000000 Feb 22 19:31:47 dworkin kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 22 19:31:47 dworkin kernel: CR2: 00007f3050586270 CR3: 0000000001001000 CR4: 00000000000006e0 Feb 22 19:31:47 dworkin kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 22 19:31:47 dworkin kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 22 19:31:47 dworkin kernel: Call Trace: Feb 22 19:31:47 dworkin kernel: [<ffffffff8102ddf7>] ? kvm_mmu_op+0x30/0x55 Feb 22 19:31:47 dworkin kernel: [<ffffffff8102def5>] ? kvm_deferred_mmu_op+0x41/0x96 Feb 22 19:31:47 dworkin kernel: [<ffffffff8102dfa9>] ? kvm_mmu_write+0x33/0x3a Feb 22 19:31:47 dworkin kernel: [<ffffffff8102e017>] ? kvm_set_pmd+0x21/0x23 Feb 22 19:31:47 dworkin kernel: [<ffffffff810da83c>] ? free_pgd_range+0x218/0x39c Feb 22 19:31:47 dworkin kernel: [<ffffffff810daa69>] ? free_pgtables+0xa9/0xcc Feb 22 19:31:47 dworkin kernel: [<ffffffff810dc439>] ? exit_mmap+0xea/0x122 Feb 22 19:31:47 dworkin kernel: [<ffffffff8104f265>] ? mmput+0x58/0xaf Feb 22 19:31:47 dworkin kernel: [<ffffffff810532ba>] ? exit_mm+0x105/0x110 Feb 22 19:31:47 dworkin kernel: [<ffffffff81054d6e>] ? do_exit+0x1cf/0x6cb Feb 22 19:31:47 dworkin kernel: [<ffffffff8141f859>] ? do_page_fault+0x269/0x299 SRC=71.194.232.243 DST=10.1.1.1 LEN=40 TOS=0x00 PREC=0x20 TTL=52 ID=3987 DF PROTO=TCP SPT =45745 DPT=80 WINDOW=0 RES=0x00 RST URGP=0 Feb 22 19:31:39 dworkin kernel: IN=eth0 OUT= MAC=52:54:00:dc:79:3f:00:02:b3:3b:88:62:08:00 SRC=71.194.232.243 DST=10.1.1.1 LEN=40 TOS=0x00 PREC=0x20 TTL=52 ID=37765 DF PROTO=TCP SP T=45745 DPT=80 WINDOW=0 RES=0x00 RST URGP=0 Feb 22 19:31:42 dworkin kernel: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:21:9b:3c:90:36:08:00 SRC=10.1.1.2 DST=255.255.255.255 LEN=44 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=3 483 DPT=3483 LEN=24 Feb 22 19:31:47 dworkin kernel: BUG: soft lockup - CPU#2 stuck for 61s! [munin-graph:8058] Feb 22 19:31:47 dworkin kernel: Modules linked in: cpufreq_stats freq_table tun sunrpc ipv 6 ipt_LOG iptable_nat nf_nat ip6_tables dm_multipath uinput i2c_piix4 e1000 virtio_balloon i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Feb 22 19:31:47 dworkin kernel: CPU 2: Feb 22 19:31:47 dworkin kernel: Modules linked in: cpufreq_stats freq_table tun sunrpc ipv 6 ipt_LOG iptable_nat nf_nat ip6_tables dm_multipath uinput i2c_piix4 e1000 virtio_balloon i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Feb 22 19:31:47 dworkin kernel: Pid: 8058, comm: munin-graph Not tainted 2.6.31.12-174.2.1 9.fc12.x86_64 #1 Bochs Feb 22 19:31:47 dworkin kernel: RIP: 0010:[<ffffffff8102de03>] [<ffffffff8102de03>] kvm_m mu_op+0x3c/0x55 Feb 22 19:31:47 dworkin kernel: RSP: 0018:ffff880039823c68 EFLAGS: 00000293 Feb 22 19:31:47 dworkin kernel: RAX: 0000000000000018 RBX: ffff880039823c98 RCX: 000000003 9823cd8 Feb 22 19:31:47 dworkin kernel: RDX: 0000000000000000 RSI: 0000000000000018 RDI: ffff88003 9823cd8 Feb 22 19:31:47 dworkin kernel: RBP: ffffffff810128ae R08: 0000003d2d400000 R09: 00000000f fffffdc Feb 22 19:31:47 dworkin kernel: R10: ffff880039823cc8 R11: ffff88000000fe00 R12: 000000000 0000000 Feb 22 19:31:47 dworkin kernel: R13: 0000000000000002 R14: ffff88002807c938 R15: ffff88000 0012808 Feb 22 19:31:47 dworkin kernel: FS: 0000000000000000(0000) GS:ffff88002806e000(0000) knlG S:0000000000000000 Feb 22 19:31:47 dworkin kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 22 19:31:47 dworkin kernel: CR2: 00007f3050586270 CR3: 0000000001001000 CR4: 000000000 00006e0 Feb 22 19:31:47 dworkin kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 000000000 0000000 Feb 22 19:31:47 dworkin kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 000000000 0000400 Feb 22 19:31:47 dworkin kernel: Call Trace: Feb 22 19:31:47 dworkin kernel: [<ffffffff8102ddf7>] ? kvm_mmu_op+0x30/0x55 Feb 22 19:31:47 dworkin kernel: [<ffffffff8102def5>] ? kvm_deferred_mmu_op+0x41/0x96 Feb 22 19:31:47 dworkin kernel: [<ffffffff8102dfa9>] ? kvm_mmu_write+0x33/0x3a Feb 22 19:31:47 dworkin kernel: [<ffffffff8102e017>] ? kvm_set_pmd+0x21/0x23 Feb 22 19:31:47 dworkin kernel: [<ffffffff810da83c>] ? free_pgd_range+0x218/0x39c Feb 22 19:31:47 dworkin kernel: [<ffffffff810daa69>] ? free_pgtables+0xa9/0xcc Feb 22 19:31:47 dworkin kernel: [<ffffffff810dc439>] ? exit_mmap+0xea/0x122 Feb 22 19:31:47 dworkin kernel: [<ffffffff8104f265>] ? mmput+0x58/0xaf Feb 22 19:31:47 dworkin kernel: [<ffffffff810532ba>] ? exit_mm+0x105/0x110 Feb 22 19:31:47 dworkin kernel: [<ffffffff81054d6e>] ? do_exit+0x1cf/0x6cb Feb 22 19:31:47 dworkin kernel: [<ffffffff8141f859>] ? do_page_fault+0x269/0x299 Feb 22 19:31:47 dworkin kernel: [<ffffffff810552ed>] ? do_group_exit+0x83/0xb3 Feb 22 19:31:47 dworkin kernel: [<ffffffff81055334>] ? sys_exit_group+0x17/0x1b Feb 22 19:31:47 dworkin kernel: [<ffffffff81011d32>] ? system_call_fastpath+0x16/0x1b Feb 22 19:36:28 dworkin kernel: BUG: soft lockup - CPU#0 stuck for 61s! [munin-graph:8761] Feb 22 19:36:28 dworkin kernel: Modules linked in: cpufreq_stats freq_table tun sunrpc ipv6 ipt_LOG iptable_nat nf_nat ip6_tables dm_multipath uinput i2c_piix4 e1000 virtio_balloon i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Feb 22 19:36:28 dworkin kernel: CPU 0: Feb 22 19:36:28 dworkin kernel: Modules linked in: cpufreq_stats freq_table tun sunrpc ipv6 ipt_LOG iptable_nat nf_nat ip6_tables dm_multipath uinput i2c_piix4 e1000 virtio_balloon i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: speedstep_lib] Feb 22 19:36:28 dworkin kernel: Pid: 8761, comm: munin-graph Not tainted 2.6.31.12-174.2.19.fc12.x86_64 #1 Bochs Feb 22 19:36:28 dworkin kernel: RIP: 0010:[<ffffffff81078061>] [<ffffffff81078061>] smp_call_function_many+0x1ac/0x1c1 Feb 22 19:36:28 dworkin kernel: RSP: 0000:ffff88011b87db30 EFLAGS: 00000202 Feb 22 19:36:28 dworkin kernel: RAX: 00000000000008fc RBX: ffff88011b87db80 RCX: 0000000000000100 Feb 22 19:36:28 dworkin kernel: RDX: 0000000000000000 RSI: 00000000000000fc RDI: 0000000000000282 Feb 22 19:36:28 dworkin kernel: RBP: ffffffff810128ae R08: 0000000000000008 R09: 0000000000000001 Feb 22 19:36:28 dworkin kernel: R10: 000000000000000f R11: 0000000000000004 R12: ffff88011b87dae0 Feb 22 19:36:28 dworkin kernel: R13: 00000000000000fc R14: ffffffff816f9140 R15: 0000000000000000 Feb 22 19:36:28 dworkin kernel: FS: 00007fd8e3bac700(0000) GS:ffff880028034000(0000) knlGS:0000000000000000 Feb 22 19:36:28 dworkin kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 22 19:36:28 dworkin kernel: CR2: 000000000164dfd0 CR3: 000000012269f000 CR4: 00000000000006f0 Feb 22 19:36:28 dworkin kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 22 19:36:28 dworkin kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 22 19:36:28 dworkin kernel: Call Trace: Feb 22 19:36:28 dworkin kernel: [<ffffffff810c6c2c>] ? drain_local_pages+0x0/0x18 Feb 22 19:36:28 dworkin kernel: [<ffffffff81078098>] ? smp_call_function+0x22/0x26 Feb 22 19:36:28 dworkin kernel: [<ffffffff810570d3>] ? on_each_cpu+0x1d/0x3e Feb 22 19:36:28 dworkin kernel: [<ffffffff810c5f8f>] ? drain_all_pages+0x1c/0x1e Feb 22 19:36:28 dworkin kernel: [<ffffffff810c774b>] ? __alloc_pages_nodemask+0x3ff/0x5b0 Feb 22 19:36:28 dworkin kernel: [<ffffffff810eb500>] ? alloc_pages_current+0x95/0x9e Feb 22 19:36:28 dworkin kernel: [<ffffffff810f2978>] ? alloc_slab_page+0x1b/0x28 Feb 22 19:36:28 dworkin kernel: [<ffffffff810f29d9>] ? new_slab+0x54/0x1d3 Feb 22 19:36:28 dworkin kernel: [<ffffffff8102def5>] ? kvm_deferred_mmu_op+0x41/0x96 Feb 22 19:36:28 dworkin kernel: [<ffffffff810f2ffe>] ? __slab_alloc+0x1f1/0x3a1 Feb 22 19:36:28 dworkin kernel: [<ffffffff8104fa10>] ? copy_process+0x9a/0x12ac Feb 22 19:36:28 dworkin kernel: [<ffffffff810f36f9>] ? kmem_cache_alloc+0x99/0x11c Feb 22 19:36:28 dworkin kernel: [<ffffffff8104fa10>] ? copy_process+0x9a/0x12ac Feb 22 19:36:28 dworkin kernel: [<ffffffff8104fa10>] ? copy_process+0x9a/0x12ac Feb 22 19:36:28 dworkin kernel: [<ffffffff81054390>] ? wait_consider_task+0x62c/0x9eb Feb 22 19:36:28 dworkin kernel: [<ffffffff81050d86>] ? do_fork+0x164/0x344 Feb 22 19:36:28 dworkin kernel: [<ffffffff812017c9>] ? __up_read+0x76/0x81 Feb 22 19:36:28 dworkin kernel: [<ffffffff8106af82>] ? up_read+0xe/0x10 Feb 22 19:36:28 dworkin kernel: [<ffffffff8141f859>] ? do_page_fault+0x269/0x299 Feb 22 19:36:28 dworkin kernel: [<ffffffff81010478>] ? sys_clone+0x28/0x2a Feb 22 19:36:28 dworkin kernel: [<ffffffff810120d3>] ? stub_clone+0x13/0x20 Feb 22 19:36:28 dworkin kernel: [<ffffffff81011d32>] ? system_call_fastpath+0x16/0x1b This time I had another virtual on the same host, and it too was spinning and taking up 100% cpu, so it seems when this happens all guests on the host. ;( Kevin, Can you please install and test the qemu packages from http://koji.fedoraproject.org/koji/taskinfo?taskID=2009702 The backtraces indicate a potential kvm paravirt mmu issue, and these packages the feature. The kernel you are using for both host and guest contain bugfixes for all known paravirt mmu issues. Thanks. Well, I grabbed that qemu build tonight and downgraded to it. Then, I tried just booting a centos5 guest. It fails to boot in virtio_blk.ko cpu_idle -> default_idle -> mtrr_local -> common_inturrupt -> do_IRQ -> __do_IRQ -> __do_IRQ -> handle_IRQ_event -> acpi_irq -> note_inturrupt -> __report_bad_irq Then, tried booting a fedora 12 instance from a live iso. It boots, shows the automatic boot in 10 seconds screen, then hangs. I can try my main instance, but I am a bit leary about it working. I shouldn't need to reboot or restart libvirt should I? Just had another crash of the guest. This time I caught a panic: Mar 2 09:25:28 dworkin kernel: Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff880037272a80 with usage 925311744 Mar 2 09:25:28 dworkin kernel: Mar 2 09:25:28 dworkin kernel: Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.31.12-174.2.19.fc12.x86_64 #1 Mar 2 09:25:28 dworkin kernel: Call Trace: Mar 2 09:25:28 dworkin kernel: <IRQ> [<ffffffff8141b0f0>] panic+0x7a/0x12c Mar 2 09:25:28 dworkin kernel: [<ffffffff8106cbf0>] put_cred_rcu+0x2e/0x89 Mar 2 09:25:28 dworkin kernel: [<ffffffff8109d6ec>] __rcu_process_callbacks+0x189/0x240 Mar 2 09:25:28 dworkin kernel: [<ffffffff8109d7ce>] rcu_process_callbacks+0x2b/0x4f Mar 2 09:25:28 dworkin kernel: [<ffffffff81057630>] __do_softirq+0xdd/0x1ad Mar 2 09:25:28 dworkin kernel: [<ffffffff81012eec>] call_softirq+0x1c/0x30 Mar 2 09:25:28 dworkin kernel: <EOI> [<ffffffff8101443b>] do_softirq+0x47/0x8d Mar 2 09:25:28 dworkin kernel: [<ffffffff81057035>] ksoftirqd+0x68/0xe9 Mar 2 09:25:28 dworkin kernel: [<ffffffff81056fcd>] ? ksoftirqd+0x0/0xe9 Mar 2 09:25:28 dworkin kernel: [<ffffffff810677b5>] kthread+0x91/0x99 Mar 2 09:25:28 dworkin kernel: [<ffffffff81012dea>] child_rip+0xa/0x20 I've restarted it after upgrading to qemu-0.12.3-3.fc12.x86_64 from the virt-preview repo. Let me know what next steps I can do here to help track this down. Created attachment 397396 [details]
disable KVM_CAP_PVMMU reporting
Kevin, Please run qemu-kvm manually with: LD_PRELOAD=nopvmmu.so.1 qemu-kvm parameters This will disable paravirt mmu. This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. I've been waiting for this issue to re-occur before restarting the guest, and of course so far it's not. ;( 17:08:04 up 14 days, 6:36, 5 users, load average: 0.44, 0.69, 0.68 Will update the bug as soon as/if this occurs again. ;( ok, happened again. I tried to restart it manually with the LD_PRELOAD, but it didn't seem to work. It didn't error or anything, just didn't bring up the vm at all. :( Does the nopvmmu.so.1 need to be used with a specific version of libvirt/qemu? let me know what you specifically want me to try next time and I will do so. qemu-kvm-0.12.3-6.fc12.x86_64 libvirt-0.7.6-1.fc12.x86_64 Current guest xml: <domain type='kvm' id='27'> <name>dworkin.scrye.com</name> <uuid>8ab65f5c-5228-ec88-3eaa-4895438ae4a2</uuid> <memory>6291456</memory> <currentMemory>6291456</currentMemory> <vcpu cpuset='0-3'>4</vcpu> <os> <type arch='x86_64' machine='pc-0.12'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <disk type='file' device='cdrom'> <target dev='hdc' bus='ide'/> <readonly/> <alias name='ide0-1-0'/> <address type='drive' controller='0' bus='1' unit='0'/> </disk> <disk type='file' device='disk'> <source file='/var/lib/libvirt/images/dworkin.scrye.com.img'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='ide' index='0'> <alias name='ide0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <interface type='bridge'> <mac address='52:54:00:dc:79:3f'/> <source bridge='eth0'/> <target dev='vnet0'/> <model type='e1000'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </interface> <interface type='bridge'> <mac address='52:54:00:dc:79:3f'/> <source bridge='eth0'/> <target dev='vnet1'/> <model type='e1000'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='5900' autoport='yes' keymap='en-us'/> <video> <model type='cirrus' vram='9216' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <watchdog model='i6300esb' action='reset'> <alias name='watchdog0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </watchdog> </devices> <seclabel type='dynamic' model='selinux'> <label>system_u:system_r:svirt_t:s0:c384,c392</label> <imagelabel>system_u:object_r:svirt_image_t:s0:c384,c392</imagelabel> </seclabel> </domain> Happy to provide any further info or try things. This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |