Bug 521521
Summary: | host hangs hard when starting a 3 CPU KVM RHEL 5.4 guest (nf_conntrack oops) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Rik van Riel <riel> |
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 11 | CC: | herbert.xu, itamar, kernel-maint, knoel, markmc, virt-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2009-09-11 19:05:57 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Rik van Riel
2009-09-06 17:22:18 UTC
Turns out that starting a second single CPU KVM guest will also cause the oops. This time netconsole blasted it to syslog on my other system. Due to syslog date header and hostname some lines have been wrapped early, but here it is: BUG: unable to handle kernel paging request at ffff8805c0e77800 IP: [<ffffffff8132e145>] __nf_conntrack_find+0x40/0xb3 PGD 202063 PUD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/virtual/net/vnet2/flags CPU 5 Modules linked in: wmi netconsole configfs cpufreq_stats fuse tun bnep sco l2cap bluetooth sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm_intel kvm uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer i2c_i801 snd iTCO_wdt iTCO_vendor_support tg3 soundcore ppdev serio_raw dcdbas parport_pc pcspkr parport snd_page_alloc radeon drm i2c_algo_bit i2c_core [last unloaded: wmi] Pid: 2967, comm: qemu-kvm Not tainted 2.6.29.6-217.2.16.fc11.x86_64 #1 Precision WorkStation T3500 RIP: 0010:[<ffffffff8132e145>] [<ffffffff8132e145>] __nf_conntrack_find+0x40/0xb3 RSP: 0018:ffff88033c8e7b08 EFLAGS: 00010286 RAX: ffff8805c0e77800 RBX: ffffffff818c4320 RCX: 000000006963cd8b RDX: 000000000000d2c7 RSI: 00000000ff8132e3 RDI: 0000000000000246 RBP: ffff88033c8e7b28 R08: 00000000a40de7b0 R09: 00000000438e32f2 R10: 00000000b9c93e63 R11: 000000004a116f40 R12: ffff88033c8e7ba8 R13: 0000000050c2ef00 R14: ffffffff81610d70 R15: ffffffff816103a0 FS: 00007fb31bd29740(0000) GS:ffff88033cfd0580(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8805c0e77800 CR3: 000000032a0e3000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-kvm (pid: 2967, threadinfo ffff88032702e000, task ffff88032a384500) Stack: ffffffff818c4320 ffff8800934b8200 ffff88033c8e7ba8 ffffffff81610d70 ffff88033c8e7b38 ffffffff8132e1c6 ffff88033c8e7c18 ffffffff8132e826 ffffffff81610d70 ffffffff816103a0 ffffffffa0257f10 ffffffffa0257e31 Call Trace: <IRQ> [<ffffffff8132e1c6>] nf_conntrack_find_get+0xe/0x51 [<ffffffff8132e826>] nf_conntrack_in+0x1e6/0x545 [<ffffffffa0257f10>] ? br_forward+0x1e/0x2a [bridge] [<ffffffffa0257e31>] ? br_forward_finish+0x0/0x3c [bridge] [<ffffffff8132c1f9>] ? nf_hook_slow+0x6a/0xcb [<ffffffffa0257e31>] ? br_forward_finish+0x0/0x3c [bridge] [<ffffffff8136f2bd>] ipv4_conntrack_in+0x21/0x23 [<ffffffff8132c14c>] nf_iterate+0x46/0x89 [<ffffffffa025c768>] ? br_nf_pre_routing_finish+0x0/0x25f [bridge] [<ffffffff8132c1f9>] nf_hook_slow+0x6a/0xcb [<ffffffffa025c768>] ? br_nf_pre_routing_finish+0x0/0x25f [bridge] [<ffffffffa025c766>] nf_hook_thresh.clone.0+0x3b/0x3d [bridge] [<ffffffffa025cf9b>] br_nf_pre_routing+0x523/0x551 [bridge] [<ffffffff8132c14c>] nf_iterate+0x46/0x89 [<ffffffffa025894a>] ? br_handle_frame_finish+0x0/0x13c [bridge] [<ffffffff8132c1f9>] nf_hook_slow+0x6a/0xcb [<ffffffffa025894a>] ? br_handle_frame_finish+0x0/0x13c [bridge] [<ffffffff810402dd>] ? update_shares+0x1e/0x4e [<ffffffffa0258948>] nf_hook_thresh.clone.0+0x43/0x45 [bridge] [<ffffffffa0258bf6>] br_handle_frame+0x170/0x196 [bridge] [<ffffffff81310707>] netif_receive_skb+0x2f6/0x3e8 [<ffffffff81310898>] process_backlog+0x9f/0xdd [<ffffffff8130ecc9>] net_rx_action+0xb7/0x1b1 [<ffffffff8104df87>] __do_softirq+0x94/0x155 [<ffffffff8101274c>] call_softirq+0x1c/0x30 <EOI> [<ffffffff810138ce>] do_softirq+0x52/0xb9 [<ffffffff81310d9f>] netif_rx_ni+0x26/0x2b [<ffffffffa01ef4cd>] tun_chr_aio_write+0x3e5/0x434 [tun] [<ffffffff8104d02b>] ? current_fs_time+0x27/0x2e [<ffffffffa01ef0e8>] ? tun_chr_aio_write+0x0/0x434 [tun] [<ffffffff810d4ee7>] do_sync_readv_writev+0xe5/0x124 [<ffffffff8105c91b>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811849c3>] ? selinux_file_permission+0x58/0x5d [<ffffffff8117f1f9>] ? security_file_permission+0x16/0x18 [<ffffffff810d560a>] do_readv_writev+0xa7/0x127 [<ffffffff8105f762>] ? __hrtimer_start_range_ns+0x226/0x238 [<ffffffff8105b779>] ? unlock_timer+0x12/0x14 [<ffffffff8105bd20>] ? sys_timer_settime+0x258/0x2a6 [<ffffffff810d56cd>] vfs_writev+0x43/0x4e [<ffffffff810d5722>] sys_writev+0x4a/0x93 [<ffffffff8101133a>] system_call_fastpath+0x16/0x1b Code: 00 49 89 f4 8b 35 d4 1e 2e 00 48 89 fb 4c 89 e7 e8 17 ef ff ff 41 89 c5 e8 e1 fb d1 ff 44 89 e8 48 c1 e0 03 48 03 83 80 05 00 00 c> 8b 28 eb 32 65 8b 14 25 24 00 last message repeated 2 times 48 f7 d0 89 d2 48 8b 04 RIP [<ffffffff8132e145>] __nf_conntrack_find+0x40/0xb3 RSP <ffff88033c8e7b08> CR2: ffff8805c0e77800 ---[ end trace 65501dff3cc891db ]--- Kernel panic - not syncing: Fatal exception in interrupt ------------[ cut here ]------------ WARNING: at kernel/smp.c:329 smp_call_function_many+0x45/0x1f6() (Tainted: G D ) Hardware name: Precision WorkStation T3500 Modules linked in: wmi netconsole configfs cpufreq_stats fuse tun bnep sco l2cap bluetooth sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm_intel kvm uinput snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer i2c_i801 snd iTCO_wdt iTCO_vendor_support tg3 soundcore ppdev serio_raw dcdbas parport_pc pcspkr parport snd_page_alloc radeon drm i2c_algo_bit i2c_core [last unloaded: wmi] Pid: 2967, comm: qemu-kvm Tainted: G D 2.6.29.6-217.2.16.fc11.x86_64 #1 Call Trace: <IRQ> [<ffffffff8104884b>] warn_slowpath+0xbc/0xf0 [<ffffffff81029f87>] ? default_spin_lock_flags+0x9/0xe [<ffffffff813ac89c>] ? _spin_unlock_irqrestore+0x2c/0x42 [<ffffffff81048e20>] ? release_console_sem+0x1c1/0x1f6 [<ffffffff81048e20>] ? release_console_sem+0x1c1/0x1f6 [<ffffffff8106b2d7>] smp_call_function_many+0x45/0x1f6 [<ffffffff81048e20>] ? release_console_sem+0x1c1/0x1f6 [<ffffffff8106b4aa>] smp_call_function+0x22/0x26 [<ffffffff81020e89>] native_smp_send_stop+0x27/0x6f [<ffffffff813aa1c1>] panic+0x89/0x134 [<ffffffff813ad841>] oops_end+0xb7/0xc7 [<ffffffff813af3f4>] do_page_fault+0x934/0x9e9 [<ffffffffa0257de6>] ? br_dev_queue_push_xmit+0x82/0x88 [bridge] [<ffffffffa025c357>] ? br_nf_dev_queue_xmit+0x47/0x49 [bridge] [<ffffffffa025d594>] ? br_nf_post_routing+0x1ac/0x1c4 [bridge] [<ffffffff8132c14c>] ? nf_iterate+0x46/0x89 [<ffffffffa0257d64>] ? br_dev_queue_push_xmit+0x0/0x88 [bridge] [<ffffffff813ac50d>] ? trace_hardirqs_off_thunk+0x3a/0x6c [<ffffffff813acbd5>] page_fault+0x25/0x30 [<ffffffff8132e145>] ? __nf_conntrack_find+0x40/0xb3 [<ffffffff8132e137>] ? __nf_conntrack_find+0x32/0xb3 [<ffffffff8132e1c6>] nf_conntrack_find_get+0xe/0x51 [<ffffffff8132e826>] nf_conntrack_in+0x1e6/0x545 [<ffffffffa0257f10>] ? br_forward+0x1e/0x2a [bridge] [<ffffffffa0257e31>] ? br_forward_finish+0x0/0x3c [bridge] [<ffffffff8132c1f9>] ? nf_hook_slow+0x6a/0xcb [<ffffffffa0257e31>] ? br_forward_finish+0x0/0x3c [bridge] [<ffffffff8136f2bd>] ipv4_conntrack_in+0x21/0x23 [<ffffffff8132c14c>] nf_iterate+0x46/0x89 [<ffffffffa025c768>] ? br_nf_pre_routing_finish+0x0/0x25f [bridge] [<ffffffff8132c1f9>] nf_hook_slow+0x6a/0xcb [<ffffffffa025c768>] ? br_nf_pre_routing_finish+0x0/0x25f [bridge] [<ffffffffa025c766>] nf_hook_thresh.clone.0+0x3b/0x3d [bridge] [<ffffffffa025cf9b>] br_nf_pre_routing+0x523/0x551 [bridge] [<ffffffff8132c14c>] nf_iterate+0x46/0x89 [<ffffffffa025894a>] ? br_handle_frame_finish+0x0/0x13c [bridge] [<ffffffff8132c1f9>] nf_hook_slow+0x6a/0xcb [<ffffffffa025894a>] ? br_handle_frame_finish+0x0/0x13c [bridge] [<ffffffff810402dd>] ? update_shares+0x1e/0x4e [<ffffffffa0258948>] nf_hook_thresh.clone.0+0x43/0x45 [bridge] [<ffffffffa0258bf6>] br_handle_frame+0x170/0x196 [bridge] [<ffffffff81310707>] netif_receive_skb+0x2f6/0x3e8 [<ffffffff81310898>] process_backlog+0x9f/0xdd [<ffffffff8130ecc9>] net_rx_action+0xb7/0x1b1 [<ffffffff8104df87>] __do_softirq+0x94/0x155 [<ffffffff8101274c>] call_softirq+0x1c/0x30 <EOI> [<ffffffff810138ce>] do_softirq+0x52/0xb9 [<ffffffff81310d9f>] netif_rx_ni+0x26/0x2b [<ffffffffa01ef4cd>] tun_chr_aio_write+0x3e5/0x434 [tun] [<ffffffff8104d02b>] ? current_fs_time+0x27/0x2e [<ffffffffa01ef0e8>] ? tun_chr_aio_write+0x0/0x434 [tun] [<ffffffff810d4ee7>] do_sync_readv_writev+0xe5/0x124 [<ffffffff8105c91b>] ? autoremove_wake_function+0x0/0x39 [<ffffffff811849c3>] ? selinux_file_permission+0x58/0x5d [<ffffffff8117f1f9>] ? security_file_permission+0x16/0x18 [<ffffffff810d560a>] do_readv_writev+0xa7/0x127 [<ffffffff8105f762>] ? __hrtimer_start_range_ns+0x226/0x238 [<ffffffff8105b779>] ? unlock_timer+0x12/0x14 [<ffffffff8105bd20>] ? sys_timer_settime+0x258/0x2a6 [<ffffffff810d56cd>] vfs_writev+0x43/0x4e [<ffffffff810d5722>] sys_writev+0x4a/0x93 [<ffffffff8101133a>] system_call_fastpath+0x16/0x1b ---[ end trace 65501dff3cc891dc ]--- After disabling both iptables and ip6tables (so conntrack does not get loaded), I managed to run two virtual machines simultaneously without a hang. Strange, haven't seen any other reports of this 2.6.30 is in F-11 updates now, can you still reproduce with that? 2.6.30 seems to fix this bug. I'll concentrate on the timer bug now :) |