Red Hat Bugzilla – Bug 670907
[RHEL6.1][Kernel] BUG: unable to handle kernel NULL pointer dereference, IP: [<ffffffff814115f0>] get_rps_cpu+0x290/0x340
Last modified: 2011-08-05 17:08:10 EDT
Description of problem: While running tests on a scratch kernel with a "Receive Packet Steering patch" the system Oops Version-Release number of selected component (if applicable): 2.6.32-100.el6Scratch (test kernel) How reproducible: Always Actual results: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready br0: port 1(eth0) entering learning state br0: port 1(eth0) entering forwarding state BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff814115f0>] get_rps_cpu+0x290/0x340 PGD 46bc3b067 PUD 46c7b8067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map CPU 1 Modules linked in: sit tunnel4 ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 xt_physdev iptable_filter ip_tables dm_mirror dm_region_hash dm_log kvm_intel kvm i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core sg igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mod [last unloaded: microcode] Modules linked in: sit tunnel4 ip6table_filter ip6_tables ebtable_nat ebtables xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 xt_physdev iptable_filter ip_tables dm_mirror dm_region_hash dm_log kvm_intel kvm i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma i7core_edac edac_core sg igb dca ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif ahci megaraid_sas dm_mod [last unloaded: microcode] Pid: 0, comm: swapper Not tainted 2.6.32-100.el6scratch.x86_64 #1 QSSC-S4R RIP: 0010:[<ffffffff814115f0>] [<ffffffff814115f0>] get_rps_cpu+0x290/0x340 RSP: 0018:ffff88048e403b80 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff88086bc80000 RDX: ffffffff81f02368 RSI: 0000000000000206 RDI: 0000000000000000 RBP: ffff88048e403bc0 R08: ffffffff81411a60 R09: ffff88048e403ba8 R10: 0000000000000000 R11: ffff88086a8c55f8 R12: ffff88086a8c55c0 R13: 0000000000000006 R14: ffff88086b7c2022 R15: ffff88086a8c55f8 FS: 0000000000000000(0000) GS:ffff88048e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000010 CR3: 000000046bd86000 CR4: 00000000000026e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880c6dde8000, task ffff88086de9e040) Stack: ffffffff81411a60 ffff88086bc80000 000000078e416f00 ffff88086a8c55c0 <0> ffff88086a8c55c0 ffff88086bc806c0 ffff88086b7c2022 ffff88086a8c55f8 <0> ffff88048e403be0 ffffffff81411a7d ffff88086b7c2022 ffff88086a8c55c0 Call Trace: <IRQ> [<ffffffff81411a60>] ? netif_receive_skb+0x0/0x50 [<ffffffff81411a7d>] netif_receive_skb+0x1d/0x50 [<ffffffffa02fe3c8>] br_handle_frame_finish+0x198/0x260 [bridge] [<ffffffffa02fe63a>] br_handle_frame+0x1aa/0x250 [bridge] [<ffffffff8140caa3>] __netif_receive_skb+0x1c3/0x670 [<ffffffff81411aa0>] netif_receive_skb+0x40/0x50 [<ffffffff81411ba8>] napi_skb_finish+0x58/0x70 [<ffffffff814120a9>] napi_gro_receive+0x39/0x50 [<ffffffffa01128c6>] igb_poll+0x8d6/0xf30 [igb] [<ffffffff8104f0ba>] ? enqueue_entity+0x13a/0x340 [<ffffffff810561ec>] ? try_to_wake_up+0xcc/0x400 [<ffffffff81412273>] net_rx_action+0x103/0x2a0 [<ffffffff8108d373>] ? hrtimer_get_next_event+0xc3/0x100 [<ffffffff8106bcb7>] __do_softirq+0xb7/0x1e0 [<ffffffff810d0dc0>] ? handle_IRQ_event+0x60/0x170 [<ffffffff8100c2cc>] call_softirq+0x1c/0x30 [<ffffffff8100df05>] do_softirq+0x65/0xa0 [<ffffffff8106bab5>] irq_exit+0x85/0x90 [<ffffffff814ce455>] do_IRQ+0x75/0xf0 [<ffffffff8100bad3>] ret_from_intr+0x0/0x11 <EOI> [<ffffffff812ad4ba>] ? intel_idle+0xda/0x160 [<ffffffff812ad49d>] ? intel_idle+0xbd/0x160 [<ffffffff813dc2a7>] cpuidle_idle_call+0xa7/0x140 [<ffffffff81009e96>] cpu_idle+0xb6/0x110 [<ffffffff814c01fc>] start_secondary+0x1fc/0x23f Code: 85 c0 48 8b 7d c8 0f 84 11 fe ff ff 83 bf f0 03 00 00 01 48 c7 c1 24 f5 7e 81 8b 9f a0 06 00 00 48 0f 44 cf 48 8b bf 30 04 00 00 <48> 8b 57 10 48 89 4d c0 48 89 55 c8 e8 9f 5c f1 ff 48 8b 4d c0 RIP [<ffffffff814115f0>] get_rps_cpu+0x290/0x340 RSP <ffff88048e403b80> CR2: 0000000000000010 Expected results: System should not Oops Additional info: This needs br0 interface to reproduce
So this is a wierd one, for some reason, when I add a bridge into the mix, we oops in get_rps_cpu. The bridge sends us through a path in get_rps_cpu that we other wise wouldn't which causes a printk to be triggered, which is where we seem to oops but the poniters that that printk interrogates are all valid, so I'm not at all certain whats going on here yet.
ok, it appears the netdev_warn macro is trying to dereference something in the netdev structure of the bridge interface that isn't set properly. If i remove the call everything works fine. I'll figure out what it is and get a patch together shortly.
note to self. its net_device->dev.parent thats null, and netdev_warn presumes that its always non-null. I'll need to patch that.
looks like upstream commit 08c801f8d45387a1b46066aad1789a9bb9c4b645 inadvertently fixed this when RPS was origionally introduced. 'll have a patch shortly.
Created attachment 475004 [details] fix for rps oops
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available on kernel-2.6.32-112.el6
this patch went in 104, released on 112. a followup patch was submitted and got into 115. (625487)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html