Hide Forgot
Description of problem: After run 2.6.32-71.18.2 kernel for 42+ hours on a squid server, we observed a kernel oops related t network. Version-Release number of selected component (if applicable): 2.6.32-71.18.2 on X86_64, we rebild this kernel from srpm and run it on RHEL5 installation. How reproducible: Not sure how to reproduce, just run squid workload on the server, after 42+ hours, we observed this oops from log. Didn't find similar oops on other machines. Steps to Reproduce: 1. run squid workload 2. around 42+ hours 3. only observe onece Actual results: So far it seems the system will working Expected results: Maybe the kernel oops should not be there. Additional info: Attaching the oops info, please note the kernel is running on RHEL5. [152286.466138] ------------[ cut here ]------------ [152286.472442] WARNING: at net/ipv4/tcp_input.c:2919 tcp_ack+0xc9b/0x168d() (Not tainted) [152286.482689] Hardware name: CS24-TY [152286.488837] Modules linked in: ext2 bonding ipv6 dm_mirror dm_multipath video output sbs sbshc power_meter hwmon acpi_pad parport sg igb serio_raw i7core_edac iTCO_wdt dcdbas iTCO_vendor_support ahci i2c_i801 edac_core ioatdma dca dm_region_hash dm_log dm_mod megaraid_sas shpchp mptsas mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd [152286.531076] Pid: 0, comm: swapper Not tainted 2.6.32-71.18.2.el5.x86_64 #1 [152286.541169] Call Trace: [152286.544388] <IRQ> [<ffffffff81405e32>] ? tcp_ack+0xc9b/0x168d [152286.552189] [<ffffffff8105b94b>] warn_slowpath_common+0x8d/0xa6 [152286.560651] [<ffffffff8105b97e>] warn_slowpath_null+0x1a/0x1c [152286.568640] [<ffffffff81405e32>] tcp_ack+0xc9b/0x168d [152286.575413] [<ffffffff81407aa0>] tcp_rcv_established+0xcd/0x566 [152286.583163] [<ffffffff8140df6b>] tcp_v4_do_rcv+0x196/0x352 [152286.591026] [<ffffffff8106260a>] ? local_bh_enable+0x12/0x14 [152286.599189] [<ffffffff8140f628>] tcp_v4_rcv+0x423/0x631 [152286.606234] [<ffffffff813f3831>] ? ip_local_deliver_finish+0x152/0x1fa [152286.614716] [<ffffffff813f3831>] ip_local_deliver_finish+0x152/0x1fa [152286.623599] [<ffffffff813f3c2c>] ip_local_deliver+0x72/0x7d [152286.630793] [<ffffffff813f365d>] ip_rcv_finish+0x371/0x38b [152286.638677] [<ffffffff8140ce7f>] ? tcp4_gro_receive+0x9b/0xa4 [152286.646744] [<ffffffff813f3b7b>] ip_rcv+0x2a2/0x2e1 [152286.653612] [<ffffffff813cb9c8>] netif_receive_skb+0x448/0x47f [152286.661247] [<ffffffff813cba9b>] napi_skb_finish+0x2b/0x43 [152286.668123] [<ffffffff813cbf14>] napi_gro_receive+0x2f/0x34 [152286.675784] [<ffffffffa00ea52d>] igb_poll+0x83f/0xba3 [igb] [152286.681994] [<ffffffff810181f0>] ? read_tsc+0xd/0x25 [152286.687392] [<ffffffff81082c9e>] ? timekeeping_get_ns+0x1b/0x3d [152286.693952] [<ffffffff8104b406>] ? __enqueue_entity+0x79/0x7b [152286.700285] [<ffffffff810624cd>] ? local_bh_enable_ip+0xe/0x10 [152286.706609] [<ffffffff813ced7c>] net_rx_action+0xc6/0x1c3 [152286.712561] [<ffffffff81062a2d>] __do_softirq+0xd2/0x194 [152286.718323] [<ffffffff810b646c>] ? handle_IRQ_event+0x66/0x120 [152286.724614] [<ffffffff81012e8c>] call_softirq+0x1c/0x30 [152286.730296] [<ffffffff81014757>] do_softirq+0x46/0x87 [152286.735789] [<ffffffff810628b5>] irq_exit+0x3b/0x7a [152286.741110] [<ffffffff814668e1>] do_IRQ+0x99/0xb0 [152286.746259] [<ffffffff81012693>] ret_from_intr+0x0/0x11 [152286.751927] <EOI> [<ffffffff812ba90c>] ? acpi_idle_enter_bm+0x232/0x267 [152286.759123] [<ffffffff812ba905>] ? acpi_idle_enter_bm+0x22b/0x267 [152286.765697] [<ffffffff813a1b75>] ? menu_select+0x15a/0x228 [152286.771655] [<ffffffff813a0d55>] cpuidle_idle_call+0x87/0xe2 [152286.777799] [<ffffffff81010c68>] cpu_idle+0xa5/0xd4 [152286.783111] [<ffffffff8145c26f>] ? start_secondary+0x1ea/0x237 [152286.789402] [<ffffffff8145c27d>] start_secondary+0x1f8/0x237 [152286.795536] ---[ end trace 23becf1ca0e310bc ]---
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Is this happening on the latest kernel? This seems like a problem we've seen and corrected already, I'll try find the exact bug we fixed.
We build 6.1 kernel from the srpm, and run it on several hardware, currently we don't have similar report for more then 1 month.
Copy that, I'll close this then, and post the patch that fixed it here when I find it.