Description of problem: kernel pacnicked with error output that sounded like it should be reported. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-5.0.3.EL How reproducible: Haven't tried, hopefully won't have to. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: no panic Additional info: log output will be attached shortly
Created attachment 111553 [details] panic log
Please do let us know if this is actually reproducible, should it happen again. Do you have any idea what was going on that might have caused this?
The machine in question, since its deployment, has been running with nothing but an apache2 in worker mode serving lots of 301 redirects and RDF files. This is the front end of our application update service for Firefox. At the point when it panicked it was pushing about 16 Mbit of traffic, but has successfully handled 30 Mbit at one point since rebooting, and has held up for longer now than the period between initial deployment and the first panic. We did switch it to using the deadline scheduler when we rebooted it, because of our previous experience with the only other RHEL4 box we have in production, which is experiencing either bug 131251 or bug 149635.
Created attachment 113509 [details] Fix for bogus SKB tcp_pcount when TSO disabled. This patch backported from 2.6.12'ish from John Heffner should fix the problem. Once we disable TSO on a connection, we should never set a non-one pcount for an SKB.
I could reproduce this problem on one of our servers several times today as can be seen by the two attached dumps. I do not know however, under which circumstances exactly. The system was running smoothly (with the exception of Bug #152326) for about 60 days now, only to panic at 02:39am this morning and from then on several times more. Applying the Patch from Comment 4 to <http://people.redhat.com/davej/kernels/RHEL4/SRPMS.kernel/kernel-2.6.9-11.EL.src.rpm> which fixes #152326 seems to result in a stable box again. Thus this fix should be included in RHEL4 U1. ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at tcp_output:924 invalid operand: 0000 [1] CPU 0 Modules linked in: netconsole netdump iptable_filter ip_tables md5 ipv6 i2c_dev i2c_core nfs lockd sunrpc button battery ac ohci_hcd tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod sata_sil libata sd_mod scsi_mod Pid: 0, comm: swapper Not tainted 2.6.9-5.0.5.EL RIP: 0010:[<ffffffff80312420>] <ffffffff80312420>{tcp_retransmit_skb+645} RSP: 0018:ffffffff8048b4f8 EFLAGS: 00010202 RAX: 000001000d19a6c0 RBX: 000001003c33ce00 RCX: 000001000d19a6c0 RDX: 000001000d0bc700 RSI: 00000000000003b4 RDI: 0000000000000010 RBP: 0000010039799d40 R08: ffffffff8048b598 R09: 0000000000000246 R10: 0000000000000246 R11: ffffffff80521f18 R12: 00000100113a8c50 R13: 00000000000005a8 R14: 00000100113a8c50 R15: 00000100113a8840 FS: 0000002a9590de40(0000) GS:ffffffff8051cd80(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000552ad780f8 CR3: 0000000000101000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffffffff80520000, task ffffffff8040a980) Stack: 0000000000002706 000001f400108812 00000100113a8c50 0000000000000001 00000100113a8840 0000000000000008 00000100113a8c50 00000100113a8c50 0000000000000000 ffffffff803149a6 Call Trace:<IRQ> <ffffffff803149a6>{tcp_write_timer+1089} <ffffffff8014179d>{run_timer_softirq+591} <ffffffff8013d91c>{__do_softirq+76} <ffffffff8013d9a3>{do_softirq+49} <ffffffff8011378b>{do_IRQ+664} <ffffffff80110d4b>{ret_from_intr+0} <EOI> <ffffffff8033fd7f>{unix_poll+0} <ffffffff8010e647>{default_idle+0} <ffffffff8010e667>{default_idle+32} <ffffffff8010e6d7>{cpu_idle+26} <ffffffff805236f3>{start_kernel+632} <ffffffff805231ab>{_sinittext+427} Code: 0f 0b 59 08 38 80 ff ff ff ff 9c 03 48 8b 43 10 4c 63 f6 ff RIP <ffffffff80312420>{tcp_retransmit_skb+645} RSP <ffffffff8048b4f8> ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at tcp_output:924 invalid operand: 0000 [1] CPU 0 Modules linked in: netconsole netdump i2c_dev i2c_core nfs lockd sunrpc iptable_filter ip_tables button battery ac ohci_hcd tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod sata_sil libata sd_mod scsi_mod Pid: 31800, comm: cc1 Not tainted 2.6.9-5.0.5.EL RIP: 0010:[<ffffffff80312420>] <ffffffff80312420>{tcp_retransmit_skb+645} RSP: 0000:ffffffff8048b4f8 EFLAGS: 00010202 RAX: 000001003e28c700 RBX: 000001003dc19400 RCX: 000001003e28c700 RDX: 0000010027555700 RSI: 00000000000003b4 RDI: 0000000000000010 RBP: 0000010033aa1cc0 R08: ffffffff8048b598 R09: 0000000000000002 R10: 0000000000000002 R11: 000001000ae33f58 R12: 0000010038704bd0 R13: 00000000000005a8 R14: 0000010038704bd0 R15: 00000100387047c0 FS: 0000002a9589eb00(0000) GS:ffffffff8051cd80(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000002a96335000 CR3: 0000000000101000 CR4: 00000000000006e0 Process cc1 (pid: 31800, threadinfo 000001000ae32000, task 00000100089b3310) Stack: 000001003ede4524 000001f40010e012 0000010038704bd0 0000000000000001 00000100387047c0 0000000000000008 0000010038704bd0 0000010038704bd0 0000000000000321 ffffffff803149a6 Call Trace:<IRQ> <ffffffff803149a6>{tcp_write_timer+1089} <ffffffff8014179d>{run_timer_softirq+591} <ffffffff8013d91c>{__do_softirq+76} <ffffffff8013d9a3>{do_softirq+49} <ffffffff8011378b>{do_IRQ+664} <ffffffff80110d4b>{ret_from_intr+0} <EOI> Code: 0f 0b 59 08 38 80 ff ff ff ff 9c 03 48 8b 43 10 4c 63 f6 ff RIP <ffffffff80312420>{tcp_retransmit_skb+645} RSP <ffffffff8048b4f8>
Comment 5 confirmed. The box has been stable for the last 30 hours.