Bug 150044 - Kernel panic: Kernel BUG at tcp_output:924
Summary: Kernel panic: Kernel BUG at tcp_output:924
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: David Miller
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-03-01 21:56 UTC by Dave Miller
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version: RHEL4U2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-04 15:27:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
panic log (4.83 KB, text/plain)
2005-03-01 21:57 UTC, Dave Miller
no flags Details
Fix for bogus SKB tcp_pcount when TSO disabled. (4.03 KB, patch)
2005-04-22 03:45 UTC, David Miller
no flags Details | Diff

Description Dave Miller 2005-03-01 21:56:11 UTC
Description of problem:
kernel pacnicked with error output that sounded like it should be
reported.

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.0.3.EL

How reproducible:
Haven't tried, hopefully won't have to.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:
no panic

Additional info:
log output will be attached shortly

Comment 1 Dave Miller 2005-03-01 21:57:10 UTC
Created attachment 111553 [details]
panic log

Comment 2 Suzanne Hillman 2005-03-02 18:52:37 UTC
Please do let us know if this is actually reproducible, should it happen again.

Do you have any idea what was going on that might have caused this?

Comment 3 Dave Miller 2005-03-02 20:00:19 UTC
The machine in question, since its deployment, has been running with nothing but
an apache2 in worker mode serving lots of 301 redirects and RDF files.

This is the front end of our application update service for Firefox.  At the
point when it panicked it was pushing about 16 Mbit of traffic, but has
successfully handled 30 Mbit at one point since rebooting, and has held up for
longer now than the period between initial deployment and the first panic.  We
did switch it to using the deadline scheduler when we rebooted it, because of
our previous experience with the only other RHEL4 box we have in production,
which is experiencing either bug 131251 or bug 149635.

Comment 4 David Miller 2005-04-22 03:45:52 UTC
Created attachment 113509 [details]
Fix for bogus SKB tcp_pcount when TSO disabled.

This patch backported from 2.6.12'ish from John Heffner
should fix the problem.

Once we disable TSO on a connection, we should never set
a non-one pcount for an SKB.

Comment 5 Andreas Thienemann 2005-06-07 21:04:50 UTC
I could reproduce this problem on one of our servers several times today as can
be seen by the two attached dumps.
I do not know however, under which circumstances exactly. The system was running
smoothly (with the exception of Bug #152326) for about 60 days now, only to
panic at 02:39am this morning and from then on several times more.

Applying the Patch from Comment 4 to
<http://people.redhat.com/davej/kernels/RHEL4/SRPMS.kernel/kernel-2.6.9-11.EL.src.rpm>
which fixes #152326 seems to result in a stable box again.

Thus this fix should be included in RHEL4 U1.

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at tcp_output:924
invalid operand: 0000 [1]
CPU 0
Modules linked in: netconsole netdump iptable_filter ip_tables md5 ipv6 i2c_dev
i2c_core nfs lockd sunrpc button battery ac ohci_hcd tg3 floppy dm_snapshot
dm_zero dm_mirror ext3 jbd raid1 dm_mod sata_sil libata sd_mod scsi_mod
Pid: 0, comm: swapper Not tainted 2.6.9-5.0.5.EL
RIP: 0010:[<ffffffff80312420>] <ffffffff80312420>{tcp_retransmit_skb+645}
RSP: 0018:ffffffff8048b4f8  EFLAGS: 00010202
RAX: 000001000d19a6c0 RBX: 000001003c33ce00 RCX: 000001000d19a6c0
RDX: 000001000d0bc700 RSI: 00000000000003b4 RDI: 0000000000000010
RBP: 0000010039799d40 R08: ffffffff8048b598 R09: 0000000000000246
R10: 0000000000000246 R11: ffffffff80521f18 R12: 00000100113a8c50
R13: 00000000000005a8 R14: 00000100113a8c50 R15: 00000100113a8840
FS:  0000002a9590de40(0000) GS:ffffffff8051cd80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000552ad780f8 CR3: 0000000000101000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80520000, task ffffffff8040a980)
Stack: 0000000000002706 000001f400108812 00000100113a8c50 0000000000000001
       00000100113a8840 0000000000000008 00000100113a8c50 00000100113a8c50
       0000000000000000 ffffffff803149a6
Call Trace:<IRQ> <ffffffff803149a6>{tcp_write_timer+1089}
<ffffffff8014179d>{run_timer_softirq+591}
       <ffffffff8013d91c>{__do_softirq+76} <ffffffff8013d9a3>{do_softirq+49}
       <ffffffff8011378b>{do_IRQ+664} <ffffffff80110d4b>{ret_from_intr+0}
        <EOI> <ffffffff8033fd7f>{unix_poll+0} <ffffffff8010e647>{default_idle+0}
       <ffffffff8010e667>{default_idle+32} <ffffffff8010e6d7>{cpu_idle+26}
       <ffffffff805236f3>{start_kernel+632} <ffffffff805231ab>{_sinittext+427}


Code: 0f 0b 59 08 38 80 ff ff ff ff 9c 03 48 8b 43 10 4c 63 f6 ff
RIP <ffffffff80312420>{tcp_retransmit_skb+645} RSP <ffffffff8048b4f8>
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at tcp_output:924
invalid operand: 0000 [1]
CPU 0
Modules linked in: netconsole netdump i2c_dev i2c_core nfs lockd sunrpc
iptable_filter ip_tables button battery ac ohci_hcd tg3 floppy dm_snapshot
dm_zero dm_mirror ext3 jbd raid1 dm_mod sata_sil libata sd_mod scsi_mod
Pid: 31800, comm: cc1 Not tainted 2.6.9-5.0.5.EL
RIP: 0010:[<ffffffff80312420>] <ffffffff80312420>{tcp_retransmit_skb+645}
RSP: 0000:ffffffff8048b4f8  EFLAGS: 00010202
RAX: 000001003e28c700 RBX: 000001003dc19400 RCX: 000001003e28c700
RDX: 0000010027555700 RSI: 00000000000003b4 RDI: 0000000000000010
RBP: 0000010033aa1cc0 R08: ffffffff8048b598 R09: 0000000000000002
R10: 0000000000000002 R11: 000001000ae33f58 R12: 0000010038704bd0
R13: 00000000000005a8 R14: 0000010038704bd0 R15: 00000100387047c0
FS:  0000002a9589eb00(0000) GS:ffffffff8051cd80(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000002a96335000 CR3: 0000000000101000 CR4: 00000000000006e0
Process cc1 (pid: 31800, threadinfo 000001000ae32000, task 00000100089b3310)
Stack: 000001003ede4524 000001f40010e012 0000010038704bd0 0000000000000001
       00000100387047c0 0000000000000008 0000010038704bd0 0000010038704bd0
       0000000000000321 ffffffff803149a6
Call Trace:<IRQ> <ffffffff803149a6>{tcp_write_timer+1089}
<ffffffff8014179d>{run_timer_softirq+591}
       <ffffffff8013d91c>{__do_softirq+76} <ffffffff8013d9a3>{do_softirq+49}
       <ffffffff8011378b>{do_IRQ+664} <ffffffff80110d4b>{ret_from_intr+0}
        <EOI>

Code: 0f 0b 59 08 38 80 ff ff ff ff 9c 03 48 8b 43 10 4c 63 f6 ff
RIP <ffffffff80312420>{tcp_retransmit_skb+645} RSP <ffffffff8048b4f8>




Comment 8 Andreas Thienemann 2005-06-09 00:47:36 UTC
Comment 5 confirmed.

The box has been stable for the last 30 hours.


Note You need to log in before you can comment on or make changes to this bug.