Red Hat Bugzilla – Bug 173146
Panic in serial.c
Last modified: 2007-11-30 17:07:08 EST
Description of problem:
A customer is experiencing intermittent panics in serial.c on heavily loaded
ppp_async smbfs loop nfs lockd sunrpc dgrp
ppp_generic slhc st aic79xx netconsole bcm5700 audit floppy sg microcode
nls_iso8859-1 jfs keybdev mousedev hid inp
EIP: 0060:[<021b209f>] Not tainted
EIP is at tty_wakeup [kernel] 0xf (2.4.21-23.EL.3.ttyhugemem/i686)
eax: 00000000 ebx: 00000000 ecx: c3088980 edx: 021c7d90
esi: a34d7dfc edi: 00000180 ebp: 00000001 esp: a34d7de8
ds: 0068 es: 0068 ss: 0068
Process lsof (pid: 31373, stackpage=a34d7000)
000002a2 a34d7dfc a34d7dfc 021304aa 00000000 c3088a00 c3088a00 00000001
00000000 021c7d8d 023a928c 021303c4 0247b4bc 02130262 00000003 0244e400
00000009 00000003 0000000a 0212fff5 0244e400 00000246 a34d7e40 627c6400
Call Trace: [<021304aa>] __run_task_queue [kernel] 0x6a (0xa34d7df4)
[<021c7d8d>] do_serial_bh [kernel] 0x1d (0xa34d7e0c)
[<021303c4>] bh_action [kernel] 0x54 (0xa34d7e14)
[<02130262>] tasklet_hi_action [kernel] 0x62 (0xa34d7e1c)
[<0212fff5>] do_softirq [kernel] 0x105 (0xa34d7e34)
[<02269417>] .text.lock.tcp_ipv4 [kernel] 0x1dd (0xa34d7e54)
[<02195c46>] proc_file_read [kernel] 0x1a6 (0xa34d7f54)
[<02164eb3>] sys_read [kernel] 0xa3 (0xa34d7f94)
Code: Bad EIP value.
CPU#0 is frozen.
CPU#1 is frozen.
CPU#2 is frozen.
CPU#3 is executing netdump.
< netdump activated - performing handshake with the client. >
The trace has a "do_serial_bh" call in it, which only the
built-in comport driver (serial.c) calls. From a quick browse
of serial.c in "drivers/char/serial.c":
* This routine is used to handle the "bottom half" processing for the
* serial driver, known also the "software interrupt" processing.
* This processing is done at the kernel interrupt level, after the
* rs_interrupt() has returned, BUT WITH INTERRUPTS TURNED ON. This
* is where time-consuming activities which can not be done in the
* interrupt driver proper are done; the interrupt driver schedules
* them using rs_sched_event(), and they get done here.
From the fact that "tty_wakeup" is the culprit in the stack trace,
its very likely that the serial.c driver queued up a tty_wakeup()
task in tq_serial by calling rs_sched_event(). Its that tty_wakeup
call thats deferencing a null tty pointer, which results in the
Attached is a patch that implements a check to verify that the tty
struct is valid at the beginning of the tty_wakeup function.
Created attachment 121027 [details]
Patch to prevent tty NULL in tty_wakeup
This problem was fixed in RHEL3 U5. Please upgrade to U6 (2.4.21-37.EL).
*** This bug has been marked as a duplicate of 131674 ***
Please look again at U6. This is a new patch to fix a related issue to 131674,
but this is not a dupe.
This patch was generated off the U6 kernel tree. :)
Hi, Tom. The reason that I thought that this might be a dup is that the
tty changes in U5 should prevent this problem from occurring. Before
investing any time on this, I think we should have confirmation that
this problem exists on U5 (or U6). Please verify this (and provide
the oops output on a more recent kernel).
Thanks in advance.
Hi, Tom. This is getting to be a difficult issue. Basically, the customer
is running an unsupported kernel. We can't verify for certain that the tty
fixes committed to U5 are exactly what they're running. (There were multiple
versions of the very large and complex tty patch.) Further, at least one
other tty change went into U5 that could be related (dealing with races in
forking and controlling tty assignment). Lastly, I don't feel that the
check in tty_wakeup() in comment #1 is appropriate, since if there's an
open/close race in drivers/char/serial.c, the problem should be fixed in
Thus, to make progress on resolving this issue, I think we need to have the
problem reproduced on stock U6.
Reassigning to Don and reverting to NEEDINFO (requesting a U6-based oops or
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
For more information of the RHEL errata support policy, please visit:
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.