Red Hat Bugzilla – Bug 145023
serial driver hangs system
Last modified: 2012-06-20 11:59:47 EDT
Description of problem:
If the serial port is used with "low_latency" set, the system can
lock up with a second function trying to get the same spinlock that's
already held. The system will lock up or not depending on what other
flags are set... if the driver is in "real_raw" mode, it is OK.
This problem occurs using the 8250 serial port driver and the n_tty
line discipline. The calls stack up like this when an 8250 interrupt
occurs because characters are coming into the uart:
serial8250_interrupt(); (in 8250.c)
serial8250_handle_port(); (in 8250.c)
receive_chars(); (in 8250.c)
tty_flip_buffer_push(); (in tty_io.c)
flush_to_ldisc(); (in tty_io.c)
disc->receive_buf(); (which is n_tty.c's n_tty_receive_buf
tty->driver->flush_chars(); (which is serial_core.c's
With "low_latency" set, the IRQ handler tries to wait for the flip
buffer to get flipped and the data pushed up to the line discipline,
rather than letting it get scheduled for later (see 8250.c's
receive_chars() and tty_io.c's tty_flip_buffer_push()).
However, part of what n_tty does when it's receive_buf() function is
called is to call flush_chars() to force out any characters that
might have gotten echoed when the incoming characters were
processed. Note that this doesn't happen if n_tty thinks we're
in "real_raw" mode (no chance of characters getting echoed or
anything in "real_raw" mode).
In the 2.6.10 kernel, this can still happen, even though there were
some minor changes (there's a difference in 8250.c's receive_chars()
In RHEL3, this problem *could* occur, but wasn't seen, because
the "flush_chars()" function would only try to get the port spinlock
if there were actually characters to flush, while in RHEL4 flush_chars
() *always* tries to get the port spinlock.
We have to use low_latency with our RAC card's emulated UART, because
the data comes in too fast and is lost due to the way the flip
buffers work if we aren't in low_latency mode.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. set up a serial port using low_latency, with echo enabled (or some
other flag to keep n_tty from using "real_raw" mode, such as strip)
2. use the serial port
3. watch the system lock up (smp kernel) or spit out an error message
about a spinlock already being held (up kernel)
system locks up (with smp kernel)
serial port should work normally
Created attachment 110075 [details]
patch to prevent lockup while using low_latency
This patch fixes the lockup when the serial port is in low_latency. It changes
the n_tty.c's receive_buf function so that it can be run from interrupt
context, but sceduling the flush_chars() to run later instead of just calling
it right then and there.
It appears to me that the flip buffers are in need of a spinlock to keep them
from getting flipped at the same time that the driver IRQ handler is filling
them up... but I guess that would be a separate patch.
I meant to type "...BY scheduling the flush_chars()...", not "...BUT
scheduling the flush_chars()...".
Russell King was supposed to be looking at this. Calling receive_buf
from an IRQ is IMHO not the right answer - too much other code
including paths like hangup knows it is not IRQ safe.
You are right, of course.
I didn't much care for that fix, anyway, but I thought it was safe,
and it was the smallest change I could think of. Looking at it
again, though, I see that it isn't safe.
I've got some other ideas... I'll try again when I get a little time
and if Russell hasn't already got something.
Fixed upstream as is the whole flip buffer design so should be fine with RHEL5.
Backporting some of the changes is not pretty though.
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life.
Please See https://access.redhat.com/support/policy/updates/errata/
If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.