Bug 145023 - serial driver hangs system
serial driver hangs system
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Aristeu Rozanski
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-01-13 15:27 EST by Stuart Hayes
Modified: 2012-06-20 11:59 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 11:59:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch to prevent lockup while using low_latency (4.48 KB, patch)
2005-01-21 16:09 EST, Stuart Hayes
no flags Details | Diff

  None (edit)
Description Stuart Hayes 2005-01-13 15:27:18 EST
Description of problem:
-----------------------

If the serial port is used with "low_latency" set, the system can 
lock up with a second function trying to get the same spinlock that's 
already held.  The system will lock up or not depending on what other 
flags are set... if the driver is in "real_raw" mode, it is OK.

This problem occurs using the 8250 serial port driver and the n_tty 
line discipline.  The calls stack up like this when an 8250 interrupt 
occurs because characters are coming into the uart:

serial8250_interrupt(); (in 8250.c)
  spin_lock(&up->port.lock);
  serial8250_handle_port(); (in 8250.c)
    receive_chars(); (in 8250.c)
      tty_flip_buffer_push(); (in tty_io.c)
        flush_to_ldisc(); (in tty_io.c)
          disc->receive_buf(); (which is n_tty.c's n_tty_receive_buf
();)
            tty->driver->flush_chars(); (which is serial_core.c's 
uart_flush_chars();)
              uart_start();
                spin_lock_irqsave(&port->lock,flags)  *hangs*


With "low_latency" set, the IRQ handler tries to wait for the flip 
buffer to get flipped and the data pushed up to the line discipline, 
rather than letting it get scheduled for later (see 8250.c's 
receive_chars() and tty_io.c's tty_flip_buffer_push()).

However, part of what n_tty does when it's receive_buf() function is 
called is to call flush_chars() to force out any characters that 
might have gotten echoed when the incoming characters were 
processed.  Note that this doesn't happen if n_tty thinks we're 
in "real_raw" mode (no chance of characters getting echoed or 
anything in "real_raw" mode).

In the 2.6.10 kernel, this can still happen, even though there were 
some minor changes (there's a difference in 8250.c's receive_chars() 
function).

In RHEL3, this problem *could* occur, but wasn't seen, because 
the "flush_chars()" function would only try to get the port spinlock 
if there were actually characters to flush, while in RHEL4 flush_chars
() *always* tries to get the port spinlock.

We have to use low_latency with our RAC card's emulated UART, because 
the data comes in too fast and is lost due to the way the flip 
buffers work if we aren't in low_latency mode.



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
kernel-2.6.9-1.906_EL (pre-rc2)

How reproducible:
-----------------
always

Steps to Reproduce:
-------------------
1. set up a serial port using low_latency, with echo enabled (or some 
other flag to keep n_tty from using "real_raw" mode, such as strip)
2. use the serial port
3. watch the system lock up (smp kernel) or spit out an error message 
about a spinlock already being held (up kernel)
      
Actual results:
---------------
system locks up (with smp kernel)


Expected results:
-----------------
serial port should work normally


Additional info:
----------------
Comment 2 Stuart Hayes 2005-01-21 16:09:04 EST
Created attachment 110075 [details]
patch to prevent lockup while using low_latency

This patch fixes the lockup when the serial port is in low_latency.  It changes
the n_tty.c's receive_buf function so that it can be run from interrupt
context, but sceduling the flush_chars() to run later instead of just calling
it right then and there.

It appears to me that the flip buffers are in need of a spinlock to keep them
from getting flipped at the same time that the driver IRQ handler is filling
them up... but I guess that would be a separate patch.
Comment 3 Stuart Hayes 2005-01-21 16:10:36 EST
I meant to type "...BY scheduling the flush_chars()...", not "...BUT 
scheduling the flush_chars()...".
Comment 4 Alan Cox 2005-01-22 12:31:48 EST
Russell King was supposed to be looking at this. Calling receive_buf
from an IRQ is IMHO not the right answer - too much other code
including paths like hangup knows it is not IRQ safe.
Comment 5 Stuart Hayes 2005-01-24 11:26:29 EST
You are right, of course.

I didn't much care for that fix, anyway, but I thought it was safe, 
and it was the smallest change I could think of.  Looking at it 
again, though, I see that it isn't safe.

I've got some other ideas... I'll try again when I get a little time 
and if Russell hasn't already got something.

Thanks!
Comment 6 Alan Cox 2006-09-19 11:34:46 EDT
Fixed upstream as is the whole flip buffer design so should be fine with RHEL5.
Backporting some of the changes is not pretty though.
Comment 7 Jiri Pallich 2012-06-20 11:59:47 EDT
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.

Note You need to log in before you can comment on or make changes to this bug.