Bug 145023 - serial driver hangs system
Summary: serial driver hangs system
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Aristeu Rozanski
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-01-13 20:27 UTC by Stuart Hayes
Modified: 2012-06-20 15:59 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 15:59:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to prevent lockup while using low_latency (4.48 KB, patch)
2005-01-21 21:09 UTC, Stuart Hayes
no flags Details | Diff

Description Stuart Hayes 2005-01-13 20:27:18 UTC
Description of problem:
-----------------------

If the serial port is used with "low_latency" set, the system can 
lock up with a second function trying to get the same spinlock that's 
already held.  The system will lock up or not depending on what other 
flags are set... if the driver is in "real_raw" mode, it is OK.

This problem occurs using the 8250 serial port driver and the n_tty 
line discipline.  The calls stack up like this when an 8250 interrupt 
occurs because characters are coming into the uart:

serial8250_interrupt(); (in 8250.c)
  spin_lock(&up->port.lock);
  serial8250_handle_port(); (in 8250.c)
    receive_chars(); (in 8250.c)
      tty_flip_buffer_push(); (in tty_io.c)
        flush_to_ldisc(); (in tty_io.c)
          disc->receive_buf(); (which is n_tty.c's n_tty_receive_buf
();)
            tty->driver->flush_chars(); (which is serial_core.c's 
uart_flush_chars();)
              uart_start();
                spin_lock_irqsave(&port->lock,flags)  *hangs*


With "low_latency" set, the IRQ handler tries to wait for the flip 
buffer to get flipped and the data pushed up to the line discipline, 
rather than letting it get scheduled for later (see 8250.c's 
receive_chars() and tty_io.c's tty_flip_buffer_push()).

However, part of what n_tty does when it's receive_buf() function is 
called is to call flush_chars() to force out any characters that 
might have gotten echoed when the incoming characters were 
processed.  Note that this doesn't happen if n_tty thinks we're 
in "real_raw" mode (no chance of characters getting echoed or 
anything in "real_raw" mode).

In the 2.6.10 kernel, this can still happen, even though there were 
some minor changes (there's a difference in 8250.c's receive_chars() 
function).

In RHEL3, this problem *could* occur, but wasn't seen, because 
the "flush_chars()" function would only try to get the port spinlock 
if there were actually characters to flush, while in RHEL4 flush_chars
() *always* tries to get the port spinlock.

We have to use low_latency with our RAC card's emulated UART, because 
the data comes in too fast and is lost due to the way the flip 
buffers work if we aren't in low_latency mode.



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
kernel-2.6.9-1.906_EL (pre-rc2)

How reproducible:
-----------------
always

Steps to Reproduce:
-------------------
1. set up a serial port using low_latency, with echo enabled (or some 
other flag to keep n_tty from using "real_raw" mode, such as strip)
2. use the serial port
3. watch the system lock up (smp kernel) or spit out an error message 
about a spinlock already being held (up kernel)
      
Actual results:
---------------
system locks up (with smp kernel)


Expected results:
-----------------
serial port should work normally


Additional info:
----------------

Comment 2 Stuart Hayes 2005-01-21 21:09:04 UTC
Created attachment 110075 [details]
patch to prevent lockup while using low_latency

This patch fixes the lockup when the serial port is in low_latency.  It changes
the n_tty.c's receive_buf function so that it can be run from interrupt
context, but sceduling the flush_chars() to run later instead of just calling
it right then and there.

It appears to me that the flip buffers are in need of a spinlock to keep them
from getting flipped at the same time that the driver IRQ handler is filling
them up... but I guess that would be a separate patch.

Comment 3 Stuart Hayes 2005-01-21 21:10:36 UTC
I meant to type "...BY scheduling the flush_chars()...", not "...BUT 
scheduling the flush_chars()...".


Comment 4 Alan Cox 2005-01-22 17:31:48 UTC
Russell King was supposed to be looking at this. Calling receive_buf
from an IRQ is IMHO not the right answer - too much other code
including paths like hangup knows it is not IRQ safe.


Comment 5 Stuart Hayes 2005-01-24 16:26:29 UTC
You are right, of course.

I didn't much care for that fix, anyway, but I thought it was safe, 
and it was the smallest change I could think of.  Looking at it 
again, though, I see that it isn't safe.

I've got some other ideas... I'll try again when I get a little time 
and if Russell hasn't already got something.

Thanks!


Comment 6 Alan Cox 2006-09-19 15:34:46 UTC
Fixed upstream as is the whole flip buffer design so should be fine with RHEL5.
Backporting some of the changes is not pretty though.


Comment 7 Jiri Pallich 2012-06-20 15:59:47 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.


Note You need to log in before you can comment on or make changes to this bug.