Description of problem: ----------------------- If the serial port is used with "low_latency" set, the system can lock up with a second function trying to get the same spinlock that's already held. The system will lock up or not depending on what other flags are set... if the driver is in "real_raw" mode, it is OK. This problem occurs using the 8250 serial port driver and the n_tty line discipline. The calls stack up like this when an 8250 interrupt occurs because characters are coming into the uart: serial8250_interrupt(); (in 8250.c) spin_lock(&up->port.lock); serial8250_handle_port(); (in 8250.c) receive_chars(); (in 8250.c) tty_flip_buffer_push(); (in tty_io.c) flush_to_ldisc(); (in tty_io.c) disc->receive_buf(); (which is n_tty.c's n_tty_receive_buf ();) tty->driver->flush_chars(); (which is serial_core.c's uart_flush_chars();) uart_start(); spin_lock_irqsave(&port->lock,flags) *hangs* With "low_latency" set, the IRQ handler tries to wait for the flip buffer to get flipped and the data pushed up to the line discipline, rather than letting it get scheduled for later (see 8250.c's receive_chars() and tty_io.c's tty_flip_buffer_push()). However, part of what n_tty does when it's receive_buf() function is called is to call flush_chars() to force out any characters that might have gotten echoed when the incoming characters were processed. Note that this doesn't happen if n_tty thinks we're in "real_raw" mode (no chance of characters getting echoed or anything in "real_raw" mode). In the 2.6.10 kernel, this can still happen, even though there were some minor changes (there's a difference in 8250.c's receive_chars() function). In RHEL3, this problem *could* occur, but wasn't seen, because the "flush_chars()" function would only try to get the port spinlock if there were actually characters to flush, while in RHEL4 flush_chars () *always* tries to get the port spinlock. We have to use low_latency with our RAC card's emulated UART, because the data comes in too fast and is lost due to the way the flip buffers work if we aren't in low_latency mode. Version-Release number of selected component (if applicable): ------------------------------------------------------------- kernel-2.6.9-1.906_EL (pre-rc2) How reproducible: ----------------- always Steps to Reproduce: ------------------- 1. set up a serial port using low_latency, with echo enabled (or some other flag to keep n_tty from using "real_raw" mode, such as strip) 2. use the serial port 3. watch the system lock up (smp kernel) or spit out an error message about a spinlock already being held (up kernel) Actual results: --------------- system locks up (with smp kernel) Expected results: ----------------- serial port should work normally Additional info: ----------------
Created attachment 110075 [details] patch to prevent lockup while using low_latency This patch fixes the lockup when the serial port is in low_latency. It changes the n_tty.c's receive_buf function so that it can be run from interrupt context, but sceduling the flush_chars() to run later instead of just calling it right then and there. It appears to me that the flip buffers are in need of a spinlock to keep them from getting flipped at the same time that the driver IRQ handler is filling them up... but I guess that would be a separate patch.
I meant to type "...BY scheduling the flush_chars()...", not "...BUT scheduling the flush_chars()...".
Russell King was supposed to be looking at this. Calling receive_buf from an IRQ is IMHO not the right answer - too much other code including paths like hangup knows it is not IRQ safe.
You are right, of course. I didn't much care for that fix, anyway, but I thought it was safe, and it was the smallest change I could think of. Looking at it again, though, I see that it isn't safe. I've got some other ideas... I'll try again when I get a little time and if Russell hasn't already got something. Thanks!
Fixed upstream as is the whole flip buffer design so should be fine with RHEL5. Backporting some of the changes is not pretty though.
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. Please See https://access.redhat.com/support/policy/updates/errata/ If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.