Digi has discovered a potential race condition in the RHEL 2.1/3 kernel tty layer. They are working with a customer with an extremely large multiport serial configuration, and they think the TTY layer is hitting this race. Digi describes the race as follows: "For example, lets say the tty isn't "real_raw", but is "raw", and the data coming in is say 10 bytes of regular data... We will go into the "else" clause of that real_raw statement, which unlocks the read_lock, then it walks the 10 bytes passing each byte to receive_char. In receive_char, if the tty is "raw", it calls put_tty_queue(). Finally, in put_tty_queue() it locks read_lock, dumps the byte, and unlocks... But there is no check in put_tty_queue (or in the dump of the byte), that the read_buf didn't get set to NULL by the close() routine during the time the read_lock was not held. (ie, the byte for byte walking of those 10 bytes)." We need to deal with this race. :)
Tom, please open a separate bug for RHEL 3. This one will apply only to RHEL 2.1. Thanks. -ernie
Opened BZ 131674 for RHEL 3.
*** Bug 122729 has been marked as a duplicate of this bug. ***
Created attachment 103431 [details] clean up of digi's patch to fix tty race condition revision on digi's patch to clean up missing race points outside of added locks
This patch still has nothing to do with the real bug. You have to fix this at the ldisc level not within n_tty.
(joining in from Bug 122729) So you _do_ think it is a bug in the tty/line discipline layer? This is an area of the kernel where I severely lack experience. Can you point to any resources where this is discussed e.g. in LKML? Is it fixed in EL3.0, or kernel 2.6.x?
There were multiple bugs in the driver code I looked at which would cause crashes and were driver side bugs. There is also a slightly odd corner case in the case of a line discipline change occurring at the exact momemt data is delivered by interrupt to the ldisc handler. In that situation the following can occur ldisc->close is called While running or just after ldisc->receive.. is called The bug appears to be in both 2.4 and 2.6 The concept for a fix I'd propose is that we create a "null" ldisc that way we can fix the race by setting the tty ldisc to the null_ldisc before we called ->close on the previous ldisc. [We can't open the new one first without fairly major changes]
I am reposting my analyis from Bug 122729 here again because it seems to fit very well what you said. I am still wondering why there seems to be no locking code whatsoever in n_tty_close(); but that may be due to it assuming that locking happens in the ldisc layer. Btw our customer also uses heavy serial multiplexing. Here is my old Oops analysis: The crash occurs at line 730 of drivers/char/n_tty.c in the function n_tty_receive_buf(), because the "read_buf" element of the controlling tty structure is NULL (see detailed analysis below). The read_buf element is checked in the same function in line 723, 7 lines earlier. Thus, it must have been NULLed in the meantime. This is possible because there is no locking at least until the spin_lock_irqsave(&tty->read_lock) in l. 727. However, the only function (AFAICS) that sets read_buf to NULL, n_tty_close(), does not seem to do any locking, so even if the spin_lock_irqsave() in n_tty_receive_buf() happened earlier, it would apparently not help.
There are two cases you get this crash. One of them is a driver level bug when a driver calls ->receive in parallel with or after having notified the serial layer of a hangup. That seems to be the one people see in the real world with buggy drivers. The second is the ldisc switch case which will probably only bite people doing a lot of PPP sessions at a time on SMP machines. n_tty can't do any locking because the race is bigger than n_tty itself and requires the locking is higher up. Which would be all well and good if ldisc did that locking. Similar race effects exist in the switch to n_tty from ppp and so on also. At the point you make "->receive is never called after ->close" an invariant for ldisc then the n_tty code and the other ldiscs I've looked at work. This is already meant to be an invariant but the one case got missed.
Arjan probably cares about this for 2.6...
No ppp is involved in our case. It is some specific proprietary protocol talked through the serial line with some food balances. But the protocol is impemented in user space and uses the normal ttySX interface.
Are you setting line discplines - if not then I don't see how this can be the problem you are seeing
I'll inquire, thank you.
Alan, I don't quite understand why you think we wouldn't hit this bug without switching the line discipline. The "driver level" problem you describe could just a swell happen with the default line discipline, couldn't it? Regards, Martin
If drivers make random wrong calls to the tty layer they will crash. This has always been the case with operating systems. The ldisc switch race is a very very specific situation and a very narrow window when changing between tty and ppp for example. If you don't change ldisc you can't hit the race as the racy code isnt ever called.
> If drivers make random wrong calls to the tty layer they will crash. Does that mean, then, that the standard serial driver is buggy? AFAICS, in our case, n_tty_receive_buf() was called by the serial interupt handler.
In your situation is that occurring after a hangup triggered by the serial driver itself. It's possible I guess. Instrumenting for ldisc close before ldisc receive (ie n_tty_receive_buf) would be interesting in that situation.
Ok seem to be at least three bugs kernel side 1. Some drivers call ldisc->receive_buf() directly. That's a nono but is done in the kernel with offenders including pty and rocketport. Could be out of tree offenders 2. Ldisc switching locks. I have a test fix I'm playing with in 2.6 for cases not in #1 3. Possibly raw/nonraw switching locks although I think that may purely be a symptom of #1 but this needs verification. The n_tty layer believes that if it sets TTY_NO_FLIP then it won't get input. Drivers bypassing the flip queues break this assumption. 4. TIOCSTI ioctls break the assumption too The bypassing seems to be appropriate for some drivers so we probably need a function for "try_to_queue_to_ldisc()" or some similar helper for these. Pty is trickier as it wants to queue and sleep if need be. Workarounds: 1. Don't call ldisc.receive_buf directly, if you must call it then only call it if TTY_DONT_FLIP is clear. Preferably switch to using the tty flip buffer logic as that will avoid this one 2. No workaround (RHEL fix may eventually require workaround 1 anyway) 3. Maybe workaround 1.
Alan, this seems to conflict with what Ted Ts'o said back in 2000? http://marc.theaimsgroup.com/?l=linux-kernel&m=94346476622424&w=2 Also, I checked with Digi, and their driver can be operated in two modes: If the DGRP driver global variable called "rawreadok" is on, (which is on by default) it will call ldisc->receive_buf() directly. If that variable is off, the driver will fill up a flip buffer, and schedule it to be delivered later with a call to tty_schedule_flip().
*** Bug 133109 has been marked as a duplicate of this bug. ***
Any news on this topic?
For reference the fix that got committed upstream into 2.4.29: http://linux.bkbits.net:8080/linux-2.4/cset@41c3801dxJnuBSRCpUiMRkwItPHjWA
The fix has been deemed too invasive for a RHEL2.1 update at this time. Closing as WONTFIX.