Description of problem: When pty_close() is called to close a pseudo tty device, the packet mode of the linked tty is cleared: http://rhkernel.org/RHEL5+2.6.18-128.1.10.el5/drivers/char/pty.c#L53 This can cause problems with applications (such as telnetd) that are expecting data read from the linked tty to be in packet-mode format. Version-Release number of selected component (if applicable): kernel-2.6.18-128.1.10.el5 How reproducible: 100% of the time. Additional info: The code to clear packet mode was added in 1996 by Theodore Ts'o: http://www.linuxhq.com/kernel/v1.99/13/drivers/char/pty.c http://www.linuxhq.com/kernel/v1.99/13/drivers/char/ChangeLog and it still exists in the latest upstream kernels: http://lxr.linux.no/linux+v2.6.29/drivers/char/pty.c#L39 It seems hard for me to believe that something that has been in the kernel for this long could be a bug, but clearing the packet mode of the linked pty just doesn't seem correct to me. Why should closing one pty change the packet mode of the other pty?
Created attachment 347630 [details] SystemTap script
Created attachment 347631 [details] SystemTap output when problem does not occur This is the output from the above SystemTap script when "logout" is printed on the telnet client. Note the following bit near the end of the log: tty_write: bash (16233) writing to pts2 (packet=0) 7 bytes: logout\n write_chan: bash (16233) writing to pts2 (packet=0) 7 bytes: logout\n read_chan: in.telnetd (16231) reading from ptm2 (packet=1) 9 bytes: logout\r\n tty_read: in.telnetd (16231) reading from ptm2 (packet=1) 9 bytes: logout\r\n vfs_read: in.telnetd (16231) reading from ptm2 (packet=1) 9 bytes: logout\r\n pty_close: bash (16233) closing pts2 (packet=0) pts2 is linked to ptm2 (packet=1) read_chan: in.telnetd (16231) reading from ptm2 (packet=0) Error -5 tty_read: in.telnetd (16231) reading from ptm2 (packet=0) Error -5 vfs_read: in.telnetd (16231) reading from ptm2 (packet=0) Error -5 pty_close: in.telnetd (16231) closing ptm2 (packet=0) ptm2 is linked to pts2 (packet=0) Bash writes "logout\n" (7 bytes) to pts2 which gets read by in.telnetd from ptm2 as "\0logout\r\n" (9 bytes). The system tap script automatically strips the leading '\0' from the string when the tty_struct has packet == 1. After in.telnetd reads the data, bash closes pts2.
Created attachment 347632 [details] SystemTap output when problem occurs. This is the output from the above SystemTap script when "logout" is printed on the telnet client. Note the following bit near the end of the log: tty_write: bash (16521) writing to pts2 (packet=0) 7 bytes: logout\n write_chan: bash (16521) writing to pts2 (packet=0) 7 bytes: logout\n pty_close: bash (16521) closing pts2 (packet=0) pts2 is linked to ptm2 (packet=1) read_chan: in.telnetd (16388) reading from ptm2 (packet=0) 8 bytes: logout\r\n tty_read: in.telnetd (16388) reading from ptm2 (packet=0) 8 bytes: logout\r\n vfs_read: in.telnetd (16388) reading from ptm2 (packet=0) 8 bytes: logout\r\n read_chan: in.telnetd (16388) reading from ptm2 (packet=0) Error -5 tty_read: in.telnetd (16388) reading from ptm2 (packet=0) Error -5 vfs_read: in.telnetd (16388) reading from ptm2 (packet=0) Error -5 pty_close: in.telnetd (16388) closing ptm2 (packet=0) ptm2 is linked to pts2 (packet=0) This time, bash writes "logout\n" (7 bytes) and then closes pts2, which clears packet mode on ptm2. When in.telnetd reads ptm2 (note that tty->packet == 0), it reads only 8 bytes -- the leading '\0' is missing.
REPRODUCTION STEPS Here's what I do to reproduce this: 1) On the server, run the following as root: while /bin/true; do /usr/sbin/in.telnetd -debug 2323; done 2) On the server, compile and execute the SystemTap script as a normal user who belongs to the stapusr and stapdev groups: stap -v -g pty.stp.1 2 no The parameter "2" tells the script to only report on ptys with "2" in the name (otherwise, you get lots of information on ptys that you don't care about). This parameter may vary, depending on which pty in.telnetd is using. The parameter "no" tells the script not to print backtrace information for each call ("bt" will cause it to print the bt info). 3) On the client, run the following as a normal user. telnet <server> 2323 Log in using some user account. 3a) Edit ~/.bash_logout to remove or comment out the "/usr/bin/clear" command (this makes it possible to see the logout (or "ougout") message emitted from bash). 3b) Type "exit" and then <ENTER> to log out. What seems to tickle the problem is the timing between typing "exit" and pressing <ENTER>. There has to be just a slight delay between the "t" and <ENTER>. When I'm "in the groove" I can get it to happen 80-90% of the time.
Created attachment 347654 [details] Expect script to reproduce problem in telnet Here's an expect script that should reproduce the problem fairly reliably. The values for HOST, PORT, USER, and PASS will need to be set as appropriate. The send_slow value pair reproduced the problem 100% of the time on my test system, although I expect that it may need to be adjusted for other systems.
Created attachment 348349 [details] Possible patch For what it's worth, applying the following patch to kernel-2.6.18-128.el5xen results in 0/10 failures with my test script. Without the patch, the test script generates 9/10 failures.
Touching on very old code (since 2.0) seems risky, as it can result on random regressions on other applications, especially since, by applying the proposed patch, the link will stay in packet mode after closing both pty devices. Maybe a proper fix would be to add an open count and reseting packet mode only after having all devices closing the pty link. As this bug is also upstream, I sent an email to LKML asking for opinions: http://lkml.org/lkml/2009/11/11/223 The better is to wait upstream comments, in order to proceed with a fix for this bug.
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug.
I think we should close this BZ as will not fix. It seems to risky changing the behavior for that on RHEL5, for something that is there since Kernel 2.0. I tried to get some feedback upstream, but it seems that people there is also afraid of such change or they are not caring enough. So, except if one day it rises any serious issue, I suspect that the best is to just don't touch on that.
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.