Bug 504703 - pty_close() clears the packet mode of the linked pty
pty_close() clears the packet mode of the linked pty
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
medium Severity medium
: rc
: ---
Assigned To: Mauro Carvalho Chehab
Red Hat Kernel QE team
:
Depends On:
Blocks: 533192 566295
  Show dependency treegraph
 
Reported: 2009-06-08 17:29 EDT by Bryan Mason
Modified: 2013-07-04 18:53 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 566295 (view as bug list)
Environment:
Last Closed: 2011-08-14 07:45:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
SystemTap script (11.45 KB, text/plain)
2009-06-12 13:51 EDT, Bryan Mason
no flags Details
SystemTap output when problem does not occur (19.24 KB, text/plain)
2009-06-12 14:01 EDT, Bryan Mason
no flags Details
SystemTap output when problem occurs. (18.59 KB, text/plain)
2009-06-12 14:06 EDT, Bryan Mason
no flags Details
Expect script to reproduce problem in telnet (1.18 KB, text/plain)
2009-06-12 15:44 EDT, Bryan Mason
no flags Details
Possible patch (427 bytes, patch)
2009-06-17 17:09 EDT, Bryan Mason
no flags Details | Diff

  None (edit)
Description Bryan Mason 2009-06-08 17:29:37 EDT
Description of problem:

    When pty_close() is called to close a pseudo tty device, the
    packet mode of the linked tty is cleared:

        http://rhkernel.org/RHEL5+2.6.18-128.1.10.el5/drivers/char/pty.c#L53

    This can cause problems with applications (such as telnetd) that
    are expecting data read from the linked tty to be in packet-mode
    format.

Version-Release number of selected component (if applicable):

    kernel-2.6.18-128.1.10.el5

How reproducible:

    100% of the time.

Additional info:

    The code to clear packet mode was added in 1996 by Theodore Ts'o:

        http://www.linuxhq.com/kernel/v1.99/13/drivers/char/pty.c
        http://www.linuxhq.com/kernel/v1.99/13/drivers/char/ChangeLog

    and it still exists in the latest upstream kernels:

        http://lxr.linux.no/linux+v2.6.29/drivers/char/pty.c#L39

    It seems hard for me to believe that something that has been in
    the kernel for this long could be a bug, but clearing the packet
    mode of the linked pty just doesn't seem correct to me.  Why
    should closing one pty change the packet mode of the other pty?
Comment 5 Bryan Mason 2009-06-12 13:51:15 EDT
Created attachment 347630 [details]
SystemTap script
Comment 6 Bryan Mason 2009-06-12 14:01:54 EDT
Created attachment 347631 [details]
SystemTap output when problem does not occur

This is the output from the above SystemTap script when "logout" is printed on the telnet client.  Note the following bit near the end of the log:


      tty_write:         bash (16233) writing to pts2 (packet=0)
                         7 bytes: logout\n
        write_chan:      bash (16233) writing to pts2 (packet=0)
                         7 bytes: logout\n
        read_chan:       in.telnetd (16231) reading from ptm2 (packet=1)
                         9 bytes: logout\r\n
      tty_read:          in.telnetd (16231) reading from ptm2 (packet=1)
                         9 bytes: logout\r\n
    vfs_read:            in.telnetd (16231) reading from ptm2 (packet=1)
                         9 bytes: logout\r\n
    pty_close:           bash (16233) closing pts2 (packet=0)
                         pts2 is linked to ptm2 (packet=1)
        read_chan:       in.telnetd (16231) reading from ptm2 (packet=0)
                         Error -5
      tty_read:          in.telnetd (16231) reading from ptm2 (packet=0)
                         Error -5
    vfs_read:            in.telnetd (16231) reading from ptm2 (packet=0)
                         Error -5
    pty_close:           in.telnetd (16231) closing ptm2 (packet=0)
                         ptm2 is linked to pts2 (packet=0)

Bash writes "logout\n" (7 bytes) to pts2 which gets read by in.telnetd from ptm2 as "\0logout\r\n" (9 bytes).  The system tap script automatically strips the leading '\0' from the string when the tty_struct has packet == 1.  After in.telnetd reads the data, bash closes pts2.
Comment 7 Bryan Mason 2009-06-12 14:06:37 EDT
Created attachment 347632 [details]
SystemTap output when problem occurs.

This is the output from the above SystemTap script when "logout" is printed on
the telnet client.  Note the following bit near the end of the log:

      tty_write:         bash (16521) writing to pts2 (packet=0)
                         7 bytes: logout\n
        write_chan:      bash (16521) writing to pts2 (packet=0)
                         7 bytes: logout\n
    pty_close:           bash (16521) closing pts2 (packet=0)
                         pts2 is linked to ptm2 (packet=1)
        read_chan:       in.telnetd (16388) reading from ptm2 (packet=0)
                         8 bytes: logout\r\n
      tty_read:          in.telnetd (16388) reading from ptm2 (packet=0)
                         8 bytes: logout\r\n
    vfs_read:            in.telnetd (16388) reading from ptm2 (packet=0)
                         8 bytes: logout\r\n
        read_chan:       in.telnetd (16388) reading from ptm2 (packet=0)
                         Error -5
      tty_read:          in.telnetd (16388) reading from ptm2 (packet=0)
                         Error -5
    vfs_read:            in.telnetd (16388) reading from ptm2 (packet=0)
                         Error -5
    pty_close:           in.telnetd (16388) closing ptm2 (packet=0)
                         ptm2 is linked to pts2 (packet=0)

This time, bash writes "logout\n" (7 bytes) and then closes pts2, which clears packet mode on ptm2.  When in.telnetd reads ptm2 (note that tty->packet == 0), it reads only 8 bytes -- the leading '\0' is missing.
Comment 8 Bryan Mason 2009-06-12 14:17:54 EDT
REPRODUCTION STEPS

Here's what I do to reproduce this:

1) On the server, run the following as root:

       while /bin/true; do /usr/sbin/in.telnetd -debug 2323; done

2) On the server, compile and execute the SystemTap script as a normal
   user who belongs to the stapusr and stapdev groups:

       stap -v -g pty.stp.1 2 no

   The parameter "2" tells the script to only report on ptys with "2"
   in the name (otherwise, you get lots of information on ptys that
   you don't care about).  This parameter may vary, depending on which
   pty in.telnetd is using.  The parameter "no" tells the script not
   to print backtrace information for each call ("bt" will cause it to
   print the bt info).

3) On the client, run the following as a normal user.

       telnet <server> 2323

   Log in using some user account.  

   3a) Edit ~/.bash_logout to remove or comment out the
       "/usr/bin/clear" command (this makes it possible to see the
       logout (or "ougout") message emitted from bash).

   3b) Type "exit" and then <ENTER> to log out.  What seems to tickle
       the problem is the timing between typing "exit" and pressing
       <ENTER>.  There has to be just a slight delay between the "t"
       and <ENTER>.  When I'm "in the groove" I can get it to happen
       80-90% of the time.
Comment 9 Bryan Mason 2009-06-12 15:44:06 EDT
Created attachment 347654 [details]
Expect script to reproduce problem in telnet

Here's an expect script that should reproduce the problem fairly reliably.  The values for HOST, PORT, USER, and PASS will need to be set as appropriate.  The send_slow value pair reproduced the problem 100% of the time on my test system, although I expect that it may need to be adjusted for other systems.
Comment 10 Bryan Mason 2009-06-17 17:09:10 EDT
Created attachment 348349 [details]
Possible patch

For what it's worth, applying the following patch to kernel-2.6.18-128.el5xen results in 0/10 failures with my test script.  Without the patch, the test script generates 9/10 failures.
Comment 11 Mauro Carvalho Chehab 2009-11-11 12:06:16 EST
Touching on very old code (since 2.0) seems risky, as it can result on random
regressions on other applications, especially since, by applying the proposed patch, the link will stay in packet mode after closing both pty devices.

Maybe a proper fix would be to add an open count and reseting packet mode only after having all devices closing the pty link.

As this bug is also upstream, I sent an email to LKML asking for opinions:

http://lkml.org/lkml/2009/11/11/223

The better is to wait upstream comments, in order to proceed with a fix for this bug.
Comment 16 RHEL Product and Program Management 2010-12-07 05:00:13 EST
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.
Comment 17 RHEL Product and Program Management 2011-06-20 17:21:09 EDT
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.
Comment 18 Mauro Carvalho Chehab 2011-08-12 13:57:35 EDT
I think we should close this BZ as will not fix. It seems to risky changing the behavior for that on RHEL5, for something that is there since Kernel 2.0. I tried to get some feedback upstream, but it seems that people there is also afraid of such change or they are not caring enough. So, except if one day it rises any serious issue, I suspect that the best is to just don't touch on that.
Comment 19 RHEL Product and Program Management 2011-08-14 07:45:29 EDT
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Note You need to log in before you can comment on or make changes to this bug.