Bug 504703
| Summary: | pty_close() clears the packet mode of the linked pty | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | Bryan Mason <bmason> | ||||||||||||
| Component: | kernel | Assignee: | Mauro Carvalho Chehab <mchehab> | ||||||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||
| Severity: | medium | Docs Contact: | |||||||||||||
| Priority: | medium | ||||||||||||||
| Version: | 5.3 | CC: | arozansk, lwang, moshiro, prarit, tao | ||||||||||||
| Target Milestone: | rc | ||||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | All | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | |||||||||||||||
| : | 566295 (view as bug list) | Environment: | |||||||||||||
| Last Closed: | 2011-08-14 11:45:29 UTC | Type: | --- | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Embargoed: | |||||||||||||||
| Bug Depends On: | |||||||||||||||
| Bug Blocks: | 533192, 566295 | ||||||||||||||
| Attachments: |
|
||||||||||||||
Created attachment 347630 [details]
SystemTap script
Created attachment 347631 [details]
SystemTap output when problem does not occur
This is the output from the above SystemTap script when "logout" is printed on the telnet client. Note the following bit near the end of the log:
tty_write: bash (16233) writing to pts2 (packet=0)
7 bytes: logout\n
write_chan: bash (16233) writing to pts2 (packet=0)
7 bytes: logout\n
read_chan: in.telnetd (16231) reading from ptm2 (packet=1)
9 bytes: logout\r\n
tty_read: in.telnetd (16231) reading from ptm2 (packet=1)
9 bytes: logout\r\n
vfs_read: in.telnetd (16231) reading from ptm2 (packet=1)
9 bytes: logout\r\n
pty_close: bash (16233) closing pts2 (packet=0)
pts2 is linked to ptm2 (packet=1)
read_chan: in.telnetd (16231) reading from ptm2 (packet=0)
Error -5
tty_read: in.telnetd (16231) reading from ptm2 (packet=0)
Error -5
vfs_read: in.telnetd (16231) reading from ptm2 (packet=0)
Error -5
pty_close: in.telnetd (16231) closing ptm2 (packet=0)
ptm2 is linked to pts2 (packet=0)
Bash writes "logout\n" (7 bytes) to pts2 which gets read by in.telnetd from ptm2 as "\0logout\r\n" (9 bytes). The system tap script automatically strips the leading '\0' from the string when the tty_struct has packet == 1. After in.telnetd reads the data, bash closes pts2.
Created attachment 347632 [details]
SystemTap output when problem occurs.
This is the output from the above SystemTap script when "logout" is printed on
the telnet client. Note the following bit near the end of the log:
tty_write: bash (16521) writing to pts2 (packet=0)
7 bytes: logout\n
write_chan: bash (16521) writing to pts2 (packet=0)
7 bytes: logout\n
pty_close: bash (16521) closing pts2 (packet=0)
pts2 is linked to ptm2 (packet=1)
read_chan: in.telnetd (16388) reading from ptm2 (packet=0)
8 bytes: logout\r\n
tty_read: in.telnetd (16388) reading from ptm2 (packet=0)
8 bytes: logout\r\n
vfs_read: in.telnetd (16388) reading from ptm2 (packet=0)
8 bytes: logout\r\n
read_chan: in.telnetd (16388) reading from ptm2 (packet=0)
Error -5
tty_read: in.telnetd (16388) reading from ptm2 (packet=0)
Error -5
vfs_read: in.telnetd (16388) reading from ptm2 (packet=0)
Error -5
pty_close: in.telnetd (16388) closing ptm2 (packet=0)
ptm2 is linked to pts2 (packet=0)
This time, bash writes "logout\n" (7 bytes) and then closes pts2, which clears packet mode on ptm2. When in.telnetd reads ptm2 (note that tty->packet == 0), it reads only 8 bytes -- the leading '\0' is missing.
REPRODUCTION STEPS
Here's what I do to reproduce this:
1) On the server, run the following as root:
while /bin/true; do /usr/sbin/in.telnetd -debug 2323; done
2) On the server, compile and execute the SystemTap script as a normal
user who belongs to the stapusr and stapdev groups:
stap -v -g pty.stp.1 2 no
The parameter "2" tells the script to only report on ptys with "2"
in the name (otherwise, you get lots of information on ptys that
you don't care about). This parameter may vary, depending on which
pty in.telnetd is using. The parameter "no" tells the script not
to print backtrace information for each call ("bt" will cause it to
print the bt info).
3) On the client, run the following as a normal user.
telnet <server> 2323
Log in using some user account.
3a) Edit ~/.bash_logout to remove or comment out the
"/usr/bin/clear" command (this makes it possible to see the
logout (or "ougout") message emitted from bash).
3b) Type "exit" and then <ENTER> to log out. What seems to tickle
the problem is the timing between typing "exit" and pressing
<ENTER>. There has to be just a slight delay between the "t"
and <ENTER>. When I'm "in the groove" I can get it to happen
80-90% of the time.
Created attachment 347654 [details]
Expect script to reproduce problem in telnet
Here's an expect script that should reproduce the problem fairly reliably. The values for HOST, PORT, USER, and PASS will need to be set as appropriate. The send_slow value pair reproduced the problem 100% of the time on my test system, although I expect that it may need to be adjusted for other systems.
Created attachment 348349 [details]
Possible patch
For what it's worth, applying the following patch to kernel-2.6.18-128.el5xen results in 0/10 failures with my test script. Without the patch, the test script generates 9/10 failures.
Touching on very old code (since 2.0) seems risky, as it can result on random regressions on other applications, especially since, by applying the proposed patch, the link will stay in packet mode after closing both pty devices. Maybe a proper fix would be to add an open count and reseting packet mode only after having all devices closing the pty link. As this bug is also upstream, I sent an email to LKML asking for opinions: http://lkml.org/lkml/2009/11/11/223 The better is to wait upstream comments, in order to proceed with a fix for this bug. This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. I think we should close this BZ as will not fix. It seems to risky changing the behavior for that on RHEL5, for something that is there since Kernel 2.0. I tried to get some feedback upstream, but it seems that people there is also afraid of such change or they are not caring enough. So, except if one day it rises any serious issue, I suspect that the best is to just don't touch on that. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |
Description of problem: When pty_close() is called to close a pseudo tty device, the packet mode of the linked tty is cleared: http://rhkernel.org/RHEL5+2.6.18-128.1.10.el5/drivers/char/pty.c#L53 This can cause problems with applications (such as telnetd) that are expecting data read from the linked tty to be in packet-mode format. Version-Release number of selected component (if applicable): kernel-2.6.18-128.1.10.el5 How reproducible: 100% of the time. Additional info: The code to clear packet mode was added in 1996 by Theodore Ts'o: http://www.linuxhq.com/kernel/v1.99/13/drivers/char/pty.c http://www.linuxhq.com/kernel/v1.99/13/drivers/char/ChangeLog and it still exists in the latest upstream kernels: http://lxr.linux.no/linux+v2.6.29/drivers/char/pty.c#L39 It seems hard for me to believe that something that has been in the kernel for this long could be a bug, but clearing the packet mode of the linked pty just doesn't seem correct to me. Why should closing one pty change the packet mode of the other pty?