Bug 504703
Summary: | pty_close() clears the packet mode of the linked pty | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Bryan Mason <bmason> | ||||||||||||
Component: | kernel | Assignee: | Mauro Carvalho Chehab <mchehab> | ||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | medium | ||||||||||||||
Version: | 5.3 | CC: | arozansk, lwang, moshiro, prarit, tao | ||||||||||||
Target Milestone: | rc | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | |||||||||||||||
: | 566295 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2011-08-14 11:45:29 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 533192, 566295 | ||||||||||||||
Attachments: |
|
Description
Bryan Mason
2009-06-08 21:29:37 UTC
Created attachment 347630 [details]
SystemTap script
Created attachment 347631 [details]
SystemTap output when problem does not occur
This is the output from the above SystemTap script when "logout" is printed on the telnet client. Note the following bit near the end of the log:
tty_write: bash (16233) writing to pts2 (packet=0)
7 bytes: logout\n
write_chan: bash (16233) writing to pts2 (packet=0)
7 bytes: logout\n
read_chan: in.telnetd (16231) reading from ptm2 (packet=1)
9 bytes: logout\r\n
tty_read: in.telnetd (16231) reading from ptm2 (packet=1)
9 bytes: logout\r\n
vfs_read: in.telnetd (16231) reading from ptm2 (packet=1)
9 bytes: logout\r\n
pty_close: bash (16233) closing pts2 (packet=0)
pts2 is linked to ptm2 (packet=1)
read_chan: in.telnetd (16231) reading from ptm2 (packet=0)
Error -5
tty_read: in.telnetd (16231) reading from ptm2 (packet=0)
Error -5
vfs_read: in.telnetd (16231) reading from ptm2 (packet=0)
Error -5
pty_close: in.telnetd (16231) closing ptm2 (packet=0)
ptm2 is linked to pts2 (packet=0)
Bash writes "logout\n" (7 bytes) to pts2 which gets read by in.telnetd from ptm2 as "\0logout\r\n" (9 bytes). The system tap script automatically strips the leading '\0' from the string when the tty_struct has packet == 1. After in.telnetd reads the data, bash closes pts2.
Created attachment 347632 [details]
SystemTap output when problem occurs.
This is the output from the above SystemTap script when "logout" is printed on
the telnet client. Note the following bit near the end of the log:
tty_write: bash (16521) writing to pts2 (packet=0)
7 bytes: logout\n
write_chan: bash (16521) writing to pts2 (packet=0)
7 bytes: logout\n
pty_close: bash (16521) closing pts2 (packet=0)
pts2 is linked to ptm2 (packet=1)
read_chan: in.telnetd (16388) reading from ptm2 (packet=0)
8 bytes: logout\r\n
tty_read: in.telnetd (16388) reading from ptm2 (packet=0)
8 bytes: logout\r\n
vfs_read: in.telnetd (16388) reading from ptm2 (packet=0)
8 bytes: logout\r\n
read_chan: in.telnetd (16388) reading from ptm2 (packet=0)
Error -5
tty_read: in.telnetd (16388) reading from ptm2 (packet=0)
Error -5
vfs_read: in.telnetd (16388) reading from ptm2 (packet=0)
Error -5
pty_close: in.telnetd (16388) closing ptm2 (packet=0)
ptm2 is linked to pts2 (packet=0)
This time, bash writes "logout\n" (7 bytes) and then closes pts2, which clears packet mode on ptm2. When in.telnetd reads ptm2 (note that tty->packet == 0), it reads only 8 bytes -- the leading '\0' is missing.
REPRODUCTION STEPS Here's what I do to reproduce this: 1) On the server, run the following as root: while /bin/true; do /usr/sbin/in.telnetd -debug 2323; done 2) On the server, compile and execute the SystemTap script as a normal user who belongs to the stapusr and stapdev groups: stap -v -g pty.stp.1 2 no The parameter "2" tells the script to only report on ptys with "2" in the name (otherwise, you get lots of information on ptys that you don't care about). This parameter may vary, depending on which pty in.telnetd is using. The parameter "no" tells the script not to print backtrace information for each call ("bt" will cause it to print the bt info). 3) On the client, run the following as a normal user. telnet <server> 2323 Log in using some user account. 3a) Edit ~/.bash_logout to remove or comment out the "/usr/bin/clear" command (this makes it possible to see the logout (or "ougout") message emitted from bash). 3b) Type "exit" and then <ENTER> to log out. What seems to tickle the problem is the timing between typing "exit" and pressing <ENTER>. There has to be just a slight delay between the "t" and <ENTER>. When I'm "in the groove" I can get it to happen 80-90% of the time. Created attachment 347654 [details]
Expect script to reproduce problem in telnet
Here's an expect script that should reproduce the problem fairly reliably. The values for HOST, PORT, USER, and PASS will need to be set as appropriate. The send_slow value pair reproduced the problem 100% of the time on my test system, although I expect that it may need to be adjusted for other systems.
Created attachment 348349 [details]
Possible patch
For what it's worth, applying the following patch to kernel-2.6.18-128.el5xen results in 0/10 failures with my test script. Without the patch, the test script generates 9/10 failures.
Touching on very old code (since 2.0) seems risky, as it can result on random regressions on other applications, especially since, by applying the proposed patch, the link will stay in packet mode after closing both pty devices. Maybe a proper fix would be to add an open count and reseting packet mode only after having all devices closing the pty link. As this bug is also upstream, I sent an email to LKML asking for opinions: http://lkml.org/lkml/2009/11/11/223 The better is to wait upstream comments, in order to proceed with a fix for this bug. This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. I think we should close this BZ as will not fix. It seems to risky changing the behavior for that on RHEL5, for something that is there since Kernel 2.0. I tried to get some feedback upstream, but it seems that people there is also afraid of such change or they are not caring enough. So, except if one day it rises any serious issue, I suspect that the best is to just don't touch on that. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |