Bug 1615467
| Summary: | Tee hangs with EAGAIN error when logging a telnet session | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Facundo Jaspe <fjaspe> |
| Component: | telnet | Assignee: | Michal Ruprich <mruprich> |
| Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-daemons |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 7.5 | CC: | fjaspe, fkrska, kdudka, mruprich, rmetrich, thozza |
| Target Milestone: | rc | Keywords: | Reproducer |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-02-11 08:44:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1643104, 1709724, 1757052, 1780662 | ||
| Attachments: | |||
What is connected to stdout of tee? Some terminal? Did you try a different kind of terminal? Does it hang if you redirect the output of tee to a file? > What is connected to stdout of tee? Some terminal? Any terminal. It can be anything from Terminator, to PuTTY, to VMware ESX console viewer. It doesn't seem to be dependent on that > Does it hang if you redirect the output of tee to a file? You mean, other than by running tee with "-a some_log"? For instance: # telnet 192.168.122.12 | tee -a some_log_5000 > redirect_log_3 I hadn't thought of that, but I tried now and it seems it prints out all the output correctly vs. not redirecting it: < "rhelbase" is the telnet client, "rhel6base" is the telnet server > Checking out "redirect_log_3" on telnet client: [root@rhelbase ~]# tail -n5 redirect_log_3 Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat Developer Tools (for RHEL Server for ARM)' Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat S-JIS Support (for RHEL Server)' Jul 23 15:35:11 rhel6base rsyslogd-2177: imuxsock begins to drop messages from pid 1853 due to rate-limiting [root@rhel6base ~]# logout [sysadmin@rhel6base ~]$ logout Checking out the last few lines of "messages-20180723" in the telnet server, it matches with what was redirected to redirect_log_3: [root@rhel6base ~]# tail -n3 /var/log/messages-20180723 Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat Developer Tools (for RHEL Server for ARM)' Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat S-JIS Support (for RHEL Server)' Jul 23 15:35:11 rhel6base rsyslogd-2177: imuxsock begins to drop messages from pid 1853 due to rate-limiting Versus what tee prints to output when its output is not redirected: [root@rhelbase ~]# telnet 192.168.122.12 | tee -a some_log_6000 [...] [root@rhel6base ~]# cat /var/log/messages-20180723 [...] May 22 15:52:30 rhel6base kernel: regulator: core version 0.5 May 22 15:52:30 rhel6base kernel: NET: Registered protocol family 16 May 22 15:52:30 rhel6base kernel: ACPI: bus type pci registered May 22 15:52:30 rhel6base kernel: PCI: Using configuration type 1 for base access May 2 <stdout dies here, even though some_log_6000 shows the entire output> tee: write error <After running ctrl^D> Checking "some_log_6000" (the file to which tee appehends text with "-a"): [root@rhelbase ~]# tail -n5 some_log_6000 Aug 14 19:18:13 rhel6base xinetd[1638]: START: telnet pid=2336 from=::ffff:192.168.122.11 Aug 14 19:18:41 rhel6base xinetd[1638]: EXIT: telnet status=0 pid=2336 duration=28(sec) Aug 14 19:19:22 rhel6base xinetd[1638]: START: telnet pid=2391 from=::ffff:192.168.122.11 [root@rhel6base ~]# logout [sysadmin@rhel6base ~]$ logout Thanks! I was able to reproduce it locally. At first glance, it does not seem that 'tee' is doing anything wrong. At some point, it is just not able to write to stdout any more. I think it is telnet not working very well if its output is redirected. See for example bug #1616229. Could you please try to use the 'script' utility instead? $ script -c 'telnet ...' log_file The issue is not reproduced with script or other logging tools. Which is the reason (my reasoning might be being a bit basic, I'll admit) that there could have been something wrong with "tee". We did test this with the customer. However, "tee" and "telnet" seem to be integrated into their networking tools/scripts (I reproduced this issue with a telnet server on a RHEL box, but they use this method to access some switches and other network equipments) (In reply to Facundo Jaspe from comment #5) > The issue is not reproduced with script or other logging tools. The key difference is that 'script' emulates a terminal whereas 'tee' does not: # script -c 'tty <&1' log_file Script started, file is log_file /dev/pts/1 Script done, file is log_file # tty <&1 | tee -a log_file not a tty I believe that you will encounter similar issues when you pipe telnet's output to any other tool that does not emulate a terminal (like the 'cat' example in bug #1616229). So this could hardly be fixed by patching the source code of 'tee'. On the other hand, I have not yet been able to find the exact cause of the failed writes on tee's output. I believe that I finally understand what happens: - 'tee' expects the output descriptors to operate in blocking mode, so it does not handle EAGAIN at all. - 'telnet' calls ioctl(stdin, FIONBIO, [1]) while its input is connected to the same terminal as tee's output. This *seems* to cause the O_NONBLOCK flag to be silently set on tee's output. I will attach a patch that makes 'tee' recover from EAGAIN but I am not convinced that 'tee' is the correct place to fix this bug. Clearly, the same problem may happen with other tools that expect fwrite() to block instead of returning EAGAIN. If I comment out the ioctl(stdin, FIONBIO, [1]) call in telnet, the problem goes away, too. Maybe this could be a more generic fix for this? Created attachment 1476411 [details]
[PATCH] tee: recover from EAGAIN returned from fwrite()
tee expects the output descriptors to operate in blocking mode but this
assumption might be invalidated by other programs connected to the same
terminal, as in the following example:
$ telnet ... | tee log_file
telnet calls ioctl(stdin, FIONBIO, [1]), which causes the O_NONBLOCK
flag to be set on tee's output, which is connected to the same terminal.
This patch has zero impact unless EAGAIN returns from fwrite(). In that
case we try to reset the O_NONBLOCK flag on the output file descriptor
and retry the call to fwrite().
Created attachment 1476430 [details]
[PATCH v2] tee: recover from EAGAIN returned from fwrite()
Slightly simplified version of the original patch. Should work equally.
Awesome, thanks Kamil. Can we build a test rpm from that patch? Thanks Patch posted upstream: http://lists.gnu.org/archive/html/coreutils/2018-09/msg00000.html Upstream has been silent on this for a while. I sent them a reminder today: http://lists.gnu.org/archive/html/coreutils/2018-11/msg00000.html GNU coreutils upstream is not going to merge the proposed patch: http://lists.gnu.org/archive/html/coreutils/2018-11/msg00005.html They had suggested the following options to work around the issue: 1. If one wants simple line oriented connection use netcat or equivalent. 2. Use script(1) which is more suited to recording terminal sessions. 3. Adjust telnet to not use FIONBIO when in line mode. 4. Avoid the tee apparent hang by using the --output-error=exit option. Created attachment 1501868 [details]
[PATCH v3] tee: handle EAGAIN returned from fwrite/fclose()
'tee' expected the output file descriptors to operate in blocking mode
but this assumption might be invalidated by other programs connected to
the same terminal, as in the following example:
$ telnet ... | tee log_file
telnet calls ioctl(stdin, FIONBIO, [1]), which causes the O_NONBLOCK
flag to be set on tee's output, which is connected to the same terminal.
This patch has zero impact unless EAGAIN returns from fwrite/fclose().
In that case 'tee' waits for the underlying file descriptor to become
writable again and then it writes the remaining data.
Unfortunate, but understandable. Thanks a lot for your assistance Kamil. One question though, what is being meant with "Adjust telnet to not use FIONBIO when in line mode"? I find no mention of this in telnet's manpage in order to provide the customer with a specific command line example of this usage of telnet Is option 4 only meant to change the default behaviour from "apparent hang" to "exit"? I'm afraid that wouldn't count as a workaround to the customer's intent of visualizing the entirety of any given command output. (In reply to Facundo Jaspe from comment #19) > One question though, what is being meant with "Adjust telnet to not use > FIONBIO when in line mode"? I find no mention of this in telnet's manpage in > order to provide the customer with a specific command line example of this > usage of telnet There is no such option yet. It would require code change in the telnet package as I understand it. Please feel free to reassign this bug to telnet if this is the preferred option for the customer. Per comment #17, there is no way to get this fixed in GNU coreutils upstream. The only way to move this forward would be to apply the patch downstream, which I would like to avoid if possible. I am switching this bug to telnet to evaluate the possibility of getting this fixed in the telnet package instead. The solution would be to (optionally?) skip enabling of the non-blocking file descriptor mode when telnet operates in line mode. Development Management has reviewed and declined this request. You may appeal this decision by using your Red Hat support channels, who will make certain the issue receives the proper prioritization with product and development management. https://www.redhat.com/support/process/production/#howto A patch to handle non-blocking file descriptors in tee(1) landed in GNU coreutils upstream: https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.1-193-g16000805eb1 |
Description of problem: When a telnet session is opened to a telnet server (it can be RHEL, or any telnet-enabled device), and tee is used through a pipe to log the session on a text file, printing a long output causes tee to hang its printing to stdout, causing the appeareance that the session has hanged. However, the telnet session is still active and tee is still logging further interactions, despite these not being visible through stdout. Version-Release number of selected component (if applicable): Red Hat Enterprise Linux, any version Tested in RHEL 7.5 with: telnet-0.17-64.el7.x86_64 coreutils-8.22-21.el7.x86_64 How reproducible: Always Steps to Reproduce: 1. Set up a Telnet server of any kind 2. From a RHEL system, run "telnet lab.example.com | tee -a some_log" 3. Running any command that returns a long output, printing to stdout will hang Actual results: Printing to stoud will hang, whereas the telnet session is still active and tee still printing to some_log Expected results: Stdout should continue to print information on screen Additional info: I am unsure as to the exact length of output that will cause tee to hang. In my lab, I tried running "cat /var/log/messages" and it did not hang, but running "cat /var/log/messages-20180723" causes it to hang: # ll /var/log/messages -rw-------. 1 root root 100364 Aug 13 13:20 /var/log/messages # ll /var/log/messages-20180723 -rw-------. 1 root root 327098 Jul 23 15:35 /var/log/messages-20180723 It seems running strace on this affects in some way the reproductibility of this issue, but when it hangs nonetheless, EAGAIN errors are seen: 14211 15:15:47.120582 write(1, " PBBL \r\nNTPT 0% 0/ 800a2 NTPT task \r\nNQAF 0% 0/ 0 NQAFTPR \r\nRBSQ 0% 0/ 0 RBSQ "..., 1024) = -1 EAGAIN (Resource temporarily unavailable) <0.000022> stdbuf was attempted to be used to modify the stdout buffers, but it seems "tee" will override whatever is being indicated by stdbuf: from man (1) stdbuf: NOTE: If COMMAND adjusts the buffering of its standard streams ('tee' does for e.g.) then that will override corresponding settings changed by 'stdbuf'.