Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1615467

Summary: Tee hangs with EAGAIN error when logging a telnet session
Product: Red Hat Enterprise Linux 7 Reporter: Facundo Jaspe <fjaspe>
Component: telnetAssignee: Michal Ruprich <mruprich>
Status: CLOSED WONTFIX QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: medium    
Version: 7.5CC: fjaspe, fkrska, kdudka, mruprich, rmetrich, thozza
Target Milestone: rcKeywords: Reproducer
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-11 08:44:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1643104, 1709724, 1757052, 1780662    
Attachments:
Description Flags
[PATCH] tee: recover from EAGAIN returned from fwrite()
none
[PATCH v2] tee: recover from EAGAIN returned from fwrite()
none
[PATCH v3] tee: handle EAGAIN returned from fwrite/fclose() none

Description Facundo Jaspe 2018-08-13 16:32:14 UTC
Description of problem:

When a telnet session is opened to a telnet server (it can be RHEL, or any telnet-enabled device), and tee is used through a pipe to log the session on a text file, printing a long output causes tee to hang its printing to stdout, causing the appeareance that the session has hanged. However, the telnet session is still active and tee is still logging further interactions, despite these not being visible through stdout. 

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux, any version

Tested in RHEL 7.5 with:
telnet-0.17-64.el7.x86_64
coreutils-8.22-21.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Set up a Telnet server of any kind
2. From a RHEL system, run "telnet lab.example.com | tee -a some_log"
3. Running any command that returns a long output, printing to stdout will hang

Actual results:
Printing to stoud will hang, whereas the telnet session is still active and tee still printing to some_log

Expected results:
Stdout should continue to print information on screen

Additional info:
I am unsure as to the exact length of output that will cause tee to hang. In my lab, I tried running "cat /var/log/messages" and it did not hang, but running "cat /var/log/messages-20180723" causes it to hang:

# ll /var/log/messages
-rw-------. 1 root root 100364 Aug 13 13:20 /var/log/messages

# ll /var/log/messages-20180723 
-rw-------. 1 root root 327098 Jul 23 15:35 /var/log/messages-20180723

It seems running strace on this affects in some way the reproductibility of this issue, but when it hangs nonetheless, EAGAIN errors are seen:

14211 15:15:47.120582 write(1, "  PBBL                         \r\nNTPT         0%              0/   800a2           NTPT task                    \r\nNQAF         0%              0/       0           NQAFTPR                      \r\nRBSQ         0%              0/       0           RBSQ       "..., 1024) = -1 EAGAIN (Resource temporarily unavailable) <0.000022>

stdbuf was attempted to be used to modify the stdout buffers, but it seems "tee" will override whatever is being indicated by stdbuf:

from man (1) stdbuf:

NOTE: If COMMAND adjusts the buffering of its standard streams ('tee' does for e.g.) then that will override corresponding settings changed by 'stdbuf'.

Comment 2 Kamil Dudka 2018-08-14 08:47:27 UTC
What is connected to stdout of tee?  Some terminal?

Did you try a different kind of terminal?

Does it hang if you redirect the output of tee to a file?

Comment 3 Facundo Jaspe 2018-08-14 22:22:17 UTC
> What is connected to stdout of tee?  Some terminal?

Any terminal. It can be anything from Terminator, to PuTTY, to VMware ESX console viewer. It doesn't seem to be dependent on that

> Does it hang if you redirect the output of tee to a file?

You mean, other than by running tee with "-a some_log"? For instance:

# telnet 192.168.122.12 | tee -a some_log_5000 > redirect_log_3

I hadn't thought of that, but I tried now and it seems it prints out all the output correctly vs. not redirecting it:

< "rhelbase" is the telnet client, "rhel6base" is the telnet server >

Checking out "redirect_log_3" on telnet client:

[root@rhelbase ~]# tail -n5 redirect_log_3 
Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat Developer Tools (for RHEL Server for ARM)'
Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat S-JIS Support (for RHEL Server)'
Jul 23 15:35:11 rhel6base rsyslogd-2177: imuxsock begins to drop messages from pid 1853 due to rate-limiting
[root@rhel6base ~]# logout
[sysadmin@rhel6base ~]$ logout

Checking out the last few lines of "messages-20180723" in the telnet server, it matches with what was redirected to redirect_log_3:

[root@rhel6base ~]# tail -n3 /var/log/messages-20180723 
Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat Developer Tools (for RHEL Server for ARM)'
Jul 23 15:35:11 rhel6base subscription-manager: Removed subscription for product 'Red Hat S-JIS Support (for RHEL Server)'
Jul 23 15:35:11 rhel6base rsyslogd-2177: imuxsock begins to drop messages from pid 1853 due to rate-limiting

Versus what tee prints to output when its output is not redirected:

[root@rhelbase ~]# telnet 192.168.122.12 | tee -a some_log_6000
[...]
[root@rhel6base ~]# cat /var/log/messages-20180723
[...]
May 22 15:52:30 rhel6base kernel: regulator: core version 0.5
May 22 15:52:30 rhel6base kernel: NET: Registered protocol family 16
May 22 15:52:30 rhel6base kernel: ACPI: bus type pci registered
May 22 15:52:30 rhel6base kernel: PCI: Using configuration type 1 for base access
May 2   <stdout dies here, even though some_log_6000 shows the entire output>
tee: write error   <After running ctrl^D>

Checking "some_log_6000" (the file to which tee appehends text with "-a"):

[root@rhelbase ~]# tail -n5 some_log_6000 
Aug 14 19:18:13 rhel6base xinetd[1638]: START: telnet pid=2336 from=::ffff:192.168.122.11
Aug 14 19:18:41 rhel6base xinetd[1638]: EXIT: telnet status=0 pid=2336 duration=28(sec)
Aug 14 19:19:22 rhel6base xinetd[1638]: START: telnet pid=2391 from=::ffff:192.168.122.11
[root@rhel6base ~]# logout
[sysadmin@rhel6base ~]$ logout

Comment 4 Kamil Dudka 2018-08-15 11:17:10 UTC
Thanks!  I was able to reproduce it locally.  At first glance, it does not seem that 'tee' is doing anything wrong.  At some point, it is just not able to write to stdout any more.

I think it is telnet not working very well if its output is redirected.  See for example bug #1616229.  Could you please try to use the 'script' utility instead?

$ script -c 'telnet ...' log_file

Comment 5 Facundo Jaspe 2018-08-15 18:03:00 UTC
The issue is not reproduced with script or other logging tools. Which is the reason (my reasoning might be being a bit basic, I'll admit) that there could have been something wrong with "tee".

We did test this with the customer. However, "tee" and "telnet" seem to be integrated into their networking tools/scripts (I reproduced this issue with a telnet server on a RHEL box, but they use this method to access some switches and other network equipments)

Comment 6 Kamil Dudka 2018-08-16 06:44:15 UTC
(In reply to Facundo Jaspe from comment #5)
> The issue is not reproduced with script or other logging tools.

The key difference is that 'script' emulates a terminal whereas 'tee' does not:

# script -c 'tty <&1' log_file
Script started, file is log_file
/dev/pts/1
Script done, file is log_file

# tty <&1 | tee -a log_file 
not a tty


I believe that you will encounter similar issues when you pipe telnet's output to any other tool that does not emulate a terminal (like the 'cat' example in bug #1616229).  So this could hardly be fixed by patching the source code of 'tee'.

On the other hand, I have not yet been able to find the exact cause of the failed writes on tee's output.

Comment 7 Kamil Dudka 2018-08-16 11:55:28 UTC
I believe that I finally understand what happens:

- 'tee' expects the output descriptors to operate in blocking mode, so it does not handle EAGAIN at all.

- 'telnet' calls ioctl(stdin, FIONBIO, [1]) while its input is connected to the same terminal as tee's output.  This *seems* to cause the O_NONBLOCK flag to be silently set on tee's output.

I will attach a patch that makes 'tee' recover from EAGAIN but I am not convinced that 'tee' is the correct place to fix this bug.  Clearly, the same problem may happen with other tools that expect fwrite() to block instead of returning EAGAIN.

If I comment out the ioctl(stdin, FIONBIO, [1]) call in telnet, the problem goes away, too.  Maybe this could be a more generic fix for this?

Comment 8 Kamil Dudka 2018-08-16 11:56:18 UTC
Created attachment 1476411 [details]
[PATCH] tee: recover from EAGAIN returned from fwrite()

tee expects the output descriptors to operate in blocking mode but this
assumption might be invalidated by other programs connected to the same
terminal, as in the following example:

$ telnet ... | tee log_file

telnet calls ioctl(stdin, FIONBIO, [1]), which causes the O_NONBLOCK
flag to be set on tee's output, which is connected to the same terminal.

This patch has zero impact unless EAGAIN returns from fwrite().  In that
case we try to reset the O_NONBLOCK flag on the output file descriptor
and retry the call to fwrite().

Comment 9 Kamil Dudka 2018-08-16 13:53:37 UTC
Created attachment 1476430 [details]
[PATCH v2] tee: recover from EAGAIN returned from fwrite()

Slightly simplified version of the original patch.  Should work equally.

Comment 10 Facundo Jaspe 2018-08-16 17:40:31 UTC
Awesome, thanks Kamil.

Can we build a test rpm from that patch?

Thanks

Comment 15 Kamil Dudka 2018-09-06 14:35:44 UTC
Patch posted upstream:

http://lists.gnu.org/archive/html/coreutils/2018-09/msg00000.html

Comment 16 Kamil Dudka 2018-11-01 14:50:55 UTC
Upstream has been silent on this for a while.  I sent them a reminder today:

http://lists.gnu.org/archive/html/coreutils/2018-11/msg00000.html

Comment 17 Kamil Dudka 2018-11-05 13:04:44 UTC
GNU coreutils upstream is not going to merge the proposed patch:

http://lists.gnu.org/archive/html/coreutils/2018-11/msg00005.html

They had suggested the following options to work around the issue:

1. If one wants simple line oriented connection use netcat or equivalent.

2. Use script(1) which is more suited to recording terminal sessions.

3. Adjust telnet to not use FIONBIO when in line mode.

4. Avoid the tee apparent hang by using the --output-error=exit option.

Comment 18 Kamil Dudka 2018-11-05 13:08:59 UTC
Created attachment 1501868 [details]
[PATCH v3] tee: handle EAGAIN returned from fwrite/fclose()

'tee' expected the output file descriptors to operate in blocking mode
but this assumption might be invalidated by other programs connected to
the same terminal, as in the following example:

$ telnet ... | tee log_file

telnet calls ioctl(stdin, FIONBIO, [1]), which causes the O_NONBLOCK
flag to be set on tee's output, which is connected to the same terminal.

This patch has zero impact unless EAGAIN returns from fwrite/fclose().
In that case 'tee' waits for the underlying file descriptor to become
writable again and then it writes the remaining data.

Comment 19 Facundo Jaspe 2018-11-05 15:41:58 UTC
Unfortunate, but understandable. Thanks a lot for your assistance Kamil.

One question though, what is being meant with "Adjust telnet to not use FIONBIO when in line mode"? I find no mention of this in telnet's manpage in order to provide the customer with a specific command line example of this usage of telnet

Is option 4 only meant to change the default behaviour from "apparent hang" to "exit"? I'm afraid that wouldn't count as a workaround to the customer's intent of visualizing the entirety of any given command output.

Comment 20 Kamil Dudka 2018-11-05 15:54:11 UTC
(In reply to Facundo Jaspe from comment #19)
> One question though, what is being meant with "Adjust telnet to not use
> FIONBIO when in line mode"? I find no mention of this in telnet's manpage in
> order to provide the customer with a specific command line example of this
> usage of telnet

There is no such option yet.  It would require code change in the telnet package as I understand it.  Please feel free to reassign this bug to telnet if this is the preferred option for the customer.

Comment 22 Kamil Dudka 2018-11-19 13:01:53 UTC
Per comment #17, there is no way to get this fixed in GNU coreutils upstream.  The only way to move this forward would be to apply the patch downstream, which I would like to avoid if possible.

I am switching this bug to telnet to evaluate the possibility of getting this fixed in the telnet package instead.  The solution would be to (optionally?) skip enabling of the non-blocking file descriptor mode when telnet operates in line mode.

Comment 29 RHEL Program Management 2020-02-11 08:44:39 UTC
Development Management has reviewed and declined this request. You may appeal this decision by using your Red Hat support channels, who will make certain  the issue receives the proper prioritization with product and development management.

https://www.redhat.com/support/process/production/#howto

Comment 30 Kamil Dudka 2023-03-09 07:26:39 UTC
A patch to handle non-blocking file descriptors in tee(1) landed in GNU coreutils upstream:
https://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v9.1-193-g16000805eb1