Bug 1218424

Summary:	infinite loop, at 100% cpu in ssh if ^Z is pressed at password prompt
Product:	Red Hat Enterprise Linux 6	Reporter:	Paulo Andrade <pandrade>
Component:	openssh	Assignee:	Jakub Jelen <jjelen>
Status:	CLOSED ERRATA	QA Contact:	Stefan Dordevic <sdordevi>
Severity:	low	Docs Contact:
Priority:	unspecified
Version:	6.6	CC:	dtucker, jjelen, nmavrogi, pandrade, sdordevi, svashisht, szidek
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:	openssh-5.3p1-120.el6	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1402424 (view as bug list)		Environment:
Last Closed:	2017-03-21 10:01:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1172231, 1269194, 1402424

Description Paulo Andrade 2015-05-04 20:53:23 UTC

Steps to reproduce:

1. either change or create a test login with /bin/sh as
   login shell
2. "ssh user@localhost" and login
3. "sftp user@localhost" and press ^Z in the password
   prompt

A few times it will work. It depends a bit on what code
is being executed in the readpassphrase function, at
openbsd-compat/readpassphrase.c in the openssh code.

It has been verified that exec'ing again /bin/sh with
--posix before running sftp, or exporting the environment
variable POSIXLY_CORRECT before the "ssh user@localhost"
step prevents the problem.

So, while the problem appears to be kind of expected, it
is being reported in case it was not meant to happen.

Comment 2 Paulo Andrade 2015-05-05 20:37:09 UTC

Some of the comments after steps to reproduce are
actually incorrect.

The conditions to reproduce the bug is actually
having the global variable posixly_correct set
to a non zero value.

posixly_correct is only set, by default, if
argv[0] is either "sh" or "-sh".

The comment about setting POSIXLY_CORRECT in
the ssh environment is incorrect. It was likely
one of the cases it did work, as even in the
environment the problem happens, sometimes it
works, likely due to order if signals, that
may cause it to not start an infinite loop,
triggered by attempting to write to the tty,
and after receiving the signal, raising it
again.

Comment 3 Paulo Andrade 2015-05-15 21:28:13 UTC

This problem happens also in rhel7, but is hard to
reproduce (happens like 1 in 5+ tries). On rhel 6.6
it happens all the time.

I reduced it to, testing posixly_correct code path,
to only need this to prevent the problem on rhel 6.6:

$ . /etc/profile.d/lang.sh

Something that should be useful to note, is that
when it does not enter the infinite loop in ssh,
it just cancels/kills sftp with ^C. When the problem
happens, ^C does not cancel/kill the sftp command;
regardless of having pressed ^Z and then run "fg",
or pressing ^C in the first prompt.

So, the problem may actually be in readline, and,
maybe, still not verified, the "solution" would be,
if in posix mode, do not use/initialize readline.

Comment 4 Paulo Andrade 2015-05-18 19:38:04 UTC

This corrects the problem, but should be more of
a hint of where to look to correct the problem:
---8<---
diff -up bash-4.1/bashline.c.orig bash-4.1/bashline.c
--- bash-4.1/bashline.c.orig	2015-05-18 15:22:24.420999898 -0300
+++ bash-4.1/bashline.c	2015-05-18 15:32:29.271000816 -0300
@@ -370,6 +370,8 @@ initialize_readline ()
     return;
 
   rl_terminal_name = get_string_value ("TERM");
+  if (!rl_terminal_name)
+    rl_terminal_name = "vt100";
   rl_instream = stdin;
   rl_outstream = stderr;
 
---8<---
The readline fallback is "dumb", but that apparently
does not create enough defaults for terminal handling.
With the vt100 default, ^C kills the sftp password
prompt, and it does not go 100% cpu if ^Z is pressed
in the password prompt.

Comment 9 Siteshwar Vashisht 2016-07-21 10:19:49 UTC

I am able to reproduce this issue. This is how the backtrace for ssh process looks like when this issue happens :

(gdb) bt
#0  0x00007fd3fa228048 in tcsetattr (fd=4, optional_actions=<value optimized out>, termios_p=0x7ffc61354560) at ../sysdeps/unix/sysv/linux/tcsetattr.c:84
#1  0x00007fd3fc66ffed in readpassphrase (prompt=0x7ffc61354ea0 "temp.36.221's password: ", buf=0x7ffc61354a50 "", bufsiz=<value optimized out>, flags=2) at readpassphrase.c:143
#2  0x00007fd3fc657f4c in read_passphrase (prompt=0x7ffc61354ea0 "temp.36.221's password: ", flags=0) at readpass.c:153
#3  0x00007fd3fc63b360 in userauth_passwd (authctxt=0x7ffc61355010) at sshconnect2.c:967
#4  0x00007fd3fc63c62d in userauth (authctxt=0x7ffc61355010, authlist=0x7fd3fd79ac00 "publickey,gssapi-keyex,gssapi-with-mic,password") at sshconnect2.c:468
#5  0x00007fd3fc65fc73 in dispatch_run (mode=0, done=0x7ffc61355038, ctxt=0x7ffc61355010) at dispatch.c:98
#6  0x00007fd3fc63d6fd in ssh_userauth2 (local_user=0x7fd3fd791c60 "root", server_user=0x7fd3fd77c752 "temp", host=0x7fd3fd796510 "172.16.36.221", sensitive=0x7fd3fc8923a0) at sshconnect2.c:432
#7  0x00007fd3fc6386fc in ssh_login (sensitive=0x7fd3fc8923a0, orighost=<value optimized out>, hostaddr=0x7fd3fc8923c0, pw=<value optimized out>, timeout_ms=-1000) at sshconnect.c:1138
#8  0x00007fd3fc62e7de in main (ac=<value optimized out>, av=<value optimized out>) at ssh.c:904


ssh process is trying to set terminal attributes for fd=4 which refers to "/dev/tty" :
(gdb) frame 1
#1  0x00007fd3fc66ffed in readpassphrase (prompt=0x7ffc61354ea0 "temp.36.221's password: ", buf=0x7ffc61354a50 "", bufsiz=<value optimized out>, flags=2) at readpassphrase.c:143
143                     while (tcsetattr(input, _T_FLUSH, &oterm) == -1 &&
(gdb) p input
$1 = 4

However since ssh process is in background it keeps receiving 'SIGTTOU' (background processes can not set terminal attributes) and returns with errno = EINTR. It is stuck in below loop :

143                     while (tcsetattr(input, _T_FLUSH, &oterm) == -1 &&
144                         errno == EINTR)
145                             continue;


ssh process is back to normal cpu usage when it is brought to foreground. This issue is not specific to bash and happens with ksh too. I would like somebody from openssh team to look at if it could be considered a bug with openssh.

Comment 10 Jakub Jelen 2016-07-21 15:13:27 UTC

Thank you for a verbose analysis. I see the same code in openssh upstream, so it should be applicable to RHEL7 and Fedora. Though not sure how to correctly resolve such a problem, especially, when it is such a corner case. Also the impact for customer does not look very critical.

The behavior on the bash side is most likely intended. The OpenSSH part is in the OpenBSD (compat) code and not sure how likely to change. I will check that tomorrow, what we can do.

Comment 12 Jakub Jelen 2016-09-27 11:46:11 UTC

Sorry for a late reply. I posted the bug upstream. The idea is probably to check that SIGTTOU signal was caught and we should not cycle anymore. With something like this we should make it working:

--- openssh-7.3p1/openbsd-compat/readpassphrase.c.patch	2016-09-27 11:36:46.801980295 +0200
+++ openssh-7.3p1/openbsd-compat/readpassphrase.c	2016-09-27 11:38:11.161970239 +0200
@@ -157,7 +157,7 @@ restart:
 	/* Restore old terminal settings and signals. */
 	if (memcmp(&term, &oterm, sizeof(term)) != 0) {
 		while (tcsetattr(input, _T_FLUSH, &oterm) == -1 &&
-		    errno == EINTR)
+		    errno == EINTR && signo[SIGTTOU] != 1)
 			continue;
 	}
 	(void)sigaction(SIGALRM, &savealrm, NULL);

as soon as we will have upstream opinion on this bug, we can consider backporting to RHEL6.

Comment 13 Darren Tucker 2016-10-12 17:19:55 UTC

This should be fixed by the upstream change in https://bugzilla.mindrot.org/show_bug.cgi?id=2619 (https://anongit.mindrot.org/openssh.git/commit/?id=12069e56221de207ed666c2449dedb431a2a7ca2)

Comment 14 Jakub Jelen 2016-10-13 06:17:58 UTC

Thank you Darren for looking into that.
I was certainly searching for the difference against OpenBSD sources, but I probably didn't find recent OpenBSD repository. Do you have some link to CVS or HTTP version of it?

Comment 18 Darren Tucker 2016-10-13 13:25:24 UTC

All of the files that we (try to) keep in sync have a marker like this denoting the upstream file:

/* OPENBSD ORIGINAL: lib/libc/gen/readpassphrase.c */

In this case the upstream file is here:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/gen/readpassphrase.c

Any one of the anonymous CVS servers from https://www.openbsd.org/anoncvs.html will also have it.

Comment 20 Jakub Jelen 2016-10-13 14:45:11 UTC

Thanks Darren once more :)

Good catch Paulo.

It seems like we are hitting some another race condition now. In some cases, after returning to the foreground, we are not receiving the prompt back from the sftp, because the process is blocked in the kill() call:

#0  0x00007f7b1ce1d8c7 in kill () from /lib64/libc.so.6
#1  0x00007f7b1f3130c5 in readpassphrase (prompt=0x7ffce5c7c1a0 "test@localhost's password: ", 
    buf=0x7ffce5c7bd50 "", bufsiz=<value optimized out>, flags=2) at readpassphrase.c:182
#2  0x00007f7b1f2faf0c in read_passphrase (prompt=0x7ffce5c7c1a0 "test@localhost's password: ", 

This is related to the part of the (Linux)  man 3p kill:

> If the value of pid causes sig to be generated for the sending process,
> and if sig is not blocked for the calling thread and if no other thread has
> sig unblocked or is waiting  in  a sigwait() function for sig,
> either sig or at least one pending unblocked signal shall be delivered to
> the sending thread **before kill() returns**.

This is happening in RHEL6, but it will probably be in all Linuxes and POSIX systems, but not in OpenBSD [1] if I see right. This might need some tweaking for the portable version. I will try to investigate further what can we do about it tomorrow.

[1] http://man.openbsd.org/kill.2

Comment 21 Darren Tucker 2016-10-13 14:55:36 UTC

I'll see if I can reproduce that, but it won't be for a day or two.  Can get get the signal kill is trying to send out for frame 0?

Comment 22 Jakub Jelen 2016-10-13 15:32:52 UTC

(In reply to Darren Tucker from comment #21)
> I'll see if I can reproduce that, but it won't be for a day or two.  Can get
> get the signal kill is trying to send out for frame 0?

In previous build, it was optimized out. Non optimized build it points to the signal 22:

(gdb) f 1
#1  0x00007fdd560accac in readpassphrase (prompt=0x7fff5a867ef0 "test@localhost's password: ", 
    buf=0x7fff5a867aa0 "", bufsiz=1024, flags=2) at readpassphrase.c:182
182				kill(getpid(), i);

(gdb) p i
$1 = 22

Therefore our discussed SIGTOU:

/usr/include/bits/signum.h
#define	SIGTTOU		22	/* Background write to tty (POSIX).  */

Comment 23 Darren Tucker 2016-10-14 16:27:12 UTC

FYI: I just added a patch to the upstream bug that I think will resolve this.

Comment 30 errata-xmlrpc 2017-03-21 10:01:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0641.html