Bug 1218424
Summary: | infinite loop, at 100% cpu in ssh if ^Z is pressed at password prompt | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Paulo Andrade <pandrade> | |
Component: | openssh | Assignee: | Jakub Jelen <jjelen> | |
Status: | CLOSED ERRATA | QA Contact: | Stefan Dordevic <sdordevi> | |
Severity: | low | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 6.6 | CC: | dtucker, jjelen, nmavrogi, pandrade, sdordevi, svashisht, szidek | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | ||||
Fixed In Version: | openssh-5.3p1-120.el6 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1402424 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-21 10:01:26 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1172231, 1269194, 1402424 |
Description
Paulo Andrade
2015-05-04 20:53:23 UTC
Some of the comments after steps to reproduce are actually incorrect. The conditions to reproduce the bug is actually having the global variable posixly_correct set to a non zero value. posixly_correct is only set, by default, if argv[0] is either "sh" or "-sh". The comment about setting POSIXLY_CORRECT in the ssh environment is incorrect. It was likely one of the cases it did work, as even in the environment the problem happens, sometimes it works, likely due to order if signals, that may cause it to not start an infinite loop, triggered by attempting to write to the tty, and after receiving the signal, raising it again. This problem happens also in rhel7, but is hard to reproduce (happens like 1 in 5+ tries). On rhel 6.6 it happens all the time. I reduced it to, testing posixly_correct code path, to only need this to prevent the problem on rhel 6.6: $ . /etc/profile.d/lang.sh Something that should be useful to note, is that when it does not enter the infinite loop in ssh, it just cancels/kills sftp with ^C. When the problem happens, ^C does not cancel/kill the sftp command; regardless of having pressed ^Z and then run "fg", or pressing ^C in the first prompt. So, the problem may actually be in readline, and, maybe, still not verified, the "solution" would be, if in posix mode, do not use/initialize readline. This corrects the problem, but should be more of a hint of where to look to correct the problem: ---8<--- diff -up bash-4.1/bashline.c.orig bash-4.1/bashline.c --- bash-4.1/bashline.c.orig 2015-05-18 15:22:24.420999898 -0300 +++ bash-4.1/bashline.c 2015-05-18 15:32:29.271000816 -0300 @@ -370,6 +370,8 @@ initialize_readline () return; rl_terminal_name = get_string_value ("TERM"); + if (!rl_terminal_name) + rl_terminal_name = "vt100"; rl_instream = stdin; rl_outstream = stderr; ---8<--- The readline fallback is "dumb", but that apparently does not create enough defaults for terminal handling. With the vt100 default, ^C kills the sftp password prompt, and it does not go 100% cpu if ^Z is pressed in the password prompt. I am able to reproduce this issue. This is how the backtrace for ssh process looks like when this issue happens : (gdb) bt #0 0x00007fd3fa228048 in tcsetattr (fd=4, optional_actions=<value optimized out>, termios_p=0x7ffc61354560) at ../sysdeps/unix/sysv/linux/tcsetattr.c:84 #1 0x00007fd3fc66ffed in readpassphrase (prompt=0x7ffc61354ea0 "temp.36.221's password: ", buf=0x7ffc61354a50 "", bufsiz=<value optimized out>, flags=2) at readpassphrase.c:143 #2 0x00007fd3fc657f4c in read_passphrase (prompt=0x7ffc61354ea0 "temp.36.221's password: ", flags=0) at readpass.c:153 #3 0x00007fd3fc63b360 in userauth_passwd (authctxt=0x7ffc61355010) at sshconnect2.c:967 #4 0x00007fd3fc63c62d in userauth (authctxt=0x7ffc61355010, authlist=0x7fd3fd79ac00 "publickey,gssapi-keyex,gssapi-with-mic,password") at sshconnect2.c:468 #5 0x00007fd3fc65fc73 in dispatch_run (mode=0, done=0x7ffc61355038, ctxt=0x7ffc61355010) at dispatch.c:98 #6 0x00007fd3fc63d6fd in ssh_userauth2 (local_user=0x7fd3fd791c60 "root", server_user=0x7fd3fd77c752 "temp", host=0x7fd3fd796510 "172.16.36.221", sensitive=0x7fd3fc8923a0) at sshconnect2.c:432 #7 0x00007fd3fc6386fc in ssh_login (sensitive=0x7fd3fc8923a0, orighost=<value optimized out>, hostaddr=0x7fd3fc8923c0, pw=<value optimized out>, timeout_ms=-1000) at sshconnect.c:1138 #8 0x00007fd3fc62e7de in main (ac=<value optimized out>, av=<value optimized out>) at ssh.c:904 ssh process is trying to set terminal attributes for fd=4 which refers to "/dev/tty" : (gdb) frame 1 #1 0x00007fd3fc66ffed in readpassphrase (prompt=0x7ffc61354ea0 "temp.36.221's password: ", buf=0x7ffc61354a50 "", bufsiz=<value optimized out>, flags=2) at readpassphrase.c:143 143 while (tcsetattr(input, _T_FLUSH, &oterm) == -1 && (gdb) p input $1 = 4 However since ssh process is in background it keeps receiving 'SIGTTOU' (background processes can not set terminal attributes) and returns with errno = EINTR. It is stuck in below loop : 143 while (tcsetattr(input, _T_FLUSH, &oterm) == -1 && 144 errno == EINTR) 145 continue; ssh process is back to normal cpu usage when it is brought to foreground. This issue is not specific to bash and happens with ksh too. I would like somebody from openssh team to look at if it could be considered a bug with openssh. Thank you for a verbose analysis. I see the same code in openssh upstream, so it should be applicable to RHEL7 and Fedora. Though not sure how to correctly resolve such a problem, especially, when it is such a corner case. Also the impact for customer does not look very critical. The behavior on the bash side is most likely intended. The OpenSSH part is in the OpenBSD (compat) code and not sure how likely to change. I will check that tomorrow, what we can do. Sorry for a late reply. I posted the bug upstream. The idea is probably to check that SIGTTOU signal was caught and we should not cycle anymore. With something like this we should make it working: --- openssh-7.3p1/openbsd-compat/readpassphrase.c.patch 2016-09-27 11:36:46.801980295 +0200 +++ openssh-7.3p1/openbsd-compat/readpassphrase.c 2016-09-27 11:38:11.161970239 +0200 @@ -157,7 +157,7 @@ restart: /* Restore old terminal settings and signals. */ if (memcmp(&term, &oterm, sizeof(term)) != 0) { while (tcsetattr(input, _T_FLUSH, &oterm) == -1 && - errno == EINTR) + errno == EINTR && signo[SIGTTOU] != 1) continue; } (void)sigaction(SIGALRM, &savealrm, NULL); as soon as we will have upstream opinion on this bug, we can consider backporting to RHEL6. This should be fixed by the upstream change in https://bugzilla.mindrot.org/show_bug.cgi?id=2619 (https://anongit.mindrot.org/openssh.git/commit/?id=12069e56221de207ed666c2449dedb431a2a7ca2) Thank you Darren for looking into that. I was certainly searching for the difference against OpenBSD sources, but I probably didn't find recent OpenBSD repository. Do you have some link to CVS or HTTP version of it? All of the files that we (try to) keep in sync have a marker like this denoting the upstream file: /* OPENBSD ORIGINAL: lib/libc/gen/readpassphrase.c */ In this case the upstream file is here: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/gen/readpassphrase.c Any one of the anonymous CVS servers from https://www.openbsd.org/anoncvs.html will also have it. Thanks Darren once more :) Good catch Paulo. It seems like we are hitting some another race condition now. In some cases, after returning to the foreground, we are not receiving the prompt back from the sftp, because the process is blocked in the kill() call: #0 0x00007f7b1ce1d8c7 in kill () from /lib64/libc.so.6 #1 0x00007f7b1f3130c5 in readpassphrase (prompt=0x7ffce5c7c1a0 "test@localhost's password: ", buf=0x7ffce5c7bd50 "", bufsiz=<value optimized out>, flags=2) at readpassphrase.c:182 #2 0x00007f7b1f2faf0c in read_passphrase (prompt=0x7ffce5c7c1a0 "test@localhost's password: ", This is related to the part of the (Linux) man 3p kill: > If the value of pid causes sig to be generated for the sending process, > and if sig is not blocked for the calling thread and if no other thread has > sig unblocked or is waiting in a sigwait() function for sig, > either sig or at least one pending unblocked signal shall be delivered to > the sending thread **before kill() returns**. This is happening in RHEL6, but it will probably be in all Linuxes and POSIX systems, but not in OpenBSD [1] if I see right. This might need some tweaking for the portable version. I will try to investigate further what can we do about it tomorrow. [1] http://man.openbsd.org/kill.2 I'll see if I can reproduce that, but it won't be for a day or two. Can get get the signal kill is trying to send out for frame 0? (In reply to Darren Tucker from comment #21) > I'll see if I can reproduce that, but it won't be for a day or two. Can get > get the signal kill is trying to send out for frame 0? In previous build, it was optimized out. Non optimized build it points to the signal 22: (gdb) f 1 #1 0x00007fdd560accac in readpassphrase (prompt=0x7fff5a867ef0 "test@localhost's password: ", buf=0x7fff5a867aa0 "", bufsiz=1024, flags=2) at readpassphrase.c:182 182 kill(getpid(), i); (gdb) p i $1 = 22 Therefore our discussed SIGTOU: /usr/include/bits/signum.h #define SIGTTOU 22 /* Background write to tty (POSIX). */ FYI: I just added a patch to the upstream bug that I think will resolve this. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0641.html |