Bug 1295563 - race condition in ksh spawnveg
race condition in ksh spawnveg
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ksh (Show other bugs)
7.2
All Linux
high Severity high
: rc
: ---
Assigned To: Siteshwar Vashisht
BaseOS QE - Apps
: Reopened
Depends On:
Blocks: 1420851 1527400
  Show dependency treegraph
 
Reported: 2016-01-04 16:12 EST by Paulo Andrade
Modified: 2018-05-07 13:06 EDT (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1527400 (view as bug list)
Environment:
Last Closed: 2018-03-13 05:04:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Paulo Andrade 2016-01-04 16:12:35 EST
This can be reproduced easily in rhel7 with the sequence:

$ taskset 1 ksh
$ top

and most times, top will exit with the message:

"""

top: failed tty set: Interrupted system call
"""

ksh can be built with a minor patch using posix_spawn. See
src/lib/libast/comp/spawnveg.c, but that just breaks ksh
completely. It relies in nested calls to sigcritical()

Note that on 4 cpus or more, this issue is "close to impossible"
to reproduce, and could not be reproduced in rhel6, so, the
race condition appears to be somewhat associated with rhel7
kernel.

At first, I suggest changing ksh to be built with
-D_AST_no_spawnveg=1 in CFLAGS to correct the problem.
Comment 3 Siteshwar Vashisht 2016-06-14 05:29:43 EDT
tcsetattr() fails with EINTR if calling process does not belong to foreground process group for terminal file descriptor. 


"top: failed tty set: Interrupted system call" error is generate by below code in top.c (procps-ng package) :

3852    if (-1 == tcsetattr(STDIN_FILENO, TCSAFLUSH, &tmptty))
3853       error_exit(fmtmk(N_fmt(FAIL_tty_set_fmt), strerror(errno)));


I have verified that when 'tcsetattr()' fails with EINTR, return value of 'tcgetpgrp(STDIN_FILENO)' is different from current process group (Return value of 'getpgrp()'). If we explicity set foreground process group for STDIN_FILENO file descriptor by adding following code, top command starts without any issues :

tcsetpgrp(STDIN_FILENO, getpgrp())

So it looks like a race condition in setting new process as foreground process group for the terminal.
Comment 12 Siteshwar Vashisht 2016-08-02 04:41:09 EDT
Earlier upstream discussion http://www.mail-archive.com/ast-developers@research.att.com/msg00718.html
Comment 19 Paulo Andrade 2017-06-26 09:12:10 EDT
Reopening due to another user having a similar issue with the race
condition, when doing "sudo su -", from ksh as user shell, and bash
as root shell. And most times getting a stopped bash shell.
User is using a test package built with -D_AST_no_spawnveg=1 that
corrects the problem.
Comment 20 rajasekar 2017-08-04 15:46:53 EDT
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
$ top

top: failed tty set: Interrupted system call
===========
RHEL 7.3 , i am facing the issue, let me know the fix
Comment 21 Siteshwar Vashisht 2017-08-04 15:58:20 EDT
Currently the fix requires recompiling ksh with -D_AST_no_spawnveg=1 macro.
Comment 22 rajasekar 2017-08-04 16:30:43 EDT
(In reply to Siteshwar Vashisht from comment #21)
> Currently the fix requires recompiling ksh with -D_AST_no_spawnveg=1 macro.

=============
Please provide me the detailed steps to fix this issue
Comment 23 Paulo Andrade 2017-08-07 08:26:30 EDT
  You need to rebuild the ksh rpm.

  After unpacking the .src.rpm, edit the ksh.spec file to change the line:

export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT"

to

export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT -D_AST_no_spawnveg=1"

and run rpmbuild.
Comment 24 rajasekar 2017-08-07 13:41:55 EDT
(In reply to Paulo Andrade from comment #23)
>   You need to rebuild the ksh rpm.
> 
>   After unpacking the .src.rpm, edit the ksh.spec file to change the line:
> 
> export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT"
> 
> to
> 
> export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT
> -D_AST_no_spawnveg=1"
> 
> and run rpmbuild.

=============================
send me the link to download the source rpm, please find the current rpm in server.

$ rpm -qa | grep -i ksh
ksh-20120801-34.el7.x86_64
Comment 25 Paulo Andrade 2017-08-07 14:04:16 EDT
Please try this command:

$ yumdownloader --source ksh-20120801-34.el7.x86_64
Comment 26 rajasekar 2017-08-07 16:22:00 EDT
I downloaded the source rpm and unpacked and updated the ksh.spec under /root/rpmbuild/SPECS and ran rpmbuild. still the since persist while running top command as normal user
[root@SPECS]# pwd
/root/rpmbuild/SPECS
[root@SPECS]# rpmbuild
[root@SPECS]# ls
ksh.spec
[root@SPECS]# grep export ksh.spec
export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT -D_AST_no_spawnveg=1"
export CC=gcc
export SHELL=$(ls $(pwd)/arch/*/bin/ksh)
- exporting fixed with variable corrupted its data (#1192026)
[root@SPECS]#
===================
$ top

top: failed tty set: Interrupted system call
Comment 27 Paulo Andrade 2017-08-07 17:00:55 EDT
  You should provide some information on how did you rebuild
the package, and how did you install it.

  I suggest making sure to add some tag to the rpm "Release",
what automatically makes it newer, and can be used to bump
on different rebuilds. Otherwise, it is easy to get confused.
Comment 28 rajasekar 2017-08-07 17:13:03 EDT
I just downloaded the rpm from redhat site and ran these command to find the path

#rpm -vv -Uvh ksh-20120801-34.el7.src.rpm
I found the file under /root/rpmbuild/SPECS/ksh.spec , I edited the file as per your advise.
and ran rpmbuild command.
#rpmbuild 

==========
Please correct me if I am missing anything here.
Comment 29 Paulo Andrade 2017-08-16 08:40:40 EDT
Hi Rajasekar,

Please comment in the related customer portal case.
Comment 30 Torbjörn Björklund 2017-09-29 08:28:00 EDT
This just turned out to be a problem for us as well. We usually don't have systems with only one cpu but recently installed a bunch of training systems for a WCS/WMS system and didn't need more cpu resources.

Most of us use bash so we hadn't noticed, but the supplier of the software uses ksh and ran into this problem. I've asked if using bash is possible for him, otherwise I guess just raising the cpu count will have to work.
Comment 38 Siteshwar Vashisht 2018-05-07 13:06:06 EDT
For the record, this issue is being discussed on GitHub[1] and I have already switched to fork()/exec() in upstream[2]. If anyone on this bug would like to try out latest development builds, they are available in copr[3].

[1] https://github.com/att/ast/issues/468
[2] https://github.com/att/ast/pull/470
[3] https://copr.fedorainfracloud.org/coprs/g/ksh/latest/

Note You need to log in before you can comment on or make changes to this bug.