Bug 1295563

Summary: race condition in ksh spawnveg
Product: Red Hat Enterprise Linux 7 Reporter: Paulo Andrade <pandrade>
Component: kshAssignee: Siteshwar Vashisht <svashisht>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: high Docs Contact:
Priority: high    
Version: 7.2CC: bs168, dzhukous, fkrska, gpion, jhunt, jkejda, pandrade, rajasekar.m, rmetrich, rmullett, sbeal, svashisht, torbjorn.bjorklund
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1527400 (view as bug list) Environment:
Last Closed: 2020-02-05 13:31:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1420851, 1527400    

Description Paulo Andrade 2016-01-04 21:12:35 UTC
This can be reproduced easily in rhel7 with the sequence:

$ taskset 1 ksh
$ top

and most times, top will exit with the message:

"""

top: failed tty set: Interrupted system call
"""

ksh can be built with a minor patch using posix_spawn. See
src/lib/libast/comp/spawnveg.c, but that just breaks ksh
completely. It relies in nested calls to sigcritical()

Note that on 4 cpus or more, this issue is "close to impossible"
to reproduce, and could not be reproduced in rhel6, so, the
race condition appears to be somewhat associated with rhel7
kernel.

At first, I suggest changing ksh to be built with
-D_AST_no_spawnveg=1 in CFLAGS to correct the problem.

Comment 3 Siteshwar Vashisht 2016-06-14 09:29:43 UTC
tcsetattr() fails with EINTR if calling process does not belong to foreground process group for terminal file descriptor. 


"top: failed tty set: Interrupted system call" error is generate by below code in top.c (procps-ng package) :

3852    if (-1 == tcsetattr(STDIN_FILENO, TCSAFLUSH, &tmptty))
3853       error_exit(fmtmk(N_fmt(FAIL_tty_set_fmt), strerror(errno)));


I have verified that when 'tcsetattr()' fails with EINTR, return value of 'tcgetpgrp(STDIN_FILENO)' is different from current process group (Return value of 'getpgrp()'). If we explicity set foreground process group for STDIN_FILENO file descriptor by adding following code, top command starts without any issues :

tcsetpgrp(STDIN_FILENO, getpgrp())

So it looks like a race condition in setting new process as foreground process group for the terminal.

Comment 12 Siteshwar Vashisht 2016-08-02 08:41:09 UTC
Earlier upstream discussion http://www.mail-archive.com/ast-developers@research.att.com/msg00718.html

Comment 19 Paulo Andrade 2017-06-26 13:12:10 UTC
Reopening due to another user having a similar issue with the race
condition, when doing "sudo su -", from ksh as user shell, and bash
as root shell. And most times getting a stopped bash shell.
User is using a test package built with -D_AST_no_spawnveg=1 that
corrects the problem.

Comment 20 rajasekar 2017-08-04 19:46:53 UTC
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
$ top

top: failed tty set: Interrupted system call
===========
RHEL 7.3 , i am facing the issue, let me know the fix

Comment 21 Siteshwar Vashisht 2017-08-04 19:58:20 UTC
Currently the fix requires recompiling ksh with -D_AST_no_spawnveg=1 macro.

Comment 22 rajasekar 2017-08-04 20:30:43 UTC
(In reply to Siteshwar Vashisht from comment #21)
> Currently the fix requires recompiling ksh with -D_AST_no_spawnveg=1 macro.

=============
Please provide me the detailed steps to fix this issue

Comment 23 Paulo Andrade 2017-08-07 12:26:30 UTC
  You need to rebuild the ksh rpm.

  After unpacking the .src.rpm, edit the ksh.spec file to change the line:

export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT"

to

export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT -D_AST_no_spawnveg=1"

and run rpmbuild.

Comment 24 rajasekar 2017-08-07 17:41:55 UTC
(In reply to Paulo Andrade from comment #23)
>   You need to rebuild the ksh rpm.
> 
>   After unpacking the .src.rpm, edit the ksh.spec file to change the line:
> 
> export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT"
> 
> to
> 
> export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT
> -D_AST_no_spawnveg=1"
> 
> and run rpmbuild.

=============================
send me the link to download the source rpm, please find the current rpm in server.

$ rpm -qa | grep -i ksh
ksh-20120801-34.el7.x86_64

Comment 25 Paulo Andrade 2017-08-07 18:04:16 UTC
Please try this command:

$ yumdownloader --source ksh-20120801-34.el7.x86_64

Comment 26 rajasekar 2017-08-07 20:22:00 UTC
I downloaded the source rpm and unpacked and updated the ksh.spec under /root/rpmbuild/SPECS and ran rpmbuild. still the since persist while running top command as normal user
[root@SPECS]# pwd
/root/rpmbuild/SPECS
[root@SPECS]# rpmbuild
[root@SPECS]# ls
ksh.spec
[root@SPECS]# grep export ksh.spec
export CCFLAGS="$RPM_OPT_FLAGS -fno-strict-aliasing $XTRAFLAGS -DSHOPT_AUDIT -D_AST_no_spawnveg=1"
export CC=gcc
export SHELL=$(ls $(pwd)/arch/*/bin/ksh)
- exporting fixed with variable corrupted its data (#1192026)
[root@SPECS]#
===================
$ top

top: failed tty set: Interrupted system call

Comment 27 Paulo Andrade 2017-08-07 21:00:55 UTC
  You should provide some information on how did you rebuild
the package, and how did you install it.

  I suggest making sure to add some tag to the rpm "Release",
what automatically makes it newer, and can be used to bump
on different rebuilds. Otherwise, it is easy to get confused.

Comment 28 rajasekar 2017-08-07 21:13:03 UTC
I just downloaded the rpm from redhat site and ran these command to find the path

#rpm -vv -Uvh ksh-20120801-34.el7.src.rpm
I found the file under /root/rpmbuild/SPECS/ksh.spec , I edited the file as per your advise.
and ran rpmbuild command.
#rpmbuild 

==========
Please correct me if I am missing anything here.

Comment 29 Paulo Andrade 2017-08-16 12:40:40 UTC
Hi Rajasekar,

Please comment in the related customer portal case.

Comment 30 Torbjörn Björklund 2017-09-29 12:28:00 UTC
This just turned out to be a problem for us as well. We usually don't have systems with only one cpu but recently installed a bunch of training systems for a WCS/WMS system and didn't need more cpu resources.

Most of us use bash so we hadn't noticed, but the supplier of the software uses ksh and ran into this problem. I've asked if using bash is possible for him, otherwise I guess just raising the cpu count will have to work.

Comment 38 Siteshwar Vashisht 2018-05-07 17:06:06 UTC
For the record, this issue is being discussed on GitHub[1] and I have already switched to fork()/exec() in upstream[2]. If anyone on this bug would like to try out latest development builds, they are available in copr[3].

[1] https://github.com/att/ast/issues/468
[2] https://github.com/att/ast/pull/470
[3] https://copr.fedorainfracloud.org/coprs/g/ksh/latest/