1737552 – Async unsafe call in batch mode can cause top to crash

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1737552 - Async unsafe call in batch mode can cause top to crash

Summary: Async unsafe call in batch mode can cause top to crash

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	procps-ng
Sub Component:
Version:	7.5
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jan Rybar
QA Contact:	Karel Volný
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1739370 1739371 1739372
TreeView+	depends on / blocked

Reported:	2019-08-05 15:55 UTC by Paulo Andrade
Modified:	2023-09-07 20:21 UTC (History)
CC List:	4 users (show)
Fixed In Version:	procps-ng-3.3.10-27.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1739370 1739371 1739372 (view as bug list)
Environment:
Last Closed:	2020-03-31 19:38:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	4343051	0	None	None	None	2019-09-15 11:36:23 UTC
Red Hat Product Errata	RHBA-2020:1018	0	None	None	None	2020-03-31 19:38:22 UTC

Description Paulo Andrade 2019-08-05 15:55:24 UTC

Backtrace looks like this:

#0  0x00007f71cdd6d8f1 in __GI__IO_putc (c=10, fp=0x7f71ce0bd400 <_IO_2_1_stdout_>) at putc.c:31
#1  0x00007f71ce2dd30a in tputs () from /lib64/libtinfo.so.5
#2  0x00000000004055c2 in bye_bye (str=str@entry=0x0) at top.c:563
#3  0x0000000000406b3f in sig_endpgm (dont_care_sig=<optimized out>) at top.c:625
#4  <signal handler called>
#5  0x00007f71cddef087 in munmap () at ../sysdeps/unix/syscall-template.S:81
#6  0x00007f71cdd72b62 in __GI__IO_setb (f=f@entry=0x7f71ce0bd400 <_IO_2_1_stdout_>, b=b@entry=0x0, eb=eb@entry=0x0, a=a@entry=0) at genops.c:402
#7  0x00007f71cdd70f40 in _IO_new_file_close_it (fp=fp@entry=0x7f71ce0bd400 <_IO_2_1_stdout_>) at fileops.c:194
#8  0x00007f71cdd64018 in _IO_new_fclose (fp=0x7f71ce0bd400 <_IO_2_1_stdout_>) at iofclose.c:59
#9  0x0000000000411131 in close_stream (stream=0x7f71ce0bd400 <_IO_2_1_stdout_>) at ../lib/fileutils.c:25
#10 0x00000000004111ad in close_stdout () at ../lib/fileutils.c:37
#11 0x00007f71cdd2fb69 in __run_exit_handlers (status=status@entry=0, listp=0x7f71ce0bc6c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#12 0x00007f71cdd2fbb7 in __GI_exit (status=status@entry=0) at exit.c:99
#13 0x00000000004055b8 in bye_bye (str=str@entry=0x0) at top.c:564
#14 0x0000000000402f2c in do_key (ch=<optimized out>) at top.c:4987
#15 main (dont_care_argc=<optimized out>, argv=<optimized out>) at top.c:5721

  The problem is that top apparently is being killed if it takes too much time to
execute, on a loaded system.

  Code is spwaning top from a loop, in batch mode, with a small timeout (-d option),
and one iteration (-n option), from third party code, to fetch some information
from running processes.

  The most likely solution with as few as possible chances of unexpected side
effects probably would be the pseudo-patch:

-   if (Batch) putp("\n");
+   if (Batch) {
+      if (Frames_signal == BREAK_sig)
+         write(STDOUT_FILENO, "\n", 1);
+      else
+         putp("\n");
+   }
    exit(EXIT_SUCCESS);
 } // end: bye_bye

in top/top.c:bye_bye()

Comment 4 Paulo Andrade 2019-08-06 12:30:00 UTC

  Procedures to reproduce are very close to a similar bug report at
https://bugzilla.redhat.com/show_bug.cgi?id=1403971

  The root cause of the problem should have been, for some reason, a slow procfs
read, and the top command being killed with a signal handled by sig_endpgm:

"""
         case SIGALRM: case SIGHUP:  case SIGINT:
         case SIGPIPE: case SIGQUIT: case SIGTERM:
         case SIGUSR1: case SIGUSR2:
            sa.sa_handler = sig_endpgm;
"""

  Need two terminals and run under gdb. For example, call #1 one terminal
and #2 another terminal:

<<< Start under gdb, with debuginfo packages installed, and stop on the
    close_stdout function >>>

#1 gdb -q --args top -b -n 1
[...]
(gdb) b close_stdout
Breakpoint 1 at 0x411260: file ../lib/fileutils.c, line 37.
(gdb) r
Starting program: /usr/bin/top -b -n 1
[...]

Breakpoint 1, close_stdout () at ../lib/fileutils.c:37
37		if (close_stream(stdout) != 0 && !(errno == EPIPE)) {

<<< Now stop in the next munmap >>>

(gdb) b munmap
Breakpoint 2 at 0x7ffff70593f0: file ../sysdeps/unix/syscall-template.S, line 81.
(gdb) c
Continuing.

Breakpoint 2, munmap () at ../sysdeps/unix/syscall-template.S:81
81	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)

<<< On munmap, step instructions a bit until returning from the syscall >>>

(gdb) si
0x00007ffff70593f5	81	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) si
0x00007ffff70593f7	81	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) 
0x00007ffff70593fd	81	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) disassemble 
Dump of assembler code for function munmap:
   0x00007ffff70593f0 <+0>:	mov    $0xb,%eax
   0x00007ffff70593f5 <+5>:	syscall 
   0x00007ffff70593f7 <+7>:	cmp    $0xfffffffffffff001,%rax
=> 0x00007ffff70593fd <+13>:	jae    0x7ffff7059400 <munmap+16>
[...]

<<< get the top pid >>>

(gdb) call getpid()
$1 = 7520

<<< tell gdb just to be sure, to pass SIGTERM, and not stop on it >>

(gdb) handle SIGTERM nostop pass
Signal        Stop	Print	Pass to program	Description
SIGTERM       No	Yes	Yes		Terminated


<<< Now the important part to reproduce, in the second terminal, send the signal >>>

#2 kill -TERM 7520


<<< And on gdb, let it continue executing >>>

(gdb) c
Continuing.

Program received signal SIGTERM, Terminated.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6fdcd19 in _IO_new_file_overflow (
    f=0x7ffff7328400 <_IO_2_1_stdout_>, ch=10) at fileops.c:851
851	  *f->_IO_write_ptr++ = ch;
(gdb) bt
#0  0x00007ffff6fdcd19 in _IO_new_file_overflow (
    f=0x7ffff7328400 <_IO_2_1_stdout_>, ch=10) at fileops.c:851
#1  0x00007ffff6fd8999 in __GI__IO_putc (c=<optimized out>, 
    fp=0x7ffff7328400 <_IO_2_1_stdout_>) at putc.c:29
#2  0x00007ffff754830a in tputs (string=string@entry=0x411433 "\n", 
    affcnt=affcnt@entry=1, outc=outc@entry=0x7ffff7548020 <_nc_putchar>)
    at ../../ncurses/tinfo/lib_tputs.c:337
#3  0x00007ffff75485b1 in putp (string=string@entry=0x411433 "\n")
    at ../../ncurses/tinfo/lib_tputs.c:208
#4  0x0000000000405532 in bye_bye (str=str@entry=0x0) at top.c:563
#5  0x0000000000406aaf in sig_endpgm (dont_care_sig=<optimized out>)
    at top.c:625
#6  <signal handler called>
[...]

  Because the munmap, it is unlikely it will work by accident.

  Note that because the 'bye_bye' function is called a second time, it might
not be required to write the '\n'. The suggested patch is just to have the
same behaviour, and not crash, but most likely it will write it twice, once
when normally exiting, and again when receiving the signal, after returning
from main.

Comment 7 Jan Rybar 2019-08-07 11:58:38 UTC

(In reply to Paulo Andrade from comment #4)

>   Note that because the 'bye_bye' function is called a second time, it might
> not be required to write the '\n'. The suggested patch is just to have the
> same behaviour, and not crash, but most likely it will write it twice, once
> when normally exiting, and again when receiving the signal, after returning
> from main.

Maybe just using 'static volatile sig_atomic_t in_exit' and asking for it before calling exit() might prevent calling exit() twice and outputting '\n' twice also.

Comment 8 Paulo Andrade 2019-08-07 20:28:11 UTC

(In reply to Jan Rybar from comment #7)
> (In reply to Paulo Andrade from comment #4)
> 
> >   Note that because the 'bye_bye' function is called a second time, it might
> > not be required to write the '\n'. The suggested patch is just to have the
> > same behaviour, and not crash, but most likely it will write it twice, once
> > when normally exiting, and again when receiving the signal, after returning
> > from main.
> 
> Maybe just using 'static volatile sig_atomic_t in_exit' and asking for it
> before calling exit() might prevent calling exit() twice and outputting '\n'
> twice also.

  Yes. That probably would better match the expected behaviour. The suggested
patch was just to prevent the crash, as depending on when the signal is received,
while exiting, it might not crash, and still output twice '\n'.

Comment 19 errata-xmlrpc 2020-03-31 19:38:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1018

Note You need to log in before you can comment on or make changes to this bug.