Bug 1737552
Summary: | Async unsafe call in batch mode can cause top to crash | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Paulo Andrade <pandrade> | |
Component: | procps-ng | Assignee: | Jan Rybar <jrybar> | |
Status: | CLOSED ERRATA | QA Contact: | Karel Volný <kvolny> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 7.5 | CC: | fkrska, jrybar, kvolny, nbansal | |
Target Milestone: | rc | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | All | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | procps-ng-3.3.10-27.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1739370 1739371 1739372 (view as bug list) | Environment: | ||
Last Closed: | 2020-03-31 19:38:15 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1739370, 1739371, 1739372 |
Description
Paulo Andrade
2019-08-05 15:55:24 UTC
Procedures to reproduce are very close to a similar bug report at https://bugzilla.redhat.com/show_bug.cgi?id=1403971 The root cause of the problem should have been, for some reason, a slow procfs read, and the top command being killed with a signal handled by sig_endpgm: """ case SIGALRM: case SIGHUP: case SIGINT: case SIGPIPE: case SIGQUIT: case SIGTERM: case SIGUSR1: case SIGUSR2: sa.sa_handler = sig_endpgm; """ Need two terminals and run under gdb. For example, call #1 one terminal and #2 another terminal: <<< Start under gdb, with debuginfo packages installed, and stop on the close_stdout function >>> #1 gdb -q --args top -b -n 1 [...] (gdb) b close_stdout Breakpoint 1 at 0x411260: file ../lib/fileutils.c, line 37. (gdb) r Starting program: /usr/bin/top -b -n 1 [...] Breakpoint 1, close_stdout () at ../lib/fileutils.c:37 37 if (close_stream(stdout) != 0 && !(errno == EPIPE)) { <<< Now stop in the next munmap >>> (gdb) b munmap Breakpoint 2 at 0x7ffff70593f0: file ../sysdeps/unix/syscall-template.S, line 81. (gdb) c Continuing. Breakpoint 2, munmap () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) <<< On munmap, step instructions a bit until returning from the syscall >>> (gdb) si 0x00007ffff70593f5 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) si 0x00007ffff70593f7 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) 0x00007ffff70593fd 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) disassemble Dump of assembler code for function munmap: 0x00007ffff70593f0 <+0>: mov $0xb,%eax 0x00007ffff70593f5 <+5>: syscall 0x00007ffff70593f7 <+7>: cmp $0xfffffffffffff001,%rax => 0x00007ffff70593fd <+13>: jae 0x7ffff7059400 <munmap+16> [...] <<< get the top pid >>> (gdb) call getpid() $1 = 7520 <<< tell gdb just to be sure, to pass SIGTERM, and not stop on it >> (gdb) handle SIGTERM nostop pass Signal Stop Print Pass to program Description SIGTERM No Yes Yes Terminated <<< Now the important part to reproduce, in the second terminal, send the signal >>> #2 kill -TERM 7520 <<< And on gdb, let it continue executing >>> (gdb) c Continuing. Program received signal SIGTERM, Terminated. Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6fdcd19 in _IO_new_file_overflow ( f=0x7ffff7328400 <_IO_2_1_stdout_>, ch=10) at fileops.c:851 851 *f->_IO_write_ptr++ = ch; (gdb) bt #0 0x00007ffff6fdcd19 in _IO_new_file_overflow ( f=0x7ffff7328400 <_IO_2_1_stdout_>, ch=10) at fileops.c:851 #1 0x00007ffff6fd8999 in __GI__IO_putc (c=<optimized out>, fp=0x7ffff7328400 <_IO_2_1_stdout_>) at putc.c:29 #2 0x00007ffff754830a in tputs (string=string@entry=0x411433 "\n", affcnt=affcnt@entry=1, outc=outc@entry=0x7ffff7548020 <_nc_putchar>) at ../../ncurses/tinfo/lib_tputs.c:337 #3 0x00007ffff75485b1 in putp (string=string@entry=0x411433 "\n") at ../../ncurses/tinfo/lib_tputs.c:208 #4 0x0000000000405532 in bye_bye (str=str@entry=0x0) at top.c:563 #5 0x0000000000406aaf in sig_endpgm (dont_care_sig=<optimized out>) at top.c:625 #6 <signal handler called> [...] Because the munmap, it is unlikely it will work by accident. Note that because the 'bye_bye' function is called a second time, it might not be required to write the '\n'. The suggested patch is just to have the same behaviour, and not crash, but most likely it will write it twice, once when normally exiting, and again when receiving the signal, after returning from main. (In reply to Paulo Andrade from comment #4) > Note that because the 'bye_bye' function is called a second time, it might > not be required to write the '\n'. The suggested patch is just to have the > same behaviour, and not crash, but most likely it will write it twice, once > when normally exiting, and again when receiving the signal, after returning > from main. Maybe just using 'static volatile sig_atomic_t in_exit' and asking for it before calling exit() might prevent calling exit() twice and outputting '\n' twice also. (In reply to Jan Rybar from comment #7) > (In reply to Paulo Andrade from comment #4) > > > Note that because the 'bye_bye' function is called a second time, it might > > not be required to write the '\n'. The suggested patch is just to have the > > same behaviour, and not crash, but most likely it will write it twice, once > > when normally exiting, and again when receiving the signal, after returning > > from main. > > Maybe just using 'static volatile sig_atomic_t in_exit' and asking for it > before calling exit() might prevent calling exit() twice and outputting '\n' > twice also. Yes. That probably would better match the expected behaviour. The suggested patch was just to prevent the crash, as depending on when the signal is received, while exiting, it might not crash, and still output twice '\n'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1018 |