Hide Forgot
Created attachment 1511468 [details] Crash session that crashed Description of problem: Crash crashes when examining the following retrace task on optimus: crash> sys KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-696.13.2.el6.x86_64/vmlinux DUMPFILE: /cores/retrace/tasks/110186978/crash/vmcore [PARTIAL DUMP] CPUS: 8 [OFFLINE: 7] DATE: Mon Feb 5 04:13:11 2018 UPTIME: 80 days, 07:36:30 LOAD AVERAGE: 145.48, 121.62, 81.48 TASKS: 1093 NODENAME: XXXXXXXXXX RELEASE: 2.6.32-696.13.2.el6.x86_64 VERSION: #1 SMP Fri Sep 22 12:32:14 EDT 2017 MACHINE: x86_64 (2396 Mhz) MEMORY: 32 GB PANIC: "Kernel panic - not syncing: hung_task: blocked tasks" Version-Release number of selected component (if applicable): $ crash --version crash 7.2.4 Copyright (C) 2002-2017 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". How reproducible: Happened several times, full session log attached to the BZ Steps to Reproduce: 1. Use crash to analyse the core Actual results: Crash crashes Expected results: Crash doesn't crash Additional info:
I've never used the readline library's tab-completion feature in the crash utility (I didn't even consider it being enabled). I'm certainly not familiar with the library's internals, so don't hold your breath awaiting a fix.
The failure can occur in multiple different paths, where the damage has been done before the corruption is recognized. Here's a couple more relevant backtraces than the one in the attached file, where the failure occurs while executing the readline() call: crash> whatis mu*** Error in `./crash': free(): invalid pointer: 0x00007f095e62a000 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81489)[0x7f095d559489] /lib64/libc.so.6(_IO_free_backup_area+0x1a)[0x7f095d55473a] /lib64/libc.so.6(_IO_file_overflow+0x1d5)[0x7f095d553ea5] /lib64/libc.so.6(_IO_file_xsputn+0xb0)[0x7f095d552810] /lib64/libc.so.6(fputs+0xbb)[0x7f095d546e4b] ./crash[0x761e24] ./crash[0x762180] ./crash(fprintf_filtered+0x8c)[0x76320c] ./crash(throw_exception+0x63)[0x6a5cb3] ./crash[0x6a5f49] ./crash[0x6a6166] ./crash[0x760bc4] ./crash(c_parse_internal+0x32d6)[0x618a56] ./crash(c_parse+0x159)[0x618db9] ./crash[0x6d183a] ./crash(parse_expression_for_completion+0x71)[0x6d1b31] ./crash(expression_completer+0x76)[0x6b0726] ./crash[0x6afbf9] ./crash(readline_line_completion_function+0x59)[0x6b0669] ./crash(rl_completion_matches+0x61)[0x793331] ./crash(rl_complete_internal+0xf8)[0x793528] ./crash(_rl_dispatch_subseq+0x173)[0x78be63] ./crash(readline_internal_char+0x9f)[0x78c16f] ./crash(readline+0x45)[0x78c775] ./crash(process_command_line+0x1c3)[0x54f4a3] ./crash(main_loop+0x1e5)[0x467ed5] ./crash[0x6a7733] ./crash(catch_errors+0x7a)[0x6a645a] ./crash[0x6a86c6] ./crash(catch_errors+0x7a)[0x6a645a] ./crash(gdb_main_entry+0x47)[0x6a8a27] ./crash(main+0x775)[0x466265] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f095d4fa3d5] ./crash[0x46750e] crash> whatis mu*** Error in `./crash': malloc(): memory corruption: 0x00007feeec3f7010 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x82c96)[0x7feeeb325c96] /lib64/libc.so.6(+0x8382c)[0x7feeeb32682c] /lib64/libc.so.6(realloc+0x1d2)[0x7feeeb328832] ./crash(xrealloc+0x1d)[0x7873ad] ./crash(vec_o_reserve+0x5f)[0x73bdff] ./crash[0x673fa6] ./crash(default_make_symbol_completion_list_break_on+0x3ac)[0x678a9c] ./crash(location_completer+0x322)[0x6b0222] ./crash(expression_completer+0x11e)[0x6b07ce] ./crash[0x6afbf9] ./crash(readline_line_completion_function+0x59)[0x6b0669] ./crash(rl_completion_matches+0x61)[0x793331] ./crash(rl_complete_internal+0xf8)[0x793528] ./crash(_rl_dispatch_subseq+0x173)[0x78be63] ./crash(readline_internal_char+0x9f)[0x78c16f] ./crash(readline+0x45)[0x78c775] ./crash(process_command_line+0x1c3)[0x54f4a3] ./crash(main_loop+0x1e5)[0x467ed5] ./crash[0x6a7733] ./crash(catch_errors+0x7a)[0x6a645a] ./crash[0x6a86c6] ./crash(catch_errors+0x7a)[0x6a645a] ./crash(gdb_main_entry+0x47)[0x6a8a27] ./crash(main+0x775)[0x466265] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7feeeb2c53d5] ./crash[0x46750e] The rl_completion_matches() function is where the transition is made from the readline library code to base gdb code.
I can't really figure out how to effectively debug this, given that the damage has been done by the time the malloc/free/corruption is detected. Staring at the code doesn't show anything obvious. My best guess is that it has more to do with the embedded gdb completion code than the readline library itself. Or perhaps it's an issue related to the crash/gdb marriage, where it is the only place where gdb code is invoked directly without the top-level crash utility invoking gdb through its well-defined interface. That alone is a little bit disconcerting. Anyway, I think I'll take a look at writing a readline completer plugin, which would take gdb totally out of the picture. It should be faster than using the gdb completer, and would also remove the useless clutter of showing filenames as a completion option, which makes no sense.
> ... > Anyway, I think I'll take a look at writing a readline completer plugin, > which would take gdb totally out of the picture. It should be faster than > using the gdb completer, and would also remove the useless clutter of > showing filenames as a completion option, which makes no sense. A patch has been applied upstream: https://github.com/crash-utility/crash/commit/0f65ae0c36bf04e22219f28c32c3ae0cdee5acfe Implemented a new plugin function for the readline library's tab completion feature. Without the patch, the use of the default plugin from the embedded gdb module has been seen to cause segmentation violations or other fatal malloc/free/corruption assertions. The new plugin takes gdb out of the picture entirely, and also restricts the matching options to just symbol names, so as not to clutter the results with irrelevant filenames. (anderson@redhat.com) Also, because the top-level crash code already has a symbol list, the new plugin avoids having to do the malloc/realloc/frees that the gdb code does in generating the list of matching options -- which is where I *believe* the reported problem lies.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2071