|Summary:||crash-7.0.1-0.1.x86_64 built on RHEL6.4 system hung for 4 hours consuming CPU on ~2GB x86_64 vmcore|
|Product:||Red Hat Enterprise Linux 6||Reporter:||Dave Wysochanski <dwysocha>|
|Component:||crash||Assignee:||Dave Anderson <anderson>|
|Status:||CLOSED ERRATA||QA Contact:||Qiao Zhao <qzhao>|
|Version:||6.4||CC:||dwysocha, hamiller, harshula, jherrman, kwalker, rvokal, stalexan, streeter, tlavigne|
|Fixed In Version:||crash-7.1.0-1.el6||Doc Type:||Bug Fix|
On certain system configurations, the "kmem -f", "kmem -F", and "kmem [address]" command options previously took a very long time to complete. This update increases the internal hash queue size used to store the address of each free page, and streamlines the free page search to only check the NUMA node that contains a specified address. As a result, the mentioned "kmem" options no longer have a negative impact on performance.
|Last Closed:||2015-07-22 06:27:03 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Dave Wysochanski 2014-03-07 15:36:22 UTC
Description of problem: Yesterday we had a vmcore come in from a customer system which was 256GB of RAM but the stripped core was only 2GB. The Red Hat engineer submitted the vmcore to our retrace-server system to strip and detect the proper kernel-debuginfo for the core. About 4 hours later, I got a ping from the engineer stating he thought his retrace-server task was hung. I logged into our retrace-server machine and saw a crash process which had been initiated by retrace-server still running with 100% cpu. It had been running for 4 hours. I gave instructions to the admin of the machine to core the crash process so we can examine it, as well as the vmcore which seemed to trigger the problem. I will put in a separate comment with the location of all the data. Unfortunately this may be a hard to reproduce problem. I took the same core and set it up manually (extracted debuginfo, etc) and successfully ran crash on it without any hang. Version-Release number of selected component (if applicable): crash-7.0.1-0.1.x86_64 NOTE: The system running crash is a RHEL6.4 system, but we've built the latest crash-7.x from the srpm. I'm not sure if this matters but I wanted to make it clear. How reproducible: Unsure. Steps to Reproduce: Unsure. May be able to trigger this with specific vmcore and the retrace-server version on optimus. Actual results: crash session goes to 100% cpu and does not come back. Additional info: I may have gotten the release wrong for this bug. I was not sure since we're running a RHEL6.4 box with a crash-7.x version. Feel free to change it if you feel it is wrong. I have seen one other instance on this same system where crash processes go to 100% CPU and don't want to come back, though I'm not sure it is the same problem. I use 'screen' often along with my crash sessions. While in my crash session, I may do a "CTRL-Z" to get to a shell prompt, do some command, and then 'fg' to get back. Occassionally I have done the "CTRL-Z" step, and then left my screen session without getting back into crash. I have seen this sometimes leads to a crash session going to 100% CPU, because someone asked me about a runaway crash session of mine. This may be a separate but similar bug. Unfortunately I do not have a core of this sort of hang right now. If it happens again, I'll get a core and either attach it to this bug, or open a new bug.
Comment 4 Dave Anderson 2014-03-07 16:20:14 UTC
Yeah I'm not sure how to debug this. It's simply doing a readline() library call for a command entry, which should just block in the kernel in a read system call: Program terminated with signal 11, Segmentation fault. #0 0x00007f7543cc3630 in __read_nocancel () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 zlib-1.2.3-29.el6.x86_64 (gdb) bt #0 0x00007f7543cc3630 in __read_nocancel () from /lib64/libc.so.6 #1 0x0000000000762d57 in rl_getc (stream=0x7f7543f756a0) at input.c:471 #2 0x00000000007631f6 in rl_read_key () at input.c:448 #3 0x0000000000750725 in readline_internal_char () at readline.c:517 #4 0x0000000000750b46 in readline_internal_charloop (prompt=<value optimized out>) at readline.c:579 #5 readline_internal (prompt=<value optimized out>) at readline.c:593 #6 e (prompt=<value optimized out>) at readline.c:342 #7 0x000000000051fb85 in process_command_line () at cmdline.c:116 #8 0x0000000000461295 in main_loop () at main.c:707 #9 0x000000000066ef23 in captured_command_loop (data=0x0) at main.c:258 #10 0x000000000066d7f4 in catch_errors (func=0x66ef10 <captured_command_loop>, func_args=0x0, errstring=0x8a2388 "", mask=6) at exceptions.c:557 #11 0x000000000066fb67 in captured_main (data=<value optimized out>) at main.c:1064 #12 0x000000000066d7f4 in catch_errors (func=0x66f000 <captured_main>, func_args=0x7fff769039d0, errstring=0x8a2388 "", mask=6) at exceptions.c:557 #13 0x000000000066ed24 in gdb_main (args=0x7fff76903647) at main.c:1079 #14 0x000000000066ed66 in gdb_main_entry (argc=<value optimized out>, argv=0x7fff76903647) at main.c:1099 #15 0x0000000000465045 in main (argc=5, argv=0x7fff76904cb8) at main.c:613 (gdb) And if a CTRL-z is entered, the process should block (call schedule) in do_signal_stop(). But I have no clue about whatever retrace-server or screen might be doing when they are the standard input. Maybe they're causing the read() to return constantly with zero-data or something like that?
Comment 5 Dave Anderson 2014-03-07 16:50:30 UTC
> The Red Hat engineer submitted the vmcore to our retrace-server system > to strip and detect the proper kernel-debuginfo for the core. What does the "strip and detect" procedure do?
Comment 6 Dave Wysochanski 2014-03-07 23:09:52 UTC
(In reply to Dave Anderson from comment #5) > > The Red Hat engineer submitted the vmcore to our retrace-server system > > to strip and detect the proper kernel-debuginfo for the core. > > What does the "strip and detect" procedure do? The 'strip' procedure just runs makedumpfile on the vmcore. The 'detect' part runs 'crash --osrelease' to try to get the kernel version. If that fails, I think it tries 'strings' or something on the vmcore to try to pull a kernel version out. I don't think either one of these was hung. And as it turns out, I think it is reproducible with this vmcore. I submitted it to our development retrace-server system, and it looks hung in the same manner. It is still running in fact on that machine.
Comment 9 Dave Anderson 2014-03-08 15:57:06 UTC
[anderson@optimus ~]$ crash -s /retrace/tasks/180670073/crash/vmcore /retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-358.6.2.el6.x86_64/vmlinux crash: /retrace/tasks/180670073/crash/vmcore: No such file or directory Usage: crash [OPTION]... NAMELIST MEMORY-IMAGE (dumpfile form) crash [OPTION]... [NAMELIST] (live system form) Enter "crash -h" for details. [anderson@optimus ~]$ FWIW, yesterday I did try the original vmlinux/vmcore pair reported in comment #1 in a standalone interactive manner, and as expected there was no problem. I didn't try the vmcore on flopbox, but I highly doubt that it would fail while being invoked interactively on the command line. I don't know why a particular vmlinux/vmcore pair would cause this to happen, but I also don't know how crash is being invoked by the retrace-server script (or whatever it is). Is it piping input into the crash process? Or is using the "-i <input-file>" command line option? Can you show the invocation?
Comment 10 Dave Anderson 2014-03-08 16:10:04 UTC
> The core should also still be on flopobox as this filename: 01050648.vmcore.gz $ crash -s --osrelease 01050648.vmcore 2.6.32-358.6.2.el6.x86_64 $ crash -s /tmp/vmlinux 01050648.vmcore crash> q $ crash /tmp/vmlinux 01050648.vmcore crash 7.0.5 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /tmp/vmlinux DUMPFILE: 01050648.vmcore [PARTIAL DUMP] CPUS: 32 DATE: Thu Mar 6 04:28:15 2014 UPTIME: 18 days, 20:09:38 LOAD AVERAGE: 0.04, 0.05, 0.11 TASKS: 848 NODENAME: fl08lxccs01 RELEASE: 2.6.32-358.6.2.el6.x86_64 VERSION: #1 SMP Tue May 14 15:48:21 EDT 2013 MACHINE: x86_64 (2660 Mhz) MEMORY: 256 GB PANIC: "Oops: 0002 [#1] SMP " (check log for details) PID: 31382 COMMAND: "bash" TASK: ffff8810652f2040 [THREAD_INFO: ffff88050ab1a000] CPU: 3 STATE: TASK_RUNNING (PANIC) crash>
Comment 12 Michal Toman 2014-03-10 10:05:03 UTC
I've tried to run the commands that Retrace Server uses and all but 'kmem -f' return quickly. 'kmem -f' seems to trigger the problem (I've killed it after ~15 minutes, it was using 100% CPU during that time).
Comment 13 Dave Anderson 2014-03-10 15:07:47 UTC
(In reply to Michal Toman from comment #12) > I've tried to run the commands that Retrace Server uses and all but 'kmem > -f' return quickly. 'kmem -f' seems to trigger the problem (I've killed it > after ~15 minutes, it was using 100% CPU during that time). The "kmem -f" command is somewhat labor-intensive, especially on a system that has 256GB of memory like in 01050648.vmcore, most of which is sitting on the various free list chains. The page lists (often containing multiple millions of pages each) are walked through, verified for correctness, and hashed internally to check for duplicates to catch wrap-arounds/corruptions and to prevent the command from entering an infinite loop. I started a "kmem -f" on 01050648.vmcore at around 9:00 EST, and it's still dutifully churning through the free lists at 11:00 EST. I'm predicting that it will eventually finish successfully. You guys might want to re-think the need for running that particular command by default in your retrace-server. There's not much I can do about that. That being said, the core file you gave me captured the crash utility in a readline library call, where it was gathering a crash command. In the case of an onerous "kmem -f" operation, if you had taken a core dump, you wouldn't see it operating there, i.e., because it would have read the "kmem -f" string from whatever the input source was, and then spent whatever time it takes to complete the command. So I don't even know if we're talking about the same issue here.
Comment 14 Dave Anderson 2014-03-10 18:27:23 UTC
> The "kmem -f" command is somewhat labor-intensive, especially on a system > that has 256GB of memory like in 01050648.vmcore, most of which is sitting > on the various free list chains. The page lists (often containing multiple > millions of pages each) are walked through, verified for correctness, and > hashed internally to check for duplicates to catch wrap-arounds/corruptions > and to prevent the command from entering an infinite loop. It's the internal hashing that's consuming most of the time that "kmem -f" is running. Every page found is hashed, and when walking through each subsequent free page, its associated hash chain is searched for duplicates. Currently there are only 128 hash chain heads being used. (I wrote the hashing code back when 256 *MB* was a normal system size, never mind 256 *GB*) I can look into seeing how much of an increase in hash chain heads would alleviate the problem on a system of this size, but it's still going to take a lot of time, given the sheer number of free pages that are going to be hashed. And I don't want to go crazy allocating huge amounts of static buffer space. I'll tinker around with this and see what might be useful. In the meantime... Do this in your script: ... set hash off kmem -f set hash on ... Doing that, "kmem -f" takes about 5 minutes on 01050648.vmcore.
Comment 15 Dave Anderson 2014-03-11 18:51:30 UTC
> I can look into seeing how much of an increase in hash chain heads > would alleviate the problem on a system of this size, but it's > still going to take a lot of time, given the sheer number of free > pages that are going to be hashed. Well, I could not be more wrong... Increasing the hash chain head count reduces the time by several orders of magnitude. This particular system (01050648.vmcore) has 50331705 free pages (~192GB out of 256GB). Now, as a "best-case" benchmark, with hashing turned off, "kmem -f" completes in: real 4m42.759s user 4m26.065s sys 0m16.109s The crash utility's hash queue head count has always been 128, consuming 1K of memory for just the hash queue heads. And because of its small size, the "kmem -f" command on a huge system of this size takes forever because the number of objects linked on each of the 128 queues is ridiculously large. Increasing the hash queue head count from 128 to the following sizes, I get the following results. For each [size], here is the time spent, the number of hash those queues used, the range of the number of objects per-queue, and the memory size consumed by just the hash queue heads:  real 34m04.579s user 33m37.106s sys 0m22.180s queues in use: 4096 of 4096 queue length range: 3032 to 3478 size of hash queue heads: 32K  real 19m58.683s user 19m32.972s sys 0m21.496s queues in use: 8192 of 8192 queue length range: 1423 to 1810 size of hash queue heads: 64K  real 12m33.526s user 12m10.278s sys 0m20.318s queues in use: 16384 of 16384 queue length range: 690 to 950 size of hash queue heads: 128K  real 8m44.229s user 8m23.037s sys 0m19.295s queues in use: 32768 of 32768 queue length range: 292 to 506 size of hash queue heads: 256K  real 6m41.822s user 6m22.935s sys 0m17.784s queues in use: 65536 of 65536 queue length range: 131 to 280 size of hash queue heads: 512K  real 5m42.404s user 5m24.724s sys 0m16.801s queues in use: 131072 of 131072 queue length range: 46 to 158 size of hash queue heads: 1MB I'm thinking that a reasonable compromise between the benefit of a larger hash queue vs. the increase in the amount of memory required would be a count of 16384, increasing the memory required from 1KB up to 128KB. On the other hand, going to 131072 brings the time remarkably close to not hashing at all, with the expense of using 1MB of memory for just the hash queue heads. So maybe I'm being unnecessarily conservative? And in certain circumstances, it may be helpful/useful to be able to override the default count, e.g., to handle abnormally large systems like this example. So it may also be beneficial to add a new command line option "--hash <size>" that could be used to override the default. It should be noted that "kmem -f" is a fairly special case with respect to hash queue usage, at least for the built-in commands. Anyway, given that you guys are my prime customers, I'm interested in your thoughts on the matter -- any comments?
Comment 16 Guy Streeter 2014-03-11 19:05:28 UTC
I'm not a heavy user of kmem -f. I have encountered occasionally a vmcore on which kmem <address> took a very long time. Is that a related problem? I don't think using a MB of memory for kmem -f is bad. Especially if it can be allocated when needed. We've already put in a request to have our retrace server stop doing kmem -f by default, so this is not critical for us.
Comment 17 Dave Anderson 2014-03-12 14:30:34 UTC
(In reply to Guy Streeter from comment #16) > I'm not a heavy user of kmem -f. I have encountered occasionally a vmcore on > which kmem <address> took a very long time. Is that a related problem? Yes. The "kmem <address>" does a series of subsystem checks, and one of them is a check of the free list. I'll take a look to see if it can be stream-lined somewhat. > > I don't think using a MB of memory for kmem -f is bad. Especially if it can > be allocated when needed. Unfortunately it's not a when-needed scenario. The default size (or as specified by a new "--hash <size>" option) will be allocated and touched during initialization because the pid-gathering process uses the hash queue. > > We've already put in a request to have our retrace server stop doing kmem -f > by default, so this is not critical for us. OK.
Comment 18 Dave Anderson 2014-03-13 17:15:50 UTC
For now I've decided on a default of 32768: Increase the internal hash queue head count from 128 to 32768. The hash queue is used for gathering and verifying lists, and the original count of 128 may be overwhelmed if a list is extremely large. For example, on a 256GB system with 192GB of free pages, the "kmem -f" command takes hours to complete; with this patch, the time is reduced to a few minutes. In addition, a new command line option "--hash <count>" has been added to allow a user to override the default hash queue head count of 32768. (email@example.com) $ git clone git://github.com/crash-utility/crash.git $ cd crash $ make lzo
Comment 19 Dave Anderson 2014-03-18 15:15:28 UTC
(In reply to Guy Streeter from comment #16) > > I'm not a heavy user of kmem -f. I have encountered occasionally a > > vmcore on which kmem <address> took a very long time. Is that a > > related problem? > > Yes. The "kmem <address>" does a series of subsystem checks, and one > of them is a check of the free list. I'll take a look to see if it can > be stream-lined somewhat. And yes, "kmem [-f] <address>" can be stream-lined fairly significantly: commit ed28aca0c0281d44713f2e9a1a39169b908b18fd Author: Dave Anderson <firstname.lastname@example.org> Date: Fri Mar 14 10:30:35 2014 -0400 Fix for the "kmem -F" display and the "kmem -f <address>" or "kmem <address> options. Without the patch, the first page in a list of free page blocks is not displayed on its own line by "kmem -F", but rather at the end of the line showing the area number, block size, and free_area struct address that the page is linked to. Due to this error, both "kmem <address>" and "kmem -f <address>" would not find a page if it happened to be the first page, or any of the contiguous pages in a multi-page block, in the list of free pages or page blocks. (email@example.com) commit d3e5664ec6924c2e4d93b47c41fbc46dbf5c816c Author: Dave Anderson <firstname.lastname@example.org> Date: Mon Mar 17 16:42:13 2014 -0400 Created a new feature for the internal do_list() function if it is necessary to immediately perform a function for each entry in a list while the list is being traversed. A callback function, and an option callback data pointer, can be registered in the list_data structure. The address of each entry in the list along with the optional callback data pointer will be passed to the callback function. If desired, the callback function may also dictate that do_list() should stop the list traversal and return immediately to its caller. (email@example.com) commit 32aa6bbbf7becb90ba84d258f7baf19898099e1b Author: Dave Anderson <firstname.lastname@example.org> Date: Mon Mar 17 16:57:48 2014 -0400 Made the "kmem -f <address>" and "kmem <address>" options more efficient by using the new do_list() callback function feature as well as restricting the search to only the NUMA node that contains the address. (email@example.com) (In reply to Dave Anderson from comment #18) > > $ git clone git://github.com/crash-utility/crash.git > $ cd crash > $ make lzo A git-pull/rebuild is required for the 3 commits above.
Comment 20 Dave Wysochanski 2014-04-09 14:41:46 UTC
Thanks Dave A! I know I have run into situations where kmem takes a while, as do others. This set of patches looks like it will improve more than one scenario, and I will see if we can get this tested soon. Other than the one vmcore mentioned in the original filing for 'kmem -f' taking a long time, I wonder if we have a few vmcores which we know are slow with kmem. I will see what I can find but if others have vmcores they know are slow with 'kmem <addr>' checks for instance they might be interesting test cases.
Comment 21 Dave Anderson 2015-02-10 20:28:19 UTC
This issue is fixed in crash-7.1.0-1.el6, which was built today for the rebase approved for the rhel6.7 crash utility errata.
Comment 22 Qiao Zhao 2015-05-12 05:51:31 UTC
(In reply to Dave Anderson from comment #21) > This issue is fixed in crash-7.1.0-1.el6, which was built today for the > rebase approved for the rhel6.7 crash utility errata. ok, set qa_ack+.
Comment 23 Qiao Zhao 2015-06-09 07:30:07 UTC
Hi developer, RHEL-6.7 has been in sanp#4, i think we can move this bug to rhel-6.8. If you want fix in rhel-6.7, please add exception and block flags. -- Thanks, Qiao
Comment 24 Dave Anderson 2015-06-09 12:53:27 UTC
(In reply to Qiao Zhao from comment #23) > Hi developer, > > RHEL-6.7 has been in sanp#4, i think we can move this bug to rhel-6.8. > If you want fix in rhel-6.7, please add exception and block flags. > > -- > Thanks, > Qiao See comment #21. It's already fixed in the rhel-6.7 errata version.
Comment 25 Qiao Zhao 2015-06-10 02:37:27 UTC
(In reply to Dave Anderson from comment #24) > (In reply to Qiao Zhao from comment #23) > > Hi developer, > > > > RHEL-6.7 has been in sanp#4, i think we can move this bug to rhel-6.8. > > If you want fix in rhel-6.7, please add exception and block flags. > > > > -- > > Thanks, > > Qiao > > See comment #21. It's already fixed in the rhel-6.7 errata version. Sorry, i miss this comment. but this bug not been added to rhel-6.7 crash-utility errata https://errata.devel.redhat.com/advisory/19827.
Comment 26 Dave Anderson 2015-06-10 13:02:54 UTC
(In reply to Qiao Zhao from comment #25) > (In reply to Dave Anderson from comment #24) > > (In reply to Qiao Zhao from comment #23) > > > Hi developer, > > > > > > RHEL-6.7 has been in sanp#4, i think we can move this bug to rhel-6.8. > > > If you want fix in rhel-6.7, please add exception and block flags. > > > > > > -- > > > Thanks, > > > Qiao > > > > See comment #21. It's already fixed in the rhel-6.7 errata version. > > Sorry, i miss this comment. but this bug not been added to rhel-6.7 > crash-utility errata https://errata.devel.redhat.com/advisory/19827. Correct -- it is not possible to add a BZ to an errata unless it has rhel-6.7+.
Comment 27 Dave Anderson 2015-06-10 13:06:28 UTC
And BTW, if the blocker/exception flag is a prerequisite for rhel-6.7+, I don't have permissions to set blocker? or exception?. (I don't even see those two flags when I click "more flags")
Comment 34 errata-xmlrpc 2015-07-22 06:27:03 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-1309.html