Description of problem: I recently noticed this behaviour of "retrace-server-interact <taskid> crash" which can be problematic. The vmcore I was working on was 35GB and I noticed it was just hanging and crash wasn't coming up. In another window, I did this simple loop: $ while true; do ps -efl | grep 476900695; sleep 1; done and noticed the loading of crash was blocked behind the following: crash --osrelease /cores/retrace/tasks/476900695/crash/vmcore So it seems every time you invoke 'retrace-server-interact <taskid> crash" it will attempt to determine the vmcore version and load the correct symbols. But I wonder why we need to do this. Hasn't this already been done and can't the kernel version be saved in a file in the <taskid> directory? Version-Release number of selected component (if applicable): retrace-server-1.10-1.el6.noarch How reproducible: I believe every time though more noticeable for large vmcores (mine was 35GB). Steps to Reproduce: 1. Load any vmcore via 'retrace-server-interact <taskid> crash'. If you want time it with 'time' 2. In another window, observe via the following script: while true; do ps -efl | grep <taskid>; sleep 1; done 3. Note that getting to a crash prompt may take minutes due to being stuck in vmcore kernel version detection logic, such as 'crash --osrelease' command Actual results: Getting to the crash prompt may take an excessive amount of time using the recommended command of 'retrace-server-interact <taskid> crash' Expected results: Command should not need to detect the kernel version every time it is invoked. Additional info: I'm not sure why the command works this way. It may have been assumed detection of the kernel version would not be so expensive. Unfortunately for large vmcores this is not the case. I'm surprised no one has noticed / complained about this. It may be people have their own workarounds, or the frequency of largish vmcores is not too high. Possible workaround is to take the kernel version in 'retrace_backtrace' file and construct a manual crash commandline to load it, avoiding the detection logic of the 'retrace-server-interact <taskid> crash' command. I even timed it and found that it took over 19 minutes just to load and quit crash. While this isn't super scientific, it does give an indication as to how much overhead we may be encountering. $ time retrace-server-interact 476900695 crash ... crash> cd /cores/retrace/tasks/476900695/misc Working directory /cores/retrace/tasks/476900695/misc. crash> quit real 19m19.671s user 18m59.388s sys 0m6.018s Manually loading via the crash command and full path took less than 5 minutes. [dwysocha@optimus expect]$ time crash -i /cores/retrace/tasks/476900695/crashrc /cores/retrace/tasks/476900695/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-358.23.2.el6.x86_64/vmlinux crash 7.0.1 Copyright (C) 2002-2013 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. quit ... crash> cd /cores/retrace/tasks/476900695/misc Working directory /cores/retrace/tasks/476900695/misc. crash> quit real 4m40.887s user 4m36.856s sys 0m3.521s
Fixed in upstream commit 328f12e24d6d1f324b4a4e43d55ba068cff6f3e4 Author: Michal Toman <mtoman> Date: Wed Feb 26 13:14:58 2014 +0100 vmcore: cache kernel version into task directory Signed-off-by: Michal Toman <mtoman>
retrace-server-1.11-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/retrace-server-1.11-1.el6
Package retrace-server-1.11-1.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing retrace-server-1.11-1.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-0687/retrace-server-1.11-1.el6 then log in and leave karma (feedback).
I think this has been fixed. I verified with one vmcore on a non-patched retrace-server system, and submitted the same vmcore to a patched retrace-server system. The unpatched system took 5 mins to load. The patched one, just a couple seconds.
retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/retrace-server-1.12-2.el6
retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.