1053186 – "retrace-server-interact <taskid> crash" should not try to detect the kernel version but should obtain it from a saved location

Bug 1053186 - "retrace-server-interact <taskid> crash" should not try to detect the kernel version but should obtain it from a saved location

Summary: "retrace-server-interact <taskid> crash" should not try to detect the kernel ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora EPEL
Classification:	Fedora
Component:	retrace-server
Sub Component:
Version:	el6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Michal Toman
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-14 19:54 UTC by Dave Wysochanski
Modified:	2015-03-23 00:42 UTC (History)
CC List:	6 users (show)
Fixed In Version:	retrace-server-1.12-2.el6
Clone Of:
Environment:
Last Closed:	2014-08-15 18:58:11 UTC
Type:	Bug
Embargoed:

Attachments	(Terms of Use)

Description Dave Wysochanski 2014-01-14 19:54:45 UTC

Description of problem:
I recently noticed this behaviour of "retrace-server-interact <taskid> crash" which can be problematic.
The vmcore I was working on was 35GB and I noticed it was just hanging and crash wasn't coming up. In another window, I did this simple loop:
$ while true; do ps -efl | grep 476900695; sleep 1; done

and noticed the loading of crash was blocked behind the following:
crash --osrelease /cores/retrace/tasks/476900695/crash/vmcore

So it seems every time you invoke 'retrace-server-interact <taskid> crash" it will attempt to determine the vmcore version and load the correct symbols. But I wonder why we need to do this. Hasn't this already been done and can't the kernel version be saved in a file in the <taskid> directory?

Version-Release number of selected component (if applicable):
retrace-server-1.10-1.el6.noarch

How reproducible:
I believe every time though more noticeable for large vmcores (mine was 35GB).

Steps to Reproduce:
1. Load any vmcore via 'retrace-server-interact <taskid> crash'. If you want time it with 'time'
2. In another window, observe via the following script: while true; do ps -efl | grep <taskid>; sleep 1; done
3. Note that getting to a crash prompt may take minutes due to being stuck in vmcore kernel version detection logic, such as 'crash --osrelease' command

Actual results:
Getting to the crash prompt may take an excessive amount of time using the recommended command of 'retrace-server-interact <taskid> crash'

Expected results:
Command should not need to detect the kernel version every time it is invoked.

Additional info:
I'm not sure why the command works this way. It may have been assumed detection of the kernel version would not be so expensive. Unfortunately for large vmcores this is not the case.

I'm surprised no one has noticed / complained about this. It may be people have their own workarounds, or the frequency of largish vmcores is not too high.

Possible workaround is to take the kernel version in 'retrace_backtrace' file and construct a manual crash commandline to load it, avoiding the detection logic of the 'retrace-server-interact <taskid> crash' command.

I even timed it and found that it took over 19 minutes just to load and quit crash. While this isn't super scientific, it does give an indication as to how much overhead we may be encountering.

$ time retrace-server-interact 476900695 crash
...
crash> cd /cores/retrace/tasks/476900695/misc
Working directory /cores/retrace/tasks/476900695/misc.
crash> quit

real 19m19.671s
user 18m59.388s
sys 0m6.018s

Manually loading via the crash command and full path took less than 5 minutes.

[dwysocha@optimus expect]$ time crash -i /cores/retrace/tasks/476900695/crashrc /cores/retrace/tasks/476900695/crash/vmcore /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/2.6.32-358.23.2.el6.x86_64/vmlinux

crash 7.0.1
Copyright (C) 2002-2013 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

quit
...
crash> cd /cores/retrace/tasks/476900695/misc
Working directory /cores/retrace/tasks/476900695/misc.
crash> quit

real 4m40.887s
user 4m36.856s
sys 0m3.521s

Comment 1 Michal Toman 2014-02-26 12:16:20 UTC

Fixed in upstream

commit 328f12e24d6d1f324b4a4e43d55ba068cff6f3e4
Author: Michal Toman <mtoman>
Date:   Wed Feb 26 13:14:58 2014 +0100

    vmcore: cache kernel version into task directory
    
    Signed-off-by: Michal Toman <mtoman>

Comment 2 Fedora Update System 2014-02-27 13:46:59 UTC

retrace-server-1.11-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.11-1.el6

Comment 3 Fedora Update System 2014-03-01 07:11:59 UTC

Package retrace-server-1.11-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing retrace-server-1.11-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-0687/retrace-server-1.11-1.el6
then log in and leave karma (feedback).

Comment 6 Dave Wysochanski 2014-03-07 18:05:50 UTC

I think this has been fixed.  I verified with one vmcore on a non-patched retrace-server system, and submitted the same vmcore to a patched retrace-server system.  The unpatched system took 5 mins to load.  The patched one, just a couple seconds.

Comment 7 Fedora Update System 2014-07-31 11:52:51 UTC

retrace-server-1.12-2.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/retrace-server-1.12-2.el6

Comment 8 Fedora Update System 2014-08-15 18:58:11 UTC

retrace-server-1.12-2.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.