Bug 1266769 - RFE: refactor existing crash commands like kmem and 'foreach bt' in src/lib/retrace_worker.py into a post_retrace hook
RFE: refactor existing crash commands like kmem and 'foreach bt' in src/lib/r...
Status: NEW
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server (Show other bugs)
epel7
Unspecified Unspecified
high Severity medium
: ---
: ---
Assigned To: Michal Toman
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-27 11:15 EDT by Dave Wysochanski
Modified: 2016-11-30 19:54 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dave Wysochanski 2015-09-27 11:15:15 EDT
Description of problem:
Now that the hooks bug has been addressed (https://bugzilla.redhat.com/show_bug.cgi?id=1082376) and we have various hooks defined, it would be good to refactor our existing post debuginfo crash commands (the ones which create the files in the 'misc' subdir) into a specific post_retrace hook.

Ideally the core of retrace-server does the following:
1. download vmcore / extract / makedumpfile (if necessary)
2. identify kernel version, and setup the kernel-debuginfo
3. Run crash and validate crash works

Once the above is done, any further crash commands should probably be inside a configurable post_retrace or similar hook.

This would allow much greater flexibility in crash analysis.  Moving the existing commands to the new hook infrastructure should end up being functionally equivalent to what we have today (i.e. we should have all of the same files in 'misc' subdirectory created), and allow us to add commands into the hook file rather than having to rebuild retrace-server if some other crash command or post-analysis is desired.


Version-Release number of selected component (if applicable):
commit 76ef38e9132f1a1284b71e4095013388b8742e47
Merge: 1742972 4eeea83
Author: Michal Toman <michal.toman@gmail.com>
Date:   Tue Sep 15 10:40:33 2015 +0200

    Merge pull request #88 from abrt/add_hooks
    
    Script hooks


How reproducible:
N/A

Actual results:


Expected results:


Additional info:
NOTE: It will probably be useful to have retrace-server fork inside the post_retrace hook, and provide a separate notification when this is completed.  The reason for this is that some of the crash commands and/or automated analysis which may get run will likely take a bit longer of time to run.  However, once the core retrace-functionality has completed (download, extract, identify kernel version, setup kernel-debuginfo, make sure crash does work), we should notify the user that the vmcore is setup and they can run crash on it immediately.

Maybe the "post_retrace" hook has a different intent than what I'm asking for here, but when I read through the hooks that were added, it seemed the closest to what we need for automated crash analysis.
Comment 1 Dave Wysochanski 2016-02-23 12:26:40 EST
After some time, we now have a retrace-server build for the hooks bz.  So this should probably be the first attempt at using it.  We hopefully can refactor and create a "built-in" post_retrace hook which we can include in retrace-server as an example.
Comment 2 Dave Wysochanski 2016-02-29 11:01:29 EST
If we factor these out, I think we still need a step inside retrace-server to run crash.  This would address https://bugzilla.redhat.com/show_bug.cgi?id=1232019

If crash fails (exits with error code) and cannot be loaded, we should consider marking the task 'failed' or some form which indicates the vmcore file is likely not useful.  Such vmcores come up enough to address them since we lose time if people think the vmcore is useable when it's not.
Comment 3 Dave Wysochanski 2016-04-19 17:27:37 EDT
Now that I look at this, factoring out the various crash commands looks non-trivial due to the optional use of mock.  Right now we either run the crash commands through mock or through crash directly, and save the output to a variable.  Then after all the crash commands are run, each variable is saved into a file inside 'misc'.

If we make a post_retrace hooks script to do this then we'd need to either ditch mock there or we'd need to import some default post_retrace hook script into the mock environment.

I still like the idea of removing these post-setup crash commands since it shortens the time to complete a retrace-server task.  Ideally retrace-server should send a notification that the task is ready as soon as it has the kernel identified, the kernel-debuginfo symbols setup, and crash runs without error.  We shouldn't have to wait for many other crash commands to complete before getting a notification of success on the task.

Need to think about how to refactor this without creating regression.

Note You need to log in before you can comment on or make changes to this bug.