Red Hat Bugzilla – Bug 1266769
RFE: refactor existing crash commands like kmem and 'foreach bt' in src/lib/retrace_worker.py into a post_retrace hook
Last modified: 2018-02-05 13:57:22 EST
Description of problem:
Now that the hooks bug has been addressed (https://bugzilla.redhat.com/show_bug.cgi?id=1082376) and we have various hooks defined, it would be good to refactor our existing post debuginfo crash commands (the ones which create the files in the 'misc' subdir) into a specific post_retrace hook.
Ideally the core of retrace-server does the following:
1. download vmcore / extract / makedumpfile (if necessary)
2. identify kernel version, and setup the kernel-debuginfo
3. Run crash and validate crash works
Once the above is done, any further crash commands should probably be inside a configurable post_retrace or similar hook.
This would allow much greater flexibility in crash analysis. Moving the existing commands to the new hook infrastructure should end up being functionally equivalent to what we have today (i.e. we should have all of the same files in 'misc' subdirectory created), and allow us to add commands into the hook file rather than having to rebuild retrace-server if some other crash command or post-analysis is desired.
Version-Release number of selected component (if applicable):
Merge: 1742972 4eeea83
Author: Michal Toman <firstname.lastname@example.org>
Date: Tue Sep 15 10:40:33 2015 +0200
Merge pull request #88 from abrt/add_hooks
NOTE: It will probably be useful to have retrace-server fork inside the post_retrace hook, and provide a separate notification when this is completed. The reason for this is that some of the crash commands and/or automated analysis which may get run will likely take a bit longer of time to run. However, once the core retrace-functionality has completed (download, extract, identify kernel version, setup kernel-debuginfo, make sure crash does work), we should notify the user that the vmcore is setup and they can run crash on it immediately.
Maybe the "post_retrace" hook has a different intent than what I'm asking for here, but when I read through the hooks that were added, it seemed the closest to what we need for automated crash analysis.
After some time, we now have a retrace-server build for the hooks bz. So this should probably be the first attempt at using it. We hopefully can refactor and create a "built-in" post_retrace hook which we can include in retrace-server as an example.
If we factor these out, I think we still need a step inside retrace-server to run crash. This would address https://bugzilla.redhat.com/show_bug.cgi?id=1232019
If crash fails (exits with error code) and cannot be loaded, we should consider marking the task 'failed' or some form which indicates the vmcore file is likely not useful. Such vmcores come up enough to address them since we lose time if people think the vmcore is useable when it's not.
Now that I look at this, factoring out the various crash commands looks non-trivial due to the optional use of mock. Right now we either run the crash commands through mock or through crash directly, and save the output to a variable. Then after all the crash commands are run, each variable is saved into a file inside 'misc'.
If we make a post_retrace hooks script to do this then we'd need to either ditch mock there or we'd need to import some default post_retrace hook script into the mock environment.
I still like the idea of removing these post-setup crash commands since it shortens the time to complete a retrace-server task. Ideally retrace-server should send a notification that the task is ready as soon as it has the kernel identified, the kernel-debuginfo symbols setup, and crash runs without error. We shouldn't have to wait for many other crash commands to complete before getting a notification of success on the task.
Need to think about how to refactor this without creating regression.
We need to start using hooks but there are a few problems.
I think the first step is to complete this bug. This may mean other patches such as allowing the hook script(s) to fork while allowing retrace to finish with a notification that the vmcore can be loaded and the backtrace is available.
Here are my current thoughts about this bug. I think to support both the use case of users that just want immediate access to a vmcore / backtrace, as well as those that want to wait for all "automated analysis" via post-retrace hooks, we need to break up the existing behavior into two phases:
Phase 1. Once kernelver detection is done, the kernel-debuginfo file is setup, and the backtrace is available, we should send a notification that the vmcore is able to be loaded.
Phase 2. After the notification, we run the existing crash commands as a "built-in" post-retrace hook script. Then once all post-retrace hooks are complete, we can send a second notification.