Bug 1440383 - RFE: Improve email notifications for failed and successful vmcores by giving suggested commands and other organizational info
Summary: RFE: Improve email notifications for failed and successful vmcores by giving ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: el6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dave Wysochanski
QA Contact: Fedora Extras Quality Assurance
Dave Wysochanski
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-04-08 11:02 UTC by Dave Wysochanski
Modified: 2018-02-20 16:23 UTC (History)
6 users (show)

Fixed In Version: retrace-server-1.18.0-1.fc27 retrace-server-1.18.0-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-13 17:51:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Dave Wysochanski 2017-04-08 11:02:26 UTC
Description of problem:
Today the email notifications contain a lot of useful information but could be better.  In particular, I can think of at least two instances where they can be improved:
1. Failed vmcores could suggest a "retrace-server-worker --restart" command with an explicit kernel version.  Most people don't know that they can restart the task with an explicit kernel version and this often solves the kernel version detection problem.
2. Success vmcores should show sample 'retrace-server-interact' command to access the vmcore.  The reason why this is important is that people often cut/paste the email text into case summary text as a way to sort multiple vmcores on the same case.
3. Success vmcores could show the output of 'sys' or at least a few key pieces of information from 'sys'.  This is important for the same reason as #2 but is a bit tricky since it relies on the output of 'sys' which may be moved to a retrace-server hook.  Also we may not want to send things like 'nodename' in email since it is customer unique info.  At least the PANIC line might be useful to identify identical vmcores.


Version-Release number of selected component (if applicable):
retrace-server-1.17.0-1.el6.noarch

How reproducible:
Every time

Steps to Reproduce:
Submit a vmcore and give email address for notification

Actual results:
Email notification missing important suggestion information and other vmcore identifying information

Expected results:
email notification contains better information, suggesting next steps and any info helping organize multiple vmcores

Additional info:
The 'suggested next commands' type of info is most likely trivial to add.  The other stuff probably will take a bit more thought but should be easy as well.

Comment 1 Dave Wysochanski 2017-04-11 19:33:50 UTC
We should also be dumping the md5sum for the file now if it exists.

Comment 2 Dave Wysochanski 2017-04-11 19:54:18 UTC
For now I think I will avoid putting the 'sys' output in the email notification.

Proposed new text for success task:

The task #167075214 started on optimus.gsslab.rdu2.redhat.com succeeded

URL: https://optimus.gsslab.rdu2.redhat.com/manager/167075214
Task directory: /cores/retrace/tasks/167075214
Started: 2017-04-11 15:35:53
Finished: 2017-04-11 15:36:52
Remote file(s): foo-vmcore.tar.gz
Md5sum: 214c4daf0cfb594a1376f5f8a07b9c71 foo-vmcore.tar.gz
Kernelver: 2.6.32-696.el6.x86_64[
Log: https://optimus.gsslab.rdu2.redhat.com/manager/167075214/misc/retrace-log
Crash: retrace-server-interact 167075214 crash


Proposed new text for failed task:

The task #167075214 started on optimus.gsslab.rdu2.redhat.com failed

NOTE: If kernel version detection failed, and you know the kernel version, you may try re-starting the task with the following command.  Please check the retrace-log for more information on why the task failed.  The following example assumes the vmcores kernel version is 2.6.32-358.el6 on x86_64 arch:
$ retrace-server-worker --restart --kernelver 2.6.32-358.el6.x86_64 --arch x86_64 167075214

URL: https://optimus.gsslab.rdu2.redhat.com/manager/167075214
Task directory: /cores/retrace/tasks/167075214
Started: 2017-04-11 15:35:53
Finished: 2017-04-11 15:36:52
Remote file(s): foo-vmcore.tar.gz
Md5sum: 214c4daf0cfb594a1376f5f8a07b9c71 foo-vmcore.tar.gz
Kernelver: unknown
Log: https://optimus.gsslab.rdu2.redhat.com/manager/167075214/misc/retrace-log

Comment 3 loberman 2017-04-11 20:13:51 UTC
If this is a custom kernel and the user needs to place the custome Debuginfo in place should we warn of that too and provide text saying.

This is a custom kernel and requires the custom Debuginfo to be placed in path xyz on the Optimus path /cores/xxxxx

Please do so and resubmit your task using 
retrace-server-worker --restart --kernelver 2.6.32-642.15.1.el6.qtine2.x86_64 --arch x86_64 123456789

Comment 4 Dave Wysochanski 2017-04-11 20:31:15 UTC
(In reply to loberman from comment #3)
> If this is a custom kernel and the user needs to place the custome Debuginfo
> in place should we warn of that too and provide text saying.
> 
> This is a custom kernel and requires the custom Debuginfo to be placed in
> path xyz on the Optimus path /cores/xxxxx
> 
> Please do so and resubmit your task using 
> retrace-server-worker --restart --kernelver
> 2.6.32-642.15.1.el6.qtine2.x86_64 --arch x86_64 123456789

Good idea though identifying whether it is a custom kernel is probably non-trivial.  I may just add a sentence regarding the location of placing a kernel-debuginfo file, i.e.

If this is a test or custom kernel version, or for some reason the kernel-debuginfo repository is unavailable, you can place the kernel-debuginfo RPM at /cores/retrace/repos/download/ and restart the task with:
$ retrace-server-worker --restart 123456789

Comment 5 Matej Marušák 2017-04-12 06:14:02 UTC
I like where are going with this. One think is not to forget, that retrace server also retraces usercores not only vmcores. So when creating email we should check if the task was vmcore or usercore and then add different fields.
For example vmcore can have field :
Kernelver: 2.6.32-696.el6.x86_64
but usercore should then have fields:
Package: coreutils-8.25-15.fc25.x86_64
Executable: /usr/bin/sleep
Os_release: Fedora 24 (Twenty Five)

Also about the NOTE - how do you plan to generate this message, since there can be different reasons.

Comment 6 Dave Wysochanski 2017-04-12 15:45:20 UTC
(In reply to Matej Marušák from comment #5)
> I like where are going with this. One think is not to forget, that retrace
> server also retraces usercores not only vmcores. So when creating email we
> should check if the task was vmcore or usercore and then add different
> fields.
> For example vmcore can have field :
> Kernelver: 2.6.32-696.el6.x86_64
> but usercore should then have fields:
> Package: coreutils-8.25-15.fc25.x86_64
> Executable: /usr/bin/sleep
> Os_release: Fedora 24 (Twenty Five)
> 
> Also about the NOTE - how do you plan to generate this message, since there
> can be different reasons.

Agreed and I'd like to be using userspace cores in production for RHEL but this has never worked (see https://bugzilla.redhat.com/show_bug.cgi?id=1292556) despite the fact I've tried multiple times.  Maybe we should talk offline about what needs done for deployment to production of userspace cores.

I think the 'success' tasks are simple, and checking for the task type is simple.  The failure tasks we have a couple options.  First, we could just specify a list of general help for things to try.  Second, we could make this bug dependent on another bug that would update the status of failure to be more granular.  I have been wanting to do the latter for some time for other reasons (for example, does it make sense to say a task "succeeded" if crash fails to run?) so I may try this.

Comment 7 Dave Wysochanski 2017-12-20 21:39:24 UTC
https://github.com/abrt/retrace-server/pull/166

Comment 9 Dave Wysochanski 2018-01-17 14:22:23 UTC
https://github.com/abrt/retrace-server/pull/167

Comment 10 Dave Wysochanski 2018-01-17 14:28:11 UTC
Wrong commit / pull request.

Comment 11 Dave Wysochanski 2018-01-17 14:34:27 UTC
https://github.com/abrt/retrace-server/pull/168

Comment 12 Fedora Update System 2018-02-01 15:51:47 UTC
retrace-server-1.18.0-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-e5741ca105

Comment 13 Fedora Update System 2018-02-01 15:52:25 UTC
retrace-server-1.18.0-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-bc35ca9028

Comment 14 Fedora Update System 2018-02-02 18:24:28 UTC
retrace-server-1.18.0-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-bc35ca9028

Comment 15 Fedora Update System 2018-02-02 18:47:47 UTC
retrace-server-1.18.0-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-e5741ca105

Comment 16 Fedora Update System 2018-02-13 17:51:56 UTC
retrace-server-1.18.0-1.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 17 Fedora Update System 2018-02-20 16:23:52 UTC
retrace-server-1.18.0-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.