Red Hat Bugzilla – Bug 572836
[RFE] Collect crash dump and other information useful for analysis when test panics/stalls
Last modified: 2012-04-26 03:16:35 EDT
[RFE] Collect crash dump and other information useful for analysis
when test panics/stalls
If the test machine stalls or panics, Beaker should be able to
collect information of the system for post-motem analysis.
Information such as:
- crash dump
- crash dump summary
- SysRq logs
With legacy RHTS, I had a local patch to watchdog script, which
kicks SysRq commands for some information and then triggers crash dump.
Also, there are separate tests for setting up kdump and checking vmcore.
So, this feature might be break-down to the following sub-features:
- metadata showing how to access the remote dump server
- utility test program to set up crash dump
- utility test program to check the collected dump
- utility test program to collect logs
- watchdog feature to run host/distro-specific program on lab controller
- interface for the watchdog script to obtain console/BMC information
The latest version of the beah harness has the following implemented https://bugzilla.redhat.com/show_bug.cgi?id=633258, although perhaps it's not quite what you are after.
We don't have it on our roadmap to implement this feature in the immediate future. Are you able to apply/implement your old patch onto the new watchdog?
(In reply to comment #1)
> The latest version of the beah harness has the following implemented
> https://bugzilla.redhat.com/show_bug.cgi?id=633258, although perhaps it's not
> quite what you are after.
I can't tell from the comments in BZ#633258.
But if the feature is limited to harness errors, as there said
"in case of harness errors", it's not what I want.
I found "bkr workflow-simple" has an option "--dump".
$ bkr workflow-simple --help
--dump Turn on ndnc/kdump. (which one depends on the family)
Isn't this something intended for the feature I described?
> We don't have it on our roadmap to implement this feature in the immediate
> future. Are you able to apply/implement your old patch onto the new watchdog?
The patch needs to be applied where the watchdog calls lab controller
to finish testing.
Where shall I apply the patch?
I'm embarrassed to say I didn't realise that option existed.
That should do something like what you want, however you'll need to have specific tasks in your Beaker library for them to work. I'll have to have a look at them because I don't think they will work in an environment external to red hat as they are.
will review patch provided by Jun'ichi.
Created attachment 516239 [details]
Add script-callout feature on external watchdog timeout
When external watchdog expires, it might mean the system is stalled
and collecting additional information is often useful.
This patch adds a feature to run a script for such a case.
Typically, the script would trigger crash dump on the system.
Since crash dump can take a long time, it allows the script to
say 'extend watchdog' (by return value 2).
Patch is made for beaker 0.6.14-7.el5.
I left the script path and the extention length ('1800') hardcoded
but a configurable parameter might be better.
Example of the watchdog script is attached.
Created attachment 516241 [details]
Example of watchdog_script
This script does the following:
- Try to find serial console connection to the host and
send SysRq commands to dump information on console.log
- Try to trigger crash dump by following method
* send NMI to the host
o use cobbler feature BZ#727394 if available
* or send SysRq-c
Job 953 and 958 in our lab are sample results.
(953 is on a machine without IPMI support, where dump is triggered by SysRq-c.
958 is on a machine with IPMI support, where dump is triggered by NMI via cobbler.)
"system-crash" is the test emulating system stall.
(And it reports 'PASS' if the system is successfully rebooted after the stall.)
This looks pretty good. I'll work on getting this into 0.8.1 and I'll see about making a back port for 0.6.14 as well.
Couple of things I plan to change:
1 - watchdog script will be optional and full path to script will be specified in config file.
2 - watchdog script will return the number of seconds the watchdog should be extended by.
0.8.1 is scheduled to be released during the week of Dec 19th. I can make the updated version of 0.6.14 at that time as well.
keeping 0.8.1 for stability changes
pushed to gerrit for review.