Bug 572836
Summary: | [RFE] Collect crash dump and other information useful for analysis when test panics/stalls | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Beaker | Reporter: | Jun'ichi NOMURA <junichi.nomura> | ||||||
Component: | lab controller | Assignee: | Bill Peck <bpeck> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 0.5 | CC: | azelinka, bpeck, dcallagh, kbaker, mcsontos, rmancy | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2012-04-26 07:16:35 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 604328 | ||||||||
Attachments: |
|
Description
Jun'ichi NOMURA
2010-03-12 07:04:11 UTC
Hi Jun'ichi, The latest version of the beah harness has the following implemented https://bugzilla.redhat.com/show_bug.cgi?id=633258, although perhaps it's not quite what you are after. We don't have it on our roadmap to implement this feature in the immediate future. Are you able to apply/implement your old patch onto the new watchdog? Thanks (In reply to comment #1) > The latest version of the beah harness has the following implemented > https://bugzilla.redhat.com/show_bug.cgi?id=633258, although perhaps it's not > quite what you are after. I can't tell from the comments in BZ#633258. But if the feature is limited to harness errors, as there said "in case of harness errors", it's not what I want. I found "bkr workflow-simple" has an option "--dump". $ bkr workflow-simple --help ... --dump Turn on ndnc/kdump. (which one depends on the family) Isn't this something intended for the feature I described? > We don't have it on our roadmap to implement this feature in the immediate > future. Are you able to apply/implement your old patch onto the new watchdog? The patch needs to be applied where the watchdog calls lab controller to finish testing. Where shall I apply the patch? I'm embarrassed to say I didn't realise that option existed. That should do something like what you want, however you'll need to have specific tasks in your Beaker library for them to work. I'll have to have a look at them because I don't think they will work in an environment external to red hat as they are. will review patch provided by Jun'ichi. Created attachment 516239 [details]
Add script-callout feature on external watchdog timeout
When external watchdog expires, it might mean the system is stalled
and collecting additional information is often useful.
This patch adds a feature to run a script for such a case.
Typically, the script would trigger crash dump on the system.
Since crash dump can take a long time, it allows the script to
say 'extend watchdog' (by return value 2).
Patch is made for beaker 0.6.14-7.el5.
I left the script path and the extention length ('1800') hardcoded
but a configurable parameter might be better.
Example of the watchdog script is attached.
Created attachment 516241 [details] Example of watchdog_script This script does the following: - Try to find serial console connection to the host and send SysRq commands to dump information on console.log - Try to trigger crash dump by following method * send NMI to the host o use cobbler feature BZ#727394 if available * or send SysRq-c Job 953 and 958 in our lab are sample results. (953 is on a machine without IPMI support, where dump is triggered by SysRq-c. 958 is on a machine with IPMI support, where dump is triggered by NMI via cobbler.) "system-crash" is the test emulating system stall. (And it reports 'PASS' if the system is successfully rebooted after the stall.) This looks pretty good. I'll work on getting this into 0.8.1 and I'll see about making a back port for 0.6.14 as well. Couple of things I plan to change: 1 - watchdog script will be optional and full path to script will be specified in config file. 2 - watchdog script will return the number of seconds the watchdog should be extended by. 0.8.1 is scheduled to be released during the week of Dec 19th. I can make the updated version of 0.6.14 at that time as well. keeping 0.8.1 for stability changes pushed to gerrit for review. |