Bug 1328227 - Improve handling of damaged / partial vmcores: add --zero_excluded and --minimal to 'crash_cmd' if certain crash failures occur
Summary: Improve handling of damaged / partial vmcores: add --zero_excluded and --mini...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: retrace-server
Version: el6
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Dave Wysochanski
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-18 19:00 UTC by Dave Wysochanski
Modified: 2018-12-21 15:43 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-21 15:43:30 UTC
Type: Bug


Attachments (Terms of Use)

Description Dave Wysochanski 2016-04-18 19:00:09 UTC
Description of problem:
Some vmcores are damaged but can be run with other options.  For example, recently we received one vmcore and crash load showed the following:

WARNING: /cores/retrace/tasks/776013158/crash/vmcore:
         This dumpfile is incomplete.  This may cause the crash session
         to fail entirely, may cause commands to fail, or may result in
         unpredictable runtime behavior.
   NOTE: This dumpfile may be analyzed with the --zero_excluded command
         line option, in which case any read requests from missing pages
         will return zero-filled memory.


I manually added the option via the following:
$ echo -n "crash --zero_excluded" > /cores/retrace/tasks/776013158/crash_cmd 

We can then run other commands in the vmcore such as 'bt', etc.


Version-Release number of selected component (if applicable):
retrace-server-1.15-1.el6.noarch

How reproducible:
Once so far but depends on how many damaged / partial vmcores we get.


Steps to Reproduce:
TBD

Actual results:
"retrace-server-interact <taskid> crash" fails with crash exiting with an error, but the retrace-server task status == 'success'

Expected results:
"retrace-server-interact <taskid> crash" does not fail if some other option such as '--zero_excluded' would work.


Additional info:

This is not a high priority but it would help in some instances.  We still get approximately 20% vmcores which fail in some way.

There's some other options as well which may be useful, such as --no_kmem_cache, and of course, if all else fails, --minimal.  Right now the way the code is structured for the 32-bit vmcore and the VMware --phys_base parameter should probably be refactored so we can add these other options.

Also probably we need some patches so that running crash affects the task 'status' in some way, or there's a secondary status possibly.  Will need to think about it and work on some patches to see what can be done.

The one example we had occurred on a vmcore with kernel 3.10.0-327.13.1.el7.x86_64.debug

Comment 2 Dave Wysochanski 2018-02-05 18:59:58 UTC
This probably should be tackled alongside https://bugzilla.redhat.com/show_bug.cgi?id=1232019

Comment 3 Dave Wysochanski 2018-03-02 21:52:01 UTC
I am not sure how important this is but keeping it open for now.

Comment 4 Dave Wysochanski 2018-04-18 11:01:28 UTC
For now just set --minimal if we recognize the kernel and have a decent sized kernel log.  https://github.com/abrt/retrace-server/pull/187

Comment 5 Dave Wysochanski 2018-12-21 15:43:30 UTC
$ git tag --contains e27be24
1.19.0


Note You need to log in before you can comment on or make changes to this bug.