Bug 605411

Summary: check raw partition for core dump when starting kdump service
Product: Red Hat Enterprise Linux 6 Reporter: Dave Maley <dmaley>
Component: kexec-toolsAssignee: Cong Wang <amwang>
Status: CLOSED ERRATA QA Contact: Chao Ye <cye>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.1CC: akarlsso, amwang, cye, jwest, martin.wilck, nhorman, qcai, rkhan, syeghiay
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kexec-tools-2_0_0-155_el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 14:15:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 561978    
Attachments:
Description Flags
partner provided patch none

Description Dave Maley 2010-06-17 21:08:55 UTC
Created attachment 424944 [details]
partner provided patch

Description of problem:
When a "raw" partition is entered in /etc/kdump.conf, kdump does write a dump to the partition but the dump is not automatically recovered at the next reboot.

Judging from the comment in /etc/init.d/kdump:

function start()
{
       #TODO check raw partition for core dump image

this seems to be a missing feature actually.

However it makes dumping to a raw partition (the most robust setting IMO) unpractical for anybody except gurus who would be able to recover the dump manually (guessing the size of the dump would be the problem). I am not such a guru and don't feel like figuring out how the size is encoded in the makedumpfile header.


Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-72.el6


How reproducible:
always


Steps to Reproduce:
1. configure a "raw" device in /etc/kdump.conf
2. crash system
3. vmcore not retrieved from raw partition upon reboot

  
Actual results:
dump is written but not recovered


Expected results:
dump is recovered, as it was with diskdump in the old days


Additional info:

Comment 1 Dave Maley 2010-06-17 21:12:44 UTC
It looks like RHEL5 needs this as well, however I'll hold off on cloning until we have some feedback on the patch from FJ.

Comment 2 RHEL Program Management 2010-06-17 21:14:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Cong Wang 2010-06-22 09:35:22 UTC
This looks strange, if you really want to collect the core from a raw partition, why not just dump to a vmcore file instead of a raw partition? Neil, is this what 'raw XXX' is used for?

Comment 4 Neil Horman 2010-06-23 11:43:54 UTC
 

raw XXX causes the kdump initramfs to dump /proc/vmcore to a raw disk w/o the aid of any file system.

The patch above appears to recover such a dump, although I'm hesitant to take it in this form, as there is no guarantee that:
1) there will be space in /var/crash for such a dump
2) there is no guarantee that we know to need to use makedumpfile -R on the dump.

Thats part of the reason we require by hand dump recovery

it might be better to just indicate to the user that a dump is waiting for recovery on startup

Comment 5 RHEL Program Management 2010-07-15 15:22:05 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 6 Neil Horman 2010-07-19 11:02:24 UTC
Dave, still waiting on feedback from FJ, as per comment 4.  Thanks!

Comment 7 Martin Wilck 2010-07-20 09:19:46 UTC
(In reply to comment #4)

> raw XXX causes the kdump initramfs to dump /proc/vmcore to a raw disk w/o the
> aid of any file system.
> 
> The patch above appears to recover such a dump, although I'm hesitant to take
> it in this form, as there is no guarantee that:
> 1) there will be space in /var/crash for such a dump

You never know that, the same situation exists if you dump straight into a partition in the first place. But AFAICT many customers use DUMPLEVEL 31 anyway, so the space requirements are moderate.

mkdumprd issues a warning if dump is configured to write to a file system partition that it considers too small to capture the dump (well, IMO mkdumprd's required size estimate is pretty rough). The same could be done for raw partitions.

> 2) there is no guarantee that we know to need to use makedumpfile -R on the
> dump.

I don't understand what you mean here. To my understanding makedumpfile -R will be required because mkdumprd either uses dd or makedumpfile -F, which both produce "flat" format.

> Thats part of the reason we require by hand dump recovery
> 
> it might be better to just indicate to the user that a dump is waiting for
> recovery on startup    

That should be very clearly documented and indicated, otherwise the risk of loosing a dump in a critical situation is very high.

Raw dumping is desirable in many environments because the chance for failure is lower than in all other cases. Besides, it behaves like diskdump in RHEL3/RHEL4 and many users like this behavior.

Comment 8 Neil Horman 2010-07-20 12:02:50 UTC
fine.

Comment 9 Neil Horman 2010-08-02 12:44:36 UTC
lowering priority/severity, since this has not been accepted for 6.0, and isn't causing any real failures currently.

Comment 11 Martin Wilck 2010-10-25 13:46:51 UTC
Changing into a 6.1 feature request.

Comment 12 Cong Wang 2010-11-09 05:10:49 UTC
So, besides this patch, we also need to:

1) warn if the partition is too small
2) document that a dump is waiting for recovery on startup in this case.

Am I missing anything?

Comment 13 Martin Wilck 2010-11-09 09:29:54 UTC
With dump compression, how do you know if the partition is too small? With the terabyte memory era approaching, requiring the dump partition to be the same size as physical memory is getting unrealistic.

Comment 14 Neil Horman 2010-11-09 13:50:18 UTC
Amerigo, I think all we need to do here honestly is just check to see if a core exits and try to recover it as the attached patch does.  Lets not try get too fancy with it.

Comment 16 Chao Ye 2011-03-18 09:56:16 UTC
Tested with latest build:
=====================================================
[root@ibm-js22-07 ~]# rpm -q kernel kexec-tools
kernel-2.6.32-122.el6.ppc64
kexec-tools-2.0.0-171.el6.ppc64
[root@ibm-js22-07 ~]# tail /etc/kdump.conf 
#kdump_post /var/crash/scripts/kdump-post.sh
#extra_bins /usr/bin/lftp
#disk_timeout 30
#extra_modules gfs2
#options modulename options
#default shell

raw /dev/sda3
core_collector makedumpfile -c --message-level 1 -d 31
default shell
[root@ibm-js22-07 ~]# touch /etc/kdump.conf 
[root@ibm-js22-07 ~]# service kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):

  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-122.el6.ppc64kdump.img
Starting kdump:[  OK  ]
[root@ibm-js22-07 ~]# echo c > /proc/sysrq-trigger
--------------------------------------------------------------------------------------
mdadm: No arrays found in config file or automatically
Free memory/Total memory (free %): 150208 / 231296 ( 64.9419 )
Scanning logical volumes
  Reading all physical volumes.  This may take a while...
  Found volume group "vg_ibmjs2207" using metadata type lvm2
Activating logical volumes
  2 logical volume(s) in volume group "vg_ibmjs2207" now active
Free memory/Total memory (free %): 149312 / 231296 ( 64.5545 )
Saving to partition /dev/sda3
Free memory/Total memory (free %): 149312 / 231296 ( 64.5545 )
Copying data                       : [100 %] 
Saving core complete
Restarting system.
......

Starting RPC idmapd: [  OK  ]
Dump saved to /var/crash/2011-03-18-05:46/vmcore
Starting kdump:[  OK  ]
......
[root@ibm-js22-07 ~]# ls -lsh /var/crash/2011-03-18-05\:46/
total 28M
28M -rw-------. 1 root root 28M Mar 18 05:46 vmcore

Change status to VERIFIED.

Comment 17 errata-xmlrpc 2011-05-19 14:15:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0736.html