605411 – check raw partition for core dump when starting kdump service

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 605411 - check raw partition for core dump when starting kdump service

Summary: check raw partition for core dump when starting kdump service

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	6.1
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Cong Wang
QA Contact:	Chao Ye
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	561978
TreeView+	depends on / blocked

Reported:	2010-06-17 21:08 UTC by Dave Maley
Modified:	2018-11-14 19:01 UTC (History)
CC List:	9 users (show)
Fixed In Version:	kexec-tools-2_0_0-155_el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-19 14:15:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
partner provided patch (869 bytes, patch) 2010-06-17 21:08 UTC, Dave Maley	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0736	0	normal	SHIPPED_LIVE	kexec-tools bug fix update	2011-05-18 18:09:18 UTC

Description Dave Maley 2010-06-17 21:08:55 UTC

Created attachment 424944 [details]
partner provided patch

Description of problem:
When a "raw" partition is entered in /etc/kdump.conf, kdump does write a dump to the partition but the dump is not automatically recovered at the next reboot.

Judging from the comment in /etc/init.d/kdump:

function start()
{
       #TODO check raw partition for core dump image

this seems to be a missing feature actually.

However it makes dumping to a raw partition (the most robust setting IMO) unpractical for anybody except gurus who would be able to recover the dump manually (guessing the size of the dump would be the problem). I am not such a guru and don't feel like figuring out how the size is encoded in the makedumpfile header.


Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-72.el6


How reproducible:
always


Steps to Reproduce:
1. configure a "raw" device in /etc/kdump.conf
2. crash system
3. vmcore not retrieved from raw partition upon reboot

  
Actual results:
dump is written but not recovered


Expected results:
dump is recovered, as it was with diskdump in the old days


Additional info:

Comment 1 Dave Maley 2010-06-17 21:12:44 UTC

It looks like RHEL5 needs this as well, however I'll hold off on cloning until we have some feedback on the patch from FJ.

Comment 2 RHEL Program Management 2010-06-17 21:14:10 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Cong Wang 2010-06-22 09:35:22 UTC

This looks strange, if you really want to collect the core from a raw partition, why not just dump to a vmcore file instead of a raw partition? Neil, is this what 'raw XXX' is used for?

Comment 4 Neil Horman 2010-06-23 11:43:54 UTC

 

raw XXX causes the kdump initramfs to dump /proc/vmcore to a raw disk w/o the aid of any file system.

The patch above appears to recover such a dump, although I'm hesitant to take it in this form, as there is no guarantee that:
1) there will be space in /var/crash for such a dump
2) there is no guarantee that we know to need to use makedumpfile -R on the dump.

Thats part of the reason we require by hand dump recovery

it might be better to just indicate to the user that a dump is waiting for recovery on startup

Comment 5 RHEL Program Management 2010-07-15 15:22:05 UTC

This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release. It has
been denied for the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 6 Neil Horman 2010-07-19 11:02:24 UTC

Dave, still waiting on feedback from FJ, as per comment 4.  Thanks!

Comment 7 Martin Wilck 2010-07-20 09:19:46 UTC

(In reply to comment #4)

> raw XXX causes the kdump initramfs to dump /proc/vmcore to a raw disk w/o the
> aid of any file system.
> 
> The patch above appears to recover such a dump, although I'm hesitant to take
> it in this form, as there is no guarantee that:
> 1) there will be space in /var/crash for such a dump

You never know that, the same situation exists if you dump straight into a partition in the first place. But AFAICT many customers use DUMPLEVEL 31 anyway, so the space requirements are moderate.

mkdumprd issues a warning if dump is configured to write to a file system partition that it considers too small to capture the dump (well, IMO mkdumprd's required size estimate is pretty rough). The same could be done for raw partitions.

> 2) there is no guarantee that we know to need to use makedumpfile -R on the
> dump.

I don't understand what you mean here. To my understanding makedumpfile -R will be required because mkdumprd either uses dd or makedumpfile -F, which both produce "flat" format.

> Thats part of the reason we require by hand dump recovery
> 
> it might be better to just indicate to the user that a dump is waiting for
> recovery on startup    

That should be very clearly documented and indicated, otherwise the risk of loosing a dump in a critical situation is very high.

Raw dumping is desirable in many environments because the chance for failure is lower than in all other cases. Besides, it behaves like diskdump in RHEL3/RHEL4 and many users like this behavior.

Comment 8 Neil Horman 2010-07-20 12:02:50 UTC

fine.

Comment 9 Neil Horman 2010-08-02 12:44:36 UTC

lowering priority/severity, since this has not been accepted for 6.0, and isn't causing any real failures currently.

Comment 11 Martin Wilck 2010-10-25 13:46:51 UTC

Changing into a 6.1 feature request.

Comment 12 Cong Wang 2010-11-09 05:10:49 UTC

So, besides this patch, we also need to:

1) warn if the partition is too small
2) document that a dump is waiting for recovery on startup in this case.

Am I missing anything?

Comment 13 Martin Wilck 2010-11-09 09:29:54 UTC

With dump compression, how do you know if the partition is too small? With the terabyte memory era approaching, requiring the dump partition to be the same size as physical memory is getting unrealistic.

Comment 14 Neil Horman 2010-11-09 13:50:18 UTC

Amerigo, I think all we need to do here honestly is just check to see if a core exits and try to recover it as the attached patch does.  Lets not try get too fancy with it.

Comment 16 Chao Ye 2011-03-18 09:56:16 UTC

Tested with latest build:
=====================================================
[root@ibm-js22-07 ~]# rpm -q kernel kexec-tools
kernel-2.6.32-122.el6.ppc64
kexec-tools-2.0.0-171.el6.ppc64
[root@ibm-js22-07 ~]# tail /etc/kdump.conf 
#kdump_post /var/crash/scripts/kdump-post.sh
#extra_bins /usr/bin/lftp
#disk_timeout 30
#extra_modules gfs2
#options modulename options
#default shell

raw /dev/sda3
core_collector makedumpfile -c --message-level 1 -d 31
default shell
[root@ibm-js22-07 ~]# touch /etc/kdump.conf 
[root@ibm-js22-07 ~]# service kdump restart
Stopping kdump:[  OK  ]
Detected change(s) the following file(s):

  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.32-122.el6.ppc64kdump.img
Starting kdump:[  OK  ]
[root@ibm-js22-07 ~]# echo c > /proc/sysrq-trigger
--------------------------------------------------------------------------------------
mdadm: No arrays found in config file or automatically
Free memory/Total memory (free %): 150208 / 231296 ( 64.9419 )
Scanning logical volumes
  Reading all physical volumes.  This may take a while...
  Found volume group "vg_ibmjs2207" using metadata type lvm2
Activating logical volumes
  2 logical volume(s) in volume group "vg_ibmjs2207" now active
Free memory/Total memory (free %): 149312 / 231296 ( 64.5545 )
Saving to partition /dev/sda3
Free memory/Total memory (free %): 149312 / 231296 ( 64.5545 )
Copying data                       : [100 %] 
Saving core complete
Restarting system.
......

Starting RPC idmapd: [  OK  ]
Dump saved to /var/crash/2011-03-18-05:46/vmcore
Starting kdump:[  OK  ]
......
[root@ibm-js22-07 ~]# ls -lsh /var/crash/2011-03-18-05\:46/
total 28M
28M -rw-------. 1 root root 28M Mar 18 05:46 vmcore

Change status to VERIFIED.

Comment 17 errata-xmlrpc 2011-05-19 14:15:15 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0736.html

Note You need to log in before you can comment on or make changes to this bug.