788678 – mkdumprd doesn't detect /var mounted on separate partion and kdump kernel fails

Bug 788678 - mkdumprd doesn't detect /var mounted on separate partion and kdump kernel fails

Summary: mkdumprd doesn't detect /var mounted on separate partion and kdump kernel fails

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	5.5
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Cong Wang
QA Contact:	Xu Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-02-08 18:46 UTC by Matthew Whitehead
Modified:	2018-12-02 16:11 UTC (History)
CC List:	8 users (show)
Fixed In Version:	kexec-tools-1.102pre-157.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-01-08 04:09:08 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Patch for bug 809983 (1.00 KB, patch) 2012-06-22 04:02 UTC, Cong Wang	no flags	Details \| Diff
Proposed Patch (4.96 KB, patch) 2012-06-22 10:27 UTC, Cong Wang	no flags	Details \| Diff
Proposed Patch v2 (15.67 KB, patch) 2012-06-25 06:36 UTC, Cong Wang	no flags	Details \| Diff
Proposed Patch v3 (15.53 KB, patch) 2012-06-25 09:44 UTC, Cong Wang	no flags	Details \| Diff
Show Obsolete (3) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0012	0	normal	SHIPPED_LIVE	kexec-tools bug fix and enhancement update	2013-01-08 08:38:45 UTC

Description Matthew Whitehead 2012-02-08 18:46:37 UTC

Description of problem: 

On our system /var is a separate parition. Under RHEL6, the mkdumprd utility correctly detects this on the live system (without needing entries in /etc/kdump.conf) and builds a kdump initrd that mounts the /var patition and dumps to /crash on that parition.

RHEL5 does not do this. The kdump initrd fails to save the core file because it is incorrectly looking for /var/crash on the root filesystem, and it isn't there. It eventually panics the kdump kernel.

Version-Release number of selected component (if applicable): RHEL5.5


How reproducible: 100%


Steps to Reproduce:
1. mkdumprd
2. sysrq-c
3. observe the kdump fail
  
Actual results: crash dump is not obtained.


Expected results: crash dump should be obtained.


Additional info:

Comment 1 Cong Wang 2012-06-21 06:39:06 UTC

Hi, Matthew,

This could be a dup of Bug 809983, which filesystem is your /var using? Please show me the output of 'mount'?

Thanks!

Comment 2 Eric Hagberg 2012-06-21 11:32:29 UTC

Can't read that BZ, but the fs is ext3.

np30c3n1.one-nyp.ms.com /var/user/hagberg 2$ mount|grep "/var "
/dev/mapper/v1-varvol on /var type ext3 (rw)
np30c3n1.one-nyp.ms.com /var/user/hagberg 3$

Comment 3 Cong Wang 2012-06-22 04:00:23 UTC

Hi, Eric,

Bug 809983 is basically a bug that we missed ext* modules when a non-root partation is used as a dump target, so could be a dup of this one.

So, you /var is ext3, but rootfs is a different ext* filesystem?

Thanks.

Comment 4 Cong Wang 2012-06-22 04:02:15 UTC

Created attachment 593644 [details]
Patch for bug 809983

This is the patch to fix bug 809983, please test if it could fix this problem as well?

Comment 5 Eric Hagberg 2012-06-22 04:05:47 UTC

/var and / are both ext3. The problem is that mkdumprd doesn't bother to figure out that /var could be on a separate partition, so the init script just blindly mounts /, and tries to copy the core into /var/crash/... and since /var/crash doesn't exist, it fails.

The patch won't fix this.

Comment 6 Cong Wang 2012-06-22 08:38:46 UTC

Yeah, I see. Then the problem is harder and more serious than Bug 809983.

Comment 7 Cong Wang 2012-06-22 10:04:20 UTC

Hmm, after a second thought, did you put the block device mounted on /var into your /etc/kdump.conf? Something like:

ext3 /dev/sdbX  #the device mounted on /var
path crash  #relative path inside /var

? Please share your kdump.conf if possible.

Thanks!

Comment 8 Cong Wang 2012-06-22 10:06:27 UTC

Errr, in your case it sould be:

ext3 /dev/mapper/v1-varvol
path crash

in /etc/kdump.conf.

Comment 9 Eric Hagberg 2012-06-22 10:08:15 UTC

The point is to _not_ touch the default kdump.conf, and mkdumprd should just work, like it does in RHEL6.

If I do put the ext3 and path directives into kdump.conf, then of course things work fine, but it shouldn't be needed for the stock case where you just want to dump to /var/crash on your local filesystem.

Comment 10 Cong Wang 2012-06-22 10:18:13 UTC

Yeah... I saw how RHEL6 handles this, will try to backport it to RHEL5.
Thanks!

Comment 11 Cong Wang 2012-06-22 10:27:58 UTC

Created attachment 593683 [details]
Proposed Patch

This is an untested patch, please help to test it? I can build an rpm for you if you need.

Thanks!

Comment 12 Eric Hagberg 2012-06-22 12:21:07 UTC

Still fails the same way. I don't see anything in the kdump initrd's generated init script that makes any attempt to mount /var (which is on an lvm partition, by the way).

Comment 13 Cong Wang 2012-06-25 05:53:07 UTC

I see, I still missed one part... :( Will update the patch.

Comment 14 Cong Wang 2012-06-25 06:36:36 UTC

Created attachment 594107 [details]
Proposed Patch v2

Updated patch

Comment 15 Eric Hagberg 2012-06-25 09:12:05 UTC

Still failing:

Creating block device ram9
Saving to the local filesystem UUID=5068996b-9a6b-4160-966b-7f94011d05dd
findfs: Unable to resolve 'UUID=5068996b-9a6b-4160-966b-7f94011d05dd'
BusyBox v1.2.0 (2009.07.02-14:09+0000) multi-call binary

No help available.

mount: Can't find /mnt in /etc/fstab
Attempting to enter user-space to capture vmcore
Creating root device.
Checking root filesystem.
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/: recovering journal
/: clean, 102501/750720 files, 693901/1501440 blocks
Mounting root filesystem.
Trying mount -t ext4 /dev/cciss/c0d0p2 /sysroot
kjournald starting.  Commit interval 5 secondst

EXT3 FS on cciss/c0d0p2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Using ext3 on root filesystem
Switching to new root and running init.
SELinux:  Disabled at runtime.
type=1404 audit(1340615410.722:2): selinux=0 auid=4294967295 ses=4294967295
INIT: version 2.86 booting
                Welcome to Red Hat Enterprise Linux Server
                Press 'I' to enter interactive startup.
Setting clock  (utc): Mon Jun 25 05:10:13 EDT 2012 [  OK  ]
Starting udev: Kernel panic - not syncing: Out of memory and no killable processes...

Comment 16 Cong Wang 2012-06-25 09:44:49 UTC

Created attachment 594144 [details]
Proposed Patch v3

Ok, let's just remove the UUID converting code.

Comment 17 Eric Hagberg 2012-06-25 14:07:49 UTC

Yep - it works now!

Comment 18 Eric Hagberg 2012-06-25 14:32:48 UTC

... almost. I'm pretty sure that the RHEL6 default mkdumprd uses makedumpfile by default so it isn't just using "cp" to create the vmcore file.

The currently-patched version appears to just use "cp" instead.

Comment 19 Cong Wang 2012-06-26 02:25:07 UTC

(In reply to comment #18)
> ... almost. I'm pretty sure that the RHEL6 default mkdumprd uses
> makedumpfile by default so it isn't just using "cp" to create the vmcore
> file.
> 
> The currently-patched version appears to just use "cp" instead.

Yeah, this is expected, because we don't have a chance to change the default core_collector to makedumpfile on RHEL5, so "cp" is still the default one. :)

Thanks for testing!

Comment 20 RHEL Program Management 2012-06-26 11:28:58 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 27 errata-xmlrpc 2013-01-08 04:09:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0012.html

Note You need to log in before you can comment on or make changes to this bug.