456154 – mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost

Bug 456154 - mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost

Summary: mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2008-07-21 19:33 UTC by Charlotte Richardson
Modified:	2009-09-09 05:14 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	600604 (view as bug list)
Environment:
Last Closed:	2009-01-20 21:00:32 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn (439 bytes, patch) 2008-07-21 19:33 UTC, Charlotte Richardson	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2009:0105	0	normal	SHIPPED_LIVE	kexec-tools bug fix and enhancement update	2009-01-20 16:04:36 UTC

Description Charlotte Richardson 2008-07-21 19:33:47 UTC

Description of problem:
If /ext/mdadm.conf creates disk mirrors whose names have more than one final
number, the mknod command created in the init script in the kdump initrd are
incorrect. This resulted in /var/crash not being accessible anf resulting in the
loss of the crash dump. The mdadm.conf file looked like:

ARRAY /dev/md0 ...
ARRAY /dev/md2 ...
ARRAY /dev/md1 ...
ARRAY /dev/md10 ...
ARRAY /dev/md11 ...
ARRAY /dev/md12 ...
ARRAY /dev/md13 ...
... up to ARRAY/md37 ...
ARRAY /dev/md3 ...

where /var/crash is /dev/md3.

Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-21.el5

How reproducible:
100%


Steps to Reproduce:
1. Create disk partitions as above in that order.
2. echo 'c' > /proc/sysrq-trigger
3. Observe what happens...
  
Actual results:
Either the dump is written to only one device in the /var/crash (corrupting it)
or is lost completely. Customer had the first scenario; I got both while testing
this.


Expected results:
Should work.


Additional info:
The problem is in the init script for the kdump kernel that is created by
/sbin/mkdumprd. The sed macro that is used to extract the minor numbers for the
mknod commands before the mdadm -A -s command incorrectly eats the first
trailing number into piece 1 instead of piece 2 if the device nane has more than
one trailing digit. In this particular case, /dev/md13 was created with minor
number 3 (as were /dev/md23 and /dev/md33 as well as the real /dev/md3 which was
/var/crash). The attached patch fixes the sed macro so that it will work for
these default mdadm names by consuming only non-digits into piece 1 and all the
trailing digits into piece 2. This solves the problem in the case of default
device names.

The whole mechanism really ought to be rethought, however, since you are not
restricted to using only default names by mdadm, and there is at any rate no
real need to start up any devices other than the ones the dump is being written
out to.

Because of where we are in our testing cycle here at Stratus, instead of
replacing kexec-tools-1.102pre-21.el5 with a fixed version, we are planning on
working around this problem by shipping a kdump_pre script that deletes the
erroneous /dev/mdnn devices and stops mdadm, then recreates the devices with the
correct minor numbers and restarts mdadm. (I've tested both fixes.) This also
only works for device names of the default format, though.

Comment 1 Charlotte Richardson 2008-07-21 19:33:47 UTC

Created attachment 312293 [details]
fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn

Comment 2 Neil Horman 2008-07-21 20:16:21 UTC

yep, looks good, I'll apply it shortly, thanks!

Comment 3 RHEL Program Management 2008-07-21 20:40:07 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Neil Horman 2008-07-22 15:56:29 UTC

fixed in -28.el5.  Thanks!

Comment 8 errata-xmlrpc 2009-01-20 21:00:32 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0105.html

Note You need to log in before you can comment on or make changes to this bug.