Red Hat Bugzilla – Bug 456154
mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost
Last modified: 2009-09-09 01:14:48 EDT
Description of problem:
If /ext/mdadm.conf creates disk mirrors whose names have more than one final
number, the mknod command created in the init script in the kdump initrd are
incorrect. This resulted in /var/crash not being accessible anf resulting in the
loss of the crash dump. The mdadm.conf file looked like:
ARRAY /dev/md0 ...
ARRAY /dev/md2 ...
ARRAY /dev/md1 ...
ARRAY /dev/md10 ...
ARRAY /dev/md11 ...
ARRAY /dev/md12 ...
ARRAY /dev/md13 ...
... up to ARRAY/md37 ...
ARRAY /dev/md3 ...
where /var/crash is /dev/md3.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create disk partitions as above in that order.
2. echo 'c' > /proc/sysrq-trigger
3. Observe what happens...
Either the dump is written to only one device in the /var/crash (corrupting it)
or is lost completely. Customer had the first scenario; I got both while testing
The problem is in the init script for the kdump kernel that is created by
/sbin/mkdumprd. The sed macro that is used to extract the minor numbers for the
mknod commands before the mdadm -A -s command incorrectly eats the first
trailing number into piece 1 instead of piece 2 if the device nane has more than
one trailing digit. In this particular case, /dev/md13 was created with minor
number 3 (as were /dev/md23 and /dev/md33 as well as the real /dev/md3 which was
/var/crash). The attached patch fixes the sed macro so that it will work for
these default mdadm names by consuming only non-digits into piece 1 and all the
trailing digits into piece 2. This solves the problem in the case of default
The whole mechanism really ought to be rethought, however, since you are not
restricted to using only default names by mdadm, and there is at any rate no
real need to start up any devices other than the ones the dump is being written
Because of where we are in our testing cycle here at Stratus, instead of
replacing kexec-tools-1.102pre-21.el5 with a fixed version, we are planning on
working around this problem by shipping a kdump_pre script that deletes the
erroneous /dev/mdnn devices and stops mdadm, then recreates the devices with the
correct minor numbers and restarts mdadm. (I've tested both fixes.) This also
only works for device names of the default format, though.
Created attachment 312293 [details]
fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn
yep, looks good, I'll apply it shortly, thanks!
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
fixed in -28.el5. Thanks!
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.