Bug 456154 - mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost
mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kexec-tools (Show other bugs)
5.2
All Linux
low Severity high
: rc
: ---
Assigned To: Neil Horman
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-07-21 15:33 EDT by Charlotte Richardson
Modified: 2009-09-09 01:14 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 600604 (view as bug list)
Environment:
Last Closed: 2009-01-20 16:00:32 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn (439 bytes, patch)
2008-07-21 15:33 EDT, Charlotte Richardson
no flags Details | Diff

  None (edit)
Description Charlotte Richardson 2008-07-21 15:33:47 EDT
Description of problem:
If /ext/mdadm.conf creates disk mirrors whose names have more than one final
number, the mknod command created in the init script in the kdump initrd are
incorrect. This resulted in /var/crash not being accessible anf resulting in the
loss of the crash dump. The mdadm.conf file looked like:

ARRAY /dev/md0 ...
ARRAY /dev/md2 ...
ARRAY /dev/md1 ...
ARRAY /dev/md10 ...
ARRAY /dev/md11 ...
ARRAY /dev/md12 ...
ARRAY /dev/md13 ...
... up to ARRAY/md37 ...
ARRAY /dev/md3 ...

where /var/crash is /dev/md3.

Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-21.el5

How reproducible:
100%


Steps to Reproduce:
1. Create disk partitions as above in that order.
2. echo 'c' > /proc/sysrq-trigger
3. Observe what happens...
  
Actual results:
Either the dump is written to only one device in the /var/crash (corrupting it)
or is lost completely. Customer had the first scenario; I got both while testing
this.


Expected results:
Should work.


Additional info:
The problem is in the init script for the kdump kernel that is created by
/sbin/mkdumprd. The sed macro that is used to extract the minor numbers for the
mknod commands before the mdadm -A -s command incorrectly eats the first
trailing number into piece 1 instead of piece 2 if the device nane has more than
one trailing digit. In this particular case, /dev/md13 was created with minor
number 3 (as were /dev/md23 and /dev/md33 as well as the real /dev/md3 which was
/var/crash). The attached patch fixes the sed macro so that it will work for
these default mdadm names by consuming only non-digits into piece 1 and all the
trailing digits into piece 2. This solves the problem in the case of default
device names.

The whole mechanism really ought to be rethought, however, since you are not
restricted to using only default names by mdadm, and there is at any rate no
real need to start up any devices other than the ones the dump is being written
out to.

Because of where we are in our testing cycle here at Stratus, instead of
replacing kexec-tools-1.102pre-21.el5 with a fixed version, we are planning on
working around this problem by shipping a kdump_pre script that deletes the
erroneous /dev/mdnn devices and stops mdadm, then recreates the devices with the
correct minor numbers and restarts mdadm. (I've tested both fixes.) This also
only works for device names of the default format, though.
Comment 1 Charlotte Richardson 2008-07-21 15:33:47 EDT
Created attachment 312293 [details]
fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn
Comment 2 Neil Horman 2008-07-21 16:16:21 EDT
yep, looks good, I'll apply it shortly, thanks!
Comment 3 RHEL Product and Program Management 2008-07-21 16:40:07 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 4 Neil Horman 2008-07-22 11:56:29 EDT
fixed in -28.el5.  Thanks!
Comment 8 errata-xmlrpc 2009-01-20 16:00:32 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0105.html

Note You need to log in before you can comment on or make changes to this bug.