671013 – Kdump fails after updating drive firmware.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 671013 - Kdump fails after updating drive firmware.

Summary: Kdump fails after updating drive firmware.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Cong Wang
QA Contact:	Boris Ranto
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-01-19 22:09 UTC by Stephen Cameron
Modified:	2013-09-30 02:22 UTC (History)
CC List:	5 users (show)
Fixed In Version:	kexec-tools-2_0_0-163_el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-05-19 14:15:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Proposed patch (958 bytes, patch) 2011-01-26 09:42 UTC, Cong Wang	no flags	Details \| Diff
Patch to /sbin/mkdumprd script from RHEL6 to make it ignore drive firmware revisions (1.07 KB, patch) 2011-01-26 15:45 UTC, Stephen Cameron	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2011:0736	0	normal	SHIPPED_LIVE	kexec-tools bug fix update	2011-05-18 18:09:18 UTC

Description Stephen Cameron 2011-01-19 22:09:50 UTC

Description of problem:

Updating firmware of a disk drive which is required for kdump to work, and then rebooting will cause subsequent kdump attempts to fail unless the kdump initrd is rebuilt manually. This is because the firmware revision of the necessary drives are stored in the kdump initrd, obtained from /sys/block/sd*/device/rev.

Version-Release number of selected component (if applicable):

kexec-tools-2.0.0-145.el6.x86_64

How reproducible:

Verify kdump works. Upgrade or downgrade firmware on the disk to which the dump is captured. Reboot. Re-try the kdump. It won't recognize the disks with different firmware.

Steps to Reproduce:

1. Setup kdump on a system with SCSI, SAS, SATA, or SmartArray, verify that kdump works by:

a. echo 1 > /proc/sys/kernel/sysrq
b. echo s > /proc/sysrq-trigger ; echo c > /proc/sysrq-trigger
c. watch console to see that dump occurs.

2. upgrade or downgrade firmware of drive to which dump is captured. Or, on an HP Smart Array controller, upgrade or downgrade the controller firmware (the controller firmware revision is reported as the logical drive firmware revision on Smart Arrays.)

3. Reboot.

4. Attempt kdump. You will noticed that kdump does not recognize the disks anymore.

Actual results:

Kdump fails to recognize disks with different firmware.

Expected results:

Kdump should not care about the firmware revision.

Additional info:

It looks like, in /sbin/mkdumprd, the critical disk information is stored in /etc/critical-disks in the kdump initrd image. What's in there that is compared later to recognize the disks are the Vendor, Model, Revision, and Type, obtained from /sys/block/sd*/device directory from the files "vendor", "model", "rev" and "type". "rev" should probably be ignored.

See this section of code from /sbin/mkdumprd:

for i in \`cat /etc/critical_disks | awk '{print \$1}'\`
do
IDSTRING=\`grep \$i /etc/critical_disks | awk '{print \$2}'\`
COUNT=\`grep \$i /etc/critical_disks | awk '{print \$3}'\`
found=0

echo -n "Waiting for \$COUNT \$i-like device(s)..."
while true
do
for j in \`ls /sys/block\`
do
DSKSTRING=""
TMPNAME=""
if [ ! -d /sys/block/\$j/device ]
then
continue
fi
for a in "vendor" "model" "rev" "type"
do
TMPNAME=\`cat /sys/block/\$j/device/\$a\`
DSKSTRING="\$DSKSTRING \$TMPNAME"
done
DSKSTRING=\`echo \$DSKSTRING | sed -e's/ //g'\`
if [ "\$DSKSTRING" == "\$IDSTRING" ]
then
found=\$((\$found + 1))
fi
if [ \$found -ge \$COUNT ]
then
break 2
fi
done

However, to really identify the disks, the tuple "vendor, model, rev, type" seems a little weak, since, for example, all of the logical drives on an HP Smart Array will have identical vendor/model/rev/type values, so this code will not be able to distinguish one drive from another. Luckily, hpsa and cciss drivers (and most SCSI or SAS HBAs -- but not most fibre SANs) will present disks in a predetermined order most of the time (barring messing around with /proc/scsi/scsi to re-order drives with linux hotplug functionality). It is also likely that servers from any vender will be shipped with disks which have identical vendor/model/rev/type. (For disks, the type will always be 0 anyway.)

There probably needs to be a better way to identify the drives. Perhaps using the device identifier from SCSI Inquiry page 0x83, which should be obtainable via SG_IO (e.g. see sg_inq program from sg3utils package). Ideally (I think), some unique ID should be exported via /sys (e.g. ascii representation of the device identifier from SCSI Inquiry page 0x83, a la the unique_id attribute which the hpsa driver exports for each logical drive), although it would probably be best if a similar attribute were exported by the scsi mid layer rather than by the LLDs. But these are implementation details. The gist of the complaint is that there needs to be a better way to identify drives than by vendor/model/rev/type tuple.

-- steve

Comment 2 Cong Wang 2011-01-26 09:42:31 UTC

Created attachment 475355 [details]
Proposed patch

Neil suggested to remove "rev" from the tuple, so could try this patch?
Thanks!

Comment 3 Stephen Cameron 2011-01-26 14:29:56 UTC

Ok, I will give it a try.

-- steve

Comment 4 Stephen Cameron 2011-01-26 14:58:23 UTC

Have not yet tried the patch, but wanted to report what I've found so far.  When I attempted to apply the patch, it gave me some offsets (-3 and -14 lines), and there remained one  instance of the "vendor" "model" "rev" "type" tuple in the mkdumprd script.  I was expecting the patch to go in clean, if my mkdumprd script was the same as Neil's before applying the patch, so I began to suspect maybe I was on a beta release of RHEL6, but I double checked on another RHEL6 system which I just installed yesterday and found the same thing.

So, I'm thinking maybe Neil's patch is vs. a newer variant of the mkdumprd script than what RHEL6 shipped with?

Should I still try it?  I suspect that third instance of "rev" needs to be removed too.

-- steve

Comment 5 Stephen Cameron 2011-01-26 15:45:42 UTC

Created attachment 475422 [details]
Patch to /sbin/mkdumprd script from RHEL6 to make it ignore drive firmware revisions

Comment 6 Stephen Cameron 2011-01-26 15:46:53 UTC

So I took the liberty of making my own patch against the mkdumprd script which, to the best of my knowledge, is the one which actually ships with RHEL6, and tested it, flashing to 3.50 firmware on the P410i, then rebuilding the kdump initrd with the patched mkdumprd script, then flashing the firmware to 3.66, rebooting, and trying kdump without rebuilding the kdump initrd, and it seem to work.

attachment 475422 [details]

-- steve

Comment 7 Cong Wang 2011-01-27 10:43:13 UTC

Sorry that my patch is not correctly generated, your patch is exactly what I want.
Thanks, Stephen!

Comment 14 errata-xmlrpc 2011-05-19 14:15:57 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0736.html

Note You need to log in before you can comment on or make changes to this bug.