Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionStephen Cameron
2011-01-19 22:09:50 UTC
Description of problem:
Updating firmware of a disk drive which is required for kdump to work, and then rebooting will cause subsequent kdump attempts to fail unless the kdump initrd is rebuilt manually. This is because the firmware revision of the necessary drives are stored in the kdump initrd, obtained from /sys/block/sd*/device/rev.
Version-Release number of selected component (if applicable):
kexec-tools-2.0.0-145.el6.x86_64
How reproducible:
Verify kdump works. Upgrade or downgrade firmware on the disk to which the dump is captured. Reboot. Re-try the kdump. It won't recognize the disks with different firmware.
Steps to Reproduce:
1. Setup kdump on a system with SCSI, SAS, SATA, or SmartArray, verify that kdump works by:
a. echo 1 > /proc/sys/kernel/sysrq
b. echo s > /proc/sysrq-trigger ; echo c > /proc/sysrq-trigger
c. watch console to see that dump occurs.
2. upgrade or downgrade firmware of drive to which dump is captured. Or, on an HP Smart Array controller, upgrade or downgrade the controller firmware (the controller firmware revision is reported as the logical drive firmware revision on Smart Arrays.)
3. Reboot.
4. Attempt kdump. You will noticed that kdump does not recognize the disks anymore.
Actual results:
Kdump fails to recognize disks with different firmware.
Expected results:
Kdump should not care about the firmware revision.
Additional info:
It looks like, in /sbin/mkdumprd, the critical disk information is stored in /etc/critical-disks in the kdump initrd image. What's in there that is compared later to recognize the disks are the Vendor, Model, Revision, and Type, obtained from /sys/block/sd*/device directory from the files "vendor", "model", "rev" and "type". "rev" should probably be ignored.
See this section of code from /sbin/mkdumprd:
for i in \`cat /etc/critical_disks | awk '{print \$1}'\`
do
IDSTRING=\`grep \$i /etc/critical_disks | awk '{print \$2}'\`
COUNT=\`grep \$i /etc/critical_disks | awk '{print \$3}'\`
found=0
echo -n "Waiting for \$COUNT \$i-like device(s)..."
while true
do
for j in \`ls /sys/block\`
do
DSKSTRING=""
TMPNAME=""
if [ ! -d /sys/block/\$j/device ]
then
continue
fi
for a in "vendor" "model" "rev" "type"
do
TMPNAME=\`cat /sys/block/\$j/device/\$a\`
DSKSTRING="\$DSKSTRING \$TMPNAME"
done
DSKSTRING=\`echo \$DSKSTRING | sed -e's/ //g'\`
if [ "\$DSKSTRING" == "\$IDSTRING" ]
then
found=\$((\$found + 1))
fi
if [ \$found -ge \$COUNT ]
then
break 2
fi
done
However, to really identify the disks, the tuple "vendor, model, rev, type" seems a little weak, since, for example, all of the logical drives on an HP Smart Array will have identical vendor/model/rev/type values, so this code will not be able to distinguish one drive from another. Luckily, hpsa and cciss drivers (and most SCSI or SAS HBAs -- but not most fibre SANs) will present disks in a predetermined order most of the time (barring messing around with /proc/scsi/scsi to re-order drives with linux hotplug functionality). It is also likely that servers from any vender will be shipped with disks which have identical vendor/model/rev/type. (For disks, the type will always be 0 anyway.)
There probably needs to be a better way to identify the drives. Perhaps using the device identifier from SCSI Inquiry page 0x83, which should be obtainable via SG_IO (e.g. see sg_inq program from sg3utils package). Ideally (I think), some unique ID should be exported via /sys (e.g. ascii representation of the device identifier from SCSI Inquiry page 0x83, a la the unique_id attribute which the hpsa driver exports for each logical drive), although it would probably be best if a similar attribute were exported by the scsi mid layer rather than by the LLDs. But these are implementation details. The gist of the complaint is that there needs to be a better way to identify drives than by vendor/model/rev/type tuple.
-- steve
Have not yet tried the patch, but wanted to report what I've found so far. When I attempted to apply the patch, it gave me some offsets (-3 and -14 lines), and there remained one instance of the "vendor" "model" "rev" "type" tuple in the mkdumprd script. I was expecting the patch to go in clean, if my mkdumprd script was the same as Neil's before applying the patch, so I began to suspect maybe I was on a beta release of RHEL6, but I double checked on another RHEL6 system which I just installed yesterday and found the same thing.
So, I'm thinking maybe Neil's patch is vs. a newer variant of the mkdumprd script than what RHEL6 shipped with?
Should I still try it? I suspect that third instance of "rev" needs to be removed too.
-- steve
So I took the liberty of making my own patch against the mkdumprd script which, to the best of my knowledge, is the one which actually ships with RHEL6, and tested it, flashing to 3.50 firmware on the P410i, then rebuilding the kdump initrd with the patched mkdumprd script, then flashing the firmware to 3.66, rebooting, and trying kdump without rebuilding the kdump initrd, and it seem to work.
attachment 475422[details]
-- steve
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
http://rhn.redhat.com/errata/RHBA-2011-0736.html
Description of problem: Updating firmware of a disk drive which is required for kdump to work, and then rebooting will cause subsequent kdump attempts to fail unless the kdump initrd is rebuilt manually. This is because the firmware revision of the necessary drives are stored in the kdump initrd, obtained from /sys/block/sd*/device/rev. Version-Release number of selected component (if applicable): kexec-tools-2.0.0-145.el6.x86_64 How reproducible: Verify kdump works. Upgrade or downgrade firmware on the disk to which the dump is captured. Reboot. Re-try the kdump. It won't recognize the disks with different firmware. Steps to Reproduce: 1. Setup kdump on a system with SCSI, SAS, SATA, or SmartArray, verify that kdump works by: a. echo 1 > /proc/sys/kernel/sysrq b. echo s > /proc/sysrq-trigger ; echo c > /proc/sysrq-trigger c. watch console to see that dump occurs. 2. upgrade or downgrade firmware of drive to which dump is captured. Or, on an HP Smart Array controller, upgrade or downgrade the controller firmware (the controller firmware revision is reported as the logical drive firmware revision on Smart Arrays.) 3. Reboot. 4. Attempt kdump. You will noticed that kdump does not recognize the disks anymore. Actual results: Kdump fails to recognize disks with different firmware. Expected results: Kdump should not care about the firmware revision. Additional info: It looks like, in /sbin/mkdumprd, the critical disk information is stored in /etc/critical-disks in the kdump initrd image. What's in there that is compared later to recognize the disks are the Vendor, Model, Revision, and Type, obtained from /sys/block/sd*/device directory from the files "vendor", "model", "rev" and "type". "rev" should probably be ignored. See this section of code from /sbin/mkdumprd: for i in \`cat /etc/critical_disks | awk '{print \$1}'\` do IDSTRING=\`grep \$i /etc/critical_disks | awk '{print \$2}'\` COUNT=\`grep \$i /etc/critical_disks | awk '{print \$3}'\` found=0 echo -n "Waiting for \$COUNT \$i-like device(s)..." while true do for j in \`ls /sys/block\` do DSKSTRING="" TMPNAME="" if [ ! -d /sys/block/\$j/device ] then continue fi for a in "vendor" "model" "rev" "type" do TMPNAME=\`cat /sys/block/\$j/device/\$a\` DSKSTRING="\$DSKSTRING \$TMPNAME" done DSKSTRING=\`echo \$DSKSTRING | sed -e's/ //g'\` if [ "\$DSKSTRING" == "\$IDSTRING" ] then found=\$((\$found + 1)) fi if [ \$found -ge \$COUNT ] then break 2 fi done However, to really identify the disks, the tuple "vendor, model, rev, type" seems a little weak, since, for example, all of the logical drives on an HP Smart Array will have identical vendor/model/rev/type values, so this code will not be able to distinguish one drive from another. Luckily, hpsa and cciss drivers (and most SCSI or SAS HBAs -- but not most fibre SANs) will present disks in a predetermined order most of the time (barring messing around with /proc/scsi/scsi to re-order drives with linux hotplug functionality). It is also likely that servers from any vender will be shipped with disks which have identical vendor/model/rev/type. (For disks, the type will always be 0 anyway.) There probably needs to be a better way to identify the drives. Perhaps using the device identifier from SCSI Inquiry page 0x83, which should be obtainable via SG_IO (e.g. see sg_inq program from sg3utils package). Ideally (I think), some unique ID should be exported via /sys (e.g. ascii representation of the device identifier from SCSI Inquiry page 0x83, a la the unique_id attribute which the hpsa driver exports for each logical drive), although it would probably be best if a similar attribute were exported by the scsi mid layer rather than by the LLDs. But these are implementation details. The gist of the complaint is that there needs to be a better way to identify drives than by vendor/model/rev/type tuple. -- steve