Bug 671013
| Summary: | Kdump fails after updating drive firmware. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Stephen Cameron <steve.cameron> | ||||||
| Component: | kexec-tools | Assignee: | Cong Wang <amwang> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Boris Ranto <branto> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | low | ||||||||
| Version: | 6.0 | CC: | branto, coughlan, nhorman, phan, rkhan | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | kexec-tools-2_0_0-163_el6 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2011-05-19 14:15:57 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 475355 [details]
Proposed patch
Neil suggested to remove "rev" from the tuple, so could try this patch?
Thanks!
Ok, I will give it a try. -- steve Have not yet tried the patch, but wanted to report what I've found so far. When I attempted to apply the patch, it gave me some offsets (-3 and -14 lines), and there remained one instance of the "vendor" "model" "rev" "type" tuple in the mkdumprd script. I was expecting the patch to go in clean, if my mkdumprd script was the same as Neil's before applying the patch, so I began to suspect maybe I was on a beta release of RHEL6, but I double checked on another RHEL6 system which I just installed yesterday and found the same thing. So, I'm thinking maybe Neil's patch is vs. a newer variant of the mkdumprd script than what RHEL6 shipped with? Should I still try it? I suspect that third instance of "rev" needs to be removed too. -- steve Created attachment 475422 [details]
Patch to /sbin/mkdumprd script from RHEL6 to make it ignore drive firmware revisions
So I took the liberty of making my own patch against the mkdumprd script which, to the best of my knowledge, is the one which actually ships with RHEL6, and tested it, flashing to 3.50 firmware on the P410i, then rebuilding the kdump initrd with the patched mkdumprd script, then flashing the firmware to 3.66, rebooting, and trying kdump without rebuilding the kdump initrd, and it seem to work.
attachment 475422 [details]
-- steve
Sorry that my patch is not correctly generated, your patch is exactly what I want. Thanks, Stephen! An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0736.html |
Description of problem: Updating firmware of a disk drive which is required for kdump to work, and then rebooting will cause subsequent kdump attempts to fail unless the kdump initrd is rebuilt manually. This is because the firmware revision of the necessary drives are stored in the kdump initrd, obtained from /sys/block/sd*/device/rev. Version-Release number of selected component (if applicable): kexec-tools-2.0.0-145.el6.x86_64 How reproducible: Verify kdump works. Upgrade or downgrade firmware on the disk to which the dump is captured. Reboot. Re-try the kdump. It won't recognize the disks with different firmware. Steps to Reproduce: 1. Setup kdump on a system with SCSI, SAS, SATA, or SmartArray, verify that kdump works by: a. echo 1 > /proc/sys/kernel/sysrq b. echo s > /proc/sysrq-trigger ; echo c > /proc/sysrq-trigger c. watch console to see that dump occurs. 2. upgrade or downgrade firmware of drive to which dump is captured. Or, on an HP Smart Array controller, upgrade or downgrade the controller firmware (the controller firmware revision is reported as the logical drive firmware revision on Smart Arrays.) 3. Reboot. 4. Attempt kdump. You will noticed that kdump does not recognize the disks anymore. Actual results: Kdump fails to recognize disks with different firmware. Expected results: Kdump should not care about the firmware revision. Additional info: It looks like, in /sbin/mkdumprd, the critical disk information is stored in /etc/critical-disks in the kdump initrd image. What's in there that is compared later to recognize the disks are the Vendor, Model, Revision, and Type, obtained from /sys/block/sd*/device directory from the files "vendor", "model", "rev" and "type". "rev" should probably be ignored. See this section of code from /sbin/mkdumprd: for i in \`cat /etc/critical_disks | awk '{print \$1}'\` do IDSTRING=\`grep \$i /etc/critical_disks | awk '{print \$2}'\` COUNT=\`grep \$i /etc/critical_disks | awk '{print \$3}'\` found=0 echo -n "Waiting for \$COUNT \$i-like device(s)..." while true do for j in \`ls /sys/block\` do DSKSTRING="" TMPNAME="" if [ ! -d /sys/block/\$j/device ] then continue fi for a in "vendor" "model" "rev" "type" do TMPNAME=\`cat /sys/block/\$j/device/\$a\` DSKSTRING="\$DSKSTRING \$TMPNAME" done DSKSTRING=\`echo \$DSKSTRING | sed -e's/ //g'\` if [ "\$DSKSTRING" == "\$IDSTRING" ] then found=\$((\$found + 1)) fi if [ \$found -ge \$COUNT ] then break 2 fi done However, to really identify the disks, the tuple "vendor, model, rev, type" seems a little weak, since, for example, all of the logical drives on an HP Smart Array will have identical vendor/model/rev/type values, so this code will not be able to distinguish one drive from another. Luckily, hpsa and cciss drivers (and most SCSI or SAS HBAs -- but not most fibre SANs) will present disks in a predetermined order most of the time (barring messing around with /proc/scsi/scsi to re-order drives with linux hotplug functionality). It is also likely that servers from any vender will be shipped with disks which have identical vendor/model/rev/type. (For disks, the type will always be 0 anyway.) There probably needs to be a better way to identify the drives. Perhaps using the device identifier from SCSI Inquiry page 0x83, which should be obtainable via SG_IO (e.g. see sg_inq program from sg3utils package). Ideally (I think), some unique ID should be exported via /sys (e.g. ascii representation of the device identifier from SCSI Inquiry page 0x83, a la the unique_id attribute which the hpsa driver exports for each logical drive), although it would probably be best if a similar attribute were exported by the scsi mid layer rather than by the LLDs. But these are implementation details. The gist of the complaint is that there needs to be a better way to identify drives than by vendor/model/rev/type tuple. -- steve