Bug 500741

Summary: [Stratus 5.5 bug] "critical_disks" makes kdump unreliable
Product: Red Hat Enterprise Linux 5 Reporter: Robert N. Evans <robert.evans>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: andriusb, balkov, bzeranski, charlotte.richardson, chas.horvath, cward, jparadis, phan, qcai, richard.johnson
Target Milestone: rcKeywords: OtherQA
Target Release: 5.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
kdump waits for all devices in its critical_disks list to be available before it performs a dump. Previously, there was no limit to the time that kdump would wait for a device to respond. Therefore, the dump might never be performed. kexec-tools now has a disk_timeout parameter that limits how long kdump will wait for storage to respond. This ensures that the dump will take place.
Story Points: ---
Clone Of:
: 600583 (view as bug list) Environment:
Last Closed: 2010-03-30 07:47:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533941    
Attachments:
Description Flags
patch to add disk_timeout config option
none
Revised patch none

Description Robert N. Evans 2009-05-13 21:16:51 UTC
Description of problem:
If one of the disk partitions that compose a RAID1 device is not present kdump hangs forever and does not capture a dump.

Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-56.el5

How reproducible:
Happens every time if preconditions are met.

Steps to Reproduce:
1. Install RHEL5.3 using software RAID.  A typical partition scheme is:
   250MB RAID1 /dev/md0 composed of /dev/sda1, /dev/sdb1 mounted on /boot
     4GB RAID1 /dev/md1 composed of /dev/sda2, /dev/sdb2 used for swap
    32GB RAID1 /dev/md2 composed of /dev/sda3, /dev/sdb3 mounted on /
    28GB RAID1 /dev/md3 composed of /dev/sda5, /dev/sdb5 mounted on /var/crash

2. Configure kdump.conf with the following.
   ext3 /dev/md3
   path /
   kdump_pre /opt/ft/sbin/ft_kdump_pre
   kdump_post /opt/ft/sbin/ft_dump_mover
   core_collector makedumpfile -d 31
   extra_bins /bin/bash /usr/bin/ipmitool /opt/ft/sbin/ft_set_kdump_wdt /opt/ft/sbin/ft_mdadm_fixer /opt/ft/lib/ipmi_si.ko
   extra_modules ipmi_devintf

3. Take one of the disks offline, for example remove /dev/sdb

4. Panic the kernel.
  
Actual results:
System hangs forever in init process of the capture kernel at "Waiting for required block device discovery".

Expected results:
As long as one of the devices that compose /dev/md3 is online, the dump should get written.  The kernel can boot and run applications with only one underlying disk present so RAID increases reliability.  But for kdump, if both disks must be present, the reliability is less than that for a single disk.

Additional info:
It is possible to avoid this problem by modifying /sbin/mkdumprd to ignore sd devices in the critical_disks list, with this change:
 # as soon as the driver loads
-egrep -v '(^cciss|^md)' $TMPDISKLIST > $MNTIMAGE/etc/critical_disks
+egrep -v '(^cciss|^md|^sd)' $TMPDISKLIST > $MNTIMAGE/etc/critical_disks
 cp $MNTIMAGE/etc/critical_disks $TMPDISKLIST

But it seems preferable to provide a configuration option to prevent the problem.

An option to allow setting a maximum time to wait for critical disks would probably be a good approach.  Depending upon the type of disk, it might not make sense to wait more than a few seconds.

Comment 1 Neil Horman 2009-05-14 01:02:21 UTC
I understand your argument in principle, but I just can't do it.  We need to be able to wait for all the disks to be present so that we can guarantee that the dump is captured at all.  We can't ignore sd drives, since hard disk drives are the entire reason that the critical disk list was created.  In the converse situation, if we only have scsi disks, the above change renders the critical disk list useless, and if a drive is broken, and all sorts of unexpected errors can occur.  

The option to create a timeout sounds a bit better.  I'll attach a patch for you to try shortly.

Comment 2 Neil Horman 2009-05-14 01:04:05 UTC
Created attachment 343898 [details]
patch to add disk_timeout config option

Here it is, this lets you set a disk_timeout option to limit how long we wait for critical disks.  I've not tested it yet, but Let me know how it works for you

Comment 4 Robert N. Evans 2009-05-15 16:07:52 UTC
Created attachment 344194 [details]
Revised patch

I have revised the patch to be compatible with msh.  Also added, handling for "disk_timeout" not configured; in this case there is no limit to the wait for critical disks.

I verified this worked as expected with these test cases:
- missing critical disks and disk_timeout configured to 0
- missing critical disks and disk_timeout configured to 7
- missing critical disks and disk_timeout not configured
- critical disks all present and disk_timeout not configured

Please consider taking this change for kexec-tools.

Comment 5 Neil Horman 2009-05-15 16:49:56 UTC
yeah, that looks good.  I'll commit this to whichever release it gets approved for.  Thanks!

Comment 8 Andrius Benokraitis 2009-05-19 14:00:40 UTC
Stratus: Would this cause any heartburn if this was proposed for RHEL 5.5?

Comment 9 Robert N. Evans 2009-05-19 17:37:35 UTC
We can work around this problem.  Although it would be nice to have a fix earlier, it is great to get this fix in RHEL 5.5.

Comment 10 Andrius Benokraitis 2009-05-19 17:53:47 UTC
OK - will do. Proposing officially for RHEL 5.5. Thanks for the feedback.

Comment 11 Neil Horman 2009-09-28 13:06:03 UTC
fixed in -79.el5

Comment 12 Robert N. Evans 2009-09-29 23:12:41 UTC
Neil -
 Can you make the new kexec-tools RPM available for me to test?  I'd like to make sure the avoidance Stratus is using is compatible with the new version of kexec-tools.

Comment 13 Neil Horman 2009-09-30 01:59:49 UTC
its in brew.

Comment 14 Andrius Benokraitis 2009-09-30 02:24:09 UTC
Neil - I don't believe Stratus can get packages in Brew (unless Jim can grab them) since they are external. Would it be possible to have them on a people page in the meantime? If not, I'll see if Jim can bring them down for Stratus.

Comment 15 Robert N. Evans 2009-10-23 02:27:16 UTC
Fix verified on Stratus hardware with both kexec-tools-1.102pre-79.el5.x86_64 and kexec-tools-1.102pre-83.el5.x86_64.
 Using kdump.conf disk_timeout=0, successfully collected dumps when incomplete RAID1 present, with no delays waiting for disks.  Also verified mkdumprd script by comparison with version from comment 4 that was thoroughly tested at Stratus.
 Verified that Stratus work-around for this problem properly accommodates situation when this fix is present.  So a new version of the Stratus lsb-ft-cstools RPM is not needed to use the fix from Red Hat.

Comment 20 Ruediger Landmann 2010-03-19 04:23:18 UTC
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
kdump waits for all devices in its critical_disks list to be available
before it performs a dump. Previously, there was no limit to the time that
kdump would wait for a device to respond. Therefore, the dump might never
be performed. kexec-tools now has a disk_timeout parameter that limits how 
long kdump will wait for storage to respond. This ensures that the dump will
take place.

Comment 21 errata-xmlrpc 2010-03-30 07:47:10 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0179.html