716340 – /dev/sdX name if different in second kernel which casue kdump hange at waiting for incorrect dev_name

Bug 716340 - /dev/sdX name if different in second kernel which casue kdump hange at waiting for incorrect dev_name

Summary: /dev/sdX name if different in second kernel which casue kdump hange at waitin...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kexec-tools
Sub Component:
Version:	5.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Dave Young
QA Contact:	Xu Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-06-24 03:24 UTC by Gris Ge
Modified:	2013-01-08 04:08 UTC (History)
CC List:	5 users (show)
Fixed In Version:	kexec-tools-1.102pre-158.el5
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-01-08 04:08:40 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
doc fix (1.16 KB, patch) 2012-07-12 09:16 UTC, Dave Young	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2013:0012	0	normal	SHIPPED_LIVE	kexec-tools bug fix and enhancement update	2013-01-08 08:38:45 UTC

Description Gris Ge 2011-06-24 03:24:58 UTC

Description of problem:

If we have 2 scsi controler driver both provide disks. like this:

sda 0:0:0:1  from driver bfa port 0
sdb 1:0:0:1  from driver bfa port 1

sdc 2:0:0:1  from driver qla2xxx port 0
sdd 3:0:0:1  from driver qla2xxx port 1

If in /etc/kdump.conf, we specify this dump target:

ext3 /dev/sdc1

As bfa driver will not included in second kernel, we only got 2 disk and will never got sdc.

In RHEL6, we have folder /dev/disk/by-id/ to overpass this issue.
But in RHEL5, it's a problem here.

Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-126.el5_6.6
kernel -269.

How reproducible:
100%

Steps to Reproduce:
1. Find a server with 2 scsi controler. 
2. Set dump target to the one hold last dev_name
3. echo c > /proc/sysrq-trigger
  
Actual results:
Kdump hang

Expected results:
Kdump dump to correct disk.

Additional info:
In my test, I have 250 x 8 disks from lpfc and 1 x 4 disks from bfa. let me know if you need system for reproduce.

Comment 1 RHEL Program Management 2011-06-24 03:37:05 UTC

This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 2 Cong Wang 2011-06-24 07:53:59 UTC

On RHEL6 we found the same problem, see Bug 600575.
The other to solve this is to use mdev of busybox, but I am not sure if it has this on RHEL5.

BTW, on which machine did you see this?

Comment 3 Qian Cai 2011-06-27 01:12:27 UTC

If you use a label or UUID, did it resolve the problem?

Comment 4 Gris Ge 2011-06-27 03:49:13 UTC

(In reply to comment #2)
> On RHEL6 we found the same problem, see Bug 600575.
> The other to solve this is to use mdev of busybox, but I am not sure if it has
> this on RHEL5.
> 
> BTW, on which machine did you see this?

storageqe-04.rhts.eng.bos.redhat.com
It's busy with storage related testing and I might not load to you before RHEL 5.7 release. You can simply try to use 2 controlers in your VM (1 scsi, 1 ata), as long as you have 2 drivers providing disks, it can hit the problem.

For solving this in a smart way other than depend on other tools, service kdump can check and noted down the wwid of a disk. After switch to second kernel, we can find a disk by wwid instead of by dev_name.

For checking wwid:

RHEL5:
/lib/udev/scsi_id -g -p 0x83 -s /block/sda

RHEL6:
/lib/udev/scsi_id --page=0x83 --whitelisted --device=/dev/sda

For RHEL6, even they have UUID or label, but some user will still use /dev/sdX. Instead of forcing user to use UUID or label, I think the solution above could be better.

Hope this could help you.
Let me know if you need any info.

Comment 5 Dave Young 2011-10-24 09:57:40 UTC

(In reply to comment #4)
> (In reply to comment #2)
> > On RHEL6 we found the same problem, see Bug 600575.
> > The other to solve this is to use mdev of busybox, but I am not sure if it has
> > this on RHEL5.
> > 
> > BTW, on which machine did you see this?
> 
> storageqe-04.rhts.eng.bos.redhat.com
> It's busy with storage related testing and I might not load to you before RHEL
> 5.7 release. You can simply try to use 2 controlers in your VM (1 scsi, 1 ata),
> as long as you have 2 drivers providing disks, it can hit the problem.
> 
> For solving this in a smart way other than depend on other tools, service kdump
> can check and noted down the wwid of a disk. After switch to second kernel, we
> can find a disk by wwid instead of by dev_name.
> 
> For checking wwid:
> 
> RHEL5:
> /lib/udev/scsi_id -g -p 0x83 -s /block/sda
> 
> RHEL6:
> /lib/udev/scsi_id --page=0x83 --whitelisted --device=/dev/sda

For real scsi device, yes, but for virtual devices I'm afraid it will not works.

For example:
Ibmviscsi devices reports nothing with scsi_id, I tested on ibm-js22-vios-03-lp2.rhts.brq.redhat.com
Also kvm guest report like QM0001, I doubt it is not an uniq id

> 
> For RHEL6, even they have UUID or label, but some user will still use /dev/sdX.
> Instead of forcing user to use UUID or label, I think the solution above could
> be better.
> 
> Hope this could help you.
> Let me know if you need any info.

Comment 6 Gris Ge 2011-12-23 02:30:00 UTC

(In reply to comment #5)

> 
> For real scsi device, yes, but for virtual devices I'm afraid it will not
> works.
> 
In RHEL 5, cciss don't follow SCSI VPD neither, they use cciss_id. (not checked on RHEL 6).

For virtual devices, I think it opens a door for SCSI VPD patches.

Kdump just use the current way as faillback way for these special devices.

======
Let me make it clear:

For block devices, WWID (VPD page 0x83) is identical.

For filesytem, UUID is identical. dracut has very good example how to handle UUID or label.

For LVM, they use PV UUID to make sure LV is from correct PV.

For md, they use meta data, but no sure their md-X rull, need md-QE input.

For dm-multipath, they follow the bind file (/etc/multipath/bindings in RHEL6).

For dm-crypt, they use some ID as dm table name (check with cryptsetup info).

Comment 9 RHEL Program Management 2012-04-02 10:51:02 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 12 Dave Young 2012-07-12 09:16:09 UTC

Created attachment 597746 [details]
doc fix

Comment 18 errata-xmlrpc 2013-01-08 04:08:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0012.html

Note You need to log in before you can comment on or make changes to this bug.