735600 – Regression in 2.6.32-131.12.1 regarding LUN discovery

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 735600 - Regression in 2.6.32-131.12.1 regarding LUN discovery

Summary: Regression in 2.6.32-131.12.1 regarding LUN discovery

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Red Hat Kernel Manager
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-09-03 22:03 UTC by Troels Arvin
Modified:	2017-12-06 10:15 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-12-06 10:15:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Output of "multipath -ll" when there is no problem (5.68 KB, text/plain) 2011-09-03 22:08 UTC, Troels Arvin	no flags	Details
Output of "lsscsi" when there is no problem (3.75 KB, text/plain) 2011-09-03 22:09 UTC, Troels Arvin	no flags	Details
Output of "multipath -ll" when there IS a problem (5.17 KB, text/plain) 2011-09-03 22:10 UTC, Troels Arvin	no flags	Details
Output of "lsscsi" when there IS a problem (3.75 KB, text/plain) 2011-09-03 22:11 UTC, Troels Arvin	no flags	Details
Screenshot of the server's console duing boot (77.60 KB, image/png) 2011-09-03 22:33 UTC, Troels Arvin	no flags	Details
View All

Description Troels Arvin 2011-09-03 22:03:49 UTC

Description of problem:
After having upgraded the kernel from kernel-2.6.32-131.6.1.el6.x86_64 to kernel-2.6.32-131.12.1.el6.x86_64, the following problem occurs in around 50% of system boots:
A FC SAN LUN is not discovered at boot-time. This leads to a volume group not being available.

Version-Release number of selected component (if applicable):
2.6.32-131.12.1.el6.x86_64

How reproducible:
The problem occurs randomly; after having performed around 10 reboots, my estimate is that it occurs in around 50% of the boot sequences.

Steps to Reproduce:
1. Reboot.
2. multipath -ll | fgrep XIV
3. When the LUN has not been discovered, the grep yields nothing; then the LUN has been discovered: grep yields one line (expected in this setup):
2001738000565024e dm-2 IBM,2810XIV

Actual results:
In ½ the cases, the LUN is not known by multipath

Expected results:
The LUN is always present.

Additional info:
I tried reverting to kernel-2.6.32-131.6.1.el6.x86_64, and then rebooted 8 times; with this configuration, the LUN showed up every time.

When running with the 2.6.32-131.12.1.el6.x86_64, I tried adjusting rc.sysinit, after having found the following page after some googling: http://www.firetooth.net/confluence/display/public/Linux+-+Multipath
My adjustment:
Instead of
modprobe dm-multipath > /dev/null 2>&1
/sbin/multipath -v 0
I put in:
echo "About to modprobe dm-multipath; sleeping a bit"
sleep 30
modprobe dm-multipath
echo "modprobe done; sleeping again"
sleep 10
/sbin/multipath
echo "multipath was run; sleeping again"
sleep 10

The adjustment to rc.sysinit doesn't make the problem go away, but likelihood of the problem occurring seems to decrease a little bit.

Comment 1 Troels Arvin 2011-09-03 22:05:51 UTC

FYI, the version of some possibly related packages:
 - device-mapper-multipath-0.4.9-41.el6
 - kpartx-0.4.9-41.el6.x86_64

Comment 2 Troels Arvin 2011-09-03 22:08:46 UTC

Created attachment 521344 [details]
Output of "multipath -ll" when there is no problem

Comment 3 Troels Arvin 2011-09-03 22:09:33 UTC

Created attachment 521345 [details]
Output of "lsscsi" when there is no problem

Comment 4 Troels Arvin 2011-09-03 22:10:43 UTC

Created attachment 521346 [details]
Output of "multipath -ll" when there IS a problem

Comment 5 Troels Arvin 2011-09-03 22:11:18 UTC

Created attachment 521347 [details]
Output of "lsscsi" when there IS a problem

Comment 7 Troels Arvin 2011-09-03 22:32:18 UTC

The server (which is a Dell R710) has some local storage (served by a PERC H700 on-board RAID-controller) which hosts the operating system and swap.

Its SAN storage connectivity happens through two 4Gbit/s Qlogic HBAs, connected to two Brocade FC switches.

Behind the switches are three different storage systems:
 - An IBM DS4800 system
 - A Hitachi AMS2100 system
 - An IBM XIV (generation 2) system

The LUN which is not discovered half the time (after upgrading the kernel) is on the IBM XIV system. But I suspect that the problem is really related to the handling of the DS4800 LUNs. This suspicion is based on some ugly messages seen on the console when booting; the messages show up before the stage where the dm-multipath kernel module is loaded by rc.sysinit; the messages are seen no matter which of the kernels is being booted on:

sd 2:0:0:1: [sds] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:1: [sds] Sense Key : Illegal Request [current] 
sd 2:0:0:1: [sds] <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
sd 2:0:0:1: [sds] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sds, sector 0
Buffer I/O error on device sds, logical block 0

(sds is a path to a LUN on the DS4800 storage system.)

A screenshot illustrating the messages will be uploaded as "console-shot1.png".


By the way: It might be the case that this is somehow related to RH support case 484711 which concerns a situation where a swap-partition on a local RAID volume (/dev/sdb1) is not discovered at boot-time, unless the following is inserted in rc.local:
partprobe /dev/sdb
swapon /dev/sdb1
This problem happens no matter which of the two kernels is being used, though.

Comment 8 Troels Arvin 2011-09-03 22:33:07 UTC

Created attachment 521348 [details]
Screenshot of the server's console duing boot

Comment 9 RHEL Program Management 2011-10-07 15:47:13 UTC

Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 10 Jan Kurik 2017-12-06 10:15:48 UTC

Red Hat Enterprise Linux 6 is in the Production 3 Phase. During the Production 3 Phase, Critical impact Security Advisories (RHSAs) and selected Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available.

The official life cycle policy can be reviewed here:

http://redhat.com/rhel/lifecycle

This issue does not meet the inclusion criteria for the Production 3 Phase and will be marked as CLOSED/WONTFIX. If this remains a critical requirement, please contact Red Hat Customer Support to request a re-evaluation of the issue, citing a clear business justification. Note that a strong business justification will be required for re-evaluation. Red Hat Customer Support can be contacted via the Red Hat Customer Portal at the following URL:

https://access.redhat.com/

Note You need to log in before you can comment on or make changes to this bug.