Bug 694602

Summary:	[6.2 FEAT] Change in multipath.conf file for RSSM
Product:	Red Hat Enterprise Linux 6	Reporter:	IBM Bug Proxy <bugproxy>
Component:	device-mapper-multipath	Assignee:	Ben Marzinski <bmarzins>
Status:	CLOSED ERRATA	QA Contact:	Gris Ge <fge>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.2	CC:	agk, bmarzins, dwysocha, fge, heinzm, jjarvis, mbroz, nobody+PNT0273897, prajnoha, prockai, sbest, ssaha, zkabelac
Target Milestone:	beta	Keywords:	FutureFeature, OtherQA
Target Release:	6.2
Hardware:	All
OS:	All
Whiteboard:
Fixed In Version:	device-mapper-multipath-0.4.9-42.el6	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-12-06 18:07:13 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	638197, 659725, 697866

Description IBM Bug Proxy 2011-04-07 18:13:12 UTC

1. Feature Overview:
Feature Id: [71340]
a. Name of Feature: [6.2 FEAT] Change in multipath.conf file for RSSM
b. Feature Description
Storage currently has the IBM RAIDed SAS Switch as an option for the BladeCenter - S chassis. Up to
now we have been providing to our customers a modified multipath.conf fill with settings to use when
running with the RSSM. Would like to see is if we could get the settings rolled up into the disro
as defaults:

device {
vendor "IBM"
product "1820N00"
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
hardware_handler "0"
path_selector "round-robin 0"
path_grouping_policy group_by_prio
failback immediate
rr_weight uniform
rr_min_io 100
path_checker tur
prio alua
polling_interval 30
no_path_retry queue
}

2. Feature Details:
Sponsor: LTC
Architectures: ppc64, x86, x86_64,
Arch Specificity: purely common code
Affects Kernel Modules: No
Delivery Mechanism: LDP Deliverable
Category: other
Request Type: Configuration/Build Change
d. Upstream Acceptance: No Code Required
Sponsor Priority P3
f. Severity: high
IBM Confidential: No
Code Contribution: no
g. Component Version Target: ---

3. Business Case
This would make the environment much easier on our customers.

4. Primary contact at Red Hat:
John Jarvis, jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Stephanie A. Glass, sglass.com

Technical contact(s):
Brent Yardley, yardleyb.com

Comment 2 RHEL Program Management 2011-04-07 18:24:40 UTC

Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Ales Kozumplik 2011-04-08 06:43:04 UTC

(In reply to comment #0)
> running with the RSSM.  Would like to see is if we could get the settings
> rolled up into the disro
> as defaults: 
> 
>  device {
>         vendor                  "IBM"
>         product                 "1820N00"
>         getuid_callout          "/lib/udev/scsi_id --whitelisted
> --device=/dev/%n"
>         hardware_handler        "0"
>         path_selector           "round-robin 0"
>         path_grouping_policy    group_by_prio
>         failback                immediate
>         rr_weight               uniform
>         rr_min_io               100
>         path_checker            tur
>         prio                    alua
>         polling_interval        30
>         no_path_retry           queue
>     }

a) Do you want to always have this in every multipath.conf anaconda generates or only when the RSSM is detected? 

b) How do we detect RSSM?

Comment 5 IBM Bug Proxy 2011-04-08 15:11:49 UTC

------- Comment From yardleyb.com 2011-04-08 11:00 EDT-------
(In reply to comment #5)
> (In reply to comment #0)
> > running with the RSSM.  Would like to see is if we could get the settings
> > rolled up into the disro
> > as defaults:
> >
> >  device {
> >         vendor                  "IBM"
> >         product                 "1820N00"
> >         getuid_callout          "/lib/udev/scsi_id --whitelisted
> > --device=/dev/%n"
> >         hardware_handler        "0"
> >         path_selector           "round-robin 0"
> >         path_grouping_policy    group_by_prio
> >         failback                immediate
> >         rr_weight               uniform
> >         rr_min_io               100
> >         path_checker            tur
> >         prio                    alua
> >         polling_interval        30
> >         no_path_retry           queue
> >     }
> a) Do you want to always have this in every multipath.conf anaconda generates or only when the RSSM is detected?

If it makes sense, from an ease of implementation perspective to include it always, then lets do that.

>
> b) How do we detect RSSM?

RSSM Is a SAS based device, it has a SCSI product ID of 1820N00 and product vendor of IBM.  It's LUNs would be discovered when a SAS HBA driver is loaded and RSSM is attached.

Comment 6 Ben Marzinski 2011-04-08 16:11:33 UTC

Yeah, this is just adding a new device to the list of devices that multipath autoconfigures.  We do this sort of thing all the time, to make life easier for everyone involved.

In the future, if you have any devices you want device-mapper-multipath to autoconfigure, you should file the bug against it, instead of anaconda.

Comment 7 John Jarvis 2011-04-20 03:02:11 UTC

IBM is signed up to test and provide feedback, setting OtherQA.

Comment 10 Ben Marzinski 2011-06-30 05:41:45 UTC

configuation added

Comment 11 John Jarvis 2011-06-30 15:37:36 UTC

This enhancement request was evaluated by the full Red Hat Enterprise Linux
team for inclusion in a Red Hat Enterprise Linux minor release.   As a result
of this evaluation, Red Hat has tentatively approved inclusion of this feature
in the next Red Hat Enterprise Linux Update minor release.   While it is a goal
to include this enhancement in the next minor release of Red Hat Enterprise
Linux, the enhancement is not yet committed for inclusion in the next minor
release pending the next phase of actual code integration and successful Red
Hat and partner testing.

Comment 13 Gris Ge 2011-09-21 07:08:58 UTC

Ben,

IBM was requesting "polling_interval 30" but this entry is missed from multipathd -k'show config'

The default setting for polling_interval is 5. (device-mapper-multipath-0.4.9-43.el6.x86_64)

Does this expect results?

Comment 14 Gris Ge 2011-09-21 07:16:19 UTC

Another concern, 
As Bug #419581 mentioned, "no_path_retry queue" will cause OS cannot shutdown when all path down.

The default "queue_without_daemon" is "yes" which will not disable queue when daemon shutdown.

Should we change this configuration to "no_path_retry N"?

Comment 15 Ben Marzinski 2011-09-21 19:29:34 UTC

About the "polling_interval 30": Good catch.  Somehow I missed that line in my patch.

About the "no_path_retry queue": No. Other configurations do this as well. Some people really don't want to ever fail the IO back, unless the sysadmin manually
does it.  For the default configs, we simply honor what the vendors give us, which is the configuration that they tested.  It makes more sense to fix the problem by changing the "queue_without_daemon" default, which should be possible.  The only issue is that I need to add a multipathd command to override the queue_without_daemon option, that can be used when restarting multipathd.  That way, if someone does a

# service multipathd restart

With a queuing multipath device with no valid paths, they won't suddenly start seeing IOs getting failed, when they wouldn't before.

I'll include a fix for the polling interval when I respin the package to fix the other crash you found for 697386

Comment 16 Gris Ge 2011-09-22 03:10:12 UTC

A wild idea about the queue issue, can we store/move the queued I/O to the kernel memory of dm-multipath module? In that case, restart daemon don't fail I/O and read it back to queue. When OS shutdown, it will got cleaned without any option.

Previously there is a patch for saving dev_loss_tmo value after link down, this idea is coming from that patch.

Forgive me if this is a silly idea as I don't know the SCSI mid layer at all.
I am keep studying to be a good storage-qe.

Comment 17 Ben Marzinski 2011-09-22 03:52:34 UTC

I wasn't thinking before.  You can't set the polling_interval on a per-device basis.  You can only set it for all of multipathd. That's why I didn't add it.

Comment 18 Gris Ge 2011-09-23 05:50:33 UTC

I see. IBM should be informed for this changed.

Then, you have my VERIFY.

Comment 19 errata-xmlrpc 2011-12-06 18:07:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1527.html