Bug 460899

Summary: LTC:201049: mkinitrd support for DM-MP SCSI Hardware Handlers
Product: Red Hat Enterprise Linux 5 Reporter: IBM Bug Proxy <bugproxy>
Component: mkinitrdAssignee: Peter Jones <pjones>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: atodorov, bmr, borgan, bpeters, cmarcant, cward, ddumas, ejratl, hdegoede, jbrier, jjarvis, johncall, jplans, jruemker, mchristi, notting, pjones, ricardo.arguello, sekharan, syeghiay, tao, thorhs
Target Milestone: rcKeywords: FutureFeature, Reopened
Target Release: 5.5   
Hardware: ppc64   
OS: All   
Whiteboard:
Fixed In Version: mkinitrd-5.1.19.6-61 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:59:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 438761, 499522, 557292    
Attachments:
Description Flags
Load every scsi_dh module if we find any multipath devices
none
adjustment to Peter's patch to use the sg module instead of scsi_wait_scan so that it is compatible with RHEL5
none
mkinitrd from snap3 system none

Description IBM Bug Proxy 2008-09-02 14:28:15 UTC
=Comment: #0=================================================
Chandra S. Seetharaman <chandra.seetharaman.com> - 2008-08-18 21:07 EDT
Problem description:

IBM Bugzilla bug #43337(RedHat 438761) provides support for SCSI Hardware Handler.

Changes are needed in mkinitrd to include these modules in the initrd image.

The modules are:
   scsi_dh
   scsi_dh_rdac
   scsi_dh_emc

These modules need to included whenever SCSI module is included.

Note: Mike Christie requested this bugzilla and asked to have himself and Peter Jones in CC.

Comment 1 Emily Ratliff 2008-09-05 16:46:21 UTC
Adding Mike Christie to cc as requested by Chandra.

Comment 2 RHEL Program Management 2008-09-19 11:50:39 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Peter Jones 2008-09-25 19:27:24 UTC
There's really not enough information here.  Under what conditions should the modules be included?

Comment 5 IBM Bug Proxy 2008-09-25 20:11:00 UTC
(In reply to comment #17)
> ------- Comment From pjones 2008-09-25 15:27:24 EDT-------
> There's really not enough information here.  Under what conditions should the
modules be included?

scsi_dh_rdac module is needed to support many lsi engenio based (IBM and
non-IBM) storage devices. Hence it is needed in default initrd image that is
shipped in the install CDs/DVDs.

w.r.t mkinitrd able to include this in the new initrd image being built: These
modules need to be included if the module is currently being used (is it
possible ?).

Note that it is _not_ root/boot/swap device _only_ issue.

Comment 6 Bryn M. Reeves 2008-09-25 20:41:02 UTC
Peter, it's just the same problem as the "old" device-mapper hardware handlers. If the root multipath device requires use of a SCSI hardware error handler (this can be parsed from multipath command output or determined by consulting the multipath defaults for the type of storage, although there may be better ways) then it must be included in the initramfs and loaded at the appropriate time.

See the existing comments relating to hardware handlers in the current multipath code in mkinitrd:

 957             case "$TYPE" in
 958                 multipath|emc)
 959                     # ugggh.  We could try to fish the module name out, but it
 960                     # requires real parsing... 
 961                     # XXX also covered by #132001
 962                     for mod in $TABLE ; do
 963                         DMMODS="$DMMODS $([[ "$mod" =~ "[[:alpha:]]" ]] && echo "$mod")"
 964                     done
 965                     DMDEVS="$DMDEVS $NAME"
 966                     ;;
 967                 *)
 968                     for raid in $RAIDS ; do
 969                         if [ "$raid" == "$NAME" ]; then
 970                             dmname=$(resolve_dm_name $NAME)
 971                             DMDEVS="$DMDEVS $dmname"
 972                             RAIDS=$(sed 's/ $NAME //' <<< "$RAIDS")
 973                             break
 974                         fi
 975                     done
 976                     ;;
 977             esac

Comment 7 IBM Bug Proxy 2008-09-25 22:30:33 UTC
No, this is _not_ just the _multipathed root device_ issue.

These modules are needed even for non root devices.

With different scan (partitioning, lvm etc.,) happening during boot, for a
system with say 100 (rdac) disks, the boot takes lot of time (an hour or so),
and the delay is cumulative if there are multiple passive paths to the storage
device. (This also produces lot of buffer layer error messages).

With scsi_dh_rdac module in initrd, I/O is not sent down to the passive paths
hence the time is not spent, and lot less error messages.

Comment 8 Bryn M. Reeves 2008-09-25 22:57:20 UTC
Are you sure that's the case? Given that the initramfs strategy is currently to activate just the root multipath device (by WWID) and volume group explicitly there should only be scanning of minimal devices from the initramfs.

By the time we've handed over to SysV init and scans are being initiated for other devices the rootfs is mounted and the scsi_dh_* modules can be loaded from /lib/modules as normal.

If the main concern is the slow boot with many rdac devices then I think the point is moot - if we're loading the module for the root device then it's going to be available for all the other devices present too.

In either case, once the infrastructure exists in mkinitrd to determine the handlers needed for the root device it should be straightforward to extend that to include any other devices if it's required. Even then however, it's only going to be effective for devices that are known to the system at the time the initramfs is built - if additional storage requiring other handlers is later added to the system then the image will need to be rebuilt (I don't think including all available hardware handlers in case they might be needed will prove a very popular approach).

Comment 9 IBM Bug Proxy 2008-09-25 23:30:38 UTC
(In reply to comment #21)
> ------- Comment From bmr 2008-09-25 18:57:20 EDT-------
> Are you sure that's the case? Given that the initramfs strategy is currently
to activate just the root multipath device (by WWID) and volume group explicitly
there should only be scanning of minimal devices from the initramfs.
>

It is not only the lvm scanning. Partition scanning in the kernel starts
scanning the disks as soon as the SCSI layer sees the devices, which sends i/o
on the passive paths.

> By the time we've handed over to SysV init and scans are being initiated for
other devices the rootfs is mounted and the scsi_dh_* modules can be loaded from
/lib/modules as normal.
>
> If the main concern is the slow boot with many rdac devices then I think the
point is moot - if we're loading the module for the root device then it's going
to be available for all the other devices present too.
>

Yes, the main concern is the slow boot due to i/o (originating from partition
scanning etc., ) being sent on the passive paths.

Yes, if the root device happen to be on a rdac storage, then we will be covered
by your suggestion. But we cannot rely on it though as that is not a requirement
for one to use the rdac storage.

> In either case, once the infrastructure exists in mkinitrd to determine the
handlers needed for the root device it should be straightforward to extend that
to include any other devices if it's required. Even then however, it's only
going to be effective for devices that are known to the system at the time the
initramfs is built - if additional storage requiring other handlers is later
added to the system then the image will need to be rebuilt (I don't think
including all available hardware handlers in case they might be needed will
prove a very popular approach).

Not all the hardware handlers may be needed in initrd. They need to be added
only if the storage devices gets into any issues at boot time (like the rdac does).

One of the key reasons for moving the hardware handlers from the dm-layer to the
SCSI layer is to make the hardware handler available earlier in the probe cycle
to avoiding the boot time delay.

If the rdac module is not added to initrd, then SCSI hardware handlers adds
less value in addition to the dm hardware handler.

Comment 10 Bryn M. Reeves 2008-09-26 06:18:10 UTC
Yes, you're right about partition scanning (although imho that itself is something of a bug since the units are not ready at that point - why are we sending I/O to it? This causes problems in other areas, e.g. EMC SRDF R2 devices).

Not sure I agree with your point that not all hardware handlers will be needed in the initrd - the two mentioned in this bug report (emc and rdac) will be needed in very similar situations. Passive paths on an EMC Clariion will throw errors in just the same way as the LSI/Engenio RDAC stuff.

Comment 11 IBM Bug Proxy 2008-09-26 17:50:29 UTC
I didn't comment based on the two modules that are currently in hand. I
commented on a future date where there are tens of hardware handlers :)

Comment 12 Peter Jones 2008-10-03 16:54:35 UTC
(In reply to comment #5)
> (In reply to comment #17)
> > ------- Comment From pjones 2008-09-25 15:27:24 EDT-------
> > There's really not enough information here.  Under what conditions should the
> > modules be included?
> 
> scsi_dh_rdac module is needed to support many lsi engenio based (IBM and
> non-IBM) storage devices. Hence it is needed in default initrd image that is
> shipped in the install CDs/DVDs.
> 
> w.r.t mkinitrd able to include this in the new initrd image being built: These
> modules need to be included if the module is currently being used (is it
> possible ?).
> 
> Note that it is _not_ root/boot/swap device _only_ issue.

Ok, that doesn't answer what I'm trying to ask though: how do I detect that I
should pull the module in?  What can I test on the running system to tell that
the module is needed?

Comment 13 IBM Bug Proxy 2008-10-03 18:16:35 UTC
(In reply to comment #25)
> ------- Comment From pjones 2008-10-03 12:54:35 EDT-------
> (In reply to comment #5)
> > (In reply to comment #17)
> > > ------- Comment From pjones 2008-09-25 15:27:24 EDT-------
> > > There's really not enough information here.  Under what conditions should the
> > > modules be included?
> >
> > scsi_dh_rdac module is needed to support many lsi engenio based (IBM and
> > non-IBM) storage devices. Hence it is needed in default initrd image that is
> > shipped in the install CDs/DVDs.
> >
> > w.r.t mkinitrd able to include this in the new initrd image being built: These
> > modules need to be included if the module is currently being used (is it
> > possible ?).
> >
> > Note that it is _not_ root/boot/swap device _only_ issue.
>
> Ok, that doesn't answer what I'm trying to ask though: how do I detect that I
> should pull the module in?  What can I test on the running system to tell that
> the module is needed?
>

As explained above, the default initrd image need to have the scsi_dh_rdac modules as they are built-in in your CD/DVD. (or is the initrd created during install ?)

w.r.t mkinitrd including these when creating the new initrd image... if the module is currently being used (can be found using "lsmod | grep scsi_dh_rdac" in a live system), then include them in the new initrd image. Hope it is clear now.

Comment 14 Peter Jones 2008-10-03 19:23:58 UTC
> > Ok, that doesn't answer what I'm trying to ask though: how do I detect that I
> > should pull the module in?  What can I test on the running system to tell that
> > the module is needed?
> >
> 
> As explained above, the default initrd image need to have the scsi_dh_rdac
> modules as they are built-in in your CD/DVD. (or is the initrd created during
> install ?)
> 
> w.r.t mkinitrd including these when creating the new initrd image... if the
> module is currently being used (can be found using "lsmod | grep scsi_dh_rdac"
> in a live system), then include them in the new initrd image. Hope it is clear
> now.

In combination with your first paragraph, this means we'll always be loading them on all systems.  If this is correct, then this should be part of scsi_mod or similar, and we shouldn't need to do anything in either anaconda or mkinitrd.

I suspect it's not correct, and we should only be loading this when certain hardware is present, or some similar condition.

Comment 15 IBM Bug Proxy 2008-10-03 20:46:47 UTC
We need the scsi_dh_rdac module only when the special hardware(rdac storage, DS4K) is present.

But, (acc to my understanding) the problem with the default(shipped with the install CD/DVD) initrd is that it is built at RedHat and does not know about the (install) target system (correct me if I am wrong). In which case, don't we have include scsi_dh_rdac module in the default initrd image ?!

Comment 16 Peter Jones 2008-10-06 15:35:15 UTC
Including it in the image isn't the issue.  We can include it in the image, that's fine, but we still need to detect, at runtime, if it's appropriate to load the module.  As there's no modalias info in the modules,, and there's no reason they'd be loaded based on dependencies from other modules, there needs to be some mechanism to actually determine that the module should be used.

Comment 17 IBM Bug Proxy 2008-10-09 00:21:16 UTC
Peter,

I understand your concern. But, to my knowledge it is not possible to detect the
existence of a hard disk.

One of the solution that came up is to insmod scsi_dh_rdac.ko immediately following the scsi module, and at the end of init (in initrd) rmmod it if nobody is using it.

What do you think ?

Comment 18 Denise Dumas 2008-10-13 15:49:06 UTC
IBM, please realize that if this code is not in by Friday Oct 17, it will very likely not be part of 5.3. And unless the technical questions are answered, we cannot provide a code change.

Comment 19 IBM Bug Proxy 2008-10-13 22:32:22 UTC
Peter,

One another option:
(1) - Have the scsi_dh_rdac module in the install initrd image.
(2) - Change mkinitrd to include scsi_dh_rdac module, if the module is currently being used (if "lsmod | grep ^scsi_dh_rdac| awk '{print $3}'" return more than 0), then include the module in the new initrd image that is being created.

Having the scsi_dh_rdac module in the install initrd image ((1) above) would enable user to provide "module=scsi_dh_rdac" as part of the (anaconda) install option, which would insert the module.

And later, when the installer runs the mkinitrd command, it would add the scsi_dh_rdac module to the initrd image (due to (2)) properly.

This method will add the module to initrd image only when mkinitrd sees the module being currently used.

Please respond with your comments.

Comment 20 Peter Jones 2008-10-15 22:11:59 UTC
Why all of this extra mechanism that's specific to this module instead of making it use the existing modalias mechanism like devices on other buses do?

Comment 21 IBM Bug Proxy 2008-10-17 12:16:36 UTC
Any comments ? Please respond.

Comment 22 IBM Bug Proxy 2008-10-17 19:41:10 UTC
scsi does not have the modalias support to add this to it.

Even if it does, there is no 2 stage scan (as in PCI, PCMCIA etc,) where we can use the vendor-product info to get in the module.

We need to have this module in early on (at the probe time of the scsi disk).

If we were to support the device properly at install time, this module must be present in the install initrd image and the user must be able to select it to be inserted.

Comment 23 Bill Nottingham 2008-10-17 20:38:14 UTC
... if it's at disk probe time, then build it in.

Anything that relies on the user to select it manually is a failed design.

Comment 24 IBM Bug Proxy 2008-10-17 20:50:36 UTC
(In reply to comment #36)
> ------- Comment From notting 2008-10-17 16:38:14 EDT-------
> ... if it's at disk probe time, then build it in.
>

We do not have to build it in as it is needed only if SCSI is present.

> Anything that relies on the user to select it manually is a failed design.
>

Yes, that is the purpose of this bugzilla. To make the things happen automatically (without user needing to do anything).

Without this feature being fixed, we have to ask the user to follow some steps, which leads to user dissatisfaction and also to failures.

Comment 25 Bill Nottingham 2008-10-17 20:56:10 UTC
(In reply to comment #24)
> We do not have to build it in as it is needed only if SCSI is present.

Well, sd should (theoretically) be built in, but that's a different isuse.

> > Anything that relies on the user to select it manually is a failed design.
> 
> Yes, that is the purpose of this bugzilla. To make the things happen
> automatically (without user needing to do anything).
> 
> Without this feature being fixed, we have to ask the user to follow some steps,
> which leads to user dissatisfaction and also to failures.

I'm not seeing that - the proposal *stilL* requires the user to follow steps, such as passing module= on the commandline.

Comment 26 IBM Bug Proxy 2008-10-17 21:10:47 UTC
(In reply to comment #38)
> ------- Comment From notting 2008-10-17 16:56:10 EDT-------
> (In reply to comment #24)
> > We do not have to build it in as it is needed only if SCSI is present.
>
> Well, sd should (theoretically) be built in, but that's a different isuse.

I am not going to discuss that :)
>
> > > Anything that relies on the user to select it manually is a failed design.
> >
> > Yes, that is the purpose of this bugzilla. To make the things happen
> > automatically (without user needing to do anything).
> >
> > Without this feature being fixed, we have to ask the user to follow some steps,
> > which leads to user dissatisfaction and also to failures.
>
> I'm not seeing that - the proposal *stilL* requires the user to follow steps,
> such as passing module= on the commandline.

we do expect user to do such things, for example, when they want to use multipathed device (or an iscsi device) for root, we do expect them to provide a keyword mpath(or iscsi).

There was another idea I gave (comment previous to module=), where the module is inserted by default and removed silently if it is not used. It won't require any user action.
>

Comment 27 Bill Nottingham 2008-10-20 19:43:06 UTC
That's still a bad mechanism... having any possible install path load <some set of modules>, and then try and remove them?

There has to be some sort of vendor-specific ID that these modules can key off of.

Comment 28 IBM Bug Proxy 2008-10-20 20:12:14 UTC
(In reply to comment #40)
> ------- Comment From notting 2008-10-20 15:43:06 EDT-------
> That's still a bad mechanism... having any possible install path load <some set
> of modules>, and then try and remove them?
>
> There has to be some sort of vendor-specific ID that these modules can key off
> of.
>
This is the issues I explained above
----------
scsi does not have the modalias support to add this to it.

Even if it does, there is no 2 stage scan (as in PCI, PCMCIA etc,) where we can
use the vendor-product info to get in the module.

We need to have this module in early on (at the probe time of the scsi disk).
----------

Yes, we do have unique vendor-product id, but we cannot use it with modalias (reason above).

Comment 29 Denise Dumas 2008-10-29 15:02:39 UTC
We have a proposed kernel fix and we plan to address this issue in RHEL 5.4. But standard Red Hat procedure means that the patch must first be accepted by kernel upstream. Change is too large and not enough time for upstream approval to happen before 5.3 ships. Moving to 5.4.

Comment 30 IBM Bug Proxy 2008-11-06 18:31:12 UTC
Can you share the details of how you are planning to fix it.

As mentioned above (in comment #35), adding mod-alias to scsi won't solve the problem as we do not have 2 pass scan in that layer.

Comment 31 IBM Bug Proxy 2008-12-11 20:22:18 UTC
Reopening for RHEL 5.4.

Comment 33 RHEL Program Management 2009-01-16 21:38:09 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Comment 38 IBM Bug Proxy 2009-08-12 02:32:15 UTC
------- Comment From sekharan.com 2009-08-11 13:39 EDT-------
Hi Peter(RedHat),

Since we were not able to get James convinced of this fix for mainline, what do you think our options are ?

Can we fallback to getting this problem fixed in initrd itself (or fix it during install ) ?

Comment 39 John Jarvis 2009-08-12 05:16:05 UTC
This can re-requested for 5.5, we are out of time for 5.4.

Comment 41 IBM Bug Proxy 2009-10-07 18:00:54 UTC
------- Comment From sekharan.com 2009-10-07 13:54 EDT-------
At Linux Plumbers conference Peter Jones and myself, talked to James Bottomley
(and Kay) regarding the upstream state of this issue.

James is not happy about the multi-binding (i.e insmoding multiple modules with
a single uevent) we are doing. He wants to have single uevent to be used for
single module insertion. He asked asked the udev folks for the feature. Kay
mentioned that the item is in Greg's plate. We tried to talk to Greg, but was
unable to.

Since the feature will make it to mainline, Peter was fine with adding the
capability to initrd (anaconda/dracut) as of now.

Peter, please correct my statements, if any are incorrect.

Comment 43 Denise Dumas 2009-11-05 20:01:49 UTC
For 5.5, Peter will create a patch for the simplest fix possible,
which is "if you're using multipath, drag in all scsi_dh_* modules no matter
what". 

This is going to be difficult to test though, given the wide variety of hw configurations. 

But if it's wrong, the probable failure mode is "stuff that didn't work still doesn't".

Comment 44 IBM Bug Proxy 2009-11-05 21:00:26 UTC
------- Comment From sekharan.com 2009-11-05 15:50 EDT-------
Awesome.... Thanks Peter.

Let us know as soon as the install CD/DVD is available. We will test it with rdac storage.

If that works all other hardware handlers also should work :)

Comment 46 Peter Jones 2009-12-21 21:03:11 UTC
Created attachment 379712 [details]
Load every scsi_dh module if we find any multipath devices

Here's the patch.

Comment 48 IBM Bug Proxy 2010-01-20 17:31:16 UTC
------- Comment From sekharan.com 2010-01-20 12:27 EDT-------
Awesome, Peter.

Please comment on this bug when this gets into a RHEL6 release so that I can test it out.

Comment 49 Chris Ward 2010-02-11 10:18:17 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 50 IBM Bug Proxy 2010-02-12 20:21:43 UTC
------- Comment From sekharan.com 2010-02-12 15:13 EDT-------
Dear RedHat,

I Installed RHEL 5.5. beta and there are two problems.

1. At the install, i do not see the module scsi_dh_rdac insmodded, even though i see the multipath
devices build on the DS4K devices.
- I looked at /modules/modules.cgz and it doesn't have any of the scsi hardware handler modules
it just has scsi_dh.ko

2. I proceeded with install on a local device, completed the install.
I ran mkinitrd (after verifying that I have a multipath device which is using the scsi_dh_rdac module),
but the new mkintrd did not have the scsi_dh_rdac modules.

In effect, it doesn't work as expected in both fronts (1. using the scsi_dh_* modules at install, and
2. mkinitrd adding the hardware handler handler modules, if they are already being used).

Comment 51 John Jarvis 2010-02-16 21:28:30 UTC
Flipping back to ASSIGNED based on IBM's test feedback

Comment 52 IBM Bug Proxy 2010-02-17 16:22:02 UTC
------- Comment From tpnoonan.com 2010-02-17 11:13 EDT-------
please consider under red hat exception process for rhel5.5. as this was
covered in accepted defect:
47343  -  RHBZ 460899L LTC:5.3:201049: mkinitrd support for DM-MP SCSI Hardware
Handlers
but the fixed failed to verify in the beta. thnx

Comment 53 IBM Bug Proxy 2010-02-17 18:34:45 UTC
------- Comment From sekharan.com 2010-02-17 13:26 EDT-------
RedHat,

Currently, install initrd image does have the different modules that are not used in all the systems (for example, lsmod during my install shows dm_emc).

Can't we just include the hardware handlers in the install initrd image ? It would resolve the problems.

Comment 54 IBM Bug Proxy 2010-02-17 18:51:08 UTC
------- Comment From arobert.com 2010-02-17 13:40 EDT-------
Red Hat - here is the customer impact on why this is important to fix in RH 5.5.

Using the DS4K storage as a bootable device is a major pain point for
our customers.

Without this fix, users will not be able to use the "mpath" feature that
is available in RHEL for installation onto DS4K systems. They will have
to install the OS on a single path and later follow a process to make it
multipath.

They also have to disable all ports except one port in the HBA and make
sure that the storage controller that is active is connected to the host
through that enabled port of the HBA. This leads to lot of annoyance for
our customers.

Having this function work properly would really benefit our Red Hat customers on Power.

Comment 55 Chris Marcantonio 2010-02-17 21:52:43 UTC
It looks like the problem here is that the proposed patch uses the module scsi_wait_scan to detect the proper path to the scsi device handler modules.  Unfortunately, I don't believe we currently ship this module in RHEL5 (it's likely only upstream/in Fedora at the moment) so the rest of the function never executes.

This is likely most easily fixed by substituting "sg" in place of "scsi_wait_scan" in both places that it occurs.  For instance, the current proposed patch looks like this:

diff -urpN mkinitrd-5.1.19.6/mkinitrd.scsi-dh mkinitrd-5.1.19.6/mkinitrd
--- mkinitrd-5.1.19.6/mkinitrd.scsi-dh	2009-12-21 15:48:22.000000000 -0500
+++ mkinitrd-5.1.19.6/mkinitrd	2009-12-21 16:00:57.000000000 -0500
@@ -224,6 +224,9 @@ find_mpath_deps() {
     local arg2="$2"
     local majmin=$(cat $1/dev)
     local ret=1
+
+    find_scsi_dh_modules
+
     if [ "${arg2}" == "yes" ]; then
         if is_emc ${1} ${devpath} ; then
             ret=0
@@ -358,6 +361,15 @@ findmodule() {
     fi
 }
 
+find_scsi_dh_modules() {
+    local scsipath=$(modprobe --set-version $kernel --show-depends scsi_wait_scan 2>/dev/null | awk '/^insmod / { print $2; }' | tail -1)
+    scsipath="${scsipath%%scsi_wait_scan.ko}device_handler/"
+    [ -d "$scsipath" ] || return
+    for x in $scsipath/*.ko ; do
+        findmodule -${x%%.ko}
+    done
+}
+
 inst() {
     if [ "$#" != "2" ];then
         echo "usage: inst <file> <destination>"


A simple change to something like this:


diff -urpN mkinitrd-5.1.19.6/mkinitrd.scsi-dh mkinitrd-5.1.19.6/mkinitrd
--- mkinitrd-5.1.19.6/mkinitrd.scsi-dh	2009-12-21 15:48:22.000000000 -0500
+++ mkinitrd-5.1.19.6/mkinitrd	2009-12-21 16:00:57.000000000 -0500
@@ -224,6 +224,9 @@ find_mpath_deps() {
     local arg2="$2"
     local majmin=$(cat $1/dev)
     local ret=1
+
+    find_scsi_dh_modules
+
     if [ "${arg2}" == "yes" ]; then
         if is_emc ${1} ${devpath} ; then
             ret=0
@@ -358,6 +361,15 @@ findmodule() {
     fi
 }
 
+find_scsi_dh_modules() {
+    local scsipath=$(modprobe --set-version $kernel --show-depends sg 2>/dev/null | awk '/^insmod / { print $2; }' | tail -n 1)
+    scsipath="${scsipath%%sg.ko}device_handler/"
+    [ -d "$scsipath" ] || return
+    for x in $scsipath/*.ko ; do
+        findmodule -${x%%.ko}
+    done
+}
+
 inst() {
     if [ "$#" != "2" ];then
         echo "usage: inst <file> <destination>"


Will likely solve why this doesn't appear to be running.  Note I also changed the syntax on the tail command to be POSIX compliant, but that's just kind of picking nits, as it worked fine the other way too.

Comment 56 Chris Marcantonio 2010-02-17 21:59:06 UTC
Created attachment 394838 [details]
adjustment to Peter's patch to use the sg module instead of scsi_wait_scan so that it is compatible with RHEL5

Comment 57 IBM Bug Proxy 2010-02-18 00:30:49 UTC
------- Comment From sekharan.com 2010-02-17 19:27 EDT-------
I do not totally understand how these changes in mkinitrd leads to the modules being available at install time :)

With my limited understanding, I found two issues:

1. In find_scsi_dh_modules(), the snippet
for x in $scsipath/*.ko ; do
findmodule -${x%%.ko}
done
is incorrect as "x" is the whole path, we need to first get the basename and then do a find module

for x in $scsipath/*.ko ; do
y=$(basename $x)
findmodule -${y%%.ko}
done

2. I was following how dm_emc is getting included and it looks like find_mpath_deps() is not the right place for find_scsi_dh_modules() to be called from.
I think it should be called from
(around line 1478)
------------------------
# If we use dm-multipath devices, include the needed modules
if [ "$use_multipath" == "1" ]; then
findmodule -dm-multipath
findmodule -dm-round-robin
------>   find_scsi_dh_modules
if [ "$use_emc" == "1" ]; then
findmodule -dm-emc
------------------------

Please look at these issues and see if it holds any water :)

Once you are convinced that we have a working fix, if you could give me a pp64.img file, I can do a quick test and report back (instead of waiting for the next snapshot).

Comment 59 IBM Bug Proxy 2010-02-19 15:41:44 UTC
------- Comment From tpnoonan.com 2010-02-19 10:39 EDT-------
fyi, red hat's proposed fix will be in an upcomming snapshot for verification

Comment 60 IBM Bug Proxy 2010-02-19 18:11:08 UTC
------- Comment From sekharan.com 2010-02-19 13:05 EDT-------
(In reply to comment #70)
> fyi, red hat's proposed fix will be in an upcomming snapshot for verification

I presume it is snap2 you are referring here.

RedHat,

As I mentioned above if I can get a ppc64.img I could test it out before snap2 rolls out :)

Comment 61 IBM Bug Proxy 2010-02-22 19:11:38 UTC
------- Comment From tpnoonan.com 2010-02-22 14:08 EDT-------
which snapshot is the fix planned for?

Comment 62 Peter Jones 2010-02-23 19:53:30 UTC
(In reply to comment #57)
> ------- Comment From sekharan.com 2010-02-17 19:27 EDT-------
> I do not totally understand how these changes in mkinitrd leads to the modules
> being available at install time :)
> 
> With my limited understanding, I found two issues:
> 
> 1. In find_scsi_dh_modules(), the snippet
> for x in $scsipath/*.ko ; do
> findmodule -${x%%.ko}
> done
> is incorrect as "x" is the whole path, we need to first get the basename and
> then do a find module
> 
> for x in $scsipath/*.ko ; do
> y=$(basename $x)
> findmodule -${y%%.ko}
> done

I'm going with "local y=${x##*/}" rather than basename, but yes, this is correct.

> 2. I was following how dm_emc is getting included and it looks like
> find_mpath_deps() is not the right place for find_scsi_dh_modules() to be
> called from.
> I think it should be called from
> (around line 1478)
> ------------------------
> # If we use dm-multipath devices, include the needed modules
> if [ "$use_multipath" == "1" ]; then
> findmodule -dm-multipath
> findmodule -dm-round-robin
> ------>   find_scsi_dh_modules
> if [ "$use_emc" == "1" ]; then
> findmodule -dm-emc

Fair enough, I guess - it was implemented originally as "find the device handler for this device", and so at that time finding it in the dep tree was better, but now that we're just loading whatever is available, a more centralized location is more reasonable.

> Once you are convinced that we have a working fix, if you could give me a
> pp64.img file, I can do a quick test and report back (instead of waiting for
> the next snapshot).    

I'm not really equipped to do this.

Comment 63 Peter Jones 2010-02-23 19:59:05 UTC
Fixed in 5.1.19.6-59 .

Comment 65 John Jarvis 2010-02-23 20:41:31 UTC
This is approved for snapshot 3.

Comment 67 IBM Bug Proxy 2010-03-05 01:30:51 UTC
------- Comment From sekharan.com 2010-03-04 20:23 EDT-------
Installed Snapshot 3 (uname -r returns 2.6.18-190.el5), it is not working as expected.

I see two fixes in, but one fix is still not in.

The change cmarcant proposed to change "scsi_wait_scan" to "sg" to locate the modules directory in function find_scsi_dh_modules() is not available. /sbin/mkinitrd still shows "scsi_wait_scan".
Without this change the function find_scsi_dh_modules() returns nothing.

------------------

Here is his proposal:
From:
----------------
+find_scsi_dh_modules() {
+    local scsipath=$(modprobe --set-version $kernel --show-depends
scsi_wait_scan 2>/dev/null | awk '/^insmod / { print $2; }' | tail -1)
+    scsipath="${scsipath%%scsi_wait_scan.ko}device_handler/"
----------------

To:
----------------
+find_scsi_dh_modules() {
+    local scsipath=$(modprobe --set-version $kernel --show-depends sg
2>/dev/null | awk '/^insmod / { print $2; }' | tail -n 1)
+    scsipath="${scsipath%%sg.ko}device_handler/"

----------------

Comment 71 Peter Jones 2010-03-08 22:08:03 UTC
mkinitrd-5.1.19.6-61 has the change from comment 56 in it.

Comment 72 IBM Bug Proxy 2010-03-09 02:30:57 UTC
Created attachment 398666 [details]
mkinitrd from snap3 system


------- Comment on attachment From sekharan.com 2010-03-08 21:27 EDT-------


Hi Peter,

Attached is the mkinitrd from snap3 system. It does have the later fixes, but not the replacement of scsi_scan_wait with sg.

Its version says "VERSION=5.1.19.6".

Just for confirming it is snap3. Here is the o/p of uname -r
-----------
# uname -r
2.6.18-190.el5
-----------

Comment 73 IBM Bug Proxy 2010-03-09 17:41:41 UTC
------- Comment From lxie.com 2010-03-09 12:34 EDT-------
> Chardra,
> Is mkinitrd-5.1.19.6-61 the only thing missing in snap3? is there anything else
> missing? please confirm.

As I mentioned above in comment #59, I am not sure how the changes in mkinitrd
leads to the module being available in install image, But, my understanding
(assumption) is that it does.

If the change proposed above exists in mkinitrd (when RHEL build their install
image) then it should work. That is the only thing needed.

> Thanks,
> Linda

Comment 75 IBM Bug Proxy 2010-03-11 19:11:24 UTC
------- Comment From sekharan.com 2010-03-11 14:03 EDT-------
Installed snap 4 (uname -r prints 2.6.18-191.el5). Problem still exists.

/sbin/mkinitrd still have scsi_wait_scan instead of sg.

--------------------
find_scsi_dh_modules() {
local scsipath=$(modprobe --set-version $kernel --show-depends scsi_wait_scan 2>/dev/null | awk '/^insmod / { print $2; }' | tail -1)
scsipath="${scsipath%%scsi_wait_scan.ko}device_handler/"
[ -d "$scsipath" ] || return
for x in $scsipath/*.ko ; do
local h=${x##*/}
findmodule -${h%%.ko}
done
}
--------------------

Comment 76 John Jarvis 2010-03-11 22:17:32 UTC
The final version of the fix did not make snapshot 4, it will be in the Release Candidate.

Comment 77 IBM Bug Proxy 2010-03-19 04:51:17 UTC
------- Comment From sekharan.com 2010-03-19 00:43 EDT-------
I tried RC1.

I do not see the scsi_dh_.* modules when the install even when I provide mpath in the install
command line.

When I install on a local drive with some DS4K storage (that needs scsi_dh_rdac module), I do not see the module added to the new initrd.

OTOH, if I install onto DS4K storage, i _do_ see the scsi_dh_rdac module added to the new initrd.

IOW, as I suspected earlier, having the changes in initrd did not get these modules show up in install kernel.

The request in this BZ is to make the scsi_dh modules added to the installed initrd when there is a device that needs the module, irrespective of the OS being installed in that storage.

Comment 79 errata-xmlrpc 2010-03-30 08:59:53 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0295.html

Comment 80 IBM Bug Proxy 2010-03-30 16:03:12 UTC
------- Comment From dvhltc.com 2010-03-30 11:54 EDT-------
(In reply to comment #86)
> An advisory has been issued which should help the problem

As I understand it the problem is only half addressed at best. Unless I've really misunderstood the current state of things, we still cannot use SAN partitions when doing the primary install to a local drive. The recent changes have made SAN-only installs work, but not local+SAN installs. Shouldn't we keep this open until we've seen this all the way through?

Comment 81 John Jarvis 2010-03-30 16:47:00 UTC
No, please open a new bug report specifically on the issues outstanding and reference this Bugzilla.

Comment 82 John Call 2010-03-30 18:53:14 UTC
As a customer of both Red Hat and IBM, may I please voice my concern over this half-resolution?  I am very grateful of your efforts thus far.  However, the true feature/functionality set will be complete when the patches are in place to include scsi_dh_rdac whenever LSI / IBM / etc... storage is detected.  This subtle defect is quite irksome and I have become very interested in its complete resolution.  Thank you kindly for your attention.

Comment 83 IBM Bug Proxy 2010-03-31 20:30:50 UTC
------- Comment From dvhltc.com 2010-03-31 16:26 EDT-------
I tried a SAN install on elm9m84 (LS21) of RHEL5.5 RC2 (which I'm told is equivalent to GA) with both paths enabled and both controllers enabled and boot selectable. I was able to select the language, keyboard layout, and the network device to install from, then the text install screen went blank (except for the key tips on the bottom) and didn't return after several minutes. My understanding from the above is that this should work. Chandra, how did you install?

Comment 84 IBM Bug Proxy 2010-03-31 22:11:54 UTC
------- Comment From dvhltc.com 2010-03-31 18:09 EDT-------
Previous comment was user error (NFS Server IP needed to be changed) and I missed mpath on the kernel boot command line. With that corrected I am able to proceed through the install to an mpath device.