Bug 500998 - [EMC 5.6 bug] DM takes 3 to 5 hours to build device maps for 1024 iSCSI LUNs
Summary: [EMC 5.6 bug] DM takes 3 to 5 hours to build device maps for 1024 iSCSI LUNs
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath
Version: 5.3
Hardware: All
OS: Linux
high
high
Target Milestone: ---
: 5.6
Assignee: Ben Marzinski
QA Contact: Cluster QE
URL:
Whiteboard:
: 474855 (view as bug list)
Depends On:
Blocks: 557597 5.6-Known_Issues
TreeView+ depends on / blocked
 
Reported: 2009-05-15 12:37 UTC by Wayne Berthiaume
Modified: 2011-01-05 04:17 UTC (History)
31 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
By default, the multipathd service starts up before the iscsi service. This provides multipathing support early in the bootup process and is necessary for multipathed ISCSI SAN boot setups. However, once started, the multipathd service adds paths as informed about them by udev. As soon as the multipathd service detects a path that belongs to a multipath device, it creates the device. If the first path that multipathd notices is a passive path, it attempts to make that path active. If it later adds a more optimal path, multipathd activates the more optimal path. In some cases, this can cause a significant overhead during a startup. If you are experiencing such performance problems, define the multipathd service to start after the iscsi service. This does not apply to systems where the root device is a multipathed ISCSI device, since it the system would become unbootable. To move the service start time run the following commands: # mv /etc/rc5.d/S06multipathd /etc/rc5.d/S14multipathd # mv /etc/rc3.d/S06multipathd /etc/rc3.d/S14multipathd To restore the original start time, run the following command: # chkconfig multipathd resetpriorities
Clone Of:
Environment:
Last Closed: 2010-11-23 22:30:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Change to /etc/init.d multipath to wait for udev (275 bytes, patch)
2010-05-27 04:38 UTC, Ben Marzinski
no flags Details | Diff

Description Wayne Berthiaume 2009-05-15 12:37:30 UTC
Description of problem:
A reboot with the iSCSI software initiator and DM-MPIO takes too long to complete. The server is up, the SCSI devices are created and seen in /proc/scsi/scsi within a few minutes; however, DM takes several hours to complete the construction of the device maps. A comparison of the same configuration using a fibre channel HBA only takes minutes to complete. Further testing was performed with EMC PowerPath and iSCSI, and it, too, only took minutes to complete. We took it one step further with SLES 10 SP2 and the same configuration of iSCSI and DM-MPIO, and it only took a few minutes to come up and create the device maps. 

Version-Release number of selected component (if applicable):
RHEL 5.3 release

How reproducible:
Always

Steps to Reproduce:
1.Attach a server to 1024 LUNs provided thru an iSCSI connection using the software iSCSI stack configured with DM-MPIO
2.Reboot the server or use "iscsiadm -m session -R"
3.Using "multipath -l | grep mpath | wc -l" watch for the number of newly created devices increase
  
Actual results:
Hours for device maps to be created.

Expected results:
A few minutes should be all it takes to create the device maps so devices are available for mount() to access them.

Additional info:
We started to investigate the difference in the ways SLES 10 SP2 created the device maps and the way RHEL 5.3 approaches this and one thing we do see is RHEL 5.3 spawns a multipath() process for each and every device; whereas, this is not the case for SLES 10 SP2. We are wondering if this may be at the root of the issue or not. 
The other possibility may be the way DM is probing the iSCSI LUNs. The target is in PNR mode so it is presented with inaccessible devices. We will test ALUA as well.

Comment 1 Bryn M. Reeves 2009-05-15 13:36:25 UTC
Could you try disabling one of the udev rules in /etc/udev/rules.d/40-multipath.rules?

If you comment out the line that looks like:

KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"

This should prevent the multipath processes being spawned for every device (new paths will still be picked up and handled by the daemon via the uevent mechanism).

Comment 2 Don 2009-05-15 19:11:07 UTC
I have tried the above and it did indeed stop all of the processes from being created and I was able to see all 1024 luns in /proc/scsi/scsi, /dev/sd*, dev/sg, and /dev/mapper/mpath right after the reboot.
However in the messages file I get these messages for about 45 minutes.
But what really bothers me is I have done this twice and right after the reboot of the host, in my SP log I am getting unit shutdown for trespass and about 300 of my luns trespass.


May 15 14:45:52 fry multipathd: mpath609: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 2 1 round-robin 0 1 1 66:1840 1000 round-robin 0 2 1 66:816 
May 15 14:45:52 fry multipathd: sddes: add path (uevent) 
May 15 14:45:52 fry kernel: device-mapper: multipath emc: long trespass command will be send
May 15 14:45:52 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
May 15 14:45:52 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
May 15 14:45:52 fry multipathd: mpath610: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 2 1 round-robin 0 1 1 66:1856 1000 round-robin 0 2 1 66:832 
May 15 14:45:52 fry multipathd: sddet: add path (uevent) 
May 15 14:45:52 fry kernel: device-mapper: multipath emc: long trespass command will be send
May 15 14:45:52 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
May 15 14:45:52 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
May 15 14:45:52 fry multipathd: mpath608: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 2 1 round-robin 0 1 1 66:1872 1000 round-robin 0 2 1 66:848 
May 15 14:45:52 fry multipathd: sddeu: add path (uevent) 
May 15 14:45:52 fry kernel: device-mapper: multipath emc: long trespass command will be send
May 15 14:45:52 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
May 15 14:45:52 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.

Comment 3 Don 2009-05-19 15:23:18 UTC
I have gone back and uncommented out this line and I had no trespassed luns on a reboot.

Comment 4 Andrius Benokraitis 2009-05-19 15:31:17 UTC
Donald/Bryn - sounds like this is expected behavior then? Are there any requests for actions against RHEL?

Comment 5 Bryn M. Reeves 2009-05-19 15:41:03 UTC
The udev rule in our device-mapper-multipath is a bit odd; upstream doesn't have it and RHEL5's multipathd already has all the smarts built in for map discovery etc. It certainly adds a lot of weight during bootup with the amount of forking and execing being done from udev.

It's also been implicated in some other misbehaviours (due to other bugs) e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=452897

I don't recall the exact reason that we use it in RHEL5 so I'm not sure how feasible it is for us to get rid of it entirely - Ben's probably the best one to answer here.

Comment 7 Wayne Berthiaume 2009-05-20 00:24:50 UTC
Hi Andrius.

     This is an issue if we are to support Red Hat's ability to support 8192 LUNs in RHEL 5.4. As stated above, under similar testing we don't see this issue in SLES 10 SP2. This would put Red Hat at a disadvantage in this area.

Regards,
Wayne.

Comment 8 Andrius Benokraitis 2009-05-20 03:59:51 UTC
Wayne, although not convenient, I would assume the workaround can be documented for the customers that want to go this high. Just out of curiosity, would this  documentation affect and/or satisfy EMC support statements in the short-term?

Comment 9 Don 2009-05-20 11:32:41 UTC
 I don't believe that commenting out this line is a viable work around seeing that it causes my luns to trespass

Comment 10 Wayne Berthiaume 2009-05-20 13:13:36 UTC
Hi Andrius.

    I don't see the work-around as an issue; however, it brings a new issue that needs to be resolved - the LUNs trespassing. If there is an implementation that can be made thru a work-around to get us to RHEL 5.5 (or is the tag suppose to be 5.4?) without causing the LUNs to trespass that would be fine; however, I thought the reason to flag issues during alpha/beta cycles was to fix them for that release not a subsequent release. =;^) 
    We tested 5.3 because we don't have 5.4 yet. We would like to see a fix in RHEL 5.4 where EMC and Red Hat are targetting this large LUN support. Remeber our goal is to post support of 8192 devices and this test scenario only brought us to 4096. 

Regards,
Wayne.

Comment 11 Ben Marzinski 2009-05-20 16:39:55 UTC
Those messages from comment #2 look completely normal to me.  Shouldn't dm-multipath always generate those when creating a multipath device? That's not to say that your LUNs aren't actually trespassing.

Looking at the lines which say:
May 15 14:45:52 fry multipathd: sddes: add path (uevent)

It appears that your LUNs are getting discovered by the OS after multipathd is started. Correct?  If this is the case, it may be that what is happening is this:

Since multipathd adds the paths one at a time, and it tries to make a multipath device after adding the first path, It may be making a multipath device with only the passive path, and need to do a tresspass to make it active.

To check this theory:
With the udev rule line commented out, disable multipathd with
# chkconfig --del multipathd

Then reboot, log in, and run
# multipath -l
You probably won't see any devices.  If for some reason you do, please let me know, and also run
# multipath -F
To remove them, so we can verify that multipathd can set them up without trespassing.

Once all the LUNs are visible by the OS, manually start up mulitpathd with
# service multipathd start

This will allow multipath to see all of the paths at once, and build the multipath devices with all of the paths.

Let me know if this causes the trespasses.

Comment 12 Don 2009-05-22 18:28:30 UTC
As a preliminary assessment of this procedure it seems to work like I would expect things to work. The OS saw all the luns down each path right away after boot and very shortly after multipath saw all the luns down all the paths and there were no trespasses.

Comment 13 Andrius Benokraitis 2009-07-10 15:24:24 UTC
Is the permanent commenting out of that udev line in the proposed fix viable to begin with?

Comment 14 Don 2009-07-10 15:37:15 UTC
There are also additional steps in comment 11 to prevent any trespassing.

Comment 15 Ben Marzinski 2009-07-24 21:41:01 UTC
The udev like will be commented out permanently in RHEL 5.4.  This was changed as part of the fix for bz #506715.  To avoid the trespasses, either the devices need to come up earlier, or multipathd needs to start later.  multipathd in not run as part of the initrd.  Do you know if all the drivers necessary to access the devices are loaded as part of the initrd?  If the devices can all get loaded in the initrd, then by the time multipathd starts later in the boot process, all the luns should be visible, and multipathd won't force any trespassing.

Comment 16 Don 2009-07-29 11:28:08 UTC
>Do you know if all the drivers necessary to access
>the devices are loaded as part of the initrd?

No I am not sure what ALL drivers are necessary.

Comment 17 Don 2009-08-06 18:58:08 UTC
 with S4 that line in  
/etc/udev/rules.d/40-multipath.rules has already been commented out. So its not spawning all those processes but it is taking a long time to map the luns, not as long as befofe but probably 1/2 - 1 hour and I am still getting trespasses.

Comment 18 Wayne Berthiaume 2009-08-06 19:33:27 UTC
Regarding comment #15....

This is an software iSCSI configuration, so the iSCSI driver comes up after the network during init 3. Could this be the issue? Is there antoher way to mitigate this?

Comment 19 Don 2009-08-07 18:24:55 UTC
This is what is in the messages file when I monitor /dev/mapper/ directory it adds a device about every 3 seconds and at 1024 devices thats around 45 - 50 minutes.

Aug  7 14:09:55 fry multipathd: sdcxo: add path (uevent) 
Aug  7 14:09:55 fry kernel: device-mapper: multipath emc: long trespass command will be send
Aug  7 14:09:55 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
Aug  7 14:09:55 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
Aug  7 14:09:55 fry multipathd: mpath414: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 2 1 round-robin 0 2 1 135:2192 1000 70:2720 1000 round-robin
Aug  7 14:09:55 fry multipathd: sdcxp: add path (uevent) 
Aug  7 14:09:55 fry kernel: device-mapper: multipath emc: long trespass command will be send
Aug  7 14:09:55 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
Aug  7 14:09:55 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
Aug  7 14:09:55 fry multipathd: mpath591: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 1 1 round-robin 0 2 1 8:2368 1000 70:2736 1000] 
Aug  7 14:09:55 fry multipathd: sdcxq: add path (uevent) 
Aug  7 14:09:56 fry kernel: device-mapper: multipath emc: long trespass command will be send
Aug  7 14:09:56 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
Aug  7 14:09:56 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
Aug  7 14:09:56 fry multipathd: mpath541: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 2 1 round-robin 0 1 1 70:2752 1000 round-robin 0 2 1 68:2160
Aug  7 14:09:56 fry multipathd: sdcxs: add path (uevent) 
Aug  7 14:09:56 fry kernel: device-mapper: multipath emc: long trespass command will be send
Aug  7 14:09:56 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
Aug  7 14:09:56 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
Aug  7 14:09:56 fry multipathd: mpath415: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 2 1 round-robin 0 2 1 135:2240 1000 70:2784 1000 round-robin
Aug  7 14:09:56 fry multipathd: sdcxr: add path (uevent) 
Aug  7 14:09:56 fry kernel: device-mapper: multipath emc: long trespass command will be send
Aug  7 14:09:56 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
Aug  7 14:09:56 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
Aug  7 14:09:56 fry multipathd: mpath677: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 1 1 round-robin 0 1 1 70:2768 1000] 
Aug  7 14:09:56 fry multipathd: mpath677: event checker started 
Aug  7 14:09:56 fry multipathd: dm-31: add map (uevent) 
Aug  7 14:09:56 fry multipathd: dm-31: devmap already registered 
Aug  7 14:09:57 fry kernel: device-mapper: multipath emc: emc_pg_init: sending switch-over command
Aug  7 14:10:01 fry multipathd: sdcxu: add path (uevent) 
Aug  7 14:10:01 fry kernel: device-mapper: multipath emc: long trespass command will be send
Aug  7 14:10:01 fry kernel: device-mapper: multipath emc: honor reservation bit will not be set (default)
Aug  7 14:10:01 fry kernel: device-mapper: multipath: Using dm hw handler module emc for failover/failback and device management.
Aug  7 14:10:01 fry multipathd: mpath542: load table [0 2097152 multipath 1 queue_if_no_path 1 emc 2 1 round-robin 0 1 1 71:2560 1000 round-robin 0 2 1 68:2224
Aug  7 14:10:01 fry multipathd: sdcxt: add path (uevent)

Comment 20 Ben Marzinski 2009-08-12 18:00:10 UTC
have you tried moving multipathd to later in the /etc/rc3.d directory.  Try doing

# cd /etc/rc3.d
# rm S06multipathd
# ln -s ../init.d/multipathd S98multipathd

To make multipathd get started later during bootup and see if that helps.

Comment 21 Don 2009-08-19 14:15:09 UTC
I made this change and it had no affect on the outcome.

Comment 23 Andrius Benokraitis 2010-01-27 17:15:09 UTC
Has RHEL 5.5 Alpha been tested?

Comment 24 Don 2010-01-27 18:14:35 UTC
I cannot get alpha to load I am waiting for Beta.

Comment 25 Andrius Benokraitis 2010-01-27 18:44:10 UTC
Alpha and Beta I believe are going to be very similar, correct Ben?

Comment 26 Don 2010-01-27 18:52:09 UTC
I hae only tried Alpha 6.0 I just noticed that this was a 5.5 test. Has anything benn actualy worked on to correct this problem?

Comment 28 Andrius Benokraitis 2010-02-17 20:41:22 UTC
Doesn't look like anything was committed to RHEL 5.5, and we are at the end of 5.5 development, and deferring to 5.6.

Comment 29 Ben Marzinski 2010-05-12 17:29:14 UTC
*** Bug 474855 has been marked as a duplicate of this bug. ***

Comment 30 Ben Marzinski 2010-05-27 04:38:46 UTC
Created attachment 417114 [details]
Change to /etc/init.d multipath to wait for udev

I'm not sure that this will fix the problem, but if udev has started processing the block devices, this should make it wait until all of them are finished.  If this doesn't work, then I'll need to add a configuration option to allow multipathd to wait on startup, until no more uevents come in for a certain period of time.

Comment 31 Ben Marzinski 2010-08-24 20:08:01 UTC
would it be possible for someone at EMC to test the attached patch, so I know if I need to add the startup code to multipathd.

Comment 32 Wayne Berthiaume 2010-08-25 03:16:35 UTC
Hi Ben.

The resource that was testing this is not available at this time. I'll see what we can do to get this tested.

Regards,
Wayne.

Comment 33 Ben Marzinski 2010-09-24 18:45:28 UTC
I've managed to reproduce this, and changing the startup order, like I mentioned in Comment 20, solves the issue for me.  Perhaps you were run at run level 5 instead of 3, and you needed to change rc5.d. Either that or iscsi used to start up the devices in the background, and it now waits on them.

Comment 37 Ben Marzinski 2010-11-02 15:48:26 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
By default, the multipathd service starts up before the iscsi service. This
provides multipathing support early in the bootup process, and is necessary for
multipathed ISCSI SAN boot setups.  However, once multipathd has started, it
adds paths as it is informed about them by udev. As soon as multipathd sees a
path that belongs to a multipath device, it will create that device. If the
first path that multipathd notices is a passive path, it will attempt to make
that path active.  If it later adds a more optimal path, multipathd will switch
pathgroups, to make the more optimal path active.  In some case, this can cause
a significant overhead during startup, since multipathd may have to repeatedly
perform time consuming actions to switch the active path back and and forth.

In cases where this is causing problems, the multipathd service can be moved to
start after the iscsi service to avoid these long delays.  This must not be 
done in cases where the root device is a multipathed ISCSI device, since it
will cause the system to become unbootable. To move the service start time run:

# mv /etc/rc5.d/S06multipathd /etc/rc5.d/S14multipathd
# mv /etc/rc3.d/S06multipathd /etc/rc3.d/S14multipathd

To restore multipathd to it's original start time, run

# chkconfig multipathd resetpriorities

Comment 39 Andrius Benokraitis 2010-11-16 14:53:20 UTC
Wayne - would a technical note be sufficient for EMC's needs? A permanent fix would need a full beta cycle and would change the way multipathd's init script works in the future and would need really thorough testing with SAN boot setups.

So you options are:

1) Technical note only for 5.6 (see above)

2) Technical note for 5.6 and a possible fix for 5.7

Comment 40 Andrius Benokraitis 2010-11-23 16:05:12 UTC
Received the OK from EMC for option 1 in Comment #39. (Technical note only for 5.6, and then close bug).

Comment 41 Tom Coughlan 2010-11-23 22:30:39 UTC
Okay, so in theory, we should close this BZ now. The Tech Note will make its way into the 5.6 documentation separately.

Comment 42 Eva Kopalova 2010-12-10 14:20:33 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,22 +1,18 @@
 By default, the multipathd service starts up before the iscsi service. This
-provides multipathing support early in the bootup process, and is necessary for
-multipathed ISCSI SAN boot setups.  However, once multipathd has started, it
-adds paths as it is informed about them by udev. As soon as multipathd sees a
-path that belongs to a multipath device, it will create that device. If the
-first path that multipathd notices is a passive path, it will attempt to make
-that path active.  If it later adds a more optimal path, multipathd will switch
-pathgroups, to make the more optimal path active.  In some case, this can cause
-a significant overhead during startup, since multipathd may have to repeatedly
-perform time consuming actions to switch the active path back and and forth.
+provides multipathing support early in the bootup process and is necessary for
+multipathed ISCSI SAN boot setups. However, once started, the multipathd service
+adds paths as informed about them by udev. As soon as the multipathd service detects a
+path that belongs to a multipath device, it creates the device. If the
+first path that multipathd notices is a passive path, it attempts to make
+that path active. If it later adds a more optimal path, multipathd activates the more optimal path. In some case, this can cause
+a significant overhead during a startup.
 
-In cases where this is causing problems, the multipathd service can be moved to
-start after the iscsi service to avoid these long delays.  This must not be 
-done in cases where the root device is a multipathed ISCSI device, since it
-will cause the system to become unbootable. To move the service start time run:
+If you are experiencing such performance problems, define the multipathd service to
+start after the iscsi service. This does not apply to systems where the root device is a multipathed ISCSI device, since it the system would become unbootable. To move the service start time run the following commands:
 
 # mv /etc/rc5.d/S06multipathd /etc/rc5.d/S14multipathd
 # mv /etc/rc3.d/S06multipathd /etc/rc3.d/S14multipathd
 
-To restore multipathd to it's original start time, run
+To restore the original start time, run the following command:
 
 # chkconfig multipathd resetpriorities

Comment 44 Ryan Lerch 2011-01-05 04:17:37 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -4,7 +4,7 @@
 adds paths as informed about them by udev. As soon as the multipathd service detects a
 path that belongs to a multipath device, it creates the device. If the
 first path that multipathd notices is a passive path, it attempts to make
-that path active. If it later adds a more optimal path, multipathd activates the more optimal path. In some case, this can cause
+that path active. If it later adds a more optimal path, multipathd activates the more optimal path. In some cases, this can cause
 a significant overhead during a startup.
 
 If you are experiencing such performance problems, define the multipathd service to


Note You need to log in before you can comment on or make changes to this bug.