Bug 487502 - [NetApp 4.9 bug] Delayed LUN discovery on RHEL4
Summary: [NetApp 4.9 bug] Delayed LUN discovery on RHEL4
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: device-mapper-multipath
Version: 4.8
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: 4.9
Assignee: Ben Marzinski
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 460301
Blocks: 459969 626414
TreeView+ depends on / blocked
 
Reported: 2009-02-26 11:37 UTC by Ritesh Raj Sarraf
Modified: 2010-11-02 18:18 UTC (History)
26 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 460301
Environment:
Last Closed: 2010-11-02 18:18:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ritesh Raj Sarraf 2009-02-26 11:37:35 UTC
+++ This bug was initially created as a clone of Bug #460301 +++

Description of problem:

It takes long time to create device entries when a large number of LUNs are mapped to the host.
During testing, when 256 LUNs with 4 paths were mapped to the host, it took around 30 minutes to add all the sd* entries in /dev/ directory. What is bad is that during the discovery time period, the OS becomes sluggish till the time all entries are created. The system is not usable at all.
Is it an expected behavior for the OS to take so long (30 minutes) to discover 256 LUNs (256*4=1024 devices)?
The system being barely usable during discovery does not stand as an acceptable quality.

Version-Release number of selected component (if applicable):
uedv-095-14.16.el5
device-mapper-1.02.24-1.el5
device-mapper-multipath-0.4.7-17

How reproducible:
Always

Steps to Reproduce:
1.Map 256 LUNs with 4 paths each
2.Discover the devices on the host
  
Actual results:
It takes long time to add all the device entries. System becomes very sluggish till the time all the entries are created.

Expected results:
It should not take *30 minutes* to create the entries and system should not go sluggish.

Additional info:
On RHEL5 the multipath package ships /sbin/mpath_wait script which is not available upstream. FWIU this generates a delay of around 3 seconds.

--- Additional comment from harald on 2008-08-28 06:40:52 EDT ---

$ rpm -qf /sbin/mpath_wait
device-mapper-multipath-0.4.7-16.fc9.x86_64

--- Additional comment from harald on 2008-08-28 06:53:09 EDT ---

what happens, if you rename /sbin/mpath_wait to /sbin/mpath_wait.old or remove device-mapper-multipath?

--- Additional comment from tanvi on 2008-08-28 10:49:19 EDT ---

mpath_wait does not make much of a difference. 

Aug 27 18:16:18 localhost udevd-event[7515]: run_program: '/sbin/mpath_wait 253 3'
Aug 27 18:16:19 localhost udevd-event[7515]: run_program: Waiting 1 seconds for output of '/sbin/mpath_wait 253 3(7516)'
Aug 27 18:16:20 localhost udevd-event[7515]: run_program: Waiting 2 seconds for output of '/sbin/mpath_wait 253 3(7516)'
Aug 27 18:16:21 localhost udevd-event[7515]: run_program: Waiting 3 seconds for output of '/sbin/mpath_wait 253 3(7516)'
Aug 27 18:16:21 localhost udevd-event[7515]: run_program: '/sbin/mpath_wait' returned with status 1
Aug 27 18:16:21 localhost udevd-event[7639]: run_program: '/sbin/mpath_wait 253 3'
Aug 27 18:16:22 localhost udevd-event[7639]: run_program: Waiting 1 seconds for output of '/sbin/mpath_wait 253 3(7640)'
Aug 27 18:16:23 localhost udevd-event[7639]: run_program: Waiting 2 seconds for output of '/sbin/mpath_wait 253 3(7640)'
Aug 27 18:16:24 localhost udevd-event[7639]: run_program: Waiting 3 seconds for output of '/sbin/mpath_wait 253 3(7640)'
Aug 27 18:16:24 localhost udevd-event[7639]: run_program: '/sbin/mpath_wait' returned with status 1


So whatever the mpath_wait logic is, it always iterates thrice and exits with status 1.

The majority of the time eaten is by /sbin/multipath -v0

Aug 27 19:15:22 localhost udevd-event[20609]: run_program: Waiting 11 seconds for output of '/sbin/multipath -v0 135:752(21005)'
Aug 27 19:15:22 localhost udevd-event[17309]: run_program: Waiting 26 seconds for output of '/sbin/multipath -v0 135:704(17618)'
Aug 27 19:15:22 localhost udevd-event[21996]: run_program: Waiting 6 seconds for output of '/sbin/multipath -v0 8:784(22349)'
Aug 27 19:15:22 localhost udevd-event[17074]: run_program: Waiting 22 seconds for output of '/sbin/multipath -v0 135:592(18512)'
Aug 27 19:15:22 localhost udevd-event[2650]: run_program: Waiting 80 seconds for output of '/sbin/multipath -v0 135:576(2696)'
Aug 27 19:15:22 localhost udevd-event[17351]: run_program: Waiting 26 seconds for output of '/sbin/multipath -v0 135:720(17627)'
Aug 27 19:15:22 localhost udevd-event[17243]: run_program: Waiting 27 seconds for output of '/sbin/multipath -v0 135:672(17439)'
Aug 27 19:15:23 localhost udevd-event[20596]: run_program: Waiting 13 seconds for output of '/sbin/multipath -v0 135:736(20649)'
Aug 27 19:15:23 localhost udevd-event[17179]: run_program: Waiting 28 seconds for output of '/sbin/multipath -v0 135:640(17394)'


In some instances upto 80-90 seconds. Is this an expected behavior?

PS: If you need complete logs, please let me know. (They are huge in size)

--- Additional comment from bmarzins on 2008-09-09 14:45:35 EDT ---

There may be a simple workaround for this. The way the multipath udev rules are set up, you will build multipath devices whenever an appropriate block device appears, even if multipathd is not running. Unfortunately, to do that, there is a lot of redundant work that happens.

However, it isn't that much of a pain to change this so that only multipathd can create multipath devices, except in early boot.  To do that, you edit

/etc/udev/rules.d/40-multipath.rules

and remove the line

KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"

This will keep udev from firing off multipath, every time a block device becomes available.  As long as multipathd is running, it will get a NETLINK event, and it will take care of adding the multipath device.

There should be very few problems associated with this for most setups.  In the initrd, multipath will still run if it is necessary to load the root or boot device. multipath will also run once in rc.sysinit before udev starts up, to make sure any filesystems that are on multipath devices can be started correctly. When multipathd is finally started up, in will scan all the existing block devices, and create multipath devices for any that got missed.

So, really, the only problem would happen if you needed to add multipath devices during normal system operations, without having multipathd running, which is a rare case.  In RHEL 6, this udev rule will go away completely, but I would like to leave it in for RHEL 5, so that it will not surprise anyone relying on it.  

Please try this workaround, and let me know if it works for you. If so, we can probably just release note this, or possibly not use the rule if you are installing the rpm, but keep it there if you are upgrading the rpm, so that it doesn't bite existing customers.

--- Additional comment from rsarraf on 2008-09-11 09:42:43 EDT ---

Ben,
Thanks. The work around works perfect. The device addition time drastically reduced from 20+ minutes to less than 3 minutes for 256 LUNs * 4 paths.

It should be good to have it documented in the Release Notes and a kB article.

--- Additional comment from bmarzins on 2008-09-15 14:40:10 EDT ---

Here's my stab at the release notes.

--- Additional comment from bmarzins on 2008-09-15 14:40:10 EDT ---


Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When a large number of LUNs are added to a node, multipath can significantly increase the time it takes for udev to create device nodes for them. If you experience this problem, you can correct it by deleting the following line in /etc/udev/rules.d/40-multipath.rules:

KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"

This line causes udev to run multipath every time a block device is added to the node.  Even with this line remove, multipathd will still automatically create multipath devices, and multipath will still be called during the boot process, for nodes with multipathed root filesystems.  The only change is that multipath devices will not be automatically created when multipathd is not running, which should not be a problem for the vast majority of multipath users.

--- Additional comment from bmarzins on 2008-09-15 14:57:03 EDT ---

Reassigning to Docs.

--- Additional comment from mhideo on 2008-10-30 23:50:57 EDT ---

This is in the release notes for 5.3

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.3/html-single/Release_Notes/

Can we mark this bug as MODIFIED?

Cheers,
Mike

--- Additional comment from pm-rhel on 2008-12-12 10:23:05 EDT ---

This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 1 Ritesh Raj Sarraf 2009-02-26 11:42:22 UTC
We have a similar situation here in RHEL4.
RHEL4 additionally had an OOM issue during large num. of LUN discovery, which now seems to be fixed.
But RHEL4 also suffers from the same delayed LUN discovery problem.

Comment 2 Andrius Benokraitis 2009-03-02 05:52:46 UTC
Putting on the RHEL 4.8 list, but it's already very late to propose anything at this point. Deferring to devel on the feasibility of this since only a rel note was included for RHEL 5.3 AFAIK.

Comment 4 Tom Coughlan 2009-03-27 22:39:10 UTC
Ben, the right thing to do in 4.8 is to add a release note. Does the note in 5.3  Bug #460301 apply here as-is?

Comment 5 Ben Marzinski 2009-03-30 15:19:11 UTC
No. Unfortunately, multipathd isn't able to create new devices by itself in RHEL4.  Multipath has a cache that tries to speed it up on repeat calls, but it might not be working correctly.  Getting this working will actually take some code changes.

Comment 11 Ben Marzinski 2010-05-04 21:45:55 UTC
In RHEL4, multipath must be called for every path that gets added. It has a cache to help speed things up, but I haven't been able to prove that this is a cache problem yet.  It might simply be this slow in some circumstances.  There's a chance that some speedup work can be done, but I haven't found an easy target yet.

Comment 12 RHEL Program Management 2010-05-04 21:53:18 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.

Comment 13 Ben Marzinski 2010-05-04 22:35:13 UTC
Sorry. This bug got closed accidentally.

Comment 15 Andrius Benokraitis 2010-10-04 18:16:49 UTC
(In reply to comment #13)
> Sorry. This bug got closed accidentally.

Ben, any update on this bug?

Comment 18 Andrius Benokraitis 2010-10-25 13:22:59 UTC
NetApp: Please note this is lowest priority item of your 4.9 items, and this will get addressed as time allows.

Comment 19 Ben Marzinski 2010-10-27 18:11:10 UTC
I'm not sure that there will be any way to singificantly fix this problem without a significant change in how either multipath devices are created, or how multipath does it's caching.  In RHEL5 and later multipathd was able to create multipath devices itself.  In RHEL4 multipath must do it.  In order to correctly create the device, multipath needs to search all the multipathable devices to find the ones that it will use.  This scanning needs to happen to all the devices for each device that gets added.

In order to cut this down, multipath pulls the path information from multipathd, for all the paths that it knows about.  However, this won't acutally cause much of a speed increase when a large number of devices are discovered at once. Right now, the code doesn't really provide any speed increase, because multipath fetches all the information about the cached paths again.  I tried adding some code to make multipath only get the information it actually needs. However, when a large number of devices are being added at the same time, multipathd isn't able to provide the slowest information to get back to multipath, and so multipath needs to fetch it by itself.

Even worse, when a lot of multipath devices are being created all at once, there's a long wait between when the scsi device shows up in sysfs and when multipathd knows about it at all.  All of the multipath devices that start during this time won't get any information from multipathd on these devices.

One idea that might solve this problem is to add an option that forces multipath to grab a lockfile before running.  This will mean that you can guarantee that only one process is running at once.  In this case, since you are only interested in the device you are trying to add, you can safely ignore all the other multipathable devices, unless they are already in a multipath device. If there is a multipath device with the same WWID as the path you are adding, you need to grab the path information from just those path devices, and then you can build a new multipath device.  If you don't find a multipath device with the same WWID, you just make your own.  Since the multipath processes are serialized, you don't need to worry that another multipath process is working on another path with the same WWID at the same time.

However, while this approach would do significanly less work, it would involve a substantial amout of new code, and in the end, it may not be significantly faster, since it would force everything to run serially.  The only other solution I can see would be to have multipathd do the creation, like in RHEL5, but that would be a much larger change.

Comment 20 Andrius Benokraitis 2010-11-02 18:18:41 UTC
Closing due to the amount of work required in RHEL 4.


Note You need to log in before you can comment on or make changes to this bug.