Bug 460301 - Long time is taken to discover large number of LUNs
Long time is taken to discover large number of LUNs
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: redhat-release-notes (Show other bugs)
5.2
All Linux
high Severity high
: rc
: ---
Assigned To: Ryan Lerch
Content Services Development
: Documentation
Depends On:
Blocks: 373081 RHEL5u3_relnotes 487502
  Show dependency treegraph
 
Reported: 2008-08-27 09:44 EDT by Tanvi
Modified: 2009-08-20 00:23 EDT (History)
25 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When a large number of LUNs are added to a node, multipath can significantly increase the time it takes for udev to create device nodes for them. If you experience this problem, you can correct it by deleting the following line in /etc/udev/rules.d/40-multipath.rules: KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m" This line causes udev to run multipath every time a block device is added to the node. Even with this line remove, multipathd will still automatically create multipath devices, and multipath will still be called during the boot process, for nodes with multipathed root filesystems. The only change is that multipath devices will not be automatically created when multipathd is not running, which should not be a problem for the vast majority of multipath users.
Story Points: ---
Clone Of:
: 487502 (view as bug list)
Environment:
Last Closed: 2009-03-02 00:51:10 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tanvi 2008-08-27 09:44:16 EDT
Description of problem:

It takes long time to create device entries when a large number of LUNs are mapped to the host.
During testing, when 256 LUNs with 4 paths were mapped to the host, it took around 30 minutes to add all the sd* entries in /dev/ directory. What is bad is that during the discovery time period, the OS becomes sluggish till the time all entries are created. The system is not usable at all.
Is it an expected behavior for the OS to take so long (30 minutes) to discover 256 LUNs (256*4=1024 devices)?
The system being barely usable during discovery does not stand as an acceptable quality.

Version-Release number of selected component (if applicable):
uedv-095-14.16.el5
device-mapper-1.02.24-1.el5
device-mapper-multipath-0.4.7-17

How reproducible:
Always

Steps to Reproduce:
1.Map 256 LUNs with 4 paths each
2.Discover the devices on the host
  
Actual results:
It takes long time to add all the device entries. System becomes very sluggish till the time all the entries are created.

Expected results:
It should not take *30 minutes* to create the entries and system should not go sluggish.

Additional info:
On RHEL5 the multipath package ships /sbin/mpath_wait script which is not available upstream. FWIU this generates a delay of around 3 seconds.
Comment 2 Harald Hoyer 2008-08-28 06:40:52 EDT
$ rpm -qf /sbin/mpath_wait
device-mapper-multipath-0.4.7-16.fc9.x86_64
Comment 3 Harald Hoyer 2008-08-28 06:53:09 EDT
what happens, if you rename /sbin/mpath_wait to /sbin/mpath_wait.old or remove device-mapper-multipath?
Comment 4 Tanvi 2008-08-28 10:49:19 EDT
mpath_wait does not make much of a difference. 

Aug 27 18:16:18 localhost udevd-event[7515]: run_program: '/sbin/mpath_wait 253 3'
Aug 27 18:16:19 localhost udevd-event[7515]: run_program: Waiting 1 seconds for output of '/sbin/mpath_wait 253 3(7516)'
Aug 27 18:16:20 localhost udevd-event[7515]: run_program: Waiting 2 seconds for output of '/sbin/mpath_wait 253 3(7516)'
Aug 27 18:16:21 localhost udevd-event[7515]: run_program: Waiting 3 seconds for output of '/sbin/mpath_wait 253 3(7516)'
Aug 27 18:16:21 localhost udevd-event[7515]: run_program: '/sbin/mpath_wait' returned with status 1
Aug 27 18:16:21 localhost udevd-event[7639]: run_program: '/sbin/mpath_wait 253 3'
Aug 27 18:16:22 localhost udevd-event[7639]: run_program: Waiting 1 seconds for output of '/sbin/mpath_wait 253 3(7640)'
Aug 27 18:16:23 localhost udevd-event[7639]: run_program: Waiting 2 seconds for output of '/sbin/mpath_wait 253 3(7640)'
Aug 27 18:16:24 localhost udevd-event[7639]: run_program: Waiting 3 seconds for output of '/sbin/mpath_wait 253 3(7640)'
Aug 27 18:16:24 localhost udevd-event[7639]: run_program: '/sbin/mpath_wait' returned with status 1


So whatever the mpath_wait logic is, it always iterates thrice and exits with status 1.

The majority of the time eaten is by /sbin/multipath -v0

Aug 27 19:15:22 localhost udevd-event[20609]: run_program: Waiting 11 seconds for output of '/sbin/multipath -v0 135:752(21005)'
Aug 27 19:15:22 localhost udevd-event[17309]: run_program: Waiting 26 seconds for output of '/sbin/multipath -v0 135:704(17618)'
Aug 27 19:15:22 localhost udevd-event[21996]: run_program: Waiting 6 seconds for output of '/sbin/multipath -v0 8:784(22349)'
Aug 27 19:15:22 localhost udevd-event[17074]: run_program: Waiting 22 seconds for output of '/sbin/multipath -v0 135:592(18512)'
Aug 27 19:15:22 localhost udevd-event[2650]: run_program: Waiting 80 seconds for output of '/sbin/multipath -v0 135:576(2696)'
Aug 27 19:15:22 localhost udevd-event[17351]: run_program: Waiting 26 seconds for output of '/sbin/multipath -v0 135:720(17627)'
Aug 27 19:15:22 localhost udevd-event[17243]: run_program: Waiting 27 seconds for output of '/sbin/multipath -v0 135:672(17439)'
Aug 27 19:15:23 localhost udevd-event[20596]: run_program: Waiting 13 seconds for output of '/sbin/multipath -v0 135:736(20649)'
Aug 27 19:15:23 localhost udevd-event[17179]: run_program: Waiting 28 seconds for output of '/sbin/multipath -v0 135:640(17394)'


In some instances upto 80-90 seconds. Is this an expected behavior?

PS: If you need complete logs, please let me know. (They are huge in size)
Comment 6 Ben Marzinski 2008-09-09 14:45:35 EDT
There may be a simple workaround for this. The way the multipath udev rules are set up, you will build multipath devices whenever an appropriate block device appears, even if multipathd is not running. Unfortunately, to do that, there is a lot of redundant work that happens.

However, it isn't that much of a pain to change this so that only multipathd can create multipath devices, except in early boot.  To do that, you edit

/etc/udev/rules.d/40-multipath.rules

and remove the line

KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"

This will keep udev from firing off multipath, every time a block device becomes available.  As long as multipathd is running, it will get a NETLINK event, and it will take care of adding the multipath device.

There should be very few problems associated with this for most setups.  In the initrd, multipath will still run if it is necessary to load the root or boot device. multipath will also run once in rc.sysinit before udev starts up, to make sure any filesystems that are on multipath devices can be started correctly. When multipathd is finally started up, in will scan all the existing block devices, and create multipath devices for any that got missed.

So, really, the only problem would happen if you needed to add multipath devices during normal system operations, without having multipathd running, which is a rare case.  In RHEL 6, this udev rule will go away completely, but I would like to leave it in for RHEL 5, so that it will not surprise anyone relying on it.  

Please try this workaround, and let me know if it works for you. If so, we can probably just release note this, or possibly not use the rule if you are installing the rpm, but keep it there if you are upgrading the rpm, so that it doesn't bite existing customers.
Comment 7 Ritesh Raj Sarraf 2008-09-11 09:42:43 EDT
Ben,
Thanks. The work around works perfect. The device addition time drastically reduced from 20+ minutes to less than 3 minutes for 256 LUNs * 4 paths.

It should be good to have it documented in the Release Notes and a kB article.
Comment 8 Ben Marzinski 2008-09-15 14:40:10 EDT
Here's my stab at the release notes.
Comment 9 Ben Marzinski 2008-09-15 14:40:10 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
When a large number of LUNs are added to a node, multipath can significantly increase the time it takes for udev to create device nodes for them. If you experience this problem, you can correct it by deleting the following line in /etc/udev/rules.d/40-multipath.rules:

KERNEL!="dm-[0-9]*", ACTION=="add", PROGRAM=="/bin/bash -c '/sbin/lsmod | /bin/grep ^dm_multipath'", RUN+="/sbin/multipath -v0 %M:%m"

This line causes udev to run multipath every time a block device is added to the node.  Even with this line remove, multipathd will still automatically create multipath devices, and multipath will still be called during the boot process, for nodes with multipathed root filesystems.  The only change is that multipath devices will not be automatically created when multipathd is not running, which should not be a problem for the vast majority of multipath users.
Comment 10 Ben Marzinski 2008-09-15 14:57:03 EDT
Reassigning to Docs.
Comment 11 Michael Hideo 2008-10-30 23:50:57 EDT
This is in the release notes for 5.3

http://documentation-stage.bne.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/5.3/html-single/Release_Notes/

Can we mark this bug as MODIFIED?

Cheers,
Mike
Comment 16 RHEL Product and Program Management 2008-12-12 10:23:05 EST
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.
Comment 17 Andrius Benokraitis 2009-03-02 00:51:10 EST
I'm assuming this made 5.3 release notes - closing.

Note You need to log in before you can comment on or make changes to this bug.