Bug 487502
Summary: | [NetApp 4.9 bug] Delayed LUN discovery on RHEL4 | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Ritesh Raj Sarraf <rsarraf> |
Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> |
Status: | CLOSED WONTFIX | QA Contact: | Cluster QE <mspqa-list> |
Severity: | high | Docs Contact: | |
Priority: | low | ||
Version: | 4.8 | CC: | agk, andriusb, bmarzins, bmr, bstevens, christophe.varoqui, coughlan, dwysocha, egoggin, heinzm, iannis, jbrassow, junichi.nomura, kueda, lmb, marting, mbroz, mhideo, prockai, rajashekhar.a, rlerch, rsarraf, syeghiay, tanvi, tranlan, xdl-redhat-bugzilla |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | 4.9 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | 460301 | Environment: | |
Last Closed: | 2010-11-02 18:18:41 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 460301 | ||
Bug Blocks: | 459969, 626414 |
Description
Ritesh Raj Sarraf
2009-02-26 11:37:35 UTC
We have a similar situation here in RHEL4. RHEL4 additionally had an OOM issue during large num. of LUN discovery, which now seems to be fixed. But RHEL4 also suffers from the same delayed LUN discovery problem. Putting on the RHEL 4.8 list, but it's already very late to propose anything at this point. Deferring to devel on the feasibility of this since only a rel note was included for RHEL 5.3 AFAIK. Ben, the right thing to do in 4.8 is to add a release note. Does the note in 5.3 Bug #460301 apply here as-is? No. Unfortunately, multipathd isn't able to create new devices by itself in RHEL4. Multipath has a cache that tries to speed it up on repeat calls, but it might not be working correctly. Getting this working will actually take some code changes. In RHEL4, multipath must be called for every path that gets added. It has a cache to help speed things up, but I haven't been able to prove that this is a cache problem yet. It might simply be this slow in some circumstances. There's a chance that some speedup work can be done, but I haven't found an easy target yet. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. Sorry. This bug got closed accidentally. (In reply to comment #13) > Sorry. This bug got closed accidentally. Ben, any update on this bug? NetApp: Please note this is lowest priority item of your 4.9 items, and this will get addressed as time allows. I'm not sure that there will be any way to singificantly fix this problem without a significant change in how either multipath devices are created, or how multipath does it's caching. In RHEL5 and later multipathd was able to create multipath devices itself. In RHEL4 multipath must do it. In order to correctly create the device, multipath needs to search all the multipathable devices to find the ones that it will use. This scanning needs to happen to all the devices for each device that gets added. In order to cut this down, multipath pulls the path information from multipathd, for all the paths that it knows about. However, this won't acutally cause much of a speed increase when a large number of devices are discovered at once. Right now, the code doesn't really provide any speed increase, because multipath fetches all the information about the cached paths again. I tried adding some code to make multipath only get the information it actually needs. However, when a large number of devices are being added at the same time, multipathd isn't able to provide the slowest information to get back to multipath, and so multipath needs to fetch it by itself. Even worse, when a lot of multipath devices are being created all at once, there's a long wait between when the scsi device shows up in sysfs and when multipathd knows about it at all. All of the multipath devices that start during this time won't get any information from multipathd on these devices. One idea that might solve this problem is to add an option that forces multipath to grab a lockfile before running. This will mean that you can guarantee that only one process is running at once. In this case, since you are only interested in the device you are trying to add, you can safely ignore all the other multipathable devices, unless they are already in a multipath device. If there is a multipath device with the same WWID as the path you are adding, you need to grab the path information from just those path devices, and then you can build a new multipath device. If you don't find a multipath device with the same WWID, you just make your own. Since the multipath processes are serialized, you don't need to worry that another multipath process is working on another path with the same WWID at the same time. However, while this approach would do significanly less work, it would involve a substantial amout of new code, and in the end, it may not be significantly faster, since it would force everything to run serially. The only other solution I can see would be to have multipathd do the creation, like in RHEL5, but that would be a much larger change. Closing due to the amount of work required in RHEL 4. |