Bug 494681

Summary: [Brocade 5.6 bug] Very slow boot over SAN with RHEL 5.3 and DM 4.2
Product: Red Hat Enterprise Linux 5 Reporter: Ramkumar Vadivelu <rvadivel>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED CURRENTRELEASE QA Contact: Cluster QE <mspqa-list>
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: agk, andriusb, coughlan, dwysocha, heinzm, iannis, jbrassow, mbroz, prockai, revers
Target Milestone: rc   
Target Release: 5.6   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-05 14:15:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 557597    

Description Ramkumar Vadivelu 2009-04-07 18:50:23 UTC
Description of problem:
Slow boot over SAN  with RHEL 5.3/DM 4.2 with large number of LUNs

Version-Release number of selected component (if applicable):
RHEL 5.3 with DM 4.2

How reproducible:
Always

Steps to Reproduce:

RHEL 5.3
Device Mapper 4.2 

2 Brocade HBA ports (at 8G) with each seeing 14 Target ports with a total of 161 LUNs behind them (1 boot LUN + 160 normal LUNs)

With this setup, the boot over SAN is very slow (12-24 hours). The bootup is stuck at “Starting Logical Volume Manager…”.

After boot up, the system works perfectly fine. 

From the FC trace and driver trace we just see that the IOs are happening with long delays in between them (6-7 mins). There are no IO errors/aborts seen.

This is reproducible every time.

Same setup with local boot works fine. Same test with RHEL 5.2 is working fine too (with Boot over SAN).

Also, if we bring down the configuration to smaller number of LUNs (4 Target ports with 31 LUNs)  - this issue is not seen.

Actual results:
Very slow boot over SAN

Expected results:
No major delays with Boot over SAN.


Additional info:
This problem is seen only with RHEL 5.3 (not with RHEL 5.2) and is not seen with local boot (and seen with Boot over SAN).

Comment 2 Tom Coughlan 2009-04-08 02:06:19 UTC
You might start with this:

"Why are LVM2 commands, such as vgscan, taking a very long time to complete?"
http://kbase.redhat.com/faq/docs/DOC-5542

although I am not sure why RHEL 5.3 would be any different than RHEL 5.2.

Comment 3 Tom Coughlan 2009-04-08 13:45:37 UTC
Also see the dm-multipath release note discussed here:

https://bugzilla.redhat.com/show_bug.cgi?id=460301#c9

(A modification to /etc/udev/rules.d/40-multipath.rules.)

Comment 4 Ramkumar Vadivelu 2009-04-09 07:26:29 UTC
We have tried the workaround suggested in 5.3 release notes already - it didn't help. Sorry, I forgot to mention in the initial defect report.

Comment 5 Ramkumar Vadivelu 2009-09-21 17:47:42 UTC
The original problem was seen with 2GB of memory. The problem did not happen when we increased the memory to 10GB (and then we iteratively found out that the problem does not happen with 6GB or more memory).

Comment 6 Andrius Benokraitis 2009-12-15 17:44:11 UTC
Please test with RHEL 5.4 GA and the forthcoming 5.5 Beta.

Comment 8 Alasdair Kergon 2010-07-09 20:34:56 UTC
Is there still a problem when using 5.5?

Comment 9 Heinz Mauelshagen 2010-10-05 14:15:43 UTC
Closing because of no activity. If there still is a problem, reopen the bug and report evidence.