Bug 205781
Summary: | multipath/SCSI hotplug issues on RHEL4 x86_64 2.6.9-34.ELsmp | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Nick Strugnell <nstrug> |
Component: | kernel | Assignee: | Dave Wysochanski <dwysocha> |
Status: | CLOSED NOTABUG | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.3 | CC: | agk, coughlan, jwest, tao, xdl-redhat-bugzilla |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | ia32e | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-06-07 05:24:01 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Nick Strugnell
2006-09-08 14:27:19 UTC
Nick, do you get the same result with U4? Are you using LVM? If so, have you adjusted pvcreate --metadatacopies as described in the man page? Ryan, I think you were seeing something like this. Was there a solution? I ran into a similar issue with a large number of paths. I was using i386 and the lpfc driver. I wasn't able to reliably reproduce the issue of missing paths, though, as every reboot could give a completely different outcome. I believe we may be experiencing the same issue here, but I recall the sd devices being created on my test system, but the paths were not discovered by multipath. Tom, I think the slow boot is a result of the workaround code Nick added to rc.local. Something I noticed recently when I went back to read about the system with 16,000 LUNs connected, this may have to do with dropped hotplug events. They experienced events being dropped and were able to ensure all events were handled by increasing the udev buffer to 16M from the 1M it had. It is also noted that this change was made upstream, though there's no mention of a version. It's possible that our udev package needs this patched in to support this many disks. Adding netapp engineers since I recall someone finding a similar bug in rhel4 u3 - couldn't find any bugzilla on it though. Tom - Unfortunately client will not run with a different kernel unless a full root cause analysis points that way - they are in UAT and configuration is supposed to be frozen - original FAT tests were done with half as much storage and the problem didn't show up then. We are not using LVM - this is raw devices for ASM/Oracle. Nick We in Netapp also have found similar symptoms in one of our test. When large number of luns are made visible to a host, iscsi layer is able to ceate device nodes in the /dev namespace for all the visible lun. But the multipathing layer misses to create entries for some of the scsi devices. The test tried to have single path for each LUN, so we should be able to see the same number of /dev/sd* entries and /dev/dm* entries. But, we see some /dev/dm* entries. multipath layer misses to create devices for 3or 4 entries for a range of 120 to 180 iscsi devices. Stesp to recreate. 1. Map 150 iscsi LUNS to a host from filer 2. start iscsi service, we would see all 150 luns 3. start multipath service, we would see multipath create entries for < 150 iscsi devices, generally it would miss 3 to 4 luns. 4. a restart of multipath service again also misses few luns, this time, it would be different set of devices. Which points to some timing issue. This has been seen on both rhel4 u3 and rhel4 u4 x86 versions. (In reply to comment #5) > We in Netapp also have found similar symptoms in one of our test. When large > number of luns are made visible to a host, iscsi layer is able to ceate device > nodes in the /dev namespace for all the visible lun. But the multipathing layer > misses to create entries for some of the scsi devices. The test tried to have > single path for each LUN, so we should be able to see the same number of > /dev/sd* entries and /dev/dm* entries. But, we see some /dev/dm* entries. > multipath layer misses to create devices for 3or 4 entries for a range of 120 to > 180 iscsi devices. This is the same behavior I observed with FC, however, I was unable to consistantly reproduce it. Nick, I think this might be your problem: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185569 Can you disable the network service (chkconfig network off) reboot your machine (do you have a serial console or physical console), and let me know the results? Ryan, I think your problem may be a different one, but identical to what NetApp was seeing. Do you have a /var/log/messages file? Also, are you using iSCSI or FC? I am hitting something similar to this problem now on one of my setups (rhel4 u4). At the moment I am running a test overnight but should be able to do the experiment in #8 tomorrow. If this is the problem, I'll be sure to update bz 185569. Ok, initially I thought I was seeing missing paths (subject of this bug), but apparently that's not the case. I'm just seeing the multipath device maps get created without all paths in them (the other problem). I rebooted my system with network disabled, and nothing changes (multipath device maps get created with only a single path in them - should all be 2 paths in my setup). My setup is an MSA1000 with A/P array with 14 LUNs direct connected to a QLA2342. This is still on my radar screen but unfortunately I have not had many cycles to investigate the original problem (/dev/sd*'s not appearing when there's a lot of disks in the system). I am not sure I ever investigated Ryan's comment #2 (patch for udev to increase hotplug event buffer) so maybe this is the next step. |