Bug 608723
Summary: | [EMC 6.1 bug] [multipath] Problem configuring iscsi. | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Don <blood_donald> | ||||||||||||||||||
Component: | kernel | Assignee: | Ben Marzinski <bmarzins> | ||||||||||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Storage QE <storage-qe> | ||||||||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||||||||
Priority: | high | ||||||||||||||||||||
Version: | 6.0 | CC: | andriusb, coughlan, dcbw, hdegoede, mchristi, rpacheco | ||||||||||||||||||
Target Milestone: | rc | Keywords: | OtherQA, Reopened | ||||||||||||||||||
Target Release: | 6.1 | ||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||
Last Closed: | 2011-02-12 20:27:39 UTC | Type: | --- | ||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||
Embargoed: | |||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||
Bug Blocks: | 580566, 645454 | ||||||||||||||||||||
Attachments: |
|
Description
Don
2010-06-28 14:43:11 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Did you lock the configuration for your iscsi interface to the MAC address of the specific device it should be used for? Can you attach your /etc/sysconfig/network-scripts/ifcfg-* files as well as /var/log/messages? (feel free to 'X' out sensitive information, or to mark the attachments as "Private"). Created attachment 427660 [details]
ifcfg and messages
Did you lock the configuration for your iscsi interface to the MAC address of the specific device it should be used for? No So the real issue here is that your default route is not the one you'd expect, correct? IF that's the case, that's because the last device brought up wins, which is how RHEL5 worked too actually (with ifup/ifdown). What you want to do is specify that the iscsi device (ifcfg-eth2) should never recieve the default route. You can do that by adding: DEFROUTE=no to ifcfg-eth2. Does that help your situation? I addded that to both of my iscsi ifcfg-eth2 and ifcfg-eth3 and I can ping everything. I will add another comment once I go through the discovery process with iscsiadm. Thanks Don I was able to login to my array ports using iscsiadm however as soon as I attach a LUN my server and reboot the kernel panics during post. Are there any files I can send to look at? Adding Mike and Tom. Are you doing iscsi root? And so the box only panics if you have logged into the array before you reboot? Can you get the panic? Is some driver or something about a disk not being found? And it does not panic during the shutdown, right (something about a sync command not being sent or something)? It is during the next boot up? Here is the senerio I login as root Then I use iscsiadm discovery and login to connect to the array. Once I am connected I put the host in a storage group on the array. At this point everything works fine, I can reboot the server no problems. Once I am at this point I add a lun to the storage group and reboot my server to discover the lun. This is when it panics every time. I remove the lun from the storage goup and reboot and it works fine again. Where would I find the panic? (In reply to comment #11) > Here is the senerio > I login as root > Then I use iscsiadm discovery and login to connect to the array. > Once I am connected I put the host in a storage group on the array. > At this point everything works fine, I can reboot the server no problems. > Once I am at this point I add a lun to the storage group and reboot my server > to discover the lun. > This is when it panics every time. > I remove the lun from the storage goup and reboot and it works fine again. > > Where would I find the panic? /var/log/messages or it might get spit out to the console *** This bug has been marked as a duplicate of bug 607921 *** EMC: Please add yourselves to bug 607921, which is the dupe. Don / EMC, Can you please be a bit more specific with describing this problem please ? For example: 1) What do you mean with: "put the host in a storage group on the array" 2) "add a lun to the storage group and reboot my server" a) So the storage group was empty before ? b) With the server you mean the RHEL-6 machine, IOW the host when we are talking in scsi terms ? 3) "This is when it panics every time" a) Where exactly in the boot sequence does it panic? b) What messages are shown? c) Can you attach a serial console and make a log file of the boot up until the panic and / or take some screenshots with a digital camera and attach those here? Thanks & Regards, Hans Going to tentatively re-open until it can be shown it is a dupe (if that's OK). Waiting on Don for his comments. I am haveing array issues could be a while. Created attachment 431833 [details]
crash
1) What do you mean with: "put the host in a storage group on the array" >This is how the EMC Clariion ataches a specific host to a specific LUN. 2) "add a lun to the storage group and reboot my server" a) So the storage group was empty before ? >That is correct and it works fine while it is empty. b) With the server you mean the RHEL-6 machine, >Yes IOW the host when we are talking in scsi terms ? >????? 3) "This is when it panics every time" a) Where exactly in the boot sequence does it panic? >After the White and red line goes across the screen b) What messages are shown? >I have attached a file called crash and >the messages file it happened around 11:35 c) Can you attach a serial console and make a log file of the boot up until the panic and / or take some screenshots with a digital camera and attach those here? >The screen goes really dim would not be able >to read them Created attachment 431835 [details]
messages file
The warning in https://bugzilla.redhat.com/attachment.cgi?id=431833 does not seem to have anything to do with iscsi. The /var/log/messsages seem to show we get past the iscsi setup too. Could you stop iscsi from starting at boot? Do chkconfig --del iscsi chkconfig --del iscsid Then reboot. If the box boots ok, then start iscsi by hand once the box is booted. service iscsi start (In reply to comment #21) > Then reboot. If the box boots ok, then start iscsi by hand once the box is > booted. > > service iscsi start Oh yeah, if that works then turn iscsi and iscsid back on at boot, but turn off the loading of the iscsi offload modules. chkconfig --add iscsi chkconfig --add iscsid Then could you edit /etc/init.d/iscsid so that these lines modprobe -q cxgb3i modprobe -q bnx2i modprobe -q be2iscsi are commented out like this: # modprobe -q cxgb3i # modprobe -q bnx2i # modprobe -q be2iscsi Then reboot box again. Could you stop iscsi from starting at boot? Do chkconfig --del iscsi chkconfig --del iscsid Then reboot. If the box boots ok, then start iscsi by hand once the box is booted. >It booted service iscsi start >It said starting iscsi [ok] >But then the server hung I had to do a hard reboot Do you want me to continue with the next steps? (In reply to comment #23) > >It said starting iscsi [ok] > > >But then the server hung I had to do a hard reboot > > Do you want me to continue with the next steps? No. When it hung, could you even ping the server? What got printed out to the console? What was in /var/log/messages? (In reply to comment #11) > Here is the senerio > I login as root > Then I use iscsiadm discovery and login to connect to the array. > Once I am connected I put the host in a storage group on the array. Some other questions. Do the devices uses scsi_dh_alua or scsi_dm_emc and are you using multipath? If using multipath could you disable that, and could you rmmod the scsi_dh_* module you are using and make sure dm-multpiath is not used, then run service iscsi start? When it hung, could you even ping the server? >no What got printed out to the console? >nothing it just hung (no mouse no keyboard could not do anthing) What was in /var/log/messages? >I attached it Do the devices uses scsi_dh_alua Do the devices uses scsi_dh_alua >neither of these show up with an lsmod are you using multipath? >Yes If using multipath could you disable that, and could you rmmod the scsi_dh_* module you are using and make sure dm-multpiath is not used, then run service iscsi start? > I disabled multipath >dm-multipath was running so I removed it >I started iscsi and it seemed fine (no hang) >so I tried to start multipath and it hung. Created attachment 431880 [details]
messages1
(In reply to comment #26) > Do the devices uses scsi_dh_alua Do the devices uses scsi_dh_alua > >neither of these show up with an lsmod > > are you using multipath? > >Yes > > >so I tried to start multipath and it hung. For your device did you want to be using scsi_dh_alua or scsi_dh_emc (just trying to figure out if maybe dm-multipth/scsi is trying to load them and dieing there)? When the issue from comment #1 occurs, can you attach any 'ifcfg-*' files you have in /etc/sysconfig/network-scripts/ please? Thanks! For your device did you want to be using scsi_dh_alua or scsi_dh_emc (just trying to figure out if maybe dm-multipth/scsi is trying to load them and dieing there)? >Niether one of these modules are loaded but it hangs pretty quick. What I did do was changed my configuration from iscsi to fibre chanel and it paniced on a reboot. When the issue from comment #1 occurs, can you attach any 'ifcfg-*' files you have in /etc/sysconfig/network-scripts/ please? Thanks! >they are all ready attached just the line from comment 6 DEFROUTE=no was added. (In reply to comment #30) > For your device did you want to be using scsi_dh_alua or scsi_dh_emc (just > trying to figure out if maybe dm-multipth/scsi is trying to load them and > dieing there)? > > >Niether one of these modules are loaded but it hangs pretty quick. What I was not asking if they are loaded when you start your test. I was just asking if your target needs one (when multipath creates dm devices it can load them so I am just trying to rule out that IO caused from loading them is not causing the panic)? Is this a clarrion and is it in alua or tresspass mode? > I did do was changed my configuration from iscsi to fibre chanel and it > paniced on a reboot. > On boot did it panic with iscsi+multipath or just with iscsi? From comment #26 it sounded like when running manually it just paniced when multipath started. I am guessing for boot it would be the same, but was not sure if you are saying above if it is different or not. Also when you manually started multipath and the box panicd was there anything on the console? Or could you hook up a serial line, because the /var/log/messages is not getting the panic, and the other crash from comment #18 just showed a warning. Also what iscsi module are you using? Is it iscsi_tcp or a hardware offload driver? I was not asking if they are loaded when you start your test. I was just asking if your target needs one (when multipath creates dm devices it can load them so I am just trying to rule out that IO caused from loading them is not causing the panic)? Is this a clarrion and is it in alua or tresspass mode? >This is a Clariion in Alua mode, not sure what device, I will assume scsi_dh_alua. On boot did it panic with iscsi+multipath or just with iscsi >When I have iscsi and multipath starting at boot it panics before can login. When I have just iscsi I can login and the it hangs when I start multipath Also when you manually started multipath and the box panicd was there anything on the console? >I Never said it panics whan I manually start multipath it hangs see comment 26 Also what iscsi module are you using? Is it iscsi_tcp or a hardware offload driver? >iscsi_tcp So this doesn't look like a NetworkManager bug anymore given comments 6 and 7. It appears to be a multipath/iscsi and/or kernel problem now. Changing the component to kernel based on comment 32. Are you able to view the console output when you start multipathing? If so, does it say anything? Can you capture your console output if it does, for instance by setting up your console to use the serial port? Instead of simply starting the multipath service, can you instead run: # modprobe dm-multipath # multipath -v3 > multipath_messages 2>&1 and attach the file to this bugzilla. Could you also attach a copy of /etc/multipath.conf Created attachment 438258 [details]
alua_multipath.conf
Created attachment 438259 [details]
alua_multipath_messages
Created attachment 438260 [details]
pnr_multipath.conf
Created attachment 438261 [details]
pnr_multipath_messages
There is a default configuration for this device, Have you tried using it? If you haven't, can you simply comment out the devices section from /etc/multipath.conf, and try running with the default configuration. It case you are wondering, here is the default configuration. You don't need to enter this into /etc/multipath.conf. If you don't include a devices section entry for this device, you will automatically get this one. devices { device { vendor "DGC" product ".*" product_blacklist "LUNZ" path_grouping_policy group_by_prio getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n" path_selector "round-robin 0" path_checker emc_clariion features "1 queue_if_no_path" hardware_handler "1 emc" prio emc failback immediate rr_weight uniform no_path_retry 60 rr_min_io 1000 } } Of course, looking at your messages, multipath isn't seeing a Clariion at all. The only devices it is finding are sda and sdb, which are both SEAGATE devices. Are these messages from running multipath before iscsi has been started? Is the machine locking up after you run this command? Looking at the output, multipath isn't even trying to run on top of sdb, which is strange. If the machine is locking up, that would explain it. multipath fails to create a multipath device on top of sda. I assume that this device is already in use. If this output is from running the command before iscsi was started, I need to see what happens when multipath is run after the iscsi devices have been set up. I reggards to comment 40 Should this work with PNR as well as Alua? I had I hardwareissue that I am trying to get stable to really follow up on both of these. When I comment out my device section in /etc/multipath.conf, multipath -ll still shows my sda (seagate)and it does not show my 2 luns as being alua even though they are. Also the display is totaly different than RHEL 5* [root@fry Desktop]# multipath -ll mpathd (3600601600530240002bbd89d3fa6df11) dm-9 DGC,VRAID size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 11:0:0:0 sdc 8:32 active ready running `-+- policy='round-robin 0' prio=0 status=enabled `- 12:0:0:0 sde 8:64 active ready running mpathc (36006016005302400bcb9a0a93fa6df11) dm-8 DGC,VRAID size=2.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 12:0:0:1 sdf 8:80 active ready running `-+- policy='round-robin 0' prio=0 status=enabled `- 11:0:0:1 sdd 8:48 active ready running mpathb (35000c50006f33323) dm-2 SEAGATE,ST973402SS size=68G features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=1 status=active `- 8:0:1:0 sdb 8:16 active ready running Commenting out the devices section of multipath.conf just makes it use the default configurations. It won't change which devices get multipathed. If you want it to ignore your segate device, you can add wwid "35000c50006f33323" To the blacklist section of /etc/multipath.conf. Also, I'm confused about how the display is different than in RHEL 5. Do you simply mean the formatting, with the lack of brackets around things and the "|-+-" instead of "\_". Yes that has changed. But the default configuration for this device is exactly the same, so aside from formatting, it should say exactly the same thing. Also, what about the hang when you start multipathing? Does that still happen? Thanks for your help things are looking good except there is no indication that this is ALUA with a multipath -ll Unfortunately, multipath isn't able to tell what mode a device is in and assign ita different configuration. It simply looks at the vendor, product, and revision information and sees if it matches a device configuration. The default configuration for these devices uses the EMC hardware handler instead of the ALUA one. If you need it to use the ALUA hardware handler, you need to create your own devices section. It should look something like devices { device { vendor "DGC" product ".*" product_blacklist "LUNZ" path_grouping_policy group_by_prio getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n" path_selector "round-robin 0" path_checker emc_clariion # path_checker tur features "1 queue_if_no_path" hardware_handler "1 alua" prio alua failback immediate rr_weight uniform no_path_retry 60 rr_min_io 1000 } } I'm not sure if you should change the path_checker as well. However changing the config to ALUA shouldn't be necessary. The Clariion should still respond to the SCSI commands that you send for PNR mode when it is set to ALUA mode. I'm not sure if the configuration above will work, but it couldn't hurt to try it. You may want to try switching the path_checker from emc_clariion to tur. Is this resolved by comment 47? (Wayne, I do not see this BZ on our bi-weekly agenda. ? ) Hi Don, Any updates on this? Thanks and Regards, Ron Don, I am going to work on the presumption that this is resolved as "notabug". If your testing proves otherwise, then re-open the bug with the test results. |