608723 – [EMC 6.1 bug] [multipath] Problem configuring iscsi.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 608723 - [EMC 6.1 bug] [multipath] Problem configuring iscsi.

Summary: [EMC 6.1 bug] [multipath] Problem configuring iscsi.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.0
Hardware:	All
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	6.1
Assignee:	Ben Marzinski
QA Contact:	Storage QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	580566 645454
TreeView+	depends on / blocked

Reported:	2010-06-28 14:43 UTC by Don
Modified:	2011-02-12 20:27 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-02-12 20:27:39 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
ifcfg and messages (335.79 KB, text/plain) 2010-06-29 12:15 UTC, Don	no flags	Details
crash (632 bytes, text/plain) 2010-07-14 16:06 UTC, Don	no flags	Details
messages file (407.11 KB, text/plain) 2010-07-14 16:09 UTC, Don	no flags	Details
messages1 (865.08 KB, text/plain) 2010-07-14 19:00 UTC, Don	no flags	Details
alua_multipath.conf (3.45 KB, application/octet-stream) 2010-08-11 18:41 UTC, Don	no flags	Details
alua_multipath_messages (5.14 KB, text/plain) 2010-08-11 18:42 UTC, Don	no flags	Details
pnr_multipath.conf (3.45 KB, application/octet-stream) 2010-08-11 18:43 UTC, Don	no flags	Details
pnr_multipath_messages (5.14 KB, text/plain) 2010-08-11 18:43 UTC, Don	no flags	Details
View All

Description Don 2010-06-28 14:43:11 UTC

Description of problem:Problem getting iscsi to work with regular network


Version-Release number of selected component (if applicable):
RHEL 6.0 SS6

How reproducible:
always

Steps to Reproduce:
1.connect iscsi to an intel dual port NIC and house network to an on board port of the dell 2970.
2.Use the drop down from system > preferences > network connections
3.Set ipadresses for ivp4 for the enet and iscsi.
  
Actual results:
When I set up my addresses I get unexpected results.
I have my enet eth0 set up with 10.14.15.87 255.255.255.0 10.14.15.1
I have my iscsi eth2 setup with 10.14.108.105 255.255.255.0
And the other one eth3 setup with 10.14.109.105 255.255.255.0
When I do this as I did with RHEL5, eth2 and eth3 take on 10.14.15.1 as a gateway.
At this point I cannot ping my enet or any enet out of my subnet because it is going out 10.14.108.105 which has no physical wat to get to my enet it is an independant iscsi network. I can ping my iscsi array.
If I make up a gateway for my iscsi like 10.14.108.1 and put that in the configuration of my iscsi ports I can ping my enet and my iscsi but when I try a network outside of my subnet which I always use it tries to go out the iscsi port and fails.


Expected results:
To be able to reach all my networks as I did with RHEL 5.*

Additional info:

Comment 2 RHEL Program Management 2010-06-28 15:03:28 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Dan Williams 2010-06-28 19:08:21 UTC

Did you lock the configuration for your iscsi interface to the MAC address of the specific device it should be used for?

Can you attach your /etc/sysconfig/network-scripts/ifcfg-* files as well as /var/log/messages? (feel free to 'X' out sensitive information, or to mark the attachments as "Private").

Comment 4 Don 2010-06-29 12:15:42 UTC

Created attachment 427660 [details]
ifcfg and messages

Comment 5 Don 2010-06-29 12:16:57 UTC

Did you lock the configuration for your iscsi interface to the MAC address of
the specific device it should be used for?
No

Comment 6 Dan Williams 2010-06-29 17:49:36 UTC

So the real issue here is that your default route is not the one you'd expect, correct?

IF that's the case, that's because the last device brought up wins, which is how RHEL5 worked too actually (with ifup/ifdown).  What you want to do is specify that the iscsi device (ifcfg-eth2) should never recieve the default route.  You can do that by adding:

DEFROUTE=no

to ifcfg-eth2.  Does that help your situation?

Comment 7 Don 2010-06-30 12:48:06 UTC

I addded that to both of my iscsi ifcfg-eth2 and ifcfg-eth3 and I can ping everything. I will add another comment once I go through the discovery process with iscsiadm.

Thanks Don

Comment 8 Don 2010-07-08 14:52:50 UTC

I was able to login to my array ports using iscsiadm however as soon as I attach a LUN my server and reboot the kernel panics during post. Are there any files I can send to look at?

Comment 9 Andrius Benokraitis 2010-07-09 16:08:45 UTC

Adding Mike and Tom.

Comment 10 Mike Christie 2010-07-09 18:38:39 UTC

Are you doing iscsi root?

And so the box only panics if you have logged into the array before you reboot? Can you get the panic? Is some driver or something about a disk not being found?

And it does not panic during the shutdown, right (something about a sync command not being sent or something)? It is during the next boot up?

Comment 11 Don 2010-07-12 12:34:01 UTC

Here is the senerio 
I login as root
Then I use iscsiadm discovery and login to connect to the array.
Once I am connected I put the host in a storage group on the array.
At this point everything works fine, I can reboot the server no problems.
Once I am at this point I add a lun to the storage group and reboot my server to discover the lun.
This is when it panics every time.
I remove the lun from the storage goup and reboot and it works fine again.

Where would I find the panic?

Comment 12 Mike Christie 2010-07-12 17:16:59 UTC

(In reply to comment #11)
> Here is the senerio 
> I login as root
> Then I use iscsiadm discovery and login to connect to the array.
> Once I am connected I put the host in a storage group on the array.
> At this point everything works fine, I can reboot the server no problems.
> Once I am at this point I add a lun to the storage group and reboot my server
> to discover the lun.
> This is when it panics every time.
> I remove the lun from the storage goup and reboot and it works fine again.
> 
> Where would I find the panic?    

/var/log/messages

or it might get spit out to the console

Comment 13 Dan Williams 2010-07-12 23:41:44 UTC


*** This bug has been marked as a duplicate of bug 607921 ***

Comment 14 Andrius Benokraitis 2010-07-13 00:19:20 UTC

EMC: Please add yourselves to bug 607921, which is the dupe.

Comment 15 Hans de Goede 2010-07-13 13:49:20 UTC

Don / EMC,

Can you please be a bit more specific with describing this problem please ?

For example:
1) What do you mean with:
   "put the host in a storage group on the array"
2) "add a lun to the storage group and reboot my server"
   a) So the storage group was empty before ?
   b) With the server you mean the RHEL-6 machine, IOW the host when we
      are talking in scsi terms ?
3) "This is when it panics every time"
   a) Where exactly in the boot sequence does it panic?
   b) What messages are shown?
   c) Can you attach a serial console and make a log file of the boot
      up until the panic and / or take some screenshots with a digital camera
      and attach those here?

Thanks & Regards,

Hans

Comment 16 Andrius Benokraitis 2010-07-13 15:42:46 UTC

Going to tentatively re-open until it can be shown it is a dupe (if that's OK). 

Waiting on Don for his comments.

Comment 17 Don 2010-07-13 16:28:27 UTC

I am haveing array issues could be a while.

Comment 18 Don 2010-07-14 16:06:42 UTC

Created attachment 431833 [details]
crash

Comment 19 Don 2010-07-14 16:09:01 UTC

1) What do you mean with:
   "put the host in a storage group on the array"
	>This is how the EMC Clariion ataches a     specific host to a specific LUN.
2) "add a lun to the storage group and reboot my server"
   a) So the storage group was empty before ?
	>That is correct and it works fine while it is empty.
   b) With the server you mean the RHEL-6 machine,
	>Yes
 IOW the host when we are talking in scsi terms ?
	>?????
3) "This is when it panics every time"
	   
a) Where exactly in the boot sequence does it panic?
	 >After the White and red line goes across the screen
   b) What messages are shown?
	>I have attached a file called crash and     >the messages file it happened around 11:35
   c) Can you attach a serial console and make a log file of the boot
      up until the panic and / or take some screenshots with a digital camera
      and attach those here?
	>The screen goes really dim would not be able    >to read them

Comment 20 Don 2010-07-14 16:09:51 UTC

Created attachment 431835 [details]
messages file

Comment 21 Mike Christie 2010-07-14 17:12:41 UTC

The warning in https://bugzilla.redhat.com/attachment.cgi?id=431833 does not seem to have anything to do with iscsi. The /var/log/messsages seem to show we get past the iscsi setup too.

Could you stop iscsi from starting at boot?

Do

chkconfig --del iscsi
chkconfig --del iscsid

Then reboot. If the box boots ok, then start iscsi by hand once the box is booted.

service iscsi start

Comment 22 Mike Christie 2010-07-14 17:27:18 UTC

(In reply to comment #21)
> Then reboot. If the box boots ok, then start iscsi by hand once the box is
> booted.
> 
> service iscsi start    

Oh yeah, if that works then turn iscsi and iscsid back on at boot, but turn off the loading of the iscsi offload modules.

chkconfig --add iscsi
chkconfig --add iscsid

Then could you edit /etc/init.d/iscsid so that these lines

    modprobe -q cxgb3i
    modprobe -q bnx2i
    modprobe -q be2iscsi

are commented out like this:

# modprobe -q cxgb3i
# modprobe -q bnx2i
# modprobe -q be2iscsi

Then reboot box again.

Comment 23 Don 2010-07-14 17:49:13 UTC

Could you stop iscsi from starting at boot?

Do

chkconfig --del iscsi
chkconfig --del iscsid

Then reboot. If the box boots ok, then start iscsi by hand once the box is
booted.

         >It booted

service iscsi start    

     >It said starting iscsi                                   [ok]

      >But then the server hung I had to do a hard reboot

Do you want me to continue with the next steps?

Comment 24 Mike Christie 2010-07-14 18:04:30 UTC

(In reply to comment #23)
>      >It said starting iscsi                                   [ok]
> 
>       >But then the server hung I had to do a hard reboot
> 
> Do you want me to continue with the next steps?    

No.

When it hung, could you even ping the server?

What got printed out to the console? What was in /var/log/messages?

Comment 25 Mike Christie 2010-07-14 18:08:51 UTC

(In reply to comment #11)
> Here is the senerio 
> I login as root
> Then I use iscsiadm discovery and login to connect to the array.
> Once I am connected I put the host in a storage group on the array.


Some other questions.

Do the devices uses scsi_dh_alua or scsi_dm_emc and are you using multipath?

If using multipath could you disable that, and could you rmmod the scsi_dh_* module you are using and make sure dm-multpiath is not used, then run service iscsi start?

Comment 26 Don 2010-07-14 19:00:03 UTC

When it hung, could you even ping the server?
     >no

What got printed out to the console?
     >nothing it just hung (no mouse no keyboard could not do anthing)

What was in /var/log/messages?
     >I attached it

Do the devices uses scsi_dh_alua Do the devices uses scsi_dh_alua
     >neither of these show up with an lsmod

are you using multipath?
     >Yes


If using multipath could you disable that, and could you rmmod the scsi_dh_*
module you are using and make sure dm-multpiath is not used, then run service
iscsi start?
      > I disabled multipath
      >dm-multipath was running so I removed it
       >I started iscsi and it seemed fine (no hang)
       >so I tried to start multipath and it hung.

Comment 27 Don 2010-07-14 19:00:45 UTC

Created attachment 431880 [details]
messages1

Comment 28 Mike Christie 2010-07-15 22:03:06 UTC

(In reply to comment #26)
> Do the devices uses scsi_dh_alua Do the devices uses scsi_dh_alua
>      >neither of these show up with an lsmod
> 
> are you using multipath?
>      >Yes
> 


>        >so I tried to start multipath and it hung.    


For your device did you want to be using scsi_dh_alua or scsi_dh_emc (just trying to figure out if maybe dm-multipth/scsi is trying to load them and dieing there)?

Comment 29 Dan Williams 2010-07-16 19:28:47 UTC

When the issue from comment #1 occurs, can you attach any 'ifcfg-*' files you have in /etc/sysconfig/network-scripts/ please?  Thanks!

Comment 30 Don 2010-07-19 17:48:45 UTC

For your device did you want to be using scsi_dh_alua or scsi_dh_emc (just
trying to figure out if maybe dm-multipth/scsi is trying to load them and
dieing there)?    

      >Niether one of these modules are loaded but it hangs pretty quick. What I did do was changed  my configuration from iscsi to fibre chanel and it paniced on a reboot.


When the issue from comment #1 occurs, can you attach any 'ifcfg-*' files you
have in /etc/sysconfig/network-scripts/ please?  Thanks!    

     >they are all ready attached just the line from comment 6 DEFROUTE=no
      was added.

Comment 31 Mike Christie 2010-07-20 16:49:23 UTC

(In reply to comment #30)
> For your device did you want to be using scsi_dh_alua or scsi_dh_emc (just
> trying to figure out if maybe dm-multipth/scsi is trying to load them and
> dieing there)?    
> 
>       >Niether one of these modules are loaded but it hangs pretty quick. What

I was not asking if they are loaded when you start your test. I was just asking if your target needs one (when multipath creates dm devices it can load them so I am just trying to rule out that IO caused from loading them is not causing the panic)? Is this a clarrion and is it in alua or tresspass mode?


> I did do was changed  my configuration from iscsi to fibre chanel and it
> paniced on a reboot.
> 

On boot did it panic with iscsi+multipath or just with iscsi? From comment #26 it sounded like when running manually it just paniced when multipath started. I am guessing for boot it would be the same, but was not sure if you are saying above if it is different or not.

Also when you manually started multipath and the box panicd was there anything on the console?

Or could you hook up a serial line, because the /var/log/messages is not getting the panic, and the other crash from comment #18 just showed a warning.

Also what iscsi module are you using? Is it iscsi_tcp or a hardware offload driver?

Comment 32 Don 2010-07-20 19:27:44 UTC

I was not asking if they are loaded when you start your test. I was just asking
if your target needs one (when multipath creates dm devices it can load them so
I am just trying to rule out that IO caused from loading them is not causing
the panic)? Is this a clarrion and is it in alua or tresspass mode?

     >This is a Clariion in Alua mode, not sure what device, I will assume 
       scsi_dh_alua.


On boot did it panic with iscsi+multipath or just with iscsi
     >When I have iscsi and multipath starting at boot it panics before        can login. When I have just iscsi I can login and the it hangs when I start multipath

Also when you manually started multipath and the box panicd was there anything
on the console?

   >I Never said it panics whan I manually start multipath it hangs see comment 26


Also what iscsi module are you using? Is it iscsi_tcp or a hardware offload
driver?    
       >iscsi_tcp

Comment 33 Dan Williams 2010-08-02 17:58:45 UTC

So this doesn't look like a NetworkManager bug anymore given comments 6 and 7.  It appears to be a multipath/iscsi and/or kernel problem now.  Changing the component to kernel based on comment 32.

Comment 35 Ben Marzinski 2010-08-06 19:57:31 UTC

Are you able to view the console output when you start multipathing?  If so, does it say anything? Can you capture your console output if it does, for instance by setting up your console to use the serial port?

Instead of simply starting the multipath service, can you instead run:

# modprobe dm-multipath
# multipath -v3 > multipath_messages 2>&1

and attach the file to this bugzilla.  Could you also attach a copy of
/etc/multipath.conf

Comment 36 Don 2010-08-11 18:41:42 UTC

Created attachment 438258 [details]
alua_multipath.conf

Comment 37 Don 2010-08-11 18:42:39 UTC

Created attachment 438259 [details]
alua_multipath_messages

Comment 38 Don 2010-08-11 18:43:06 UTC

Created attachment 438260 [details]
pnr_multipath.conf

Comment 39 Don 2010-08-11 18:43:34 UTC

Created attachment 438261 [details]
pnr_multipath_messages

Comment 40 Ben Marzinski 2010-08-11 22:13:42 UTC

There is a default configuration for this device, Have you tried using it?

If you haven't, can you simply comment out the devices section from /etc/multipath.conf, and try running with the default configuration.

It case you are wondering, here is the default configuration.  You don't need to enter this into /etc/multipath.conf.  If you don't include a devices section entry for this device, you will automatically get this one.

devices {
        device {
                vendor "DGC"
                product ".*"
                product_blacklist "LUNZ"
                path_grouping_policy group_by_prio
                getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
                path_selector "round-robin 0"
                path_checker emc_clariion
                features "1 queue_if_no_path"
                hardware_handler "1 emc"
                prio emc
                failback immediate
                rr_weight uniform
                no_path_retry 60
                rr_min_io 1000
        }
}

Comment 41 Ben Marzinski 2010-08-11 22:27:40 UTC

Of course, looking at your messages, multipath isn't seeing a Clariion at all. The only devices it is finding are sda and sdb, which are both SEAGATE devices. Are these messages from running multipath before iscsi has been started?

Is the machine locking up after you run this command?  Looking at the output, multipath isn't even trying to run on top of sdb, which is strange.  If the machine is locking up, that would explain it.

multipath fails to create a multipath device on top of sda. I assume that this device is already in use.

If this output is from running the command before iscsi was started, I need to see what happens when multipath is run after the iscsi devices have been set up.

Comment 42 Don 2010-08-12 13:45:21 UTC

I reggards to comment 40 
    Should this work with PNR as well as Alua?
I had I hardwareissue that I am trying to get stable to really follow up on both of these.

Comment 43 Don 2010-08-13 12:01:46 UTC

When I comment out my device section in /etc/multipath.conf, multipath -ll still shows my sda (seagate)and it does not show my 2 luns as being alua even though they are. Also the display is totaly different than RHEL 5*


[root@fry Desktop]# multipath -ll

mpathd (3600601600530240002bbd89d3fa6df11) dm-9 DGC,VRAID
size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 11:0:0:0 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 12:0:0:0 sde 8:64 active ready running
mpathc (36006016005302400bcb9a0a93fa6df11) dm-8 DGC,VRAID
size=2.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| `- 12:0:0:1 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=0 status=enabled
  `- 11:0:0:1 sdd 8:48 active ready running
mpathb (35000c50006f33323) dm-2 SEAGATE,ST973402SS
size=68G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
  `- 8:0:1:0  sdb 8:16 active ready running

Comment 44 Ben Marzinski 2010-08-13 19:55:41 UTC

Commenting out the devices section of multipath.conf just makes it use the default configurations.  It won't change which devices get multipathed. If you want it to ignore your segate device, you can add

wwid "35000c50006f33323"

To the blacklist section of /etc/multipath.conf.

Also, I'm confused about how the display is different than in RHEL 5.  Do you simply mean the formatting, with the lack of brackets around things and
the "|-+-" instead of "\_".  Yes that has changed.  But the default configuration for this device is exactly the same, so aside from formatting, it should say exactly the same thing.

Also, what about the hang when you start multipathing? Does that still happen?

Comment 45 Don 2010-08-17 12:11:41 UTC

Thanks for your help things are looking good except there is no indication that this is ALUA with a multipath -ll

Comment 47 Ben Marzinski 2010-09-15 18:22:24 UTC

Unfortunately, multipath isn't able to tell what mode a device is in and assign ita different configuration.  It simply looks at the vendor, product, and revision information and sees if it matches a device configuration.

The default configuration for these devices uses the EMC hardware handler instead of the ALUA one.  If you need it to use the ALUA hardware handler, you need to create your own devices section. It should look something like

devices {
        device {
                vendor "DGC"
                product ".*"
                product_blacklist "LUNZ"
                path_grouping_policy group_by_prio
                getuid_callout "/lib/udev/scsi_id --whitelisted
--device=/dev/%n"
                path_selector "round-robin 0"
                path_checker emc_clariion
#               path_checker tur
                features "1 queue_if_no_path"
                hardware_handler "1 alua"
                prio alua
                failback immediate
                rr_weight uniform
                no_path_retry 60
                rr_min_io 1000
        }
}

I'm not sure if you should change the path_checker as well.  However changing the config to ALUA shouldn't be necessary. The Clariion should still respond to the SCSI commands that you send for PNR mode when it is set to ALUA mode.

I'm not sure if the configuration above will work, but it couldn't hurt to try it. You may want to try switching the path_checker from emc_clariion to tur.

Comment 49 Tom Coughlan 2010-11-24 23:35:04 UTC

Is this resolved by comment 47? 

(Wayne, I do not see this BZ on our bi-weekly agenda. ? )

Comment 50 Ronald Pacheco 2011-01-13 18:04:36 UTC

Hi Don,

Any updates on this?

Thanks and Regards,

Ron

Comment 51 Ronald Pacheco 2011-02-12 20:27:39 UTC

Don,

I am going to work on the presumption that this is resolved as "notabug".  If your testing proves otherwise, then re-open the bug with the test results.

Note You need to log in before you can comment on or make changes to this bug.