Bug 182684 - [EMC/Oracle RHEL 4.4] ISCSI MODULE SHOWS MULTIPLE DEVICES FOR A SINGLE LUN IN RHEL 4.0 U2
Summary: [EMC/Oracle RHEL 4.4] ISCSI MODULE SHOWS MULTIPLE DEVICES FOR A SINGLE LUN IN...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Mike Christie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 181409 184382
TreeView+ depends on / blocked
 
Reported: 2006-02-24 00:38 UTC by Greg Marsden
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version: RHSA-2006-0575
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-08-10 22:26:31 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to limit login and connect retries for initialt login (3.12 KB, patch)
2006-04-06 23:03 UTC, Mike Christie
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0575 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 4 2006-08-10 04:00:00 UTC

Description Greg Marsden 2006-02-24 00:38:31 UTC
We have x86 machine with RHEL 4.0 u2 install on it. We have 2 Lun's published
from Netapps iscsi target to this host. When we start the iscsi driver we see
two disks mapped to one Lun.

The iSCSI server is dual home and has two IP's. The iscsi driver detects
LUN's from both the IP's and duplicates the disks. This is the case even if
the specify one IP in the /etc/iscsi.conf

From Wysochanski, David @ NetApp:
This is a known issue with the iSCSI initiator in RHEL4 U2, it's not
our issue at all.  I had a fairly long discussion with the developers
at the time this was being developed and voiced my concerns, but as I
recall I was a little late and this decision had already been made.

For RHEL4, the behavior was changed from earlier versions of the
initiator.  Earlier versions of the initiator had a parameter,
"Multipath", that you could configure.  By default, it was set to
"off".  But in RHEL4, the initiator changed the default value to "on",


System Info:
===========
[root@stair01 ~]# uname -a
Linux stair01 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686
i386 GNU/Linux


Continuous=no
HeaderDigest=never
DataDigest=never
ImmediateData=yes
DiscoveryAddress=140.87.131.237

@ Here is the output of the /sbin/iscsi-ls

@ [root@stair01 ~]# /sbin/iscsi-ls
************************************************************************* *****
*
@ SFNet iSCSI Driver Version ...4:0.1.11(12-Jan-2005)
************************************************************************* *****
*
@ TARGET NAME             : iqn.1992-08.com.netapp:sn.101165788
TARGET ALIAS            :
HOST ID                 : 1
BUS ID                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 140.87.131.237:3260,6
SESSION STATUS          : ESTABLISHED AT Wed Oct 26 21:51:44 PDT 2005
SESSION ID              : ISID 00023d000001 TSIH 1687
************************************************************************* *****
*
@ TARGET NAME             : iqn.1992-08.com.netapp:sn.101165788
TARGET ALIAS            :
HOST ID                 : 2
BUS ID                  : 0
TARGET ID               : 0
TARGET ADDRESS          : 140.87.142.7:3260,2
SESSION STATUS          : ESTABLISHED AT Wed Oct 26 21:51:44 PDT 2005
SESSION ID              : ISID 00023d000002 TSIH 547
************************************************************************* *****

Comment 1 Mike Christie 2006-02-24 20:00:37 UTC
The Fibre Channel and iSCSI drivers in RHEL4 do not perform failover or
multipath in the driver. They only discovery all the disks and paths to disks.
Multipath software like dm should be used on top of FC or iSCSI devices.

Are you wanting the failover in the driver back, or do you want the driver to
only expose one path, or something else?

Comment 2 Greg Marsden 2006-02-24 20:02:39 UTC
The problem is that even if only one discovery address is specified, the iscsi
client finds the other ip for the filer and rediscovers the disk. This results
in confusion for the user when the same disk appears multiple times as multiple
devices.

This request is not to get failover working, it's to make single devices appear
only once.

Comment 3 Mike Christie 2006-02-24 20:24:45 UTC
When we perform discovery to the one discovery address the target tells us about
all available portals and we log into them all. If you do not want to use one of
the portals why not just turn one off? What do you do for FC when their are
multiple paths? Just unplug a cable?

We could probably add something in sfnet to not log into one of the paths when
asked but that is not really benefiting the customer. Yes it makes it easier to
admin if you need to work on the scsi disk, but there is no way they want to use
a single path iscsi setup. Well, dm-multipath actually assembles the dm devices
for the user so it may not be that much easier to admin - maybe if you need to
something like SG_IO I guess.

Comment 4 Marizol Martinez 2006-03-10 18:13:43 UTC
Greg -- Discussed this with Mike Christi. Update below:

Network bonding can be used to prevent seeing multiple iscsi 
sessions and hence multiple paths (/dev/sd) entries. This needs
to be setup on the target.

You can either:

1. The target has multiple portals the initiator can use. We can use them 
at the iscsi level or the network level. The way you have the target 
setup right now is to take advatage of the multiple iscsi portals by 
establishing a session to each portal. The initiator then scans all the 
sessions and this is why we see the same LU through each session and 
/dev/sda is the same LU as /dev/sdb.

2. If you want to take advatage of all the ports on the target but not 
do dm-multipath you can do network bonding. This is basically multipath 
at the network level. With this setup the iscsi initiator thinks the 
target only has one portal so we establish one session and you only get 
one /dev/sd entry. The fact that the target has multiple portals is 
hidden from the initiator.

The iscsi driver and tools only contact with the NICs on the initiator 
is through the network layer and we interact with the network layer 
through the socket abstraction. This mean the driver has no idea how many 
NICs or where IO goes through as far as the network layer on the 
initiator is concerned.

Why do we and FC drivers like qla2xx and lpfc do this now since it 
creates such complications? The Red Hat answer is becuase the LLD only 
exposes the paths. Multipath software like dm-multipath or md-multipath 
will magically assemble them into dm or md devices for the user.

As it is now, your target is returning that it has multiple portals that the 
initiator can connect to so the initiator wants to establish a session 
to each one. With the current setup, this is a iscsi issue. You need 
to setup the target to not return multiple portals so they can do 
bonding on the target.

This is a multipath problem in the sense that the target is exposing a
configuration to be used with dm-multipath level multipathing. With the
current setup you can just use dm-multipath and not worry which sd is 
a path to what.

If you expose multiple portals on the target then you should run 
dm-mutlipath. If you are going to hide all those details and do 
multipath at the network layer through bonding than that is fine too, 
but in that case they must reconfigure the target. It has nothing to do 
with the initiator.

Comment 5 Greg Marsden 2006-03-15 19:04:30 UTC
The problem description here is a regression based on the behavior on RHEL3. The
device only shows up multiple times in RHEL4.

We need to have two interfaces to the filer, because they are dedicated to
different behavior (local traffic and outside of subnet), so shutting down or
bonding one of the interfaces is not a viable solution.

Comment 6 Mike Christie 2006-03-15 19:19:27 UTC
What are you doing for the FC interface on that box though? You have the same
problem so you need to deal with this for both transports.

And only using a single session/connection to one portal is not a good option
for your customers. And as you said in connent #5 shutting down a portal is not
a viable solution. And as you said in comment #2 you do not want failover back
in the driver. That leaves one thing.... use the dm device instead of trying to
work with the scsi devices.

Could you just describe why you want to access the scsi devices instead of the
dm ones? Does oracle not support some operation through dm devices?

Comment 7 Greg Marsden 2006-03-15 21:16:42 UTC
I don't know where the focus on multipath and dm is coming from. As I mentioned,
this works with RHEL3, it's only with RHEL4 that the devices show up multiple
times. The problem seems to be that the discovery for iscsi on RHEL4 is too
aggressive

Comment 8 Mike Christie 2006-03-15 21:36:34 UTC
multipath comes in becuase as I said all our storage dirvers now discovery all
the paths and export them. So you should be using dm-multpath over transports,
like FC and iSCSI, that are multipath capable. For most cases dm-multpiath will
setup everything up for the user and they never have to know what sd is what
path. Do you have a case where they need to access the sd directly and why would
you want to run single path setup?

In RHEL3 it works and is safe becuase we did multipath in the driver so we only
export one scsi device.

Again the reason we disocvery all paths is becuase you normally do not want to
use iscsi with a single path. Why do you want to do this? To make it easier on
the user? Again, as I said, you would make it easier for the user because they
have what they did in RHEL3, but when that single path fails you are screwing
them. Could you please just answer why you cannot use the dm device? Is there
some special oracle setup that cannot use dm devices?

Comment 9 Mike Christie 2006-03-15 21:40:20 UTC
(In reply to comment #8)
> the user? Again, as I said, you would make it easier for the user because they
> have what they did in RHEL3,

Actually, most people will think they have what they did in RHEL3 because they
only see on scsi device, but as a I said many times in this bz, they really
would have a single path setup and will not have the protection they did in
RHEL3 becuase RHEL3's drivers was doing multipath for them and hiding all the paths.

Comment 12 Wayne Berthiaume 2006-03-29 16:51:38 UTC
     We see a similar behavior with EMC CLARiiON arrays. If you have an array 
with multiple ports on it and only some of those ports are connected to VLAN, 
login attempts will fail on the disconnected ports. This behavior eats up 
valuable CPU cycles and fills the syslog with error messages. 
     The problem configuration consist of a target that has ports on seperate 
subnets that you don't have access to so you don't have a failover capability - 
NIC1 is on subnet A to SPA0 and SPB0, and NIC2 is on subnet B to SPA1 and SPB1. 
Unfortunately, there is SPA2 and 3, and SPB2 and 3 that are not connected to 
anything and the array responds with those ports as well in the SEND_TARGETS. 
Therefore, iscsi_sfnet continues to attempt to log into SPA2 an 3 and SPB2 and 
3 and will never succeed. The better behavior would be to not log into these 
disconnected ports via an exclusion entry in the /etc/iscsi.conf file. Any 
other method would cause a target to be dropped if this was a path failure.

Comment 14 Andrius Benokraitis 2006-03-29 19:27:30 UTC
Due to capacity and cutoffs for proposed bugs/features for RHEL 4.4, adding this
to the RHEL 4.5 proposed list.

Comment 15 Greg Marsden 2006-03-31 00:46:29 UTC
I agree with comment #12, a blacklist of hosts for the iscsi discovery would be
the most workable solution.

Comment 16 Mike Christie 2006-03-31 04:33:56 UTC
(In reply to comment #15)
> I agree with comment #12, a blacklist of hosts for the iscsi discovery would be
> the most workable solution.

For EMC's situation yes it would. For the orignal request it would not.

Comment 17 Mike Christie 2006-03-31 04:46:32 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > I agree with comment #12, a blacklist of hosts for the iscsi discovery would be
> > the most workable solution.
> 
> For EMC's situation yes it would. For the orignal request it would not.

Or could you please just reply to one of the questions asking why you want to do
this for your specific setup? For the Netapp target you are using the filer will
not return ports that are disabled which is not the case for EMC. What are you
going to tell users that ask why if the network gets bogged down or just goes
caput they are losing data in RHEL4 but in RHEL3 it just worked?

Comment 18 Greg Marsden 2006-04-03 19:34:46 UTC
Hi Mike, 
I guess I wasn't clear in my original request.

> For the Netapp target you are using the filer will
> not return ports that are disabled which is not the case for EMC.

We have a single netapp host with two interfaces with different ip addresses (e0
and e1). We want half the hosts to use e0 and half to use e1. Today, if we
specify just the ip of e0 to a given host, it will go and discover that there is
another interface e1 and discover the volumes exported over the other interface.
If we could blacklist the ip of e1 for the e0-served-hosts, that would prevent
the confusion of having multiple devices created. This is a simplified example,
so please don't reply that we should change the architecture listed here.
Obviously we cannot disable the ports on the netapp as they are actively in use,
and would like to disable them on the client side. Hence the blacklist.

> What are you
> going to tell users that ask why if the network gets bogged down or just goes
> caput they are losing data in RHEL4 but in RHEL3 it just worked?

This is not relevant. 

Comment 19 Mike Christie 2006-04-03 20:17:29 UTC
(In reply to comment #18)
> Hi Mike, 
> I guess I wasn't clear in my original request.
> 
> > For the Netapp target you are using the filer will
> > not return ports that are disabled which is not the case for EMC.
> 
> We have a single netapp host with two interfaces with different ip addresses (e0
> and e1). We want half the hosts to use e0 and half to use e1. Today, if we
> specify just the ip of e0 to a given host, it will go and discover that there is
> another interface e1 and discover the volumes exported over the other interface.
> If we could blacklist the ip of e1 for the e0-served-hosts, that would prevent
> the confusion of having multiple devices created. 


But you are asking for the black list becuase you want only a single sd to show
up not becuase of a portal configuration problem like EMC, right? Or is it now both?



> 
> > What are you
> > going to tell users that ask why if the network gets bogged down or just goes
> > caput they are losing data in RHEL4 but in RHEL3 it just worked?
> 
> This is not relevant. 

It is relevant if you are trying to use the blacklist to replciate what it
looked like people had in RHEL3. This is becuase it only looks like what they
had in that they get a single sd. Under the hood there was a lot more going on.

I am sure you will use the black list correctly in the future. However, for the
people that complain about multiple sd's showing up in RHEL4 but not in RHEL3
what is your solution? The black list? That is not the correct way to use it and
my comment will be relevant when they discovery you reccomended to remove their
data loss protection just to make it easer to configure. Please answer the
question so we can come to the best solution for the users.

Comment 20 Mike Christie 2006-04-03 20:28:40 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > Hi Mike, 
> > I guess I wasn't clear in my original request.
> > 
> > > For the Netapp target you are using the filer will
> > > not return ports that are disabled which is not the case for EMC.
> > 
> > We have a single netapp host with two interfaces with different ip addresses (e0
> > and e1). We want half the hosts to use e0 and half to use e1. Today, if we
> > specify just the ip of e0 to a given host, it will go and discover that there is
> > another interface e1 and discover the volumes exported over the other interface.
> > If we could blacklist the ip of e1 for the e0-served-hosts, that would prevent
> > the confusion of having multiple devices created. 
> 
> 

This type of comment is where the confusion may be. It is on my part. What EMC
is asking for was never possible in RHEL3 and so is not a regression. It is a
completely new way to configure the driver, but it can also be used to make it
look like what we did in RHEL3. If you use the blacklist to make it look what we
did in RHEL3 then it will be a nastly regression for the user and will lead to
all sorts of problems.

If both portals work like with Netapp or for the simple setup you describe, then
you could discover all the paths and have dm-mutlipath/multipath sort out what
paths to use just by using the info in sysfs (portal group, ip, and port
number). And ths is something that people want to add to the multipath tools.


Comment 21 Greg Marsden 2006-04-03 20:29:44 UTC
Yes, I now realize that a lot of work was going on under the hood with RHEL3. If
I'd known at the time, we would probably have opened a bug to ensure that did
not happen. 

For people who want the same behavior as rhel3, md is a good option. It's not a
good option for the problem I'm presenting, though.

For us, we want to isolate the two discovery addresses on the filer, so a
blacklist is the correct solution. Using md to mask the unused devices would be 
more complicated than necessary (for example, if the disk configuration on the
filer were to change)

Comment 22 Mike Christie 2006-04-03 20:50:45 UTC
(In reply to comment #21)
> For us, we want to isolate the two discovery addresses on the filer, so a
> blacklist is the correct solution. Using md to mask the unused devices would be 
> more complicated than necessary (for example, if the disk configuration on the
> filer were to change)

For _DM_ we do not setup the device based on the device node. We can mask based
on any value like discovery address, portal properties, etc. The nice thing is
that a lot of the infrastructure is there already for just this reason :) so
except for this EMC setup we would not have had to duplicate it in iscsi. Also 
I do not think you can isolate based on discovery address. You need to do it
based on the portal we use for normal session due to what we get back from
sendtargets.

But I guess both will be implemented one day, so you can take your pick :)

Comment 23 Mike Christie 2006-04-03 20:54:15 UTC
(In reply to comment #22)
> But I guess both will be implemented one day, so you can take your pick :)

Both as in DM and iscsi not discovery address and normal session connection's
portal address.

Comment 24 Mike Christie 2006-04-03 21:34:17 UTC
Actually EMC's problem can be solved by just backporting the upstream code or
RHEL3 code that will fail the initial session creation attempt after we see what
type of error it is and after retyring a couple of times. 

And the setup in comment #18, is a good example of why we want to set this up in
dm. I cannot see most people doing the single point of failure setup described
there intentionally. And in most cases, we will want different devices to load
balcance across targets portals like described in #18, but we will also want
them to be able to failover if necessary and we will want that setup to be
created by magic by the user.

So unless I have tons of time during U4 and U5, I am going to initially just add
the code to fail the initial session creation like we do upstream and we did in
RHEL3. And for the more complex setups, I will do the masking in DM multipath
initially, since we have more people requesting that and it will work for this
BZ and if time do it in iscsi.

Comment 25 Mike Christie 2006-04-06 23:03:34 UTC
Created attachment 127434 [details]
patch to limit login and connect retries for initialt login

This patch should fix the initial problem where we cannot log into a port
becuase it is unreachable due to maybe using subnets for zoning or the target
portal being disabled. I have not gotton a chance to test that it fixes the
reported problem - need to get access to a target I an configure. Just tested
for regressions.

I am going to make a new BZ for the dm setup modifications that would be needed
for more complex setups and for the some doc updates.

The patch was made over the current RHEL4 soruces. Feel free to test if you can
revuild your kernels.

Comment 28 Dave Wysochanski 2006-04-09 00:16:19 UTC
This problem is unique to iSCSI unfortunately because of how sendtargets works.
 With FC, you have zoning, so you can zone out particular ports.  But with iscsi
you have to have the zoning / blacklist in either the target or the initiator if
you use sendtargets.  The problem is not all targets implement blacklisting on a
per initiator + per target basis.  I'm not even sure if all targets implement
port level blacklisting.  Netapp does port level blacklisting via the "iswt
port" or "iscsi port" command but it's for all initiators.

Comment 29 Dave Wysochanski 2006-04-09 00:28:32 UTC
Greg, are you dividing the traffic because of a load balancing problem?  Is it
QoS related?  Or is it security?  I see your comment in #5 and #18.  Sounds like
you using both interfaces for iscsi but just want an explicit segregation for
some reason?  I'm not saying you're wrong just wondering what the use case is
for the  explicit segregation because I don't know people understand it.

Thanks.

Comment 30 Mike Christie 2006-04-09 08:43:03 UTC
(In reply to comment #28)
> This problem is unique to iSCSI unfortunately because of how sendtargets works.
>  With FC, you have zoning, so you can zone out particular ports.  But with iscsi
> you have to have the zoning / blacklist in either the target or the initiator if
> you use sendtargets.  The problem is not all targets implement blacklisting on a
> per initiator + per target basis.  I'm not even sure if all targets implement
> port level blacklisting.  Netapp does port level blacklisting via the "iswt
> port" or "iscsi port" command but it's for all initiators.

People have been putting portals on different subnets to work around lack of FC
type of zoning. How we handle unreachable portals in RHEL4 was broken though.
Some code got lost when going from RHEL3 to RHEL4.

I am not saying subnets is a perfect replacement for fc zoning. I am just saying
people are trying to use it in replacement.

I am also not saying we do not need a blacklist. I am just saying it can be done
a layer higher so it benefits more people, plus handles the problem of matching
a sd entry to a portal and hanldes the persistent name problem.

Comment 31 Mike Christie 2006-04-09 08:51:32 UTC
(In reply to comment #30)
> I am also not saying we do not need a blacklist. I am just saying it can be done
> a layer higher so it benefits more people, plus handles the problem of matching
> a sd entry to a portal and hanldes the persistent name problem.

Oh yeah, but just to be clear it would not help something like powerpath so
eventualy we would have to add something for them. I am only adding back the
code to handle invalid portals in this patch. This will solve some of the
problems for people that used network level soltions automagically. I will
handle the other issues in other BZs due to how we handle errata and patch
merging (some of the other fixes are to only userspace code) and other Red Hat
process fun.

Comment 32 Mike Christie 2006-04-09 09:01:49 UTC
(In reply to comment #28)
> This problem is unique to iSCSI unfortunately because of how sendtargets works.

But just to be completely clear this is not a iscsi only problem. For FC, if you
want a HA solition, where you have mutlple paths into the initiator you will end
up with multiple sd entries for the same LU. And for the OS configuration, you
will need to deal with figuring out which scsi_devices (/dev/sd entries) go
match which lu.

If we are setting up a nice single point of failure solution, then yes you can
use FC zoning so you end up with one sd that can be easily used. And, in that
case I would agree this is a iscsi only problem. But for the majority of the
setups out there I cannot imagine (I hope) this is what users are doing.

Comment 33 Mike Christie 2006-04-09 09:02:54 UTC
(In reply to comment #32)
> (In reply to comment #28)
> > This problem is unique to iSCSI unfortunately because of how sendtargets works.
> 
> But just to be completely clear this is not a iscsi only problem. For FC, if you
> want a HA solition, where you have mutlple paths into the initiator you will end
> up with multiple sd entries for the same LU. And for the OS configuration, you
> will need to deal with figuring out which scsi_devices (/dev/sd entries) go
> match which lu.
> 
> If we are setting up a nice single point of failure solution, then yes you can
> use FC zoning so you end up with one sd that can be easily used. And, in that
> case I would agree this is a iscsi only problem. But for the majority of the
> setups out there I cannot imagine (I hope) this is what users are doing.

I meant I hope they are not :)

Comment 34 Mike Christie 2006-04-09 09:07:38 UTC
(In reply to comment #32)
> (In reply to comment #28)
> > This problem is unique to iSCSI unfortunately because of how sendtargets works.
> 
> But just to be completely clear this is not a iscsi only problem. 

And jsut to be more clear becuase this bz is all over the place :)

I agree the problem of having to implememt something to mask a path at the os
level is unique to iscsi, but for setups where we do have more than one path
(which I think is the the more common one) having to determine which path goes
with which lu is common accress all transports. And that is what I was referring
to in bringing up FC in the beginning of the BZ.




Comment 35 Dave Wysochanski 2006-04-11 17:22:58 UTC
Another solution to this problem is to add static configuration to linux-iscsi.
This is a little more of a pain to users since the user must put in the target
name in addition to the IP but does solve the problems here. Also, over the last
few years I've received a few comments from people that have noted that it is
strange linux-iscsi does not provide this static configuration option as in
RFC3721.  Most all other initiators do it but linux-iscsi has always been the
one that does not.

I'm not saying this is the right solution necessarily but thought I'd mention
it in case it has not been considered.

http://www.ietf.org/rfc/rfc3721.txt

   iSCSI supports the following discovery mechanisms:

   a. Static Configuration: This mechanism assumes that the IP address,
      TCP port and the iSCSI target name information are already
      available to the initiator.  The initiators need to perform no
      discovery in this approach.  The initiator uses the IP address and
      the TCP port information to establish a TCP connection, and it
      uses the iSCSI target name information to establish an iSCSI
      session.  This discovery option is convenient for small iSCSI
      setups.



Comment 36 Mike Christie 2006-04-11 18:51:30 UTC
Have you looked at the CONFIG_TYPE_DISCOVERY_FILE type in iscsid's
iscsi-config.c? Does it only work for one target or do you have to do a file per
target. I do not remember but will look at later when I get a change.

Comment 37 Mike Christie 2006-04-11 18:55:54 UTC
It looks like you do a DiscoveryFile= per target you want to define. There is a
also a DefaultAddress=.

Just so you know this is a undocumented interface. One group at a large computer
company was using it. You can see the linux-iscsi list for a disussion on it. If
it is something people want we can add it. Maybe it would be better to make it
more open-iscsi like? Or do people prefer defining it in a file then running
iscsid to start and resrtate things?

Comment 42 Andrius Benokraitis 2006-04-21 14:55:17 UTC
From Tom: "We are planning a patch that will help with the most common case.
Will also add documentation to cover some of the others."

Comment 43 Jason Baron 2006-04-28 17:25:22 UTC
committed in stream U4 build 34.26. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 44 Dave Wysochanski 2006-04-28 18:35:47 UTC
For other people's reference here, Mike added the following modparam to the
driver for this problem (I think this is the only thing for this bugzilla right
Mike):

int iscsi_max_initial_login_retries = 3;
module_param_named(max_initial_login_retries, iscsi_max_initial_login_retries,
		   int, S_IRUGO);
MODULE_PARM_DESC(max_initial_login_retries, "Max number of times to retry "
		 "logging into a target for the first time before giving up. "
		 "The default is 3. Set to -1 for no limit");





Comment 45 Mike Christie 2006-04-28 19:01:11 UTC
Yeah this is only to handle the one bug/regression, where in RHEL3 if the target
would give us a invalid address and the driver would stop trying to connect.
This seems to occur a lot for the subnet setup and when targets tell us about
all portals, even ones that are not enabled.

I will update the driver/userspace docs to reflect this modparam and the other
ones we have while I am at it.

To handle the problem of making the iscsid/userspace/dm easy to configure, I
still have to make a new BZ. I am still trying to figure out what I have time
for and what will help the most people. But we also have the possible solution
in comment #37 where you can define the specific target and portal you want to
log into. It is currently unsupported, undocumented and probably not so easy to
use but I think we could fix that up to maek it a little more easy to use. Well,
easy to use in the linux-iscsi sense :)

Comment 46 Dave Wysochanski 2006-04-28 21:19:43 UTC
Well, unfortunately my x86_64 box (onboard broadcom ethernet) is panicing on
bootup with the new kernel:


root (hd1,2)
 Filesystem type is ext2fs, partition type 0x83
kernel /boot/vmlinuz-2.6.9-34.26.ELsmp ro root=LABEL=/1 rhgb quiet console=tty0
 console=ttyS0,57600n8 nmi_watchdog=1
   [Linux-bzImage, setup=0x1400, size=0x1949e2]
initrd /boot/initrd-2.6.9-34.26.ELsmp.img
   [Linux-initrd @ 0x37f1e000, 0xd1bf4 bytes]

Node 0 using interleaving mode 1/0
ÿRed Hat nash version 4.2.1.6 starting
INIT: version 2.85 booting
                Welcome to Red Hat Enterprise Linux AS
                Press 'I' to enter interactive startup.
Starting udev:  [  OK  ]
Initializing hardware...  storage network audio done[  OK  ]
Configuring kernel parameters:  [  OK  ]
Setting clock  (localtime): Fri Apr 28 17:17:30 EDT 2006 [  OK  ]
Setting hostname troan:  [  OK  ]
Checking root filesystem
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/hdb3
/1: clean, 293921/2660160 files, 2454293/5311490 blocks
[  OK  ]
Remounting root filesystem in read-write mode:  [  OK  ]
No devices found
Setting up Logical Volume Management: [  OK  ]
Checking filesystems
Checking all file systems.
[  OK  ]
Mounting local filesystems:  mount: special device /dev/sda1 does not exist
[FAILED]
Enabling local filesystem quotas:  [  OK  ]
Enabling swap space:  [  OK  ]
INIT: Entering runlevel: 3
Entering non-interactive startup
Starting sysstat:  [  OK  ]
Checking for new hardware [  OK  ]
Starting pcmcia:  [  OK  ]
Setting network parameters:  [  OK  ]
Bringing up loopback interface:  [  OK  ]
Bringing up interface eth0:  Unable to handle kernel paging request at
00000000dead4eb1 RIP:
<ffffffff8030a0be>{__lock_text_start+1}
PML4 f62a5067 PGD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ds yenta_socket pcmcia_core vfat fat dm_multipath dm_mod
button battery ac ohci_hcd hw_random shpchp tg3 floppy ext3 jbd
Pid: 0, comm: swapper Tainted: G   M  2.6.9-34.26.ELsmp
RIP: 0010:[<ffffffff8030a0be>] <ffffffff8030a0be>{__lock_text_start+1}
RSP: 0018:00000100016fbf08  EFLAGS: 00010202
RAX: 00000100f6d1c178 RBX: 00000100f6d1c000 RCX: 0000000000000019
RDX: ffffffff805194e0 RSI: 0000000000000000 RDI: 00000000dead4ead
RBP: ffffffff804e3730 R08: 00000100016e4000 R09: 000000000000004d
R10: 000000000000004d R11: 0000000000000001 R12: 0000010001048440
R13: 00000000fffbb8e2 R14: 00000100016e5e98 R15: 0000000000000000
FS:  0000002a9556a3e0(0000) GS:ffffffff804e2900(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000dead4eb1 CR3: 00000000016e8000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo 00000100016e4000, task 00000100f881d030)
Stack: 00000100f6d1c000 ffffffff802afc93 0000012c00000001 0000000000000001
       ffffffff804e3730 000000000000000a 0000000000000001 ffffffff8013c5dc
       0000000002000000 ffffffff804e05a4
Call Trace:<IRQ> <ffffffff802afc93>{net_rx_action+137}
<ffffffff8013c5dc>{__do_softirq+88}
       <ffffffff8013c685>{do_softirq+49} <ffffffff8011320b>{do_IRQ+328}
       <ffffffff80110813>{ret_from_intr+0}  <EOI> <ffffffff8010e749>{default_idle+0}
       <ffffffff8010e769>{default_idle+32} <ffffffff8010e7dc>{cpu_idle+26}


Code: 81 7f 04 ad 4e ad de 48 89 fb 74 1f 48 8b 74 24 08 48 c7 c7
RIP <ffffffff8030a0be>{__lock_text_start+1} RSP <00000100016fbf08>
CR2: 00000000dead4eb1
 <0>Kernel panic - not syncing: Oops
  


Comment 47 Jason Baron 2006-04-28 21:31:17 UTC
yeah, ok, we're fixing that for the next kernel...apparently the up kernel works
fine if you want to try that...otherwise please use > 34.26 thanks.

Comment 48 Dave Wysochanski 2006-04-28 21:50:19 UTC
Thanks.

Comment 52 Andrius Benokraitis 2006-05-19 15:47:48 UTC
I think we (Mike/EMC/me) need an offline discussion for this being in RHEL3...
Mike, do you have questions for EMC?

Comment 53 Mike Christie 2006-05-19 20:13:13 UTC
Ok for:

RHEL3: I do not think we are going to do much. If there is another update we can
fix the bug where we hammer on a port that is not accessable and document the
problem, but that is it.

RHEL4: I made this BZ and put everyone on it. For this BZ we can add a blacklist
or make the tools a little easier to use, I hope.

Comment 54 Mike Christie 2006-05-19 20:14:12 UTC
This BZ
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=192462

Comment 57 Red Hat Bugzilla 2006-08-10 22:26:31 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0575.html



Note You need to log in before you can comment on or make changes to this bug.