Description of problem:
The iscsi software initiator in rhel3, u4, does not properly automount
LUNs. The reason for this is related to the bringup order of the
iscsi service in relation to other things, and the fact that the
mounting is not retried for a long enough period of time. For
example, the iscsi readme file located in
/usr/share/doc/iscsi-initiator-utils-3.6.2/README states to use
the /etc/fstab file with _netdev option. However, if the network
takes a little while to come up (as is the case with certain switch
ports becoming active), it's very likely that the iSCSI LUNs will not
be available at the time the netfs service is run. Another thing
listed in the README is devlabel. However, since devlabel is run from
rc.sysinit, this is before the iscsi driver gets loaded and as a
result, any labels that used to exist get removed. Thus, you cannot
use these in the /etc/fstab file.
Version-Release number of selected component (if applicable):
- iscsi kernel module v3.6.1
I could not get automounting to work, no matter what I tried - every
method I tried failed.
Steps to Reproduce:
Follow the instructions in the README file to setup automounting of
iSCSI LUNs. The recommended methods for achieving device naming
persistency do not work with automounting of LUNs. These methods need
to be reviewed and perhaps adjustments need to be made in terms of
service loading, etc, for automounting of iSCSI LUNs.
Errors in the /var/log/messages file indicating iSCSI devices do not
exist at the time the mounting is attempted.
Either background the mount, or do something so that iSCSI LUNs are
eventually mounted automatically upon reboot, once these LUNs become
As a result of this bug, upon reboot manual intervention seems to
always be needed in order to get iSCSI LUNs mounted. This isn't the
end of the world, but is definately something that should be fixed for
better iSCSI usability.
Taking ownership of this one - planning on adding a method to the iscsi init
script that checks for session establishment with a user-configurable timeout.
Note that with the equipment I have tested on, sessions are established before
the init scripts that depend on them being established, so this issue is very
dependant on hardware.
I was using the linux-iscsi package from SourceForge before Red Hat
started including iscsi in the kernel, and they had a way of
automounting that I found quite acceptable. The iscsi volumes would
be entered in /etc/fstab.iscsi and would be automounted when the iscsi
daemon started. Would this be so difficult for RH to implement? Is
there a reason it and the iscsi-mount and iscsi-mountall tools were
not put in the iscsi-initiator-utils package?
The reason it was not implemented was because it is a non-standard
interface. It was yet another place users needed to add mount
information and none of the RHEL tools knew anything about it. We
also ran into issues with the iscsi-(u)mountall scripts during
Also, using /etc/fstab.iscsi does not solve the devlabel issue - it
only solves the netfs one - so you still don't have persistant disk
names without a volume manager.
I plan on fixing both the automount and devlabel issue by modifying
the iscsi initscript to see if the session has been established, and
once it has, (or a user-definable timeout has been reached,) reload
That being said, you certainly can grab the linux-iscsi tarball from
sourceforge and use the old initscript and iscsi-(u)mountall scripts -
it just will not be supported by Red Hat.
Will your fix contain something to auto-umount iSCSI mounted
filesystems when the driver is stopped?
This is another case that may warrant another bugzilla altogether.
Basically, when the driver is stopped, the script tries to unload
the module. If there are existing iSCSI filesystems mounted, it'll
fail with something like this:
[root@dell2650-rtp8 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 35001508 4634016 28589500 14% /
/dev/sda1 101089 25826 70044 27% /boot
none 1027728 0 1027728 0% /dev/shm
/dev/sdc1 10321192 32828 9764080 1%
[root@dell2650-rtp8 /]# /etc/init.d/iscsi stop
Stopping iSCSI: iscsidiscsi_sfnet: Device or resource busy
Unable to remove iscsi kernel driver - devices may still be in use
It's probably unacceptable to just remove the module removal code,
since the user might reload the driver.
Let me know if you want me to file another bugzilla for this case,
as it might be somewhat involved.
Thanks for the info AJ - sounds like you have this issue well in hand.
I'll wait for your next rpm update, and until then make do by
mounting the iSCSI targets manually after reboot.
In response to comment #5:
I wasn't planning on it, especially since the initscript doesn't mount
the devices. I see it like any other scsi device driver - the fact
that we have a daemon running to monitor for multipath failure cases
shouldn't affect the driver - so if someone stops the daemon, they
shouldn't have to unmount the filesystem. (and if they start the
daemon again, it should reconnect to the driver and go on like nothing
happened) The error/warning that is printed out is there to make sure
the user knows there's still an iscsi device in use. It's up to them
to unmount it. I'm especially opposed to using fuser to kill
processes running on the scsi device - which is what iscsi-umountall does.
The system shutdown path will umount iscsi devices before the iscsi
service is shut down, so the only time someone will see this is if
they are manually running 'service iscsi stop'. In my opinion, it is
the responsibility of the person issuing the command to unmount iscsi
devices at that point.
As far as removing the module removal code, since the service will
load the driver if necessary, I don't see why we should remove it.
But if the module is already loaded when the service script is run, no
error will be issued.
Ok, that approach seems reasonable.
However, did you know there's a panic in the shutdown path if you
take this approach and someone tries to reboot the machine with
an iSCSI LUN mounted?
INIT: Stopping Red Hat Network Daemon: [ OK ]
Stopping atd: [ OK ]
Stopping cups: [ OK ]
Shutting down xfs: [ OK ]
Shutting down console mouse services: [ OK ]
Stopping sshd:[ OK ]
Stopping xinetd: [ OK ]
Stopping crond: [ OK ]
Stopping automount:umount2: Device or resource busy
umount: /u: device is busy
Shutting down NIS services: [ OK ]
Shutting down ntpd: [ OK ]
Unmounting NFS filesystems: [ OK ]
Saving random seed: [ OK ]
Killing mdmonitor: [ OK ]
Stopping NFS statd: [ OK ]
Stopping portmapper: [ OK ]
Shutting down kernel logger: [ OK ]
Shutting down system logger: [ OK ]
Stopping iscsi: Stopping iSCSI: iscsidiscsi_sfnet: Device or resource
Unable to remove iscsi kernel driver - devices may still be in use
Shutting down interface eth0: [ OK ]
Shutting down interface eth1: [ OK ]
Shutting down loopback interface: [ OK ]
Shutting down audit subsystem[ OK ]
Starting killall: [ OK ]
Sending all processes the TERM signal...
RPC: sendmsg returned error 101
portmap: RPC call returned error 101
Sending all processes the KILL siSCSI: session f7000000 has ended
quickly 1 times, login delay 1 seconds
iSCSI: tx thread 15601 received SIGKILL, killing rx thread 15602
NMI Watchdog detected LOCKUP on CPU3, eip c0138fdf, registers:
iscsi_sfnet nfs lockd sunrpc usbserial lp parport autofs4 audit e1000
tg3 floppy sg microcode keybdev mousedev hid input usb-ohci up
EIP: 0060:[<c0138fdf>] Not tainted
EIP is at __group_send_sig_info [kernel] 0x3ef (2.4.21-27.ELsmp/i686)
eax: c73af100 ebx: 00000286 ecx: 00000000 edx: f6ede000
esi: 00000012 edi: c7304000 ebp: f6edff40 esp: f6edfeec
ds: 0068 es: 0068 ss: 0068
Process killall5 (pid: 16369, stackpage=f6edf000)
Stack: 00000012 f6edff40 c7304000 c7304000 f6edff40 00000012 f6ede000
00000012 f6edff40 c7304000 00000010 00000000 f6ede000 00003ff1
bfffcba8 c0137a87 00000012 f6edff40 ffffffff 00000012 00000000
Call Trace: [<c01368bc>] kill_something_info [kernel] 0xcc (0xf6edff08)
[<c0137a87>] sys_kill [kernel] 0x57 (0xf6edff30)
[<c017822e>] vfs_readdir [kernel] 0xae (0xf6edff54)
[<c017d740>] dput [kernel] 0x30 (0xf6edff64)
[<c016531b>] __fput [kernel] 0xbb (0xf6edff78)
[<c01634be>] filp_close [kernel] 0x8e (0xf6edff94)
[<c0163566>] sys_close [kernel] 0x66 (0xf6edffb0)
Code: 80 b8 04 05 00 00 00 f3 90 7e f5 e9 23 d6 ff ff e8 bc 1f fd
console shuts up ...
N<MI4> N rMI: W8,a 1de02te4 ctKBed)
OC NKMUI PWatchdog detected LOCKUP L
heh - nope, didn't know that - you mind bugging that separately with
me as the owner?
Ok, filed bugzilla 144781.
Created attachment 110068 [details]
iscsi shutdown errors
it stops for about 5 minutes before it finally reboots.
I had to add a 30 second delay to the beginning of the iscsi init
script under the "start" section to give the e1000 NIC time to start
passing packets with my Cisco 3750G switch.
I'd recommend adding a user configurable start up delay (along with
the other TIMEOUT variables you already hard code in the top of the
script) into a /etc/sysconfig/iscsi config file, so the changes
aren't over written with each kernel update.
With the 30 second sleep, iscsi starts up properly and the system
mounts my e2label'd partitions specified in /etc/fstab like so:
LABEL=/oracle/home /oracle/home ext3
defaults 1 2
However, I still have issues rebooting. It fails to stop the iscsi
services due to a device still in use. No processes were running
that would hold it locked, but for some reason the umount-all stuff
doesn't seem to be unmounting one or all of the iscsi partitions.
Then the system seems to hang for about 5 minutes before rebooting
while it's sending KILL signals.
See the JPG attachment. It was the only way I could capture this
There is an i386 rpm at
includes a workaround for this bug - there is now an /etc/sysconfig/iscsi
configuration file that allows you to increase the timeouts an startup - you
should set ESTABLISHTIMEOUT to the maximum time it takes to establish the iscsi
I still want to find a way to check on the session status, but I haven't found a
good way to code it yet.
Please note the rpm in comment #13 is unsupported. This is a test rpm to verify
the workaround works.
AJ, I tested this in RHEL4 U5 beta but I had to set non-default values to get it
to automount. Is that your intent? Could we set reasonable default values so
that it works for *most* cases?
I set ESTABLISHTIMEOUT=60 for my tests - your mileage may vary.
The timeouts that are default in the beta version are easily enough for my
setup. How common do you think your router/timeout issue is going to be? I
really don't want the iscsi initscripts holding up boot by 60+ seconds if it's
Not sure what you mean by "router/timeout issue". Are you saying the defaults
mounted the luns in your setup? How thoroughly did you test this to arrive at
the defaults - was it only one machine, one target, etc? I'm just asking
because I've only tested on one machine so far (ibm x325) with one target and it
failed to mount every time until I set the ESTABLISHTIMEOUT. I think I tried 15
and it didn't work, then tried 60.
I'm not suggesting the correct default should be 60. But I think the fact that
the first machine I tried failed every time probably indicates the current
defaults are not adequate. I can try to test a few more machines and recommend
a default if you want, or maybe you guys want to do that?
The defaults work for me on a single machine connected to a single netapp
exporting 27 LUs to my initiator ID. I have no other targets to test against,
so if you can do some checking and find a reasonable default, please do so.
What's the architecture of the host? Did you put it in a reboot loop or just
try it a few times? Can you tell if it's *close* to failing or there's a lot of
It looks like ESTABLISHTIMEOUT=30 works on my ibm x325.
I will try another architecture machine.
I'm not sure what the right default is for *most* customers. Do you think most
iSCSI customers will care that much about bootup times? Tradeoff seems to be
whether bootup time is more important than automouting working for iSCSI customers.
Dual 2.4 GHz Xeon system - so i686. I think automounting working is more
important than bootup time, but we can't guarantee it'll work for everyone out
of the box with this method. If we want to do that, we'll have to set it to
3600, and even then it might not catch all cases.
30 seconds sounds reasonable to me though, so unless you see a problem with that
on another system/setup you test, I'll plan on rolling another package with the
timeout cranked up to that by default.
ESTABLISHTIMEOUT defaults didn't get changed in time for the RHEL3-U5 cutoff, so
the default is still 15 seconds. The errata advisory mentions that this may
need to be modified in /etc/sysconfig/iscsi. On the upside, the current default
worked fine for our testing.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.