Red Hat Bugzilla – Bug 504338
iscsi losing paths when heavily utilized
Last modified: 2017-04-04 16:42:58 EDT
Description of problem: iscsid will lose one or more paths if iscsi fabric is flooded. Discovered during benchmarks using Bonnie++.
Version-Release number of selected component (if applicable): iscsi-initiator-utils-184.108.40.2068-0.18.el5
How reproducible: Reproducable everytime benchmark is run. Is more likely to happen when benchmark dataset is large(>32GB).
Steps to Reproduce:
1. Create iscsi mount to SAN device(Dell Equallogic PS5000)
2. Initiate Bonnie++ or run other IO intensive tasks
Multiple paths will begin disconnecting/reconnecting. We are using 3 paths currently, and eventually all 3 will disconnect at the same time(anywhere from 5 minutes to 1hr after benchmark start).
Benchmark should complete without errors and report statistics.
Additional info: Many of the following errors flood syslog:
Jun 4 18:06:56 stl-dt-sls-006 iscsid: received iferror -38
Jun 4 18:53:49 stl-dt-sls-006 kernel: connection3:0: iscsi: detected conn error (1011)
Jun 4 18:53:49 stl-dt-sls-006 iscsid: Kernel reported iSCSI connection 3:0 error (1011) state (3)
Could you send a little more of the log? I am looking for something about a iscsi ping or nop timing out, or something about a logout from the target was requested. Also on the target could you check for something about it requesting a login or dropping a session/connection or doing load balancing (load balancing initiated from the target will cause the errors above too because it forces us to logout of one portal and into another)?
(In reply to comment #1)
> Could you send a little more of the log? I am looking for something about a
> iscsi ping or nop timing out, or something about a logout from the target was
> requested. Also on the target could you check for something about it requesting
> a login or dropping a session/connection or doing load balancing (load
I mean a logout not login there..
> balancing initiated from the target will cause the errors above too because it
> forces us to logout of one portal and into another)?
Created attachment 346708 [details]
log excerpt from server experiencing iscsi issue
This is an excerpt from syslog on one of the hosts experiencing this iSCSI disconnect issue under load.
Created attachment 346709 [details]
log excerpt from Dell Equallogic SAN device (PS5000XV)
It looks like the target might be trying to load balance the sessions. When it does this it asks us to logout and relogin and then when we try to log back in it will redirect us to what it believes is the optimal portal. It looks like there are some other connections possibly failing also, but let's try to cut some stuff out.
Could you try to turn the target's load balancing off first. Here are the instructions I got from Equallogic:
You can turn off load balancing in the command line interface. Telnet
to the array's group address, login as grpadmnin, and do:
> > grpparams conn-balancing disable
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release. Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products. This request is not yet committed for inclusion in
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
I know this case is a bit Old, but we are experiencing the same kind of issue with RHEL 5.8 and Equallogic PS6100XV.
We got the same Connection Error messages resulting in a unmount and remount the FileSystem in ReadOnly mode. This is very annoying as the FS hosting ArchiveLog for Oracle Database.
We are able to bring the serveur running after simple reboot but we got the issue once last week and already twice this week.
Red Hat Enterprise Linux 5 shipped it's last minor release, 5.11, on September 14th, 2014. On March 31st, 2017 RHEL 5 exits Production Phase 3 and enters Extended Life Phase. For RHEL releases in the Extended Life Phase, Red Hat will provide limited ongoing technical support. No bug fixes, security fixes, hardware enablement or root-cause analysis will be available during this phase, and support will be provided on existing installations only. If the customer purchases the Extended Life-cycle Support (ELS), certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release will be provided. The specific support and services provided during each phase are described in detail at http://redhat.com/rhel/lifecycle
This BZ does not appear to meet ELS criteria so is being closed WONTFIX. If this BZ is critical for your environment and you have an Extended Life-cycle Support Add-on entitlement, please open a case in the Red Hat Customer Portal, https://access.redhat.com ,provide a thorough business justification and ask that the BZ be re-opened for consideration of an errata. Please note, only certain critical-impact security fixes and selected urgent priority bug fixes for the last minor release can be considered.