Bug 1300094 - [Nimble Storage] Long boot time with iSCSI mapped volumes
[Nimble Storage] Long boot time with iSCSI mapped volumes
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: iscsi-initiator-utils (Show other bugs)
6.7
x86_64 Linux
unspecified Severity urgent
: rc
: ---
Assigned To: Chris Leech
Martin Hoyer
:
Depends On:
Blocks: 1309488
  Show dependency treegraph
 
Reported: 2016-01-19 19:01 EST by shivamerla1
Modified: 2016-05-10 17:24 EDT (History)
5 users (show)

See Also:
Fixed In Version: iscsi-initiator-utils-6.2.0.873-18.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1309488 (view as bug list)
Environment:
Last Closed: 2016-05-10 17:24:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description shivamerla1 2016-01-19 19:01:45 EST
Description of problem:
Seeing a lot of "iscsiadm: could not read session targetname: 5" and "iscsiadm: could not find session info for session14" messages during host bootup. When host has 800 iscis LUNs, the boot up time takes more than 40+ minutes to finish.


Version-Release number of selected component (if applicable):

[root@hiqa-sys-rhel2 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.7 (Santiago)


How reproducible:
Most of the time.

Steps to Reproduce:
1. Map around 800 iSCSI based volumes from Nimble Storage array.
2. Reboot host.
3. Following messages flood screen and Boot time is very slow.

Actual results:
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session46
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session47
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session48
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session49
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: Connection46:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60028-v2adb271243103de8.00000153.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session47
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session48
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session49
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session50
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: Connection47:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60084-v2adb271243103de8.0000018b.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session48
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session49
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session50
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: could not find session info for session51
Jan 19 14:39:29 hiqa-sys-rhel2 iscsid: Connection48:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60079-v2adb271243103de8.00000186.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
.
.
.
.
Jan 19 15:41:44 hiqa-sys-rhel2 iscsid: could not read session targetname: 5
Jan 19 15:41:44 hiqa-sys-rhel2 iscsid: could not find session info for session770
Jan 19 15:41:45 hiqa-sys-rhel2 iscsid: Connection770:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60396-v2adb271243103de8.000002c3.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now



Expected results:
Better boot time.

Additional info:
Comment 2 shivamerla1 2016-01-19 19:40:07 EST
Following errors were also seen and device discovery is pretty slow as there are repeated messages of sessions are operational again.

Jan 19 14:44:28 hiqa-sys-rhel2 kernel: ixgbe 0000:06:00.1: eth3: NIC Link is Down
Jan 19 14:44:38 hiqa-sys-rhel2 kernel: connection46:0: ping timeout of 10 secs expired, recv timeout 5, last rx 4295026703, last ping 4295031703, now 4295041703
Jan 19 14:44:38 hiqa-sys-rhel2 kernel: connection46:0: detected conn error (1011)
Jan 19 14:44:38 hiqa-sys-rhel2 kernel: connection152:0: ping timeout of 10 secs expired, recv timeout 5, last rx 4295026705, last ping 4295031705, now 4295041705
Jan 19 14:44:38 hiqa-sys-rhel2 kernel: connection152:0: detected conn error (1011)
Jan 19 14:44:38 hiqa-sys-rhel2 kernel: connection153:0: ping timeout of 10 secs expired, recv timeout 5, last rx 4295026706, last ping 4295031706, now 4295041706
Jan 19 14:44:38 hiqa-sys-rhel2 kernel: connection153:0: detected conn error (1011)
Jan 19 14:44:51 hiqa-sys-rhel2 kernel: ixgbe 0000:06:00.1: eth3: NIC Link is Down
Jan 19 14:44:56 hiqa-sys-rhel2 kernel: ixgbe 0000:06:00.1: eth3: NIC Link is Up 10 Gbps, Flow Control: RX
Jan 19 14:44:58 hiqa-sys-rhel2 kernel: ixgbe 0000:06:00.1: eth3: NIC Link is Down
Jan 19 14:45:02 hiqa-sys-rhel2 kernel: connection106:0: ping timeout of 10 secs expired, recv timeout 5, last rx 4295050722, last ping 4295055722, now 4295065722
Jan 19 14:45:02 hiqa-sys-rhel2 kernel: connection106:0: detected conn error (1011)
Jan 19 14:45:02 hiqa-sys-rhel2 kernel: connection202:0: ping timeout of 10 secs expired, recv timeout 5, last rx 4295050723, last ping 4295055723, now 4295065723
Jan 19 14:45:02 hiqa-sys-rhel2 kernel: connection202:0: detected conn error (1011)
Jan 19 14:45:02 hiqa-sys-rhel2 kernel: connection203:0: ping timeout of 10 secs expired, recv timeout 5, last rx 4295050748, last ping 4295055748, now 4295065748
Jan 19 14:45:02 hiqa-sys-rhel2 kernel: connection142:0: ping timeout of 10 secs expired, recv timeout 5, last rx 4295050748, last ping 4295055748, now 4295065748
Jan 19 14:45:03 hiqa-sys-rhel2 kernel: connection51:0: detected conn error (1011)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 219:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 220:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 111:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 58:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 221:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 35:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 222:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)
Jan 19 14:45:03 hiqa-sys-rhel2 iscsid: Kernel reported iSCSI connection 19:0 error (1011 - ISCSI_ERR_CONN_FAILED: iSCSI connection failed) state (3)

Jan 19 14:39:34 hiqa-sys-rhel2 kernel: sd 79:0:0:0: [sdbt] Attached SCSI disk
Jan 19 14:39:34 hiqa-sys-rhel2 kernel: sd 80:0:0:0: [sdbu] Attached SCSI disk
Jan 19 14:39:35 hiqa-sys-rhel2 kernel: sd 81:0:0:0: [sdbv] Attached SCSI disk
Jan 19 14:39:35 hiqa-sys-rhel2 kernel: sd 82:0:0:0: [sdbw] Attached SCSI disk
Jan 19 14:39:35 hiqa-sys-rhel2 iscsid: Connection71:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60138-v2adb271243103de8.000001c1.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Jan 19 14:39:35 hiqa-sys-rhel2 iscsid: Connection72:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60016-v2adb271243103de8.00000147.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Jan 19 14:39:35 hiqa-sys-rhel2 iscsid: Connection73:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60059-v2adb271243103de8.00000172.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Jan 19 14:39:35 hiqa-sys-rhel2 iscsid: Connection74:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60081-v2adb271243103de8.00000188.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
.
.
.
.
Jan 19 16:18:08 hiqa-sys-rhel2 kernel: sd 1030:0:0:0: [sdami] Attached SCSI disk
Jan 19 16:18:08 hiqa-sys-rhel2 iscsid: Connection1022:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60452-v2adb271243103de8.000002fb.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Jan 19 16:18:21 hiqa-sys-rhel2 kernel: sd 1031:0:0:0: [sdamj] Attached SCSI disk
Jan 19 16:18:22 hiqa-sys-rhel2 iscsid: Connection1023:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60430-v2adb271243103de8.000002e5.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Jan 19 16:18:47 hiqa-sys-rhel2 kernel: sd 1032:0:0:0: [sdamk] Attached SCSI disk
Jan 19 16:18:48 hiqa-sys-rhel2 iscsid: Connection1024:0 to [target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60452-v2adb271243103de8.000002fb.69cb1afa, portal: 172.18.80.96,3260] through [iface: default] is operational now
Comment 3 Martin Hoyer 2016-01-26 04:10:23 EST
See: BZ1196382
Comment 4 shivamerla1 2016-01-26 12:06:51 EST
Hi, Looks like i don't have access to BZ mentioned. Can you provide me access.

Thanks.
Comment 5 shivamerla1 2016-01-27 01:03:34 EST
We have recreated the issue again. From iscsid debug logs, we see that same ISID is used to connect to same target for both sessions. Hence target is disconnecting the old connection with same ISID before accepting new connection.

I see that ISID for both connections 480:0 and 224:0 is same(23de000 instead of 23d1e00 ) from host logs and are different sessions to same volume. Does it cause the problem?.

size=4.0G uuid=2414083771cb6a2846c9ce900fa1acb69 mount-device: /dev/nimblestorage/arr6-iscsi-vol-60062-2414083771cb6a2846c9ce900fa1acb69
target: iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60062-v2adb271243103de8.00000175.69cb1afa
 ` + array_id=1 device=/dev/dm-113
   |-  488:0:0:0  sddy 128:0   session=480 sess_st=LOGGED_IN  dev_st=running    iface=default    tgtportal=172.18.80.96
   |-  552:0:0:0  sdbj  67:208 session=544 sess_st=LOGGED_IN  dev_st=running    iface=default    tgtportal=172.18.80.96
   |-  600:0:0:0  sdhf 133:80  session=592 sess_st=LOGGED_IN  dev_st=running    iface=default    tgtportal=172.18.160.88
   |-  232:0:0:0  sdha 133:0   session=224 sess_st=FAILED     dev_st=running    iface=default    tgtportal=172.18.160.88



Volume Info.

[root@hiqa-sys-rhel2 ~]# sg_inq -p 0xd4 /dev/sdha
 Only hex output supported. sg_vpd decodes more pages.
VPD INQUIRY, page code=0xd4:
 00     00 d4 01 02 01 00 61 72  72 36 2d 69 73 63 73 69    ......arr6-iscsi
 10     2d 76 6f 6c 2d 36 30 30  36 32 00 00 00 00 00 00    -vol-60062......
        
[root@hiqa-sys-rhel2 ~]# sg_inq -p 0xd4 /dev/sddy
 Only hex output supported. sg_vpd decodes more pages.
VPD INQUIRY, page code=0xd4:
 00     00 d4 01 02 01 00 61 72  72 36 2d 69 73 63 73 69    ......arr6-iscsi
 10     2d 76 6f 6c 2d 36 30 30  36 32 00 00 00 00 00 00    -vol-60062......


iscsid logs indicating same ISID used for both sessions.

iscsid: sending login PDU with current stage 1, next stage 3, transit 0x80, isid 0x00023de00000 exp_statsn 7
iscsid: >    InitiatorName=iqn.1994-05.com.redhat:8973f121285
iscsid: >    InitiatorAlias=localhost
iscsid: >    TargetName=iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60062-v2adb271243103de8.00000175.69cb1afa
iscsid: >    SessionType=Normal
iscsid: >    HeaderDigest=None
iscsid: >    DataDigest=None
iscsid: >    DefaultTime2Wait=2
iscsid: >    DefaultTime2Retain=0
iscsid: >    IFMarker=No
iscsid: >    OFMarker=No
iscsid: >    ErrorRecoveryLevel=0
iscsid: >    InitialR2T=No
iscsid: >    ImmediateData=Yes
iscsid: >    MaxBurstLength=16776192
iscsid: >    FirstBurstLength=262144
iscsid: >    MaxOutstandingR2T=1
iscsid: >    MaxConnections=1
iscsid: >    DataPDUInOrder=Yes
iscsid: >    DataSequenceInOrder=Yes
iscsid: >    MaxRecvDataSegmentLength=262144
iscsid: in ksend_pdu_begin
iscsid: send PDU began for hdr 48 bytes and data 504 bytes
iscsid: in kwritev
iscsid: wrote 48 bytes of PDU header
iscsid: in kwritev
iscsid: wrote 504 bytes of PDU data
iscsid: in ksend_pdu_end
iscsid: in __kipc_call
iscsid: in kwritev
iscsid: in nlpayload_read
iscsid: in nlpayload_read
iscsid: send PDU finished for conn 224:0


iscsid: sending login PDU with current stage 1, next stage 3, transit 0x80, isid 0x00023de00000 exp_statsn 2
iscsid: >    InitiatorName=iqn.1994-05.com.redhat:8973f121285
iscsid: >    InitiatorAlias=localhost
iscsid: >    TargetName=iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60062-v2adb271243103de8.00000175.69cb1afa
iscsid: >    SessionType=Normal
iscsid: >    HeaderDigest=None
iscsid: >    DataDigest=None
iscsid: >    DefaultTime2Wait=2
iscsid: >    DefaultTime2Retain=0
iscsid: >    IFMarker=No
iscsid: >    OFMarker=No
iscsid: >    ErrorRecoveryLevel=0
iscsid: >    InitialR2T=No
iscsid: >    ImmediateData=Yes
iscsid: >    MaxBurstLength=16776192
iscsid: >    FirstBurstLength=262144
iscsid: >    MaxOutstandingR2T=1
iscsid: >    MaxConnections=1
iscsid: >    DataPDUInOrder=Yes
iscsid: >    DataSequenceInOrder=Yes
iscsid: >    MaxRecvDataSegmentLength=262144
iscsid: in ksend_pdu_begin
iscsid: send PDU began for hdr 48 bytes and data 504 bytes
iscsid: in kwritev
iscsid: wrote 48 bytes of PDU header
iscsid: in kwritev
iscsid: wrote 504 bytes of PDU data
iscsid: in ksend_pdu_end
iscsid: in __kipc_call
iscsid: in kwritev
iscsid: in nlpayload_read
iscsid: in nlpayload_read
iscsid: send PDU finished for conn 480:0


[root@hiqa-sys-rhel2 ~]# tail -f /var/log/messages

Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_recv_skb in 24 bytes
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_recv_skb skb ffff8810313ebeb8 ptr=ffff88102d2756a4 avail=24
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_segment_done copied 0 0 size 24 recv
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_segment_recv copying 24
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_segment_done copied 0 24 size 24 recv
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_segment_done total copied 24 total size 24
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_recv_skb segment done
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_hdr_recv_prep (digest disabled)
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_recv_skb in 0 bytes
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_tcp_recv_skb no more data avail. Consumed 0
Jan 26 15:18:36 hiqa-sys-rhel2 kernel: connection480:0: iscsi_sw_tcp_recv read 0 bytes status 1
Jan 26 15:18:39 hiqa-sys-rhel2 kernel: connection480:0: iscsi_sw_sk_state_check TCP_CLOSE|TCP_CLOSE_WAIT
Jan 26 16:04:49 hiqa-sys-rhel2 kernel: connection480:0: detected conn error (1020) ISCSI_ERR_TCP_CONN_CLOSE

Data IP intermittent failures.

[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/address 
cat: /sys/class/iscsi_connection/connection480:0/address: Transport endpoint is not connected
[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/address 
172.18.80.95
[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/address
172.18.80.95
[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/address
172.18.80.95
[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/address
cat: /sys/class/iscsi_connection/connection480:0/address: Transport endpoint is not connected


Discovery IP is fine.

[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/persistent_address 
172.18.80.96
[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/persistent_address 
172.18.80.96
[root@hiqa-sys-rhel2 ~]# cat /sys/class/iscsi_connection/connection480\:0/persistent_address 
172.18.80.96
Comment 6 shivamerla1 2016-01-27 16:50:00 EST
I see that session ISID is determinted as below.

/* OUI and uniqifying number */
        session->isid[0] = DRIVER_ISID_0;
        session->isid[1] = DRIVER_ISID_1;
        session->isid[2] = DRIVER_ISID_2;
        session->isid[3] = 0;
        session->isid[4] = 0;
        session->isid[5] = 0;

/*
 * TODO: use the iface number or some other value
 * so this will be persistent
 */
       session->isid[3] = session->id;


Only least significant byte of session SID is used here. Hence for mulitple sessions the ISID can be same.

    SID: session->id    ISID
    480:  e0      --> 0x00023de00000
    224:  e0      --> 0x00023de00000

    544:  10      --> 0x00023d100000
    528:  10      --> 0x00023d100000

    545:  21      --> 0x00023d210000
    289:  21      --> 0x00023d210000


If these sessions belong to same volume then target is reinstating(closing) existing sessions before connecting new ones. This is causing TCP_CONN_CLOSE errors and unnecessary recovery actions.

Is this a design issue?. Can this be avoided?.
Comment 7 shivamerla1 2016-01-28 19:13:58 EST
Hi, Any update on this?. This bug is critical for us to support large volume iSCSI configurations.

Thanks.
Comment 8 shivamerla1 2016-01-29 12:19:31 EST
Update from open-iscsi forum.


On Thursday, January 28, 2016 at 6:02:44 PM UTC-8, Mike Christie wrote:

    On 01/28/2016 02:19 PM, shiva krishna merla wrote:
    > Hi All,
    >
    > We are seeing an issue with Redhat 6.7 where ISID's are repeated for
    > different sessions to same target (volume level target). Our array
    > target has to close the existing sessions with same ISID before
    > accepting new connections. This causes repeated login retry attempts by
    > iscsid and overall system discovery process is too slow. From the code i
    > see that ISID is determined as below. With this many sessions above 256
    > can have repeated ISID. Is this a design flaw?. Can this be avoided?.
    >
    > /* OUI and uniqifying number */
    >         session->isid[0] = DRIVER_ISID_0;
    >         session->isid[1] = DRIVER_ISID_1;
    >         session->isid[2] = DRIVER_ISID_2;
    >         session->isid[3] = 0;
    >         session->isid[4] = 0;
    >         session->isid[5] = 0;
    >

    Very nice debugging. Thanks.

    We have not seen setups with that many sessions, so we just did not
    support it.

    It sounds like the Red Hat iSCSI engineer, Chris Leech, is working on a
    patch, so it should get fixed quickly.

    Just curious. Are you doing a LUN per target and have lots of LUNs? Or
    do you have lots of targets or target portals for some other reason? How
    many?
Comment 9 Chris Leech 2016-01-29 13:08:09 EST
(In reply to shivamerla1 from comment #7)
> Hi, Any update on this?. This bug is critical for us to support large volume
> iSCSI configurations.

Thank you for your detailed report. I'm taking a look at this, and think we should be able to use more of the ISID qualifier space.  I'm looking at the relevant RFCs, as the current ISID construction looks suspect to me in more than just only providing 8-bits of uniqueness, but I think we can safely extend that to 16-bits at least.
Comment 10 Chris Leech 2016-01-29 13:19:25 EST
(In reply to Chris Leech from comment #9)
> I'm looking at the relevant RFCs, as the current ISID construction looks 
> suspect to me in more than just only providing 8-bits of uniqueness, but I
> think we can safely extend that to 16-bits at least.

Actually, I confused myself with an outdated draft. There's nothing wrong with the construction (other than possibly questioning the choice of OUI), and we should be able to use all 24-bits of qualifier space.
Comment 12 shivamerla1 2016-01-29 13:24:43 EST
Awesome, Thanks.
Comment 13 Chris Leech 2016-01-29 19:52:11 EST
I've put RHEL 6 test RPMS that include the patch I sent to the Open-iSCSI mailing list up at http://people.redhat.com/cleech/bz1300094/
Comment 14 shivamerla1 2016-01-29 20:02:15 EST
Thanks for fixing this quickly Chris. I will load the packages and give it a try.
Comment 15 shivamerla1 2016-02-02 20:39:06 EST
Chris, with the fix we no longer see TCP_CONN_CLOSE errors, but somehow the login process is still slow for ~256 LUN's with 4 sessions each ( i.e total 1024 sessions ). Initially iscsiadm login process is fast but gradually it slows down taking ~15 seconds between each login and completes within 45min to 1hr. We have collected iscsid logs but those doesn't have timestamps. what is the better way to debug iscsid logs?. Any other known issues for scaled systems?.
Comment 16 Chris Leech 2016-02-03 18:51:18 EST
(In reply to shivamerla1 from comment #15)
> Chris, with the fix we no longer see TCP_CONN_CLOSE errors, but somehow the
> login process is still slow for ~256 LUN's with 4 sessions each ( i.e total
> 1024 sessions ). Initially iscsiadm login process is fast but gradually it
> slows down taking ~15 seconds between each login and completes within 45min
> to 1hr. We have collected iscsid logs but those doesn't have timestamps.
> what is the better way to debug iscsid logs?. Any other known issues for
> scaled systems?.

I'm not seeing performance quite that bad in my setup, but it's certainly slower than I'd like.  I've been trying to get some profile data today, and both iscsid and iscsiadm are using considerable CPU time calling strcmp.

On each login request there are checks to see if the requested session already exists, which reads in all of the sysfs info for every active session.  That doesn't scale well.  It also be happening in both iscsiadm and iscsid (checking for slightly different things, but using the same code to read in all of the sysfs data and filter with strcmp)

I've hacked out a few sysfs checks, and brought login time on my system for this same setup (4 sessions x 256 targets) down from about 9 minutes to 33 seconds.  Now I need to figure out how to keep the needed checks, but without taking so long.
Comment 17 shivamerla1 2016-02-03 19:00:59 EST
Yes, we too see repeated logs as below trying to match the session with already discovered ones. As the sessions grow, it might be taking long time for this.
It would be great if you can provide the private packages with your fix, we can verify as well. Thanks.

iscsid: mgmt_ipc_write_rsp: rsp to fd 998
iscsid: set queue depth for 1006:0:0:0 to 64
iscsid: reaped pid 28697, reap_count now 1
iscsid: reaped pid 28703, reap_count now 0
iscsid: poll result 1
iscsid: match session [iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60060-v2adb271243103de8.00000173.69cb1afa,172.18.160.88,3260][default tcp,,]:0
iscsid: to [iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60232-v2adb271243103de8.0000021f.69cb1afa,172.18.160.88,3260][default tcp,,]:0
iscsid: match session [iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60060-v2adb271243103de8.00000173.69cb1afa,172.18.160.88,3260][default tcp,,]:0
iscsid: to [iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60232-v2adb271243103de8.0000021f.69cb1afa,172.18.160.88,3260][default tcp,,]:0
iscsid: match session [iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60060-v2adb271243103de8.00000173.69cb1afa,172.18.160.88,3260][default tcp,,]:0
iscsid: to [iqn.2007-11.com.nimblestorage:arr6-iscsi-vol-60219-v2adb271243103de8.00000212.69cb1afa,172.18.160.88,3260][default tcp,,]:0
Comment 18 shivamerla1 2016-02-16 12:18:12 EST
Chris, Can you confirm the fixed version of iscsi-initiator-utils. Also, can same fixes be made for RHEL 7.1 version as well?. Do you want me to raise a seperate bug for that?.
Comment 19 Chris Leech 2016-02-17 18:55:06 EST
(In reply to shivamerla1 from comment #18)
> Chris, Can you confirm the fixed version of iscsi-initiator-utils. Also, can
> same fixes be made for RHEL 7.1 version as well?. Do you want me to raise a
> seperate bug for that?.

I'll clone this bug for RHEL 7, although there's probably something suitable already open.
Comment 21 Martin Hoyer 2016-02-19 07:03:46 EST
Tested with default RHEL-7.2 scsi-initiator-utils-6.2.0.873-14.el6: time to login to many LUNs - 3:30
Same setup updated to scsi-initiator-utils-6.2.0.873-18.el6: time to login - 1:10
Comment 22 shivamerla1 2016-02-22 18:16:25 EST
Can you let us know when will the errata update be released officially ( both 6.x and 7.x)?.
Comment 25 shivamerla1 2016-03-18 17:31:02 EDT
Hi, Any updates on the errata update with this fix?.
Comment 27 errata-xmlrpc 2016-05-10 17:24:52 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0833.html

Note You need to log in before you can comment on or make changes to this bug.