Bug 664651 - iscsi login via iSER of InfiniBand cause kernel panic
Summary: iscsi login via iSER of InfiniBand cause kernel panic
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: iscsi-initiator-utils
Version: 5.5
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Mike Christie
QA Contact: Storage QE
URL:
Whiteboard:
Depends On:
Blocks: 592322
TreeView+ depends on / blocked
 
Reported: 2010-12-21 06:06 UTC by Gris Ge
Modified: 2011-08-23 22:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-08-23 22:01:34 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log file extract from kdump vmcore. (44.00 KB, text/plain)
2011-01-04 04:41 UTC, Gris Ge
no flags Details

Description Gris Ge 2010-12-21 06:06:47 UTC
Description of problem:
These two command will cause kernel panic:
==================================
iscsiadm -m discovery -t st -p 192.0.1.1 -I iser
iscsiadm -m node -T  iqn.2010-10.com.example:storage-1000 -l
==================================
Unable to handle kernel paging request at 00002b7edc527310 RIP:
 [<ffffffff88260077>] :ib_ipath:ipath_sg_dma_address+0x5/0x66
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /block/sdb/removable
CPU 2
Modules linked in: ib_iser ib_srp rds ib_sdp ib_ipoib rdma_ucm rdma_cm ib_ucm ib_uverbs ib_umad ib_cm iw_cm ib_addr ib_sa ib_mad iw_cxgb3 ib_ipath ib_core be2iscsi iscsi_tcp bnx2i cnic uio cxgb3i cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi ipoib_helper autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc ipv6 xfrm_nalgo crypto_api loop dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ide_cd bnx2 i5000_edac edac_mc sg cdrom serio_raw pcspkr dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 5124, comm: iscsi_q_3 Not tainted 2.6.18-194.el5 #1
RIP: 0010:[<ffffffff88260077>]  [<ffffffff88260077>] :ib_ipath:ipath_sg_dma_address+0x5/0x66
RSP: 0018:ffff81006064fc88  EFLAGS: 00010246
RAX: ffffffff882aa5e0 RBX: ffff81005d3b5000 RCX: ffff81005d3b4e00
RDX: 00002b7edc527310 RSI: ffff81007a50a000 RDI: 0000000000000000
RBP: ffff81005d3b4e00 R08: ffff810001000058 R09: 0000000000000020
R10: ffff81006f0928d0 R11: 0000000000000050 R12: ffff81006f092a70
R13: 000000000000001f R14: ffff81007a5b0000 R15: ffff81007a50a000
FS:  0000000000000000(0000) GS:ffff81007ff4ce40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b7edc527310 CR3: 0000000000201000 CR4: 00000000000006e0
Process iscsi_q_3 (pid: 5124, threadinfo ffff81006064e000, task ffff810077193080)
Stack:  ffffffff8862d0fc 0000000000000003 ffff81006f092880 0000000000000000
 ffff81005c7e2290 ffff81005d6f3f10 ffff81005bccce40 ffff81006f0929d0
 0000000278525e00 ffff81006064fe30 000000020000004c ffff810002388740
Call Trace:
 [<ffffffff8862d0fc>] :ib_iser:iser_reg_rdma_mem+0x105/0x7b4
 [<ffffffff8868e262>] :libiscsi2:iscsi_xmitworker+0x0/0x2a8
 [<ffffffff8862c927>] :ib_iser:iser_send_command+0x157/0x397
 [<ffffffff8868e262>] :libiscsi2:iscsi_xmitworker+0x0/0x2a8
 [<ffffffff8862dafd>] :ib_iser:iscsi_iser_task_xmit+0xd6/0x1ac
 [<ffffffff8868e167>] :libiscsi2:iscsi_prep_scsi_cmd_pdu+0x416/0x511
 [<ffffffff80063ff8>] thread_return+0x62/0xfe
 [<ffffffff8868d756>] :libiscsi2:iscsi_xmit_task+0x36/0x69
 [<ffffffff8868e3e5>] :libiscsi2:iscsi_xmitworker+0x183/0x2a8
 [<ffffffff8004dc37>] run_workqueue+0x94/0xe4
 [<ffffffff8004a472>] worker_thread+0x0/0x122
 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8004a562>] worker_thread+0xf0/0x122
 [<ffffffff8008e16d>] default_wake_function+0x0/0xe
 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032bdc>] kthread+0xfe/0x132
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff800a198c>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032ade>] kthread+0x0/0x132
 [<ffffffff8005efa7>] child_rip+0x0/0x11


Code: 48 8b 0a 48 c1 e9 33 48 89 c8 48 c1 e8 09 48 8b 04 c5 80 43
RIP  [<ffffffff88260077>] :ib_ipath:ipath_sg_dma_address+0x5/0x66
 RSP <ffff81006064fc88>
CR2: 00002b7edc527310
 <0>Kernel panic - not syncing: Fatal exception
==================================

Version-Release number of selected component (if applicable):
RHEL 5.6 Beta 1
kernel-2.6.18-229.el5
scsi-target-utils-1.0.8-0.el5
iscsi-initiator-utils-6.2.0.872-4.el5
openib-1.4.1-5.el5

How reproducible:
100%

Steps to Reproduce:
1. Create IPoIB
2. Create iscsi target using scsi-target-utils:
================================
/etc/init.d/tgtd start
dd if=/dev/zero of=/tmp/lun1 count=1 bs=1MB seek=2048
dd if=/dev/zero of=/tmp/lun2 count=1 bs=1MB seek=2048
tgtadm --lld iscsi --mode target --op new --tid 1 --targetname iqn.2010-10.com.example:storage-1000
tgtadm --lld iscsi --mode target --op bind --tid 1 --initiator-address ALL
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1 --backing-store /tmp/lun1
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 2 --backing-store /tmp/lun2
================================
3. Perform iscsi login via iSER:
iscsiadm -m discovery -t st -p 192.0.1.1 -I iser
#192.0.1.1 is the IP address of target IB
iscsiadm -m node -T  iqn.2010-10.com.example:storage-1000 -l

  
Actual results:
Kernel Panic and system stuck

Expected results:
iSCSI login successful

Additional info:
Please feel free to reach me via email or irc if you need any more informaiton.

Comment 1 Gris Ge 2010-12-21 06:15:54 UTC
Additional infor:

The iscsi login was sucessfully. Please check these log which is just reported before the kernel panic:

==============================
iser: iser_connect:connecting to: ffff81005b1ee04cI4, port 0xbc0c
iser: iser_cma_handler:event 0 conn ffff81005d7a7910 id ffff81005d3b5c00
iser: iser_cma_handler:event 2 conn ffff81005d7a7910 id ffff81005d3b5c00
iser: iser_create_ib_conn_res:setting conn ffff81005d7a7910 cma_id ffff81005d3b5c00: fmr_pool ffff81005bccccc0 qp ffff81007bf48800
iser: iser_cma_handler:event 8 conn ffff81005d7a7910 id ffff81005d3b5c00
iser: iser_cma_handler:event: 8, error: 10
iser: iscsi_iser_ep_poll:ib conn ffff81005d7a7910 rc = -1
iser: iscsi_iser_ep_disconnect:ib conn ffff81005d7a7910 state 4
iser: iser_free_ib_conn_res:freeing conn ffff81005d7a7910 cma_id ffff81005d3b5c00 fmr pool ffff81005bccccc0 qp ffff81007bf48800
iser: iser_device_try_release:device ffff81005bcccec0 refcount 0
iser: iser_connect:connecting to: ffff81005d7a784cI4, port 0xbc0c
iser: iser_cma_handler:event 0 conn ffff81005d6f3f10 id ffff81005d3b5c00
iser: iser_cma_handler:event 2 conn ffff81005d6f3f10 id ffff81005d3b5c00
iser: iser_create_ib_conn_res:setting conn ffff81005d6f3f10 cma_id ffff81005d3b5c00: fmr_pool ffff81005bcccec0 qp ffff81007a698800
iser: iser_cma_handler:event 9 conn ffff81005d6f3f10 id ffff81005d3b5c00
iser: iscsi_iser_ep_poll:ib conn ffff81005d6f3f10 rc = 1
iser: iscsi_iser_conn_bind:binding iscsi conn ffff81005c7e2290 to iser_conn ffff81005d6f3f10
  Vendor: IET       Model: Controller        Rev: 0001
  Type:   RAID                               ANSI SCSI revision: 05
scsi 3:0:0:0: Attached scsi generic sg2 type 12
  Vendor: IET       Model: VIRTUAL-DISK      Rev: 0001
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 4001953 512-byte hdwr sectors (2049 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
SCSI device sdb: 4001953 512-byte hdwr sectors (2049 MB)
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
sd 3:0:0:1: Attached scsi disk sdb
sd 3:0:0:1: Attached scsi generic sg3 type 0
  Vendor: IET       Model: VIRTUAL-DISK      Rev: 0001
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdc: 4001953 512-byte hdwr sectors (2049 MB)
sdc: Write Protect is off
SCSI device sdc: drive cache: write back
SCSI device sdc: 4001953 512-byte hdwr sectors (2049 MB)
sdc: Write Protect is off
SCSI device sdc: drive cache: write back
sd 3:0:0:2: Attached scsi disk sdc
sd 3:0:0:2: Attached scsi generic sg4 type 0
Unable to handle kernel paging request at 00002b7edc527310 RIP:
 [<ffffffff88260077>] :ib_ipath:ipath_sg_dma_address+0x5/0x66
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /block/sdb/removabl
===========================

Comment 2 Gris Ge 2010-12-22 09:14:14 UTC
Got same kernel panic on RHEL 5.5 kernel-2.6.18-194.el5.

Comment 3 Or Gerlitz 2010-12-28 20:29:25 UTC
You're hitting some problem in ipath_sg_dma_address which means the underlying Hw is qlogic card and the driver in use in ipath, I don't have such tesbed, as the cards I'm using are all Mellanox ones... would be great if you run with both iser and libiscsi2 debug prints open, and attach the output before the crash, for ib_iser set debug_level=2 and for libiscsi2 set debug_libiscsi=1

Comment 4 Or Gerlitz 2010-12-28 20:58:22 UTC
one more thing, are you running 32bits or 64bits kernel? if 32bits, is PAE or alike active?

Comment 5 Gris Ge 2010-12-29 01:57:26 UTC
Or Genlitz,
It's 64 bits. I don't own that server. I will request Gurhan's help.

Gurhan Ozen,
Can you provide Or Gerlitz the information above?
If you don't time to do so, please loan the servers to me.

Thank you.

Comment 6 Jeff Burke 2011-01-03 19:05:20 UTC
Gris,
 I spoke with Gurhan this afternoon. He had said you had exchanged emails and he had given you the names of the systems you could use. ib-test#..... They were all configured and ready for you. Were you able to get the information you needed?

Thank you,
Jeff

Comment 8 Gris Ge 2011-01-04 03:37:33 UTC
RHEL 5.5 GA kernel 2.6.18-194.el5 x86_64
iscsi-initiator-utils-6.2.0.872-4.el5

System crashed after executing:
[root@dell-pe1950-03 ~]# iscsiadm -m node -T  iqn.2010-10.com.example:storage-1000 -l
Logging in to [iface: iser, target: iqn.2010-10.com.example:storage-1000, portal: 192.0.1.1,3260]
Login to [iface: iser, target: iqn.2010-10.com.example:storage-1000, portal: 192.0.1.1,3260] successful.


These are the log (/var/log/message) before crash:
==========================================
Jan  3 21:25:24 dell-pe1950-03 kernel: 802.1Q VLAN Support v1.8 Ben Greear <greearb>
Jan  3 21:25:24 dell-pe1950-03 kernel: All bugs added by David S. Miller <davem>
Jan  3 21:25:24 dell-pe1950-03 kernel: cxgb3i: tag itt 0x1fff, 13 bits, age 0xf, 4 bits.
Jan  3 21:25:24 dell-pe1950-03 kernel: iscsi: registered transport (cxgb3i)
Jan  3 21:25:24 dell-pe1950-03 kernel: Broadcom NetXtreme II CNIC Driver cnic v2.1.0 (Oct 10, 2009)
Jan  3 21:25:24 dell-pe1950-03 kernel: cnic: Added CNIC device: eth0
Jan  3 21:25:24 dell-pe1950-03 kernel: cnic: Added CNIC device: eth1
Jan  3 21:25:24 dell-pe1950-03 kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.1.0 (Dec 06, 2009)
Jan  3 21:25:24 dell-pe1950-03 kernel: iscsi: registered transport (bnx2i)
Jan  3 21:25:24 dell-pe1950-03 kernel: scsi1 : Broadcom Offload iSCSI Initiator
Jan  3 21:25:24 dell-pe1950-03 kernel: scsi2 : Broadcom Offload iSCSI Initiator
Jan  3 21:25:26 dell-pe1950-03 kernel: iscsi: registered transport (tcp)
Jan  3 21:25:26 dell-pe1950-03 kernel: iscsi: registered transport (be2iscsi)
Jan  3 21:25:26 dell-pe1950-03 iscsid: iSCSI logger with pid=6145 started!
Jan  3 21:25:27 dell-pe1950-03 iscsid: transport class version 2.0-871. iscsid version 2.0-872
Jan  3 21:25:27 dell-pe1950-03 iscsid: iSCSI daemon with pid=6146 started!
Jan  3 21:25:54 dell-pe1950-03 kernel: iser: iser_connect:connecting to: ffff810058f7be4cI4, port 0xbc0c
Jan  3 21:25:54 dell-pe1950-03 kernel: iser: iser_cma_handler:event 0 conn ffff810058f7b910 id ffff810057acf800
Jan  3 21:25:54 dell-pe1950-03 kernel: iser: iser_cma_handler:event 2 conn ffff810058f7b910 id ffff810057acf800
Jan  3 21:25:54 dell-pe1950-03 kernel: iser: iser_create_ib_conn_res:setting conn ffff810058f7b910 cma_id ffff810057acf800: fmr_pool ffff81005ae5f740 qp ffff810057acfc00
Jan  3 21:25:54 dell-pe1950-03 kernel: iser: iser_cma_handler:event 9 conn ffff810058f7b910 id ffff810057acf800
Jan  3 21:25:54 dell-pe1950-03 kernel: iser: iscsi_iser_ep_poll:ib conn ffff810058f7b910 rc = 1
Jan  3 21:25:54 dell-pe1950-03 kernel: scsi3 : iSCSI Initiator over iSER, v.0.1
Jan  3 21:25:54 dell-pe1950-03 kernel: iser: iscsi_iser_conn_bind:binding iscsi conn ffff81005a0a0a90 to iser_conn ffff810058f7b910
Jan  3 21:25:54 dell-pe1950-03 iscsid: Could not set session1 priority. READ/WRITE throughout and latency could be affected.
Jan  3 21:25:54 dell-pe1950-03 setroubleshoot: SELinux is preventing iscsid (iscsid_t) "sys_ptrace" to <Unknown> (iscsid_t). For complete SELinux messages. run sealert -l a20587a6-8048-4dc8-961b-cdc048431b21
=====================================================

I have also enabled the kdump.
The core dump file was exceed the maximum size of attachment(20MB+).
I don't have redhat people page for share with you.
If you need the vmcore deadly, I can split it into pieces. 
Let me know if you need any info.

Comment 9 Gris Ge 2011-01-04 04:41:08 UTC
Created attachment 471592 [details]
log file extract from kdump vmcore.

Or,
This log file might be what you want.
I just extract it from vmcore.

Comment 10 Or Gerlitz 2011-01-05 16:25:10 UTC
(In reply to comment #9)
> Or, This log file might be what you want.

yep, looking into that, any chance you can test also over Mellanox HCA on this server, I don't have the Qlogic/ipath HCA here.

Comment 11 Gris Ge 2011-01-06 03:03:48 UTC
Or,
I only got 1 mlx4 server currently.

This server act as both iscsi target and iscsi initiator.

[root@ib-test1 ~]# iscsiadm -m session
iser: [1] 192.0.0.1:3260,1 iqn.2010-10.com.example:storage-1000

It works well both on RHEL 5.5 GA and RHEL 5.6 RC1

I will tested it again if I got another mlx4 server.

Comment 12 Mike Christie 2011-02-14 19:16:07 UTC
I spoke to the ipath maintainer, and I guess iser is not supported on the driver. It has never been run/tested before.

Should we just close this bugzilla? I am not too interested in supporting this setup if Qlogic does not and there is no customer demand - sorry Gris you do not count as customer demand :)

However, if this is bug and can show up in other drivers we can use this bz to bring in a fix into rhel.

Comment 13 Gris Ge 2011-02-15 05:32:50 UTC
I am OK for closing this bug.
If GSS need a tech-note, we can reopen this bug.

Comment 14 Or Gerlitz 2011-02-15 07:44:22 UTC
(In reply to comment #12)
> I spoke to the ipath maintainer, and I guess iser is not supported on the
> driver. It has never been run/tested before. Should we just close this 
> bugzilla? 

Mike, I'm not sure where the problem lies, it could be a bug in iser exposed only/easily over ipath, or bug in ipath exposed by iser, or some combination.
Basically, I'm suspecting something is broken w.r.t to the dma mapping emulation done by the ipath/qib drivers (these drivers assume that each page provided for dma emulation is mapped to the kernel virtual address space, but aren't supported on 32bits) again maybe in iser but not surely. I wouldn't throw this away, maybe open a kernel.org ticket, and cc myself and the ipath maintainer?

Comment 15 Mike Christie 2011-02-15 19:07:06 UTC
(In reply to comment #14)
> throw this away, maybe open a kernel.org ticket, and cc myself and the ipath
> maintainer?

Sounds good to me. Will do. Thanks.

Comment 16 RHEL Program Management 2011-06-20 22:24:46 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.7 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

Comment 19 Mike Christie 2011-08-23 22:01:17 UTC
I am going to close this for now.

If we find the fix or get an actual customer wanting to use ipath+iser or if Qlogic decides they want to support iser on their hw then we can reopen it.


Note You need to log in before you can comment on or make changes to this bug.