Bug 243222

Summary: Kernel panic when logging in to OpenSolaris iSCSI target using iscsi-initiator-utils
Product: Red Hat Enterprise Linux 5 Reporter: Michael Acosta <mike.acosta>
Component: iscsi-initiator-utilsAssignee: Mike Christie <mchristi>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 5.0   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-27 09:22:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
iscsi update none

Description Michael Acosta 2007-06-07 23:59:05 UTC
Description of problem:
Kernel panic when logging in to an iSCSI target provided by OpenSolaris

Version-Release number of selected component (if applicable):
* iscsi-initiator-utils-6.2.0.742-0.5.el5 
* kernel-2.6.18-8.1.4.el5
On AMD x86_64 hardware with 2G of physical mem

How reproducible:
Every time

Steps to Reproduce:
1. Create target on Sun host:
 sun# iscsitadm create target -b /dev/zvol/dsk/pool0/iscsi/target0 target0
 sun# iscsitadm list target
 Target: target0
     iSCSI Name: iqn.1986-03.com.sun:02:607b69d6-fd3d-60b3-95d6-e629f3365280.target0
     Connections: 0

2. Discover LUN on Linux client:
 rhel5# iscsiadm -m discovery -p 10.1.0.140:3260

3. Attempt to log in to target:
  rhel5# iscsiadm -m node -T
iqn.1986-03.com.sun:02:607b69d6-fd3d-60b3-95d6-e629f3365280.target0 -p
10.1.0.140:3260 --login
  
Actual results:
Panic / reboot. This persists until I boot to single user, start iscsid, and '-o
delete' the node and discovery records.

Expected results:
Successful log in to target, or error message if a problem is found

Additional info:
I have only seen this behavior with the OpenSolaris iSCSI target. I used
OpenFiler and ietd both without issue previously.

Comment 1 Michael Acosta 2007-06-08 04:00:47 UTC
FYI - I got a crash dump out of the box this afternoon, if anyone is interested.
I packaged the vmcore and a copy of the uncompressed debug kernel bzip'd up -
since I have 2G of mem, it's a bit on the large size, but compressed it's under
200M. If you're interested, let me know privately, and I'll put it up for download.

Comment 2 Mike Christie 2007-06-08 04:08:44 UTC
Do you have the panic? If you crash is occruing right when you try to login it
may be this bug
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=235640

Comment 3 Michael Acosta 2007-06-08 04:40:37 UTC
It very well may be. A panic string never got written to messages that I've
found,  so I can't compare the exact panic. Here's what I get from the crash
backtrace:

      KERNEL: vmlinux                           
    DUMPFILE: vmcore
        CPUS: 1
        DATE: Thu Jun  7 17:51:18 2007
      UPTIME: 00:32:28
LOAD AVERAGE: 0.04, 0.12, 0.10
       TASKS: 145
    NODENAME: <uname -n>
     RELEASE: 2.6.18-8.1.4.el5
     VERSION: #1 SMP Thu May 17 03:16:52 EDT 2007
     MACHINE: x86_64  (997 Mhz)
      MEMORY: 2 GB
       PANIC: ""
         PID: 473
     COMMAND: "udevd"
        TASK: ffff81007f8f37e0  [THREAD_INFO: ffff810037cdc000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 473    TASK: ffff81007f8f37e0  CPU: 0   COMMAND: "udevd"
 #0 [ffffffff80402830] crash_kexec at ffffffff800a95f2
 #1 [ffffffff804028b8] iscsi_tcp_data_recv at ffffffff88beac3a
 #2 [ffffffff804028f0] __die at ffffffff80062e9d
 #3 [ffffffff80402930] die at ffffffff80069459
 #4 [ffffffff80402960] do_invalid_op at ffffffff80069a0f
 #5 [ffffffff80402978] iscsi_tcp_data_recv at ffffffff88beac3a
 #6 [ffffffff80402988] __wake_up at ffffffff8002dd9b
 #7 [ffffffff804029c8] sock_def_readable at ffffffff800127fd
 #8 [ffffffff804029d8] netlink_sendskb at ffffffff8021bcc3
 #9 [ffffffff80402a08] __iscsi_complete_pdu at ffffffff88bd99bf
#10 [ffffffff80402a20] error_exit at ffffffff8005be1d
    [exception RIP: iscsi_tcp_data_recv+4808]
    RIP: ffffffff88beac3a  RSP: ffffffff80402ad0  RFLAGS: 00010202
    RAX: 0000000000000164  RBX: ffff810072f31a90  RCX: ffff81006fef0080
    RDX: ffff81006fef0080  RSI: 0000000000000286  RDI: ffff81006fef0080
    RBP: ffff81006fef0080   R8: ffff81007ae5e0b8   R9: ffff810071ecf500
    R10: ffff81007a9dd880  R11: ffff810071ecf500  R12: 0000000000000023
    R13: 0000000000000162  R14: 00000000d49a7d66  R15: ffff81005eb6c1d8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffffffff80402b28] __qdisc_run at ffffffff80216d97
#12 [ffffffff80402b78] ip_output at ffffffff800316cc
#13 [ffffffff80402ba8] ip_queue_xmit at ffffffff80033d84
#14 [ffffffff80402c08] tcp_read_sock at ffffffff80035506
#15 [ffffffff80402c58] iscsi_tcp_data_ready at ffffffff88beb06a
#16 [ffffffff80402c98] tcp_rcv_established at ffffffff8001b344
#17 [ffffffff80402ce8] tcp_v4_do_rcv at ffffffff8003ab13
#18 [ffffffff80402d08] ip_confirm at ffffffff88ce0135
#19 [ffffffff80402d48] tcp_v4_rcv at ffffffff80026d39
#20 [ffffffff80402d68] nf_hook_slow at ffffffff80054866
#21 [ffffffff80402dd8] ip_local_deliver at ffffffff80033f76
#22 [ffffffff80402e08] ip_rcv at ffffffff8003504e
#23 [ffffffff80402e38] netif_receive_skb at ffffffff8001fdad
#24 [ffffffff80402e78] tg3_poll at ffffffff88210948
#25 [ffffffff80402ef8] net_rx_action at ffffffff8000c39b
#26 [ffffffff80402f00] tg3_interrupt at ffffffff882092f1
#27 [ffffffff80402f38] __do_softirq at ffffffff80011c19
#28 [ffffffff80402f40] end_level_ioapic_vector at ffffffff80075658
#29 [ffffffff80402f68] call_softirq at ffffffff8005c330
#30 [ffffffff80402f80] do_softirq at ffffffff8006a312
#31 [ffffffff80402f90] do_IRQ at ffffffff8006a19a

Please let me know if you need more information, or if this is a dup.

I'd be very interested in the patch as well. I would like to move forward with
my work, even if it's not officially supported.

Thanks for your time.

Comment 4 Michael Acosta 2007-06-12 06:54:16 UTC
Would it be helpful to get access to the machine(s) in question? I can arrange for an interactive session, just shoot me a message. I have no idea if this is already 
known, although if it is, I'd really like to see the previously-mentioned patch.

Thank you.

Comment 5 Mike Christie 2007-06-12 16:41:30 UTC
Created attachment 156808 [details]
iscsi update

Sorry about the delay for the patch. I got some review comments on it and had
to respin it and so I am still testing this, but it is stable enough for you to
test and see if it fixes your problem, but not stable enough for production
use.

Also one other question. Are you using data digests?

Comment 6 Michael Acosta 2007-06-13 02:09:36 UTC
I am not explicitly using data digests (i.e. the defaults are in place in the
/etc/iscsi/iscsid.conf).

I don't know how to gather this information in OpenSolaris - it's not exposed on
the target-side (iscsitadm(8)) as an option to specify.

I'll plug the attachment in tomorrow and start playing. I understand it is not
production-ready code, but will help very much in my testing.

Comment 7 Michael Acosta 2007-06-13 06:26:48 UTC
Hmm - no dice for me:

--
[root@macosta-crash linux]# make modules >/tmp/build.err 2>&1
[root@macosta-crash linux]# cat /tmp/build.err 
  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CC [M]  drivers/scsi/iscsi_tcp.o
drivers/scsi/iscsi_tcp.c: In function â:
drivers/scsi/iscsi_tcp.c:1965: error: â undeclared (first use in this function)
drivers/scsi/iscsi_tcp.c:1965: error: (Each undeclared identifier is reported
only once
drivers/scsi/iscsi_tcp.c:1965: error: for each function it appears in.)
drivers/scsi/iscsi_tcp.c:1970: error: â undeclared (first use in this function)
make[2]: *** [drivers/scsi/iscsi_tcp.o] Error 1
make[1]: *** [drivers/scsi] Error 2
make: *** [drivers] Error 2
[root@macosta-crash linux]# sed -n '1965p;1970p' drivers/scsi/iscsi_tcp.c
                                 &conn->portal_port, kernel_getpeername);
                                &conn->local_port, kernel_getsockname);
[root@macosta-crash linux]#
--

I do have some "extra" patches in, such as the NVidia layer and FUSE. Other than
that, I believe my kernel source is the stock kernel-2.6.18-8.1.4.el5 SRPM
install, linked to /usr/src/linux.


Comment 8 Mike Christie 2007-06-13 16:12:39 UTC
Sorry about that. You need to build with our experimentatl kernel. Here
http://people.redhat.com/dzickus/el5/
is our kernel maintainer's snapshot of the current development RHEL5 kernel. Of
course this is going to be very unstable, but it gives you a look at what we are
working on.

You need his newest kernel snapshot here
http://people.redhat.com/dzickus/el5/23.el5/
The source is here
http://people.redhat.com/dzickus/el5/23.el5/src/kernel-2.6.18-23.el5.src.rpm

Comment 9 Michael Acosta 2007-07-09 18:51:10 UTC
I've just upgraded to the 2.6.18-32.el5 kernel that includes the
"linux-2.6-scsi-update-iscsi_tcp-driver.patch" patch that appears to match this
issue. Since then, logging in to an OpenSolaris iSCSI target no longer panics
the system, but instead of being able to access the disk, I see this in
/var/log/messages:
--
Jul  9 10:38:58 macosta-crash kernel: scsi1 : iSCSI Initiator over TCP/IP
Jul  9 10:38:59 macosta-crash iscsid: connection1:0 is operational now
Jul  9 10:39:00 macosta-crash kernel: iscsi: Got CHECK_CONDITION but invalid
data buffer size of 0
--

Is what Solaris presents forbidden by the spec, or should this work?

Comment 10 Mike Christie 2007-07-09 20:23:09 UTC
(In reply to comment #9)
> I've just upgraded to the 2.6.18-32.el5 kernel that includes the
> "linux-2.6-scsi-update-iscsi_tcp-driver.patch" patch that appears to match this
> issue. Since then, logging in to an OpenSolaris iSCSI target no longer panics
> the system, but instead of being able to access the disk, I see this in
> /var/log/messages:
> --
> Jul  9 10:38:58 macosta-crash kernel: scsi1 : iSCSI Initiator over TCP/IP
> Jul  9 10:38:59 macosta-crash iscsid: connection1:0 is operational now
> Jul  9 10:39:00 macosta-crash kernel: iscsi: Got CHECK_CONDITION but invalid
> data buffer size of 0
> --
> 
> Is what Solaris presents forbidden by the spec, or should this work?

Returning a check condition, but not returning sense is not allowed in the SCSI
or iSCSI specs.

Is this the current open solaris target or a older version?

Comment 11 Michael Acosta 2007-07-09 21:24:17 UTC
(In reply to comment #10)
> Is this the current open solaris target or a older version?

It's *relatively* new:

rmike ~ # cat /etc/release 
                            Solaris Nevada snv_64a X86
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                              Assembled 18 May 2007
rmike ~ # uname -a
SunOS macosta-crash-iscsi 5.11 snv_64a i86pc i386 i86pc



Comment 12 Michael Acosta 2007-08-07 05:55:03 UTC
Is there any new progress here? I'm willing to set up a custom kernel, and
provide remote access to get this working. Right now I have an idle OpenSolaris
box waiting for when RHEL5 can mount it (the MS iSCSI initiator doesn't appear
to have any issues in this space.)

Comment 13 Mike Christie 2007-08-07 17:36:25 UTC
Send me mail mchristi with log in details. I think this problem of
getting a check condition with no sense is fixed on the opensolaris iscsi target
side in some version though.

Comment 14 Mike Christie 2012-06-27 09:22:06 UTC
This should be fixed in recent open solaris targets now. Closing. If it still occurs reopen.