Bug 243222
Summary: | Kernel panic when logging in to OpenSolaris iSCSI target using iscsi-initiator-utils | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Michael Acosta <mike.acosta> | ||||
Component: | iscsi-initiator-utils | Assignee: | Mike Christie <mchristi> | ||||
Status: | CLOSED NOTABUG | QA Contact: | |||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 5.0 | ||||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-27 09:22:06 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Michael Acosta
2007-06-07 23:59:05 UTC
FYI - I got a crash dump out of the box this afternoon, if anyone is interested. I packaged the vmcore and a copy of the uncompressed debug kernel bzip'd up - since I have 2G of mem, it's a bit on the large size, but compressed it's under 200M. If you're interested, let me know privately, and I'll put it up for download. Do you have the panic? If you crash is occruing right when you try to login it may be this bug https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=235640 It very well may be. A panic string never got written to messages that I've found, so I can't compare the exact panic. Here's what I get from the crash backtrace: KERNEL: vmlinux DUMPFILE: vmcore CPUS: 1 DATE: Thu Jun 7 17:51:18 2007 UPTIME: 00:32:28 LOAD AVERAGE: 0.04, 0.12, 0.10 TASKS: 145 NODENAME: <uname -n> RELEASE: 2.6.18-8.1.4.el5 VERSION: #1 SMP Thu May 17 03:16:52 EDT 2007 MACHINE: x86_64 (997 Mhz) MEMORY: 2 GB PANIC: "" PID: 473 COMMAND: "udevd" TASK: ffff81007f8f37e0 [THREAD_INFO: ffff810037cdc000] CPU: 0 STATE: TASK_RUNNING (PANIC) crash> bt PID: 473 TASK: ffff81007f8f37e0 CPU: 0 COMMAND: "udevd" #0 [ffffffff80402830] crash_kexec at ffffffff800a95f2 #1 [ffffffff804028b8] iscsi_tcp_data_recv at ffffffff88beac3a #2 [ffffffff804028f0] __die at ffffffff80062e9d #3 [ffffffff80402930] die at ffffffff80069459 #4 [ffffffff80402960] do_invalid_op at ffffffff80069a0f #5 [ffffffff80402978] iscsi_tcp_data_recv at ffffffff88beac3a #6 [ffffffff80402988] __wake_up at ffffffff8002dd9b #7 [ffffffff804029c8] sock_def_readable at ffffffff800127fd #8 [ffffffff804029d8] netlink_sendskb at ffffffff8021bcc3 #9 [ffffffff80402a08] __iscsi_complete_pdu at ffffffff88bd99bf #10 [ffffffff80402a20] error_exit at ffffffff8005be1d [exception RIP: iscsi_tcp_data_recv+4808] RIP: ffffffff88beac3a RSP: ffffffff80402ad0 RFLAGS: 00010202 RAX: 0000000000000164 RBX: ffff810072f31a90 RCX: ffff81006fef0080 RDX: ffff81006fef0080 RSI: 0000000000000286 RDI: ffff81006fef0080 RBP: ffff81006fef0080 R8: ffff81007ae5e0b8 R9: ffff810071ecf500 R10: ffff81007a9dd880 R11: ffff810071ecf500 R12: 0000000000000023 R13: 0000000000000162 R14: 00000000d49a7d66 R15: ffff81005eb6c1d8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #11 [ffffffff80402b28] __qdisc_run at ffffffff80216d97 #12 [ffffffff80402b78] ip_output at ffffffff800316cc #13 [ffffffff80402ba8] ip_queue_xmit at ffffffff80033d84 #14 [ffffffff80402c08] tcp_read_sock at ffffffff80035506 #15 [ffffffff80402c58] iscsi_tcp_data_ready at ffffffff88beb06a #16 [ffffffff80402c98] tcp_rcv_established at ffffffff8001b344 #17 [ffffffff80402ce8] tcp_v4_do_rcv at ffffffff8003ab13 #18 [ffffffff80402d08] ip_confirm at ffffffff88ce0135 #19 [ffffffff80402d48] tcp_v4_rcv at ffffffff80026d39 #20 [ffffffff80402d68] nf_hook_slow at ffffffff80054866 #21 [ffffffff80402dd8] ip_local_deliver at ffffffff80033f76 #22 [ffffffff80402e08] ip_rcv at ffffffff8003504e #23 [ffffffff80402e38] netif_receive_skb at ffffffff8001fdad #24 [ffffffff80402e78] tg3_poll at ffffffff88210948 #25 [ffffffff80402ef8] net_rx_action at ffffffff8000c39b #26 [ffffffff80402f00] tg3_interrupt at ffffffff882092f1 #27 [ffffffff80402f38] __do_softirq at ffffffff80011c19 #28 [ffffffff80402f40] end_level_ioapic_vector at ffffffff80075658 #29 [ffffffff80402f68] call_softirq at ffffffff8005c330 #30 [ffffffff80402f80] do_softirq at ffffffff8006a312 #31 [ffffffff80402f90] do_IRQ at ffffffff8006a19a Please let me know if you need more information, or if this is a dup. I'd be very interested in the patch as well. I would like to move forward with my work, even if it's not officially supported. Thanks for your time. Would it be helpful to get access to the machine(s) in question? I can arrange for an interactive session, just shoot me a message. I have no idea if this is already known, although if it is, I'd really like to see the previously-mentioned patch. Thank you. Created attachment 156808 [details]
iscsi update
Sorry about the delay for the patch. I got some review comments on it and had
to respin it and so I am still testing this, but it is stable enough for you to
test and see if it fixes your problem, but not stable enough for production
use.
Also one other question. Are you using data digests?
I am not explicitly using data digests (i.e. the defaults are in place in the /etc/iscsi/iscsid.conf). I don't know how to gather this information in OpenSolaris - it's not exposed on the target-side (iscsitadm(8)) as an option to specify. I'll plug the attachment in tomorrow and start playing. I understand it is not production-ready code, but will help very much in my testing. Hmm - no dice for me: -- [root@macosta-crash linux]# make modules >/tmp/build.err 2>&1 [root@macosta-crash linux]# cat /tmp/build.err CHK include/linux/version.h CHK include/linux/utsrelease.h CC [M] drivers/scsi/iscsi_tcp.o drivers/scsi/iscsi_tcp.c: In function â: drivers/scsi/iscsi_tcp.c:1965: error: â undeclared (first use in this function) drivers/scsi/iscsi_tcp.c:1965: error: (Each undeclared identifier is reported only once drivers/scsi/iscsi_tcp.c:1965: error: for each function it appears in.) drivers/scsi/iscsi_tcp.c:1970: error: â undeclared (first use in this function) make[2]: *** [drivers/scsi/iscsi_tcp.o] Error 1 make[1]: *** [drivers/scsi] Error 2 make: *** [drivers] Error 2 [root@macosta-crash linux]# sed -n '1965p;1970p' drivers/scsi/iscsi_tcp.c &conn->portal_port, kernel_getpeername); &conn->local_port, kernel_getsockname); [root@macosta-crash linux]# -- I do have some "extra" patches in, such as the NVidia layer and FUSE. Other than that, I believe my kernel source is the stock kernel-2.6.18-8.1.4.el5 SRPM install, linked to /usr/src/linux. Sorry about that. You need to build with our experimentatl kernel. Here http://people.redhat.com/dzickus/el5/ is our kernel maintainer's snapshot of the current development RHEL5 kernel. Of course this is going to be very unstable, but it gives you a look at what we are working on. You need his newest kernel snapshot here http://people.redhat.com/dzickus/el5/23.el5/ The source is here http://people.redhat.com/dzickus/el5/23.el5/src/kernel-2.6.18-23.el5.src.rpm I've just upgraded to the 2.6.18-32.el5 kernel that includes the "linux-2.6-scsi-update-iscsi_tcp-driver.patch" patch that appears to match this issue. Since then, logging in to an OpenSolaris iSCSI target no longer panics the system, but instead of being able to access the disk, I see this in /var/log/messages: -- Jul 9 10:38:58 macosta-crash kernel: scsi1 : iSCSI Initiator over TCP/IP Jul 9 10:38:59 macosta-crash iscsid: connection1:0 is operational now Jul 9 10:39:00 macosta-crash kernel: iscsi: Got CHECK_CONDITION but invalid data buffer size of 0 -- Is what Solaris presents forbidden by the spec, or should this work? (In reply to comment #9) > I've just upgraded to the 2.6.18-32.el5 kernel that includes the > "linux-2.6-scsi-update-iscsi_tcp-driver.patch" patch that appears to match this > issue. Since then, logging in to an OpenSolaris iSCSI target no longer panics > the system, but instead of being able to access the disk, I see this in > /var/log/messages: > -- > Jul 9 10:38:58 macosta-crash kernel: scsi1 : iSCSI Initiator over TCP/IP > Jul 9 10:38:59 macosta-crash iscsid: connection1:0 is operational now > Jul 9 10:39:00 macosta-crash kernel: iscsi: Got CHECK_CONDITION but invalid > data buffer size of 0 > -- > > Is what Solaris presents forbidden by the spec, or should this work? Returning a check condition, but not returning sense is not allowed in the SCSI or iSCSI specs. Is this the current open solaris target or a older version? (In reply to comment #10) > Is this the current open solaris target or a older version? It's *relatively* new: rmike ~ # cat /etc/release Solaris Nevada snv_64a X86 Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 18 May 2007 rmike ~ # uname -a SunOS macosta-crash-iscsi 5.11 snv_64a i86pc i386 i86pc Is there any new progress here? I'm willing to set up a custom kernel, and provide remote access to get this working. Right now I have an idle OpenSolaris box waiting for when RHEL5 can mount it (the MS iSCSI initiator doesn't appear to have any issues in this space.) Send me mail mchristi with log in details. I think this problem of getting a check condition with no sense is fixed on the opensolaris iscsi target side in some version though. This should be fixed in recent open solaris targets now. Closing. If it still occurs reopen. |