Bug 159709

Summary: Executing "shut/no shut" on iscsi port on MDS crashed the linux host
Product: Red Hat Enterprise Linux 4 Reporter: Nitin Chandna <cnitin>
Component: iscsi-initiator-utilsAssignee: Mike Christie <mchristi>
Status: CLOSED ERRATA QA Contact: Brock Organ <borgan>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: 157070.alewis, berthiaume_wayne, coughlan, davidw, jlaska, kaufman_susan, majianpeng, mchristi, rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0810 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:12:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nitin Chandna 2005-06-07 11:24:44 UTC
Description of problem:
Configured 1 JBOD on MDS 9216. The device was visible on the host. Executed a
file system I/O on host on this device and did shutdown/no shutdown on iscsi
interface of MDS. The Linux system panic-ed.

Version-Release number of selected component (if applicable):
- kernel-smp-2.6.9-11.EL
- kernel-smp-devel-2.6.9-11.EL

How reproducible:
Not always reproducible (happened only once)

Steps to Reproduce:
1. Configured a JBOD on MDS 9216i
2. Mounted ext2 FS on the iscsi device
3. Started an IO
4. Execute a shutdown/no shutdown on iscsi interface of MDS
5. Linux Host Panic-ed

  
Actual results:
shutdown/no shutdown of iscsi interface on MDS paniced the linux host

Expected results:
System should not panic.

Additional info:
Jun  2 17:18:10 cnitin-linux-6 kernel: iscsi-sfnet: Control device major number 254
Jun  2 17:18:10 cnitin-linux-6 iscsid[17712]: version 4:0.1.11 variant
(12-Jan-2005) 
Jun  2 17:18:10 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:10 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:10 cnitin-linux-6 iscsid[17717]: Connected to Discovery Address
10.1.1.80 
Jun  2 17:18:11 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:11 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:12 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:12 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:13 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:13 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:14 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:14 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:15 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:15 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:16 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:16 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:17 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:17 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:18 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:18 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:19 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:19 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:18:21 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:18:21 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:08 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:08 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:10 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:10 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:12 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:12 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:14 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:14 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:16 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:16 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:18 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:18 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:20 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:20 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:22 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:22 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:24 cnitin-linux-6 iscsid[17716]: cannot make connection to
10.1.1.140:3260: Connection refused 
Jun  2 17:19:24 cnitin-linux-6 iscsid[17716]: Connection to Discovery Address
10.1.1.140 failed 
Jun  2 17:19:29 cnitin-linux-6 iscsid[17716]: Connected to Discovery Address
10.1.1.140 
Jun  2 17:19:30 cnitin-linux-6 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000038
Jun  2 17:19:30 cnitin-linux-6 kernel:  printing eip:
Jun  2 17:19:30 cnitin-linux-6 kernel: e888715c
Jun  2 17:19:30 cnitin-linux-6 kernel: *pde = 18f02001
Jun  2 17:19:30 cnitin-linux-6 kernel: Oops: 0000 [#1]
Jun  2 17:19:30 cnitin-linux-6 kernel: SMP 
Jun  2 17:19:30 cnitin-linux-6 kernel: Modules linked in: iscsi_sfnet(U) vfat
fat st crc32c libcrc32c md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core
sunrpc scsi_transport_iscsi(U) button battery ac e100 mii e1000 floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod cpqarray sd_mod scsi_mod
Jun  2 17:19:30 cnitin-linux-6 kernel: CPU:    0
Jun  2 17:19:30 cnitin-linux-6 kernel: EIP:    0060:[<e888715c>]    Not tainted VLI
Jun  2 17:19:30 cnitin-linux-6 kernel: EFLAGS: 00010286   (2.6.9-11.ELsmp) 
Jun  2 17:19:30 cnitin-linux-6 kernel: EIP is at
show_session_first_burst_len+0x18/0x3c [scsi_transport_iscsi]
Jun  2 17:19:30 cnitin-linux-6 kernel: eax: 00000000   ebx: e7dee400   ecx:
e8887144   edx: dca34000
Jun  2 17:19:30 cnitin-linux-6 kernel: esi: dca34000   edi: c0334abc   ebp:
e8888e88   esp: ce7d2f44
Jun  2 17:19:30 cnitin-linux-6 kernel: ds: 007b   es: 007b   ss: 0068
Jun  2 17:19:30 cnitin-linux-6 kernel: Process iscsid (pid: 17713,
threadinfo=ce7d2000 task=d69cabd0)
Jun  2 17:19:30 cnitin-linux-6 kernel: Stack: dca34000 00000000 c02149cb
e0de0ee0 e7dee4d4 c0188425 00000000 e0de0ef4 
Jun  2 17:19:30 cnitin-linux-6 kernel:        e0de0ee0 e590a560 00001000
c0188557 09cfa3d8 c03220c0 e590a560 00001000 
Jun  2 17:19:30 cnitin-linux-6 kernel:        ce7d2fac c0156011 ce7d2fac
09cfa3d8 e590a560 fffffff7 00000007 ce7d2000 
Jun  2 17:19:30 cnitin-linux-6 kernel: Call Trace:
Jun  2 17:19:30 cnitin-linux-6 kernel:  [<c02149cb>]
class_device_attr_show+0x14/0x1b
Jun  2 17:19:30 cnitin-linux-6 kernel:  [<c0188425>] fill_read_buffer+0x45/0x6f
Jun  2 17:19:30 cnitin-linux-6 kernel:  [<c0188557>] sysfs_read_file+0x47/0x70
Jun  2 17:19:30 cnitin-linux-6 kernel:  [<c0156011>] vfs_read+0xb6/0xe2
Jun  2 17:19:30 cnitin-linux-6 kernel:  [<c0156224>] sys_read+0x3c/0x62
Jun  2 17:19:30 cnitin-linux-6 kernel:  [<c02c7377>] syscall_call+0x7/0xb
Jun  2 17:19:30 cnitin-linux-6 kernel: Code: 68 e7 78 88 e8 6a 14 56 e8 86 09 93
d7 83 c4 10 5b 5e c3 56 89 d6 53 8d 98 34 ff ff ff 8b 43 24 2d d4 00 00 00 8b 40
78 8b 40 40 <8b> 50 38 85 d2 74 04 89 d8 ff d2 ff b3 68 01 00 00 68 e7 78 88 
Jun  2 17:19:30 cnitin-linux-6 kernel:  <0>Fatal exception: panic in 5 seconds

Comment 1 Mike Christie 2005-06-09 19:18:57 UTC
Is this the entire log? From the log it looks like a session was never
established. Somehow though a sysfs file is refereneced. At first I thought you
may have started the driver and iscsid and there was some refcoutning mishap.
Did you just insmod the module, run iscsid, mount a fs, run IO then restart the
target? Then kablewey. I am not completely sure what "shutdown/no shutdown" is.

Comment 2 Nitin Chandna 2005-06-15 07:48:39 UTC
1. About the entire log, my terminal had probably less buffer, where I would
have lost some logs before this log.
2. I did not install the iniator as module. I did a make/make install to install
this driver (rather than insmod)
3. MDS iSCSI Target System interfaces can run FCIP or iSCSI protocols. In this
case, I had done to enable iscsi interface with "no shut" and disabled iscsi on
the same interface as "shut".
4. This issue - I am unable to reproduce currently, though I will give it a try
again.

Comment 6 Mike Christie 2005-10-27 16:38:34 UTC
Moving to kernel componant. The iscsi componant is not used anymore. Please
select kernel for iscsi_sfnet problems and iscsi-initiator-utils for userspace
tool problems. Thanks.

Comment 7 Mike Christie 2005-11-10 22:57:54 UTC
I think

Comment 8 Mike Christie 2005-11-10 23:01:47 UTC
I moved this to iscsi-initiator-utils. I think vikasx.aggarwal has
found the source of the problem.

Comment 9 Mike Christie 2006-01-12 17:58:20 UTC
*** Bug 177645 has been marked as a duplicate of this bug. ***

Comment 11 Dave Wysochanski 2006-03-30 15:08:08 UTC
Mike do you have a reproducible case or do you not care about fixing this
bug?  Another tester just hit this here.

Comment 12 Mike Christie 2006-03-30 19:09:36 UTC
(In reply to comment #11)
> Mike do you have a reproducible case or do you not care about fixing this
> bug? 

Nice way of putting it :) I have this on my U4 TODO list. I will try to send out
a list of all BZs I am working on for U4 to give everyone an idea of what is
going on. The problem is that I have only reproduced it once.

Have you guys found a way to consistantly hit it?


Comment 16 Mike Christie 2006-08-07 22:55:37 UTC
*** Bug 201215 has been marked as a duplicate of this bug. ***

Comment 20 Tom Coughlan 2007-02-08 21:12:19 UTC
Did not make 4.5. Moving to 4.6.

Comment 21 RHEL Program Management 2007-03-10 01:15:30 UTC
This bugzilla had previously been approved for engineering
consideration but Red Hat Product Management is currently reevaluating
this issue for inclusion in RHEL4.6.

Comment 25 Suzanne Logcher 2007-09-17 19:59:59 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 26 Jay Turner 2007-09-18 11:03:28 UTC
Moving to Assigned . . . FAILS_QA is only used by the RHN team these days.

Comment 28 errata-xmlrpc 2007-11-15 16:12:33 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0810.html


Comment 29 majianpeng 2010-06-24 02:34:16 UTC
I used iscsi-initiator-utils-4.0.3.0-8.rpm(from centos 4.8),occur this bug.
The centos 4.0.3.0-8.rpm'md5sum did not equal redhat 4.0.3.0-8.But the src.rpm equal,so I think this rpm eqal.
  I can find a way to hit it.
env:
   iscsi-initiator-utils-4.0.3.0-8.rpm
   centos 4.v;
   kernel:2.6.9-78.EL
  this is my testfile:
int main()
{
        int fd = open("/sys/class/iscsi_transport/target2:0:0/first_burst_len");
        if(fd < 0){
                printf("open error:%s\n",strerror(errno));
                return -errno;
        }
        char buff[4096] = {0};
        sleep(40);
        if(read(fd, buff,4096) == -1){
                printf("read error:%s\n",strerror(errno));
        }
        printf("first_burst_len=%s\n",buff);
        close(fd);
 
        return 0;
}
target2.0.0:is iscsi target.
when testfile in sleep status; service iscsi stop.
This way must hit the bug.

I also read the dirver source the cause:
when I open the sysfs,I remove iscsi drive:
static void
__exit iscsi_cleanup(void)
{
	unregister_reboot_notifier(&iscsi_reboot_notifier);
	iscsi_unregister_interface();
	iscsi_destroy_all_hosts();
	iscsi_release_transport(iscsi_transportt);
	kmem_cache_destroy(iscsi_task_cache);
}
iscsi_release_transport(iscsi_transportt):may clean the iscsi_transport
but stuct scsi_host live,because the open operation.

I did not know how to modify?

Comment 30 majianpeng 2010-06-24 02:38:27 UTC
I used iscsi-initiator-utils-4.0.3.0-8.rpm(from centos 4.8),occur this bug.
The centos 4.0.3.0-8.rpm'md5sum did not equal redhat 4.0.3.0-8.But the src.rpm equal,so I think this rpm eqal.
  I can find a way to hit it.
env:
   iscsi-initiator-utils-4.0.3.0-8.rpm
   centos 4.v;
   kernel:2.6.9-78.EL
  this is my testfile:
int main()
{
        int fd = open("/sys/class/iscsi_transport/target2:0:0/first_burst_len");
        if(fd < 0){
                printf("open error:%s\n",strerror(errno));
                return -errno;
        }
        char buff[4096] = {0};
        sleep(40);
        if(read(fd, buff,4096) == -1){
                printf("read error:%s\n",strerror(errno));
        }
        printf("first_burst_len=%s\n",buff);
        close(fd);
 
        return 0;
}
target2.0.0:is iscsi target.
when testfile in sleep status; service iscsi stop.
This way must hit the bug.

I also read the dirver source the cause:
when I open the sysfs,I remove iscsi drive:
static void
__exit iscsi_cleanup(void)
{
	unregister_reboot_notifier(&iscsi_reboot_notifier);
	iscsi_unregister_interface();
	iscsi_destroy_all_hosts();
	iscsi_release_transport(iscsi_transportt);
	kmem_cache_destroy(iscsi_task_cache);
}
iscsi_release_transport(iscsi_transportt):may clean the iscsi_transport
but stuct scsi_host live,because the open operation.

I did not know how to modify?

Comment 31 Mike Christie 2010-06-24 21:46:40 UTC
 majianpeng,

Thanks for the info. Sounds like a refcount bug. The open() is probably not getting a ref to all the necessary structs that the read() ends up accessing. Will look into that code.