Bug 237552 - [LSI-E 4.6 bug] RH4u4 gives a panic from sg driver during in-band Controller FW download
Summary: [LSI-E 4.6 bug] RH4u4 gives a panic from sg driver during in-band Controller ...
Keywords:
Status: CLOSED DUPLICATE of bug 239447
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Tom Coughlan
QA Contact: Martin Jenner
URL:
Whiteboard:
: 237554 237555 (view as bug list)
Depends On:
Blocks: 217099
TreeView+ depends on / blocked
 
Reported: 2007-04-23 19:07 UTC by Shailendra Hebsur
Modified: 2007-11-17 01:14 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-16 20:23:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Shailendra Hebsur 2007-04-23 19:07:47 UTC
Description of problem:

What is in-band management connection?

-- It's a method to manage storage array in which a storage management station 
sends commands to the storage array through the host i/o connection to the 
controller. 

For the in-band connection to happen, there should be an Access volume mapped. 
The Access volume is created by the storage array to establish the 
communication between the host & the storage array. An access volume is 
required only for in-band management. 

What is Controller FW download?

-- As it is known there are two controllers in the storage array. During FW 
download through in-band, using the Simplicity (GUI management for storage 
array), the user chooses the file and starts FW downloading on both the 
controllers. After the download is complete on both the controllers, one of the 
controller (ControllerA) will go a reboot and once it is back, the alternate 
controller (ControllerB) will go for a reboot.

Procedure for reproducing:

1. It's a 1x1 setup which one host connected to one storage array
2. On storage array, there is only Access LUN
3. Map the Access LUN to the host
4. Reboot the host
5. Once the host is back online, the host does see only Access LUN on dual 
paths as below,

<n/a> (/dev/sg3) [Storage Array , Virtual Disk Access, LUN 31, Virtual Disk ID 
<6001372000ffe36f0000000000000000>]

<n/a> (/dev/sg5) [Storage Array , Virtual Disk Access, LUN 31, Virtual Disk ID 
<6001372000ffe36f0000000000000000>]

As it can be seen above, the host sees only two sg devices as there is only 
Access LUN mapped to the host. 

6. With above in the configuration, using the GUI (Simplicity), the user does 
the Controller FW download, and then the host gives a panic. The panic stack is 
as below which has been collected from serial console redirection o/p,

[root@chap ~]# Unable to handle kernel paging request at 0000000014004013 RIP: 
<ffffffffa002f31d>{:sg:sg_common_write+2179}
PML4 1218fc067 PGD 11d927067 PMD 0 
Oops: 0000 [1] SMP 
CPU 0 
Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core 
smbfs sunrpc crc32c libcrc32c iscsi_sfnet scsi_transport_iscsi ds yenta_socket 
pcmcia_core joydev dm_multipath button battery ac uhci_hcd ehci_hcd hw_random 
e1000 bnx2 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mppVhba(U) 
megaraid_sas mppUpper(U) sg sd_mod scsi_mod
Pid: 6732, comm: java Not tainted 2.6.9-42.ELsmp
RIP: 0010:[<ffffffffa002f31d>] <ffffffffa002f31d>{:sg:sg_common_write+2179}
RSP: 0018:000001011cf33b58  EFLAGS: 00010202
RAX: 0000000000000002 RBX: 000001011c110000 RCX: 0000000014004010
RDX: 000001011df76088 RSI: 0000000008099600 RDI: 000001011c118000
RBP: 0000000000008200 R08: 0600ed1100002d93 R09: 2528000200000000
R10: 1800a62704000434 R11: 0000cf9102001927 R12: 000001011c115020
R13: 000000001f006014 R14: 0000000000000000 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffffffff804e5080(005b) knlGS:00000000eb0fdbb0
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000014004013 CR3: 0000000000101000 CR4: 00000000000006e0
Process java (pid: 6732, threadinfo 000001011cf32000, task 000001012cec4030)
Stack: 0000010133da76c0 0000000100000000 000001011df76088 000001011df760a8 
       000001012cec4030 00000100010447e0 000001011df76038 000001011df76088 
       0000820038ad0940 000001011df76000 
Call Trace:<ffffffffa002f77f>{:sg:sg_new_write+580} <ffffffffa002f9fa>
{:sg:sg_ioctl+595} 
       <ffffffff802a72c8>{sock_recvmsg+284} <ffffffff8030a14d>
{thread_return+88} 
       <ffffffff8010ed22>{__switch_to+306} <ffffffff8030a0f5>{thread_return+0} 
       <ffffffff80135752>{autoremove_wake_function+0} <ffffffff802a6ecb>
{sockfd_lookup+16} 
       <ffffffff80135752>{autoremove_wake_function+0} <ffffffff802a8734>
{sys_recvfrom+243} 
       <ffffffff80135752>{autoremove_wake_function+0} <ffffffff8017a358>
{fget+75} 
       <ffffffff8018ae05>{sys_ioctl+853} <ffffffff8012a122>{sg_ioctl_trans+832} 
       <ffffffff8019e8ac>{compat_sys_ioctl+235} <ffffffff80125bbb>
{sysenter_do_call+27} 
       

Code: 48 0f b6 41 03 48 8b 14 c5 c0 e0 48 80 48 b8 b7 6d db b6 6d 
RIP <ffffffffa002f31d>{:sg:sg_common_write+2179} RSP <000001011cf33b58>
CR2: 0000000014004013
 <0>Kernel panic - not syncing: Oops

7. As it can be seen above from the stack o/p, the EIP is at "sg_common_write " 
and the trace is called by the "sg_new_write"

Host details which gave a panic:

[root@chap ~]# uname -a
Linux chap.boldvt.co.lsil.com 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 
2006 x86_64 x86_64 x86_64 GNU/Linux
[root@chap ~]# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 4)
[root@chap ~]#
[root@chap ~]# sginfo -v
 Sginfo version 2.02 [20031215]
[root@chap ~]#
[root@chap ~]# lsmod | grep -i iscsi
iscsi_sfnet            95197  3
scsi_transport_iscsi    13377  1 iscsi_sfnet
scsi_mod              141457  6 
iscsi_sfnet,mppVhba,megaraid_sas,mppUpper,sg,sd_mod
[root@chap ~]# modinfo iscsi_sfnet
filename:       /lib/modules/2.6.9-
42.ELsmp/kernel/drivers/scsi/iscsi_sfnet/iscsi_sfnet.ko
parm:           max_initial_login_retries:Max number of times to retry logging 
into a target for the first time before giving up. The default is 3. Set to -1 
for no limit
version:        4:0.1.11-3 BA273FAEA64EA20472A07EC
license:        GPL
description:    iSCSI initiator
author:         Mike Christie and Cisco Systems, Inc.
depends:        scsi_transport_iscsi,scsi_mod
vermagic:       2.6.9-42.ELsmp SMP gcc-3.4
[root@chap ~]# modinfo scsi_transport_iscsi
filename:       /lib/modules/2.6.9-
42.ELsmp/kernel/drivers/scsi/scsi_transport_iscsi.ko
license:        GPL
description:    iSCSI Transport Attributes
author:         Mike Christie
depends:
vermagic:       2.6.9-42.ELsmp SMP gcc-3.4
[root@chap ~]#

Version-Release number of selected component (if applicable):


How reproducible: Always


Steps to Reproduce: The reproducing steps are as above & will reproduce it 
again as below,

1. It's a 1x1 setup with one host connected to one array
2. Map the Access LUN to the host and reboot the host
3. Make sure that the host sees Access LUN
4. Make sure the host can be connected to the Simplicity (Storage array 
management GUI) through Access LUN
5. Then, start the Ctlr FW download using Simplicity and the host will give a 
Panic and the panic stack is above
  
Actual results:Panic


Expected results: There shouldn't be a panic. The Ctlr FW download should be 
done fine.

Additional info:

Comment 2 Andrius Benokraitis 2007-04-23 19:15:36 UTC
*** Bug 237554 has been marked as a duplicate of this bug. ***

Comment 3 Andrius Benokraitis 2007-04-23 19:16:15 UTC
*** Bug 237555 has been marked as a duplicate of this bug. ***

Comment 4 RHEL Program Management 2007-05-09 04:39:42 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 Andrius Benokraitis 2007-05-16 20:23:59 UTC
Closing per K.H. Tan in an email.

Comment 7 Andrius Benokraitis 2007-06-26 23:51:35 UTC

*** This bug has been marked as a duplicate of 239447 ***


Note You need to log in before you can comment on or make changes to this bug.