Bug 149620 - megaraid driver always fails to reset adapter
megaraid driver always fails to reset adapter
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-24 11:37 EST by Jun'ichi Nomura (Red Hat)
Modified: 2013-04-02 19:51 EDT (History)
6 users (show)

See Also:
Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-01 18:49:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jun'ichi Nomura (Red Hat) 2005-02-24 11:37:50 EST
Description of problem:
megaraid driver always fails to reset adapter if the adapter is
configured to enable cluster mode.

Version-Release number of selected component (if applicable):
2.6.9-5.EL
megaraid driver 2.20.4

How reproducible:
Always

Steps to Reproduce:
1. Enter to MegaRAID WebBIOS (by pressing Ctrl+H at boot)
   and set "Cluster mode" as enabled.

2. Boot kernel and check dmesg for the following line:
     megaraid: cluster firmware, initiator ID: xx

   If the line above does not appear, the BIOS setting is
   not correct. Try again from step 1.

3. Execute the following command to issue adapter reset
     sg_reset -h <device on megaraid>

Actual results:
  After sg_reset, you will see the following kernel log message
    megaraid: resetting the host...
    megaraid mbox: reset sequence completed successfully
    megaraid: fast sync command timed out
    megaraid: reservation reset failed

Expected results:
  After sg_reset, you should see the following kernel log message
    megaraid: resetting the host...
    megaraid mbox: reset sequence completed successfully
    megaraid: reservation reset


Additional info:

If the cluster mode is disabled, the problem doesn't occur.

In the megaraid driver, the megaraid_reset_handler() is called
to reset adapter and if cluster mode is enabled (i.e. adapter->ha
is true), it tries to issue the following mbox command and fails.
        // clear reservations if any
        raw_mbox[0] = CLUSTER_CMD;
        raw_mbox[2] = RESET_RESERVATIONS;
Comment 1 Jun'ichi Nomura (Red Hat) 2005-02-25 17:06:03 EST
Created attachment 111446 [details]
sysreport

This is sysreport taken on the test machine.
Comment 2 Jun'ichi Nomura (Red Hat) 2005-02-25 17:15:58 EST
Additional note.

When the reset fails, the adapter is hung up.
The driver makes the device offline with the following messages.

 megaraid: aborting-5932 cmd=28 <c=2 t=0 l=0>
 megaraid abort: 5932:2[255:0], fw owner
 megaraid: resetting the host...
 megaraid: 1 outstanding commands. Max wait 180 sec
 megaraid mbox: Wait for 1 commands to complete:180
 megaraid mbox: Wait for 1 commands to complete:175
 ...
 megaraid mbox: Wait for 1 commands to complete:0
 megaraid mbox: critical hardware error!
 megaraid: resetting the host...
 megaraid: hw error, cannot reset
 megaraid: resetting the host...
 megaraid: hw error, cannot reset
 scsi: Device offlined - not ready after error recovery: host 2
channel 2 id 0 lun 0
 SCSI error : <2 2 0 0> return code = 0x6000000
 end_request: I/O error, dev sdb, sector 0
 Buffer I/O error on device sdb, logical block 0
 scsi2 (0:0): rejecting I/O to offline device
Comment 3 Jun'ichi Nomura (Red Hat) 2005-03-01 18:53:32 EST
The problem can be also solved by increasing the loop count
in mbox_post_sync_cmd_fast() from 0xFFFFF to the huge value
like 0xFFFFFF.

Comments from linux-scsi:

>-----Original Message-----
>Sent: Tuesday, March 01, 2005 3:50 PM
>Subject: Re: megaraid driver always fails to reset adapter
>
>Hi,
>thanks for the info.
>
>Adding one more 'F' to the loop counter works.
>i.e. 0xFFFFFF.
>
>Just adding rmb() didn't solve the problem though it
>may decrease the necessary counter value.
>
>I don't know this value is ok for environments other than mine.
>
>Bagalkote, Sreenivas wrote:
>> Please try:
>> 
>> In mbox_post_sync_cmd_fast(...) replace
>> 
>> for (i = 0; i < 0xFFFFF; i++) {
>> 	if (mbox->numstatus != 0xFF) break;
>> }
>> 
>> with
>> 
>> for (i = 0; i < 0xFFFFF; i++) {
>> 	if (mbox->numstatus != 0xFF) break;
>> 	rmb();
>> } 
>> 
>> Additionally, increase the loop counter to a bigger value.
>> 
>> Thanks,
>> Sreenivas
>> LSI LOGIC Corporation 
>> 
>> 
>>>-----Original Message-----
>>>Sent: Tuesday, March 01, 2005 1:36 PM
>>>Subject: megaraid driver always fails to reset adapter
>>>
>>>Hello,
>>>
>>>I found that the megaraid driver always fails to reset the
>>>adapter with the following message:
>>>   megaraid: resetting the host...
>>>   megaraid mbox: reset sequence completed successfully
>>>   megaraid: fast sync command timed out
>>>   megaraid: reservation reset failed
>>>when the "Cluster mode" of the adapter BIOS is enabled.
>>>So, whenever the reset occurs, the adapter goes to
>>>offline and just become unavailable.
>>>
>>>Is this a known problem?
>>>
>>>I tried 2.6.9 and 2.6.11-rc5 and the results were the same.
>>>I used sg_reset to invoke reset artificially to test this.
>>>
>>>The problem doesn't occur if I disabled the "Cluster mode"
>>>parameter in the adapter BIOS.
>>>
>>>I'm not sure how well the currenet megaraid driver supports
>>>the "Cluster mode".
>>>I appreciate if you have any idea.
Comment 5 Tom Coughlan 2005-10-12 12:51:21 EDT
> I believe that if the upstream 
> team merged it, it would be merged the patch into our kernel. correct?

Yes, we are planning to update the megaraid driver in U3, and if this fix were
upstream we would inherit it. Unfortunately, I do not see this change in
2.6.14-rc4. Please check with LSI Logic on the status of this fix. 
Comment 10 Tom Coughlan 2006-07-13 16:32:05 EDT
Thanks Dave. I'll change the status to POST. This has an ACK on rhkernel-list,
so it should go in to 4.5. We will inherit it from upstream for 5.0.
Comment 12 RHEL Product and Program Management 2006-09-07 15:41:43 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 13 RHEL Product and Program Management 2006-09-07 15:41:43 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 14 RHEL Product and Program Management 2006-09-07 15:42:17 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 15 RHEL Product and Program Management 2006-09-07 15:43:23 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 17 Jason Baron 2006-09-15 15:14:55 EDT
committed in stream U5 build 42.11. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 20 Mike Gahagan 2007-03-26 12:49:05 EDT
Patch is in the -51 kernel.
Comment 22 Red Hat Bugzilla 2007-05-01 18:49:50 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.