Bug 149620
| Summary: | megaraid driver always fails to reset adapter | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Jun'ichi Nomura (Red Hat) <jnomura> |
| Component: | kernel | Assignee: | Tom Coughlan <coughlan> |
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.0 | CC: | coughlan, davej, halligan, hfuchi, riel, tao |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i686 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHBA-2007-0304 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-05-01 22:49:50 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Created attachment 111446 [details]
sysreport
This is sysreport taken on the test machine.
Additional note. When the reset fails, the adapter is hung up. The driver makes the device offline with the following messages. megaraid: aborting-5932 cmd=28 <c=2 t=0 l=0> megaraid abort: 5932:2[255:0], fw owner megaraid: resetting the host... megaraid: 1 outstanding commands. Max wait 180 sec megaraid mbox: Wait for 1 commands to complete:180 megaraid mbox: Wait for 1 commands to complete:175 ... megaraid mbox: Wait for 1 commands to complete:0 megaraid mbox: critical hardware error! megaraid: resetting the host... megaraid: hw error, cannot reset megaraid: resetting the host... megaraid: hw error, cannot reset scsi: Device offlined - not ready after error recovery: host 2 channel 2 id 0 lun 0 SCSI error : <2 2 0 0> return code = 0x6000000 end_request: I/O error, dev sdb, sector 0 Buffer I/O error on device sdb, logical block 0 scsi2 (0:0): rejecting I/O to offline device The problem can be also solved by increasing the loop count in mbox_post_sync_cmd_fast() from 0xFFFFF to the huge value like 0xFFFFFF. Comments from linux-scsi: >-----Original Message----- >Sent: Tuesday, March 01, 2005 3:50 PM >Subject: Re: megaraid driver always fails to reset adapter > >Hi, >thanks for the info. > >Adding one more 'F' to the loop counter works. >i.e. 0xFFFFFF. > >Just adding rmb() didn't solve the problem though it >may decrease the necessary counter value. > >I don't know this value is ok for environments other than mine. > >Bagalkote, Sreenivas wrote: >> Please try: >> >> In mbox_post_sync_cmd_fast(...) replace >> >> for (i = 0; i < 0xFFFFF; i++) { >> if (mbox->numstatus != 0xFF) break; >> } >> >> with >> >> for (i = 0; i < 0xFFFFF; i++) { >> if (mbox->numstatus != 0xFF) break; >> rmb(); >> } >> >> Additionally, increase the loop counter to a bigger value. >> >> Thanks, >> Sreenivas >> LSI LOGIC Corporation >> >> >>>-----Original Message----- >>>Sent: Tuesday, March 01, 2005 1:36 PM >>>Subject: megaraid driver always fails to reset adapter >>> >>>Hello, >>> >>>I found that the megaraid driver always fails to reset the >>>adapter with the following message: >>> megaraid: resetting the host... >>> megaraid mbox: reset sequence completed successfully >>> megaraid: fast sync command timed out >>> megaraid: reservation reset failed >>>when the "Cluster mode" of the adapter BIOS is enabled. >>>So, whenever the reset occurs, the adapter goes to >>>offline and just become unavailable. >>> >>>Is this a known problem? >>> >>>I tried 2.6.9 and 2.6.11-rc5 and the results were the same. >>>I used sg_reset to invoke reset artificially to test this. >>> >>>The problem doesn't occur if I disabled the "Cluster mode" >>>parameter in the adapter BIOS. >>> >>>I'm not sure how well the currenet megaraid driver supports >>>the "Cluster mode". >>>I appreciate if you have any idea. > I believe that if the upstream
> team merged it, it would be merged the patch into our kernel. correct?
Yes, we are planning to update the megaraid driver in U3, and if this fix were
upstream we would inherit it. Unfortunately, I do not see this change in
2.6.14-rc4. Please check with LSI Logic on the status of this fix.
Thanks Dave. I'll change the status to POST. This has an ACK on rhkernel-list, so it should go in to 4.5. We will inherit it from upstream for 5.0. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. committed in stream U5 build 42.11. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ Patch is in the -51 kernel. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html |
Description of problem: megaraid driver always fails to reset adapter if the adapter is configured to enable cluster mode. Version-Release number of selected component (if applicable): 2.6.9-5.EL megaraid driver 2.20.4 How reproducible: Always Steps to Reproduce: 1. Enter to MegaRAID WebBIOS (by pressing Ctrl+H at boot) and set "Cluster mode" as enabled. 2. Boot kernel and check dmesg for the following line: megaraid: cluster firmware, initiator ID: xx If the line above does not appear, the BIOS setting is not correct. Try again from step 1. 3. Execute the following command to issue adapter reset sg_reset -h <device on megaraid> Actual results: After sg_reset, you will see the following kernel log message megaraid: resetting the host... megaraid mbox: reset sequence completed successfully megaraid: fast sync command timed out megaraid: reservation reset failed Expected results: After sg_reset, you should see the following kernel log message megaraid: resetting the host... megaraid mbox: reset sequence completed successfully megaraid: reservation reset Additional info: If the cluster mode is disabled, the problem doesn't occur. In the megaraid driver, the megaraid_reset_handler() is called to reset adapter and if cluster mode is enabled (i.e. adapter->ha is true), it tries to issue the following mbox command and fails. // clear reservations if any raw_mbox[0] = CLUSTER_CMD; raw_mbox[2] = RESET_RESERVATIONS;