Bug 674298 - [NetApp 5.6 Bug] QLogic 8G FC firmware dumps seen during IO
Summary: [NetApp 5.6 Bug] QLogic 8G FC firmware dumps seen during IO
Keywords:
Status: CLOSED DUPLICATE of bug 660386
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: All
OS: All
urgent
high
Target Milestone: rc
: 5.7
Assignee: Chad Dupuis (Cavium)
QA Contact: Storage QE
URL:
Whiteboard:
Depends On:
Blocks: 618260
TreeView+ depends on / blocked
 
Reported: 2011-02-01 11:09 UTC by Martin George
Modified: 2011-06-02 13:31 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Prior to this update, if a CT/ELS pass-through command timed out, the QLogic 8Gb Fibre Channel adapter created a firmware dump. With this update, firmware dumps are no longer created when CT/ELS pass-through requests time out as a firmware dump is not necessary in this case.
Clone Of:
Environment:
Last Closed: 2011-04-19 18:25:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Firmware dump (348.33 KB, application/x-gzip)
2011-02-01 11:11 UTC, Martin George
no flags Details
/var/log/messages (661.02 KB, application/octet-stream)
2011-02-01 11:14 UTC, Martin George
no flags Details
0001-qla2xxx-Do-not-perform-reset-fw-dump-if-CT-ELS-passt.patch (1006 bytes, patch)
2011-03-10 20:38 UTC, Chad Dupuis (Cavium)
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Martin George 2011-02-01 11:09:21 UTC
Description of problem:
On a RHEL 5.6 8G FC host, we occasionally see adapter firmware dumps during IO as seen by the following entry in the messages file:

kernel: qla2xxx 0000:1a:00.1: Firmware dump saved to temp buffer (5/ffffc200101e4000

Version-Release number of selected component (if applicable):
RHEL 5.6 (2.6.18-238.el5) host using a QLE2562 8G FC adapter with inbox qla2xxx driver v8.03.01.05.05.06-k & fw v5.03.02

How reproducible:
Occasionally

Comment 1 Martin George 2011-02-01 11:11:25 UTC
Created attachment 476360 [details]
Firmware dump

Comment 2 Martin George 2011-02-01 11:14:02 UTC
Created attachment 476362 [details]
/var/log/messages

Comment 3 Chad Dupuis (Cavium) 2011-02-17 18:04:50 UTC
We analyzed the firmware dump and found that it was caused by a CT passthrough command (not sure the origin) that timed out.  The only other reason the driver would automatically take a firmware dump is if the ASIC was paused when servicing an interrupt however that condition is not present in the firmware dump provided but there was a CT passthrough command waiting for resources (which would have caused the timeout).

We really shouldn't be doing a firmware dump here so the resolution will be to simply remove it.  This will be part of the patchset we provide for RHEL 5.7 in Bug 660386.

Comment 4 Martin George 2011-02-17 18:19:09 UTC
(In reply to comment #3)
> 
> We really shouldn't be doing a firmware dump here so the resolution will be to
> simply remove it.  This will be part of the patchset we provide for RHEL 5.7 in
> Bug 660386.

So could we please have this patch included in 5.6.z as well? That would be really helpful to us.

Comment 5 Tom Coughlan 2011-03-01 22:12:51 UTC
Martin indicates that this fw dump message is accompanied by error recovery steps like aborts & device resets (as seen in the /var/log/messages) which actually reflect as IO disruptions/delays on the host. As we know, these error recovery paths often lead to even more serious problems. This, plus the fact that the fix is very simple, is why the fix is proposed for the z-stream.

Comment 6 RHEL Program Management 2011-03-01 22:19:32 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Chad Dupuis (Cavium) 2011-03-01 22:24:06 UTC
This will be fixed when we (qlogic) do our RHEL 5.7 shortly.  Specifically the fix that will need to be backported:

diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c
index 91c8d25..f91fde5 100644
--- a/drivers/scsi/qla2xxx/qla_attr.c
+++ b/drivers/scsi/qla2xxx/qla_attr.c
@@ -622,13 +622,6 @@ qla2x00_wait_for_passthru_completion(struct scsi_qla_host *ha)
            timeout)) {
                DEBUG2(qla_printk(KERN_WARNING, ha,
                    "Passthru request timed out.\n"));
-               if (IS_QLA82XX(ha)) {
-                       set_bit(FCOE_CTX_RESET_NEEDED, &ha->dpc_flags);
-               } else {
-                       ha->isp_ops->fw_dump(ha, 0);
-                       set_bit(ISP_ABORT_NEEDED, &ha->dpc_flags);
-               }
-               qla2xxx_wake_dpc(ha);
                ha->pass_thru_cmd_result = 0;
                ha->pass_thru_cmd_in_process = 0;
        }

Once the 5.6.z bugzilla is opened, I'll be able to provide a back ported point fix.

Comment 10 Chad Dupuis (Cavium) 2011-03-10 20:38:08 UTC
Created attachment 483557 [details]
0001-qla2xxx-Do-not-perform-reset-fw-dump-if-CT-ELS-passt.patch

RHEL 5.6.z backport of "qla2xxx: Do not perform reset/fw-dump if CT/ELS passthru requests timeout.".

Comment 11 Chad Dupuis (Cavium) 2011-03-11 16:02:58 UTC
The RHEL 5.7 version of this patch was posted as part of the patch set from Bug 660386.

The RHEL 5.6.z backport was posted on 3/11/2011 as the following patch:

qla2xxx: Do not perform reset/fw-dump if CT/ELS passthru requests timeout.

Comment 12 Martin George 2011-03-21 14:19:08 UTC
What's the update on this? Is this being queued for 5.6.z?

Comment 14 Rob Evers 2011-03-22 13:20:50 UTC
(In reply to comment #12)
> What's the update on this? Is this being queued for 5.6.z?

Yes but looks like it will be in build 4.

Comment 15 Jarod Wilson 2011-04-14 21:09:21 UTC
This change is already in 5.7 (c. 2.6.18-248.el5) by way of bug 660386, going to simply move this bug to MODIFIED.

Comment 18 Jiri Pirko 2011-04-19 18:25:14 UTC

*** This bug has been marked as a duplicate of bug 660386 ***

Comment 19 Martin Prpič 2011-06-02 13:31:13 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, if a CT/ELS pass-through command timed out, the QLogic 8Gb Fibre Channel adapter created a firmware dump. With this update, firmware dumps are no longer created when CT/ELS pass-through requests time out as a firmware dump is not necessary in this case.


Note You need to log in before you can comment on or make changes to this bug.