Description of problem: On a RHEL 5.6 8G FC host, we occasionally see adapter firmware dumps during IO as seen by the following entry in the messages file: kernel: qla2xxx 0000:1a:00.1: Firmware dump saved to temp buffer (5/ffffc200101e4000 Version-Release number of selected component (if applicable): RHEL 5.6 (2.6.18-238.el5) host using a QLE2562 8G FC adapter with inbox qla2xxx driver v8.03.01.05.05.06-k & fw v5.03.02 How reproducible: Occasionally
Created attachment 476360 [details] Firmware dump
Created attachment 476362 [details] /var/log/messages
We analyzed the firmware dump and found that it was caused by a CT passthrough command (not sure the origin) that timed out. The only other reason the driver would automatically take a firmware dump is if the ASIC was paused when servicing an interrupt however that condition is not present in the firmware dump provided but there was a CT passthrough command waiting for resources (which would have caused the timeout). We really shouldn't be doing a firmware dump here so the resolution will be to simply remove it. This will be part of the patchset we provide for RHEL 5.7 in Bug 660386.
(In reply to comment #3) > > We really shouldn't be doing a firmware dump here so the resolution will be to > simply remove it. This will be part of the patchset we provide for RHEL 5.7 in > Bug 660386. So could we please have this patch included in 5.6.z as well? That would be really helpful to us.
Martin indicates that this fw dump message is accompanied by error recovery steps like aborts & device resets (as seen in the /var/log/messages) which actually reflect as IO disruptions/delays on the host. As we know, these error recovery paths often lead to even more serious problems. This, plus the fact that the fix is very simple, is why the fix is proposed for the z-stream.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
This will be fixed when we (qlogic) do our RHEL 5.7 shortly. Specifically the fix that will need to be backported: diff --git a/drivers/scsi/qla2xxx/qla_attr.c b/drivers/scsi/qla2xxx/qla_attr.c index 91c8d25..f91fde5 100644 --- a/drivers/scsi/qla2xxx/qla_attr.c +++ b/drivers/scsi/qla2xxx/qla_attr.c @@ -622,13 +622,6 @@ qla2x00_wait_for_passthru_completion(struct scsi_qla_host *ha) timeout)) { DEBUG2(qla_printk(KERN_WARNING, ha, "Passthru request timed out.\n")); - if (IS_QLA82XX(ha)) { - set_bit(FCOE_CTX_RESET_NEEDED, &ha->dpc_flags); - } else { - ha->isp_ops->fw_dump(ha, 0); - set_bit(ISP_ABORT_NEEDED, &ha->dpc_flags); - } - qla2xxx_wake_dpc(ha); ha->pass_thru_cmd_result = 0; ha->pass_thru_cmd_in_process = 0; } Once the 5.6.z bugzilla is opened, I'll be able to provide a back ported point fix.
Created attachment 483557 [details] 0001-qla2xxx-Do-not-perform-reset-fw-dump-if-CT-ELS-passt.patch RHEL 5.6.z backport of "qla2xxx: Do not perform reset/fw-dump if CT/ELS passthru requests timeout.".
The RHEL 5.7 version of this patch was posted as part of the patch set from Bug 660386. The RHEL 5.6.z backport was posted on 3/11/2011 as the following patch: qla2xxx: Do not perform reset/fw-dump if CT/ELS passthru requests timeout.
What's the update on this? Is this being queued for 5.6.z?
(In reply to comment #12) > What's the update on this? Is this being queued for 5.6.z? Yes but looks like it will be in build 4.
This change is already in 5.7 (c. 2.6.18-248.el5) by way of bug 660386, going to simply move this bug to MODIFIED.
*** This bug has been marked as a duplicate of bug 660386 ***
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this update, if a CT/ELS pass-through command timed out, the QLogic 8Gb Fibre Channel adapter created a firmware dump. With this update, firmware dumps are no longer created when CT/ELS pass-through requests time out as a firmware dump is not necessary in this case.