Bug 604134
Summary: | [NetApp 5.5 bug] Kernel panic hit on RHEL 5.5 FC host with QLogic external driver | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Martin George <marting> |
Component: | kernel | Assignee: | Chad Dupuis (Cavium) <cdupuis> |
Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 5.5.z | CC: | andrew.vasquez, andriusb, coughlan, lalit.chandivade, xdl-redhat-bugzilla |
Target Milestone: | rc | Keywords: | OtherQA |
Target Release: | 5.6 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-06-15 18:31:31 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 557597 |
Description
Martin George
2010-06-15 13:37:43 UTC
Red Hat has no means to test on external drivers. If another bugzilla already reported this with the inbox driver, this can be closed. Let's have QLogic look into the inbox driver issue first. *** This bug has been marked as a duplicate of bug 598946 *** Martin,
the test driver lalit sent to you has the following single
line change:
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 15f1f79..08de61d 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -510,7 +510,7 @@ qla24xx_queuecommand(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *))
> }
>
> /* close window on fcport/rport state-transitioning. */
> - if (fcport->drport) {
> + if (!fcport || fcport->drport) {
> cmd->result = did_imm_retry << 16;
> goto qc24_fail_command;
> }
but...that really just works around a larger problem, as the
fcport is derived from the scsi-device's hostdata scratchpad:
static int
qla2x00_queuecommand(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *))
{
scsi_qla_host_t *ha = to_qla_host(cmd->device->host);
fc_port_t *fcport = (struct fc_port *) cmd->device->hostdata;
struct fc_rport *rport = starget_to_rport(scsi_target(cmd->device));
srb_t *sp;
int rval;
hostdata is cleared only when slave_destroy() is called by the
midlayer:
static void
qla2xxx_slave_destroy(struct scsi_device *sdev)
{
sdev->hostdata = null;
}
i wouldn't expect the midlayer to send down (via queuecommand())
requests for a reaped scsi-device. we can add the workaround
code, but we'd need to understand why the midlayer is sending
these scsi-commands down in the first place.
This bz should probably be reopened as it's actually not a duplicate of 598946. (In reply to comment #4) > This bz should probably be reopened as it's actually not a duplicate of 598946. We don't usually troubleshoot out-of-box drivers, so although this is CLOSED as a dupe, it should really be CLOSED WONTFIX. The only reason this was closed as a dupe was because we were under the impression a firmware update would clear up both inbox and out-of-box drivers. > We don't usually troubleshoot out-of-box drivers, so although this is CLOSED as
> a dupe, it should really be CLOSED WONTFIX. The only reason this was closed as
> a dupe was because we were under the impression a firmware update would clear
> up both inbox and out-of-box drivers.
Our concern here is that could also affect RHEL 5.6 inbox. Would it be more appropriate to open another bz for RHEL 5.6?
> Our concern here is that could also affect RHEL 5.6 inbox. Would it be more
> appropriate to open another bz for RHEL 5.6?
I'm still confused how an out-of-box driver would affect an inbox driver.
> I'm still confused how an out-of-box driver would affect an inbox driver.
Even though the other driver is out of box, they both share the same queuecommand behavior. The one line patch listed above would apply on the rhel 5 inbox driver:
/* Close window on fcport/rport state-transitioning. */
if (fcport->drport) {
cmd->result = DID_IMM_RETRY << 16;
goto qc_fail_command;
}
Also, the FC transport behavior would be the same in both instances.
(In reply to comment #3) > Martin, > > the test driver lalit sent to you has the following single > line change: > Andrew, We've not hit the kernel panic with the external test driver (DVR:v8.03.01.07.05.06-k-test FW:v5.03.02) so far. (In reply to comment #9) > (In reply to comment #3) > > Martin, > > > > the test driver lalit sent to you has the following single > > line change: > > > Andrew, > We've not hit the kernel panic with the external test driver > (DVR:v8.03.01.07.05.06-k-test FW:v5.03.02) so far. (In reply to comment #3) > Martin, > the test driver lalit sent to you has the following single > line change: > > diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c > > index 15f1f79..08de61d 100644 > > --- a/drivers/scsi/qla2xxx/qla_os.c > > +++ b/drivers/scsi/qla2xxx/qla_os.c > > @@ -510,7 +510,7 @@ qla24xx_queuecommand(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *)) > > } > > > > /* close window on fcport/rport state-transitioning. */ > > - if (fcport->drport) { > > + if (!fcport || fcport->drport) { > > cmd->result = did_imm_retry << 16; > > goto qc24_fail_command; > > } > but...that really just works around a larger problem, as the Actually the fix I provided earlier could lead to system hung, as we do immediate retry if fcport is NULL. The correct workaround would be diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index 15f1f79..60f16b6 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -510,6 +510,11 @@ qla24xx_queuecommand(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *)) } /* Close window on fcport/rport state-transitioning. */ + if (!fcport) { + cmd->result = DID_NO_CONNECT << 16; + goto qc24_fail_command; + } + if (fcport->drport) { cmd->result = DID_IMM_RETRY << 16; goto qc24_fail_command; |