Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 2.1 product line. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 132713

Summary:

HBA lockups with CX600, powerpath and RH qla2300 driver

Product:

Red Hat Enterprise Linux 2.1

Reporter:

James Bourne <jbourne>

Component:

kernel

Assignee:

Arjan van de Ven <arjanv>

Status:

CLOSED DUPLICATE

QA Contact:

Brian Brock <bbrock>

Severity:

high

Docs Contact:

Priority:

medium

Version:

2.1

CC:

jbaron, riel

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2006-02-21 19:05:41 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ps, /proc/scsi/emcp, /proc/scsi/scsi output	none

Description James Bourne 2004-09-16 05:45:04 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20040803

Description of problem:
During tape backups (spectralogic gator 12k) and while RMAN on
multiple remote servers are dumping via NFS to a disk volume from the
CX600 mounted on the local machine the HBAs lock up and cause
processes to enter an uninterruptable sleep state.  Multipath support
is done by using emc powerpath (3.0.6-10) and the qlogic driver is the
stock 6.04 contained in the .49 enterprise kernel.

At this point we can not gain any information from
/proc/scsi/qla2300/4 or /proc/scsi/qla2300/5 (cat is blocked), data
access on the mountpoint for the CX600 disk blocks, the backup
software (legato networker 7.1.2) is also blocked.  See attached tar
file with 2 ps output and contents of  /proc/scsi/emcp and
/proc/scsi/scsi.

Current rebooted to 2.4.9-e.48 with qlogic 2300 6.04.00 drive built on
the system.  This configuration was stable for some months.

Version-Release number of selected component (if applicable):
kernel-enterprise-2.4.9-e.49

How reproducible:
Sometimes

Steps to Reproduce:
1. High throughput to SAN device using powerpath
2. Performing tape backups


Actual Results:  On most occasions the system works as expected.  6
times now since upgrading to 2.4.9-e.49enterprise on Aug. 31st the
system has locked up during unattended backups.

Expected Results:  System does not lockup.

Additional info:

Comment 1 James Bourne 2004-09-16 05:46:52 UTC

Created attachment 103894 [details]
ps, /proc/scsi/emcp, /proc/scsi/scsi output

Comment 2 Arjan van de Ven 2004-09-16 07:18:41 UTC

Please try to reproduce this without binary only modules in play,
report this bug to either EMC or go via RH's support organisation;
they can escalate things to EMC while we in engineering cannot.

*** This bug has been marked as a duplicate of 78616 ***

Comment 3 James Bourne 2004-10-02 11:25:24 UTC

Arjan,
In the future it would be extreamly helpful to point people also to
something you know about such as bug ID 103300, which is very similar
to the problem we are seeing.

It is very easy to say it's someone elses problem (which it very well
may be, I'm the first to admit that) but when there has already been
an active discussion about this issue, please forward people to that
discussion as well as it would have given me an additional lead to follow.

I'm not saying don't close the call, I'm only asking in the future to
provide information to people reporting bugs when the information may
be directly relevant.

Thanks and regards
James

Comment 4 Arjan van de Ven 2004-10-02 11:58:19 UTC

"I'm not saying don't close the call"

I think you misunderstand what bugzilla is. Bugzilla is *NOT* support.
Let me repeat that: Bugzilla is *NOT* support.

Bugzilla is a backdoor into engineering to report defects; hopefully
in such a way that they contain enough information that engineering
can do something with it. You are using a binary only kernel module
which means we need to get the vendor of that module involved in the
diagnosis; for that you really need to contact Red Hat Support and not
engineering, as I said they are the group that can work with and
escalate to EMC.

Comment 5 James Bourne 2004-10-02 12:09:28 UTC

Sorry, that was a miss type on my part.  I did mean "I'm not saying
don't close the bug"....  The changes that make RHEL 3 better may be
able to be back ported from RHEL 3 to RHEL 2.1 (may be able to be, not
can be as you'd know better then I would).

And yes, that's what we ended up doing (powerpath issue so we opened a
call with Dell/EMC).  If I had known about the other bug ID sooner I
may have been able to get either escalation sooner or schedule an
upgrade to RHEL 3 as it sounds like that issue is fixed or at least
greatly reduced in RHEL 3.

Comment 6 Red Hat Bugzilla 2006-02-21 19:05:41 UTC

Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.