Bug 132713 - HBA lockups with CX600, powerpath and RH qla2300 driver
HBA lockups with CX600, powerpath and RH qla2300 driver
Status: CLOSED DUPLICATE of bug 78616
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel (Show other bugs)
2.1
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-09-16 01:45 EDT by James Bourne
Modified: 2007-11-30 17:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-02-21 14:05:41 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
ps, /proc/scsi/emcp, /proc/scsi/scsi output (20.00 KB, text/plain)
2004-09-16 01:46 EDT, James Bourne
no flags Details

  None (edit)
Description James Bourne 2004-09-16 01:45:04 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Gecko/20040803

Description of problem:
During tape backups (spectralogic gator 12k) and while RMAN on
multiple remote servers are dumping via NFS to a disk volume from the
CX600 mounted on the local machine the HBAs lock up and cause
processes to enter an uninterruptable sleep state.  Multipath support
is done by using emc powerpath (3.0.6-10) and the qlogic driver is the
stock 6.04 contained in the .49 enterprise kernel.

At this point we can not gain any information from
/proc/scsi/qla2300/4 or /proc/scsi/qla2300/5 (cat is blocked), data
access on the mountpoint for the CX600 disk blocks, the backup
software (legato networker 7.1.2) is also blocked.  See attached tar
file with 2 ps output and contents of  /proc/scsi/emcp and
/proc/scsi/scsi.

Current rebooted to 2.4.9-e.48 with qlogic 2300 6.04.00 drive built on
the system.  This configuration was stable for some months.

Version-Release number of selected component (if applicable):
kernel-enterprise-2.4.9-e.49

How reproducible:
Sometimes

Steps to Reproduce:
1. High throughput to SAN device using powerpath
2. Performing tape backups


Actual Results:  On most occasions the system works as expected.  6
times now since upgrading to 2.4.9-e.49enterprise on Aug. 31st the
system has locked up during unattended backups.

Expected Results:  System does not lockup.

Additional info:
Comment 1 James Bourne 2004-09-16 01:46:52 EDT
Created attachment 103894 [details]
ps, /proc/scsi/emcp, /proc/scsi/scsi output
Comment 2 Arjan van de Ven 2004-09-16 03:18:41 EDT
Please try to reproduce this without binary only modules in play,
report this bug to either EMC or go via RH's support organisation;
they can escalate things to EMC while we in engineering cannot.

*** This bug has been marked as a duplicate of 78616 ***
Comment 3 James Bourne 2004-10-02 07:25:24 EDT
Arjan,
In the future it would be extreamly helpful to point people also to
something you know about such as bug ID 103300, which is very similar
to the problem we are seeing.

It is very easy to say it's someone elses problem (which it very well
may be, I'm the first to admit that) but when there has already been
an active discussion about this issue, please forward people to that
discussion as well as it would have given me an additional lead to follow.

I'm not saying don't close the call, I'm only asking in the future to
provide information to people reporting bugs when the information may
be directly relevant.

Thanks and regards
James
Comment 4 Arjan van de Ven 2004-10-02 07:58:19 EDT
"I'm not saying don't close the call"

I think you misunderstand what bugzilla is. Bugzilla is *NOT* support.
Let me repeat that: Bugzilla is *NOT* support.

Bugzilla is a backdoor into engineering to report defects; hopefully
in such a way that they contain enough information that engineering
can do something with it. You are using a binary only kernel module
which means we need to get the vendor of that module involved in the
diagnosis; for that you really need to contact Red Hat Support and not
engineering, as I said they are the group that can work with and
escalate to EMC.
Comment 5 James Bourne 2004-10-02 08:09:28 EDT
Sorry, that was a miss type on my part.  I did mean "I'm not saying
don't close the bug"....  The changes that make RHEL 3 better may be
able to be back ported from RHEL 3 to RHEL 2.1 (may be able to be, not
can be as you'd know better then I would).

And yes, that's what we ended up doing (powerpath issue so we opened a
call with Dell/EMC).  If I had known about the other bug ID sooner I
may have been able to get either escalation sooner or schedule an
upgrade to RHEL 3 as it sounds like that issue is fixed or at least
greatly reduced in RHEL 3.

Comment 6 Red Hat Bugzilla 2006-02-21 14:05:41 EST
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.

Note You need to log in before you can comment on or make changes to this bug.