Red Hat Bugzilla – Bug 132713
HBA lockups with CX600, powerpath and RH qla2300 driver
Last modified: 2007-11-30 17:06:54 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3)
Description of problem:
During tape backups (spectralogic gator 12k) and while RMAN on
multiple remote servers are dumping via NFS to a disk volume from the
CX600 mounted on the local machine the HBAs lock up and cause
processes to enter an uninterruptable sleep state. Multipath support
is done by using emc powerpath (3.0.6-10) and the qlogic driver is the
stock 6.04 contained in the .49 enterprise kernel.
At this point we can not gain any information from
/proc/scsi/qla2300/4 or /proc/scsi/qla2300/5 (cat is blocked), data
access on the mountpoint for the CX600 disk blocks, the backup
software (legato networker 7.1.2) is also blocked. See attached tar
file with 2 ps output and contents of /proc/scsi/emcp and
Current rebooted to 2.4.9-e.48 with qlogic 2300 6.04.00 drive built on
the system. This configuration was stable for some months.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. High throughput to SAN device using powerpath
2. Performing tape backups
Actual Results: On most occasions the system works as expected. 6
times now since upgrading to 2.4.9-e.49enterprise on Aug. 31st the
system has locked up during unattended backups.
Expected Results: System does not lockup.
Created attachment 103894 [details]
ps, /proc/scsi/emcp, /proc/scsi/scsi output
Please try to reproduce this without binary only modules in play,
report this bug to either EMC or go via RH's support organisation;
they can escalate things to EMC while we in engineering cannot.
*** This bug has been marked as a duplicate of 78616 ***
In the future it would be extreamly helpful to point people also to
something you know about such as bug ID 103300, which is very similar
to the problem we are seeing.
It is very easy to say it's someone elses problem (which it very well
may be, I'm the first to admit that) but when there has already been
an active discussion about this issue, please forward people to that
discussion as well as it would have given me an additional lead to follow.
I'm not saying don't close the call, I'm only asking in the future to
provide information to people reporting bugs when the information may
be directly relevant.
Thanks and regards
"I'm not saying don't close the call"
I think you misunderstand what bugzilla is. Bugzilla is *NOT* support.
Let me repeat that: Bugzilla is *NOT* support.
Bugzilla is a backdoor into engineering to report defects; hopefully
in such a way that they contain enough information that engineering
can do something with it. You are using a binary only kernel module
which means we need to get the vendor of that module involved in the
diagnosis; for that you really need to contact Red Hat Support and not
engineering, as I said they are the group that can work with and
escalate to EMC.
Sorry, that was a miss type on my part. I did mean "I'm not saying
don't close the bug".... The changes that make RHEL 3 better may be
able to be back ported from RHEL 3 to RHEL 2.1 (may be able to be, not
can be as you'd know better then I would).
And yes, that's what we ended up doing (powerpath issue so we opened a
call with Dell/EMC). If I had known about the other bug ID sooner I
may have been able to get either escalation sooner or schedule an
upgrade to RHEL 3 as it sounds like that issue is fixed or at least
greatly reduced in RHEL 3.
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.