From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.3) Gecko/20040803 Description of problem: During tape backups (spectralogic gator 12k) and while RMAN on multiple remote servers are dumping via NFS to a disk volume from the CX600 mounted on the local machine the HBAs lock up and cause processes to enter an uninterruptable sleep state. Multipath support is done by using emc powerpath (3.0.6-10) and the qlogic driver is the stock 6.04 contained in the .49 enterprise kernel. At this point we can not gain any information from /proc/scsi/qla2300/4 or /proc/scsi/qla2300/5 (cat is blocked), data access on the mountpoint for the CX600 disk blocks, the backup software (legato networker 7.1.2) is also blocked. See attached tar file with 2 ps output and contents of /proc/scsi/emcp and /proc/scsi/scsi. Current rebooted to 2.4.9-e.48 with qlogic 2300 6.04.00 drive built on the system. This configuration was stable for some months. Version-Release number of selected component (if applicable): kernel-enterprise-2.4.9-e.49 How reproducible: Sometimes Steps to Reproduce: 1. High throughput to SAN device using powerpath 2. Performing tape backups Actual Results: On most occasions the system works as expected. 6 times now since upgrading to 2.4.9-e.49enterprise on Aug. 31st the system has locked up during unattended backups. Expected Results: System does not lockup. Additional info:
Created attachment 103894 [details] ps, /proc/scsi/emcp, /proc/scsi/scsi output
Please try to reproduce this without binary only modules in play, report this bug to either EMC or go via RH's support organisation; they can escalate things to EMC while we in engineering cannot. *** This bug has been marked as a duplicate of 78616 ***
Arjan, In the future it would be extreamly helpful to point people also to something you know about such as bug ID 103300, which is very similar to the problem we are seeing. It is very easy to say it's someone elses problem (which it very well may be, I'm the first to admit that) but when there has already been an active discussion about this issue, please forward people to that discussion as well as it would have given me an additional lead to follow. I'm not saying don't close the call, I'm only asking in the future to provide information to people reporting bugs when the information may be directly relevant. Thanks and regards James
"I'm not saying don't close the call" I think you misunderstand what bugzilla is. Bugzilla is *NOT* support. Let me repeat that: Bugzilla is *NOT* support. Bugzilla is a backdoor into engineering to report defects; hopefully in such a way that they contain enough information that engineering can do something with it. You are using a binary only kernel module which means we need to get the vendor of that module involved in the diagnosis; for that you really need to contact Red Hat Support and not engineering, as I said they are the group that can work with and escalate to EMC.
Sorry, that was a miss type on my part. I did mean "I'm not saying don't close the bug".... The changes that make RHEL 3 better may be able to be back ported from RHEL 3 to RHEL 2.1 (may be able to be, not can be as you'd know better then I would). And yes, that's what we ended up doing (powerpath issue so we opened a call with Dell/EMC). If I had known about the other bug ID sooner I may have been able to get either escalation sooner or schedule an upgrade to RHEL 3 as it sounds like that issue is fixed or at least greatly reduced in RHEL 3.
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.