Description of problem: One of the cluster nodes out of 16 rebooted. @ . @ The /var/log/messages file recorded this: @ . @ Jul 29 13:49:01 abhdb015 sshd[11223]: Closing connection to 144.25.70.51 @ Jul 29 14:31:48 abhdb015 kernel: qla2xxx 0000:0e:00.0: RISC paused -- HCCR=0, @ Dumping firmware! @ Jul 29 14:31:48 abhdb015 kernel: qla2xxx 0000:0e:00.0: Firmware dump saved to @ temp buffer (0/ffffc20010081000). @ Jul 29 14:34:47 abhdb015 syslogd 1.4.1: restart. @ Jul 29 14:34:47 abhdb015 kernel: klogd 1.4.1, log source = /proc/kmsg @ started. Version-Release number of selected component (if applicable): 2.6.18-128.7.1.el5 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Fix from upstream is here: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=aed10881129c52f0e5dc1c96ac706b5ce7708a13 Please include the above patch in the next errata. Reference: Oracle bug 8746702.
This issue is related to a debug feature, not normal operation. As a result it received lower priority than other work in 5.6. This BZ is now queued for resolution in 5.7.
This looks like it is related to a debug feature of normal operation. We see this once every few months in production on RHEL5.3, 5.4, and 5.5 and can sort of live with it because we have dual HBAs with multipathing to try to avoid outages. Still would like to see it fixed though. Oracle's CentOS/Enterprise Linux 5u5 does have the fix applied.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 483550 [details] 0001-qla2xxx-Query-supported-RISC-registers-bits-in-deter.patch Back ported the appropriate bits of http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=aed10881129c52f0e5dc1c96ac706b5ce7708a13.
Posted the following patches on 3/11/2011: qla2xxx: Query supported RISC registers bits in determining a paused-state.
Patch(es) available in kernel-2.6.18-248.el5 Detailed testing feedback is always welcomed.
I use CentOS 5.5 with the kernel 2.6.18-194.el5xen and had the same problem. But as i known the lastest of centOS kernel is 2.6.18-238.9.1.el5 only. Can it fix this bug???
Chad, Any reproducer from customer?
(In reply to comment #16) > Chad, > > Any reproducer from customer? None that I am aware of.
Confirm patch in git tree.
No reproducer. Sanity test passed: 1. Boot from SAN. 2. sysfs: add LUN resize (grow) dev_loss_tmo fast_io_fail_tmo delete LUN 3. kdump. 4. 4 hours I/O stress with multipath failover induced. VERIFY.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html
*** Bug 605726 has been marked as a duplicate of this bug. ***