Bug 537277 - KERNEL: QLA2XXX 0000:0E:00.0: RISC PAUSED -- HCCR=0, DUMPING FIRMWARE!
Summary: KERNEL: QLA2XXX 0000:0E:00.0: RISC PAUSED -- HCCR=0, DUMPING FIRMWARE!
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Chad Dupuis (Cavium)
QA Contact: Gris Ge
URL:
Whiteboard:
: 605726 (view as bug list)
Depends On:
Blocks: 605694
TreeView+ depends on / blocked
 
Reported: 2009-11-13 01:16 UTC by Guru Anbalagane
Modified: 2018-11-14 17:01 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 10:17:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
0001-qla2xxx-Query-supported-RISC-registers-bits-in-deter.patch (948 bytes, patch)
2011-03-10 19:49 UTC, Chad Dupuis (Cavium)
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Guru Anbalagane 2009-11-13 01:16:42 UTC
Description of problem:
One of the cluster nodes out of 16 rebooted.
@ .
@ The /var/log/messages file recorded this:
@ .
@ Jul 29 13:49:01 abhdb015 sshd[11223]: Closing connection to 144.25.70.51
@ Jul 29 14:31:48 abhdb015 kernel: qla2xxx 0000:0e:00.0: RISC paused -- HCCR=0,
@ Dumping firmware!
@ Jul 29 14:31:48 abhdb015 kernel: qla2xxx 0000:0e:00.0: Firmware dump saved to
@ temp buffer (0/ffffc20010081000).
@ Jul 29 14:34:47 abhdb015 syslogd 1.4.1: restart.
@ Jul 29 14:34:47 abhdb015 kernel: klogd 1.4.1, log source = /proc/kmsg
@ started. 

Version-Release number of selected component (if applicable):
2.6.18-128.7.1.el5

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Fix from upstream is here:
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=aed10881129c52f0e5dc1c96ac706b5ce7708a13


Please include the above patch in the next errata. 

Reference: Oracle bug 8746702.

Comment 3 Tom Coughlan 2010-12-10 14:24:33 UTC
This issue is related to a debug feature, not normal operation. As a result it received lower priority than other work in 5.6. This BZ is now queued for resolution in 5.7.

Comment 4 Gerard de Vos 2010-12-28 16:49:27 UTC
This looks like it is related to a debug feature of normal operation. We see this once every few months in production on RHEL5.3, 5.4, and 5.5 and can sort of live with it because we have dual HBAs with multipathing to try to avoid outages. Still would like to see it fixed though. Oracle's CentOS/Enterprise Linux 5u5 does have the fix applied.

Comment 5 RHEL Program Management 2011-02-01 17:02:51 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Chad Dupuis (Cavium) 2011-03-10 19:49:22 UTC
Created attachment 483550 [details]
0001-qla2xxx-Query-supported-RISC-registers-bits-in-deter.patch

Back ported the appropriate bits of http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=aed10881129c52f0e5dc1c96ac706b5ce7708a13.

Comment 11 Chad Dupuis (Cavium) 2011-03-11 15:58:05 UTC
Posted the following patches on 3/11/2011:

qla2xxx: Query supported RISC registers bits in determining a paused-state.

Comment 13 Jarod Wilson 2011-03-16 18:00:46 UTC
Patch(es) available in kernel-2.6.18-248.el5
Detailed testing feedback is always welcomed.

Comment 15 VÅ© Minh Giang 2011-04-27 06:51:38 UTC
I use CentOS 5.5 with the kernel 2.6.18-194.el5xen and had the same problem.
But as i known the lastest of centOS kernel is 2.6.18-238.9.1.el5 only.
Can it fix this bug???

Comment 16 Gris Ge 2011-06-02 07:16:05 UTC
Chad,

Any reproducer from customer?

Comment 17 Chad Dupuis (Cavium) 2011-06-06 15:00:09 UTC
(In reply to comment #16)
> Chad,
> 
> Any reproducer from customer?

None that I am aware of.

Comment 18 Chao Ye 2011-06-15 05:09:57 UTC
Confirm patch in git tree.

Comment 19 Gris Ge 2011-06-22 08:07:26 UTC
No reproducer.

Sanity test passed:
1. Boot from SAN.
2. sysfs:
   add LUN
   resize (grow)
   dev_loss_tmo
   fast_io_fail_tmo
   delete LUN
3. kdump.
4. 4 hours I/O stress with multipath failover induced.

VERIFY.

Comment 20 errata-xmlrpc 2011-07-21 10:17:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html

Comment 21 Chad Dupuis (Cavium) 2013-03-05 14:20:51 UTC
*** Bug 605726 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.