Bug 643933 - kernel: qla2xxx Abort command issued
Summary: kernel: qla2xxx Abort command issued
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-18 15:09 UTC by Michael Schon
Modified: 2018-11-29 19:43 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-18 18:24:52 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Michael Schon 2010-10-18 15:09:31 UTC
Description of problem:


Version-Release number of selected component (if applicable):
$ modinfo qla2xxx
filename:       /lib/modules/2.6.18-194.11.1.el5/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
version:        8.03.01.04.05.05-k
license:        GPL
description:    QLogic Fibre Channel HBA Driver
author:         QLogic Corporation
srcversion:     DFDA7EA15EB76CC4AFA42A9
alias:          pci:v00001077d00008001sv*sd*bc*sc*i*
alias:          pci:v00001077d00002532sv*sd*bc*sc*i*
alias:          pci:v00001077d00005432sv*sd*bc*sc*i*
alias:          pci:v00001077d00005422sv*sd*bc*sc*i*
alias:          pci:v00001077d00008432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002422sv*sd*bc*sc*i*
alias:          pci:v00001077d00006322sv*sd*bc*sc*i*
alias:          pci:v00001077d00006312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002322sv*sd*bc*sc*i*
alias:          pci:v00001077d00002312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002300sv*sd*bc*sc*i*
alias:          pci:v00001077d00002200sv*sd*bc*sc*i*
alias:          pci:v00001077d00002100sv*sd*bc*sc*i*
depends:        scsi_mod,scsi_transport_fc
vermagic:       2.6.18-194.11.1.el5 SMP mod_unload gcc-4.1
parm:           ql2xlogintimeout:Login timeout value in seconds. (int)
parm:           qlport_down_retry:Maximum number of command retries to a port that returns a PORT-DOWN status. (int)
parm:           ql2xplogiabsentdevice:Option to enable PLOGI to devices that are not present after a Fabric scan.  This is needed for several broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int)
parm:           ql2xloginretrycount:Specify an alternate value for the NVRAM login retry count. (int)
parm:           ql2xallocfwdump:Option to enable allocation of memory for a firmware dump during HBA initialization.  Memory allocation requirements vary by ISP type.  Default is 1 - allocate memory. (int)
parm:           ql2xextended_error_logging:Option to enable extended error logging, Default is 0 - no logging. 1 - log errors. (int)
parm:           ql2xdevdiscgoldfw:Option to enable device discovery with golden firmware Applicable to ISP81XX based CNA only. Default is 0 - no discovery. 1 - discover device. (int)
parm:           ql2xfdmienable:Enables FDMI registratons Default is 0 - no FDMI. 1 - perfom FDMI. (int)
parm:           ql2xmaxqdepth:Maximum queue depth to report for target devices. (int)
parm:           ql2xqfulltracking:Controls whether the driver tracks queue full status returns and dynamically adjusts a scsi device's queue depth.  Default is 1, perform tracking.  Set to 0 to disable dynamic tracking and adjustment of queue depth. (int)
parm:           ql2xqfullrampup:Number of seconds to wait to begin to ramp-up the queue depth for a device after a queue-full condition has been detected.  Default is 120 seconds. (int)
parm:           ql2xenablemsix:Set to enable MSI or MSI-X interrupt mechanism. Default is 1, enable MSI-X interrupt mechanism. 0 = enable traditional pin-based mechanism. 1 = enable MSI-X interrupt mechanism. 2 = enable MSI interrupt mechanism. (int)
module_sig:	883f3504c4eae32e4fb9d541b21d77f112cd750a0c0c9794a6b8ce0ff1d5adf9e851b3672ffe82c209f66b61b15f0a877e0d47e1a44ea28714b3f95994

$ rpm -qa |grep ^kernel
kernel-headers-2.6.18-194.11.1.el5
kernel-devel-2.6.18-194.11.1.el5
kernel-2.6.18-194.11.1.el5

How reproducible:
[mschon@hodb001abt ~]$ grep qla2xxx /var/log/messages |head -20
May 13 12:44:33 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 e474e5 2002.
May 13 17:26:32 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 e54640 2002.
May 14 19:30:31 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 eb4f66 2002.
May 14 23:45:08 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed5446 2002.
May 14 23:45:09 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed5447 2002.
May 14 23:45:10 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed5448 2002.
May 14 23:45:11 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed5449 2002.
May 14 23:45:12 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed544a 2002.
May 14 23:45:13 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed544b 2002.
May 14 23:45:14 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed544c 2002.
May 14 23:45:15 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed544d 2002.
May 14 23:45:15 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:1:0): Abort command issued -- 1 ed544e 2002.
May 15 20:21:28 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f1a06a 2002.
May 15 20:21:29 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f1a06b 2002.
May 15 22:06:35 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f2a1fc 2002.
May 15 22:28:35 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f2b300 2002.
May 15 22:28:36 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f2b301 2002.
May 15 22:28:37 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f2b302 2002.
May 16 23:04:12 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f87e99 2002.
May 16 23:04:13 hodb001abt kernel: qla2xxx 0000:10:00.0: scsi(3:0:0): Abort command issued -- 1 f87e9a 2002.

Steps to Reproduce:
N/A
  
Actual results:
N/A

Expected results:
No error found

Additional info:

Comment 1 Zoltan Forray 2010-11-08 17:06:56 UTC
I am having the same problems.  Accessing an EMC Clariion results in constant "Abort" messages and extremely slow access:

# modinfo qla2xxx
filename:       /lib/modules/2.6.18-194.17.4.el5/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
version:        8.03.01.04.05.05-k
license:        GPL
description:    QLogic Fibre Channel HBA Driver
author:         QLogic Corporation
srcversion:     DFDA7EA15EB76CC4AFA42A9
alias:          pci:v00001077d00008001sv*sd*bc*sc*i*
alias:          pci:v00001077d00002532sv*sd*bc*sc*i*
alias:          pci:v00001077d00005432sv*sd*bc*sc*i*
alias:          pci:v00001077d00005422sv*sd*bc*sc*i*
alias:          pci:v00001077d00008432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002422sv*sd*bc*sc*i*
alias:          pci:v00001077d00006322sv*sd*bc*sc*i*
alias:          pci:v00001077d00006312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002322sv*sd*bc*sc*i*
alias:          pci:v00001077d00002312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002300sv*sd*bc*sc*i*
alias:          pci:v00001077d00002200sv*sd*bc*sc*i*
alias:          pci:v00001077d00002100sv*sd*bc*sc*i*
depends:        scsi_mod,scsi_transport_fc
vermagic:       2.6.18-194.17.4.el5 SMP mod_unload gcc-4.1
parm:           ql2xlogintimeout:Login timeout value in seconds. (int)
parm:           qlport_down_retry:Maximum number of command retries to a port that returns a PORT-DOWN status. (int)
parm:           ql2xplogiabsentdevice:Option to enable PLOGI to devices that are not present after a Fabric scan.  This is needed for several broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int)
parm:           ql2xloginretrycount:Specify an alternate value for the NVRAM login retry count. (int)
parm:           ql2xallocfwdump:Option to enable allocation of memory for a firmware dump during HBA initialization.  Memory allocation requirements vary by ISP type.  Default is 1 - allocate memory. (int)
parm:           ql2xextended_error_logging:Option to enable extended error logging, Default is 0 - no logging. 1 - log errors. (int)
parm:           ql2xdevdiscgoldfw:Option to enable device discovery with golden firmware Applicable to ISP81XX based CNA only. Default is 0 - no discovery. 1 - discover device. (int)
parm:           ql2xfdmienable:Enables FDMI registratons Default is 0 - no FDMI. 1 - perfom FDMI. (int)
parm:           ql2xmaxqdepth:Maximum queue depth to report for target devices. (int)
parm:           ql2xqfulltracking:Controls whether the driver tracks queue full status returns and dynamically adjusts a scsi device's queue depth.  Default is 1, perform tracking.  Set to 0 to disable dynamic tracking and adjustment of queue depth. (int)
parm:           ql2xqfullrampup:Number of seconds to wait to begin to ramp-up the queue depth for a device after a queue-full condition has been detected.  Default is 120 seconds. (int)
parm:           ql2xenablemsix:Set to enable MSI or MSI-X interrupt mechanism. Default is 1, enable MSI-X interrupt mechanism. 0 = enable traditional pin-based mechanism. 1 = enable MSI-X interrupt mechanism. 2 = enable MSI interrupt mechanism. (int)
module_sig:     883f3504cbf226322d213c6fbb63431121bb40a0a3e6122e9b4f9c60fa4ebd6fc92136312a4f526e0a099b62299727ff5a79a97ffce8f797a7f8830843


# rpm -qa |grep ^kernel
kernel-2.6.18-194.11.3.el5
kernel-2.6.18-194.el5
kernel-devel-2.6.18-194.17.4.el5
kernel-devel-2.6.18-194.11.3.el5
kernel-headers-2.6.18-194.17.4.el5
kernel-2.6.18-194.17.4.el5
kernel-devel-2.6.18-194.el5

Comment 2 dtaniguchi 2010-11-09 06:57:21 UTC
I have same problem on Red Hat EL 5.3.
"Abort" message appears in /var/log/messages several times.

qdiskd says:
 qdiskd[4043]: <warning> qdisk cycle took more than 1 second to complete (61.000000)

And RHCS evicted active node.

$ sudo /sbin/modinfo qla2xxx
filename:       /lib/modules/2.6.18-128.el5/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
version:        8.02.00.06.05.03-k
license:        GPL
description:    QLogic Fibre Channel HBA Driver
author:         QLogic Corporation
srcversion:     7792BC80ED4AFED369ECEE8
alias:          pci:v00001077d00002532sv*sd*bc*sc*i*
alias:          pci:v00001077d00005432sv*sd*bc*sc*i*
alias:          pci:v00001077d00005422sv*sd*bc*sc*i*
alias:          pci:v00001077d00008432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002422sv*sd*bc*sc*i*
alias:          pci:v00001077d00006322sv*sd*bc*sc*i*
alias:          pci:v00001077d00006312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002322sv*sd*bc*sc*i*
alias:          pci:v00001077d00002312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002300sv*sd*bc*sc*i*
alias:          pci:v00001077d00002200sv*sd*bc*sc*i*
alias:          pci:v00001077d00002100sv*sd*bc*sc*i*
depends:        scsi_mod,scsi_transport_fc
vermagic:       2.6.18-128.el5 SMP mod_unload gcc-4.1
parm:           ql2xlogintimeout:Login timeout value in seconds. (int)
parm:           qlport_down_retry:Maximum number of command retries to a port that returns a PORT-DOWN status. (int)
parm:           ql2xplogiabsentdevice:Option to enable PLOGI to devices that are not present after a Fabric scan.  This is needed for several broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int)
parm:           ql2xloginretrycount:Specify an alternate value for the NVRAM login retry count. (int)
parm:           ql2xallocfwdump:Option to enable allocation of memory for a firmware dump during HBA initialization.  Memory allocation requirements vary by ISP type.  Default is 1 - allocate memory. (int)
parm:           ql2xextended_error_logging:Option to enable extended error logging, Default is 0 - no logging. 1 - log errors. (int)
parm:           ql2xfdmienable:Enables FDMI registratons Default is 0 - no FDMI. 1 - perfom FDMI. (int)
parm:           ql2xmaxqdepth:Maximum queue depth to report for target devices. (int)
parm:           ql2xqfullrampup:Number of seconds to wait to begin to ramp-up the queue depth for a device after a queue-full condition has been detected.  Default is 120 seconds. (int)
parm:           ql2xenablemsix:Set to enable MSI-X interrupt mechanism. (int)
module_sig:     883f35049492f655cdc734e64d24fa1122c3c0a09ba24b95a398829236a3d8ab311d83ade9b1309e2771eae7792e79a58d1442ad0c533e56ea74f67

$ rpm -qa |grep ^kernel
kernel-headers-2.6.18-128.el5
kernel-2.6.18-128.el5
kernel-devel-2.6.18-128.el5

Comment 3 James Hofmeister 2010-11-23 23:29:12 UTC
Interested to see what is logged just before "qla2xxx scsi Abort command issued" after you enable the qlogic extended error logging...

# cat /sys/module/qla2xxx/ql2xextended_error_logging
0
# echo 1 > /sys/module/qla2xxx/ql2xextended_error_logging

# cat /sys/module/qla2xxx/ql2xextended_error_logging
1

Regards, James Hofmeister      Hewlett Packard Linux Solutions Engineer

Comment 4 Zoltan Forray 2010-11-24 03:23:53 UTC
(In reply to comment #3)
> Interested to see what is logged just before "qla2xxx scsi Abort command
> issued" after you enable the qlogic extended error logging...
> 
> # cat /sys/module/qla2xxx/ql2xextended_error_logging
> 0
> # echo 1 > /sys/module/qla2xxx/ql2xextended_error_logging
> 
> # cat /sys/module/qla2xxx/ql2xextended_error_logging
> 1
> 
> Regards, James Hofmeister      Hewlett Packard Linux Solutions Engineer

First off, than you for offer some help.  We have had no help from Dell and QLogic won't talk to us since the cards came with a system from Dell (they told us to talk to Dell, which is the only reason we are talking with them). We didn't even know about the "extended logging" (and obviously neither have the folks at Dell we are talking to since they didn't suggest we turn this on).

Here is the extended logging after starting a format/write of a 10GB file.  I notice all of the dropped frames and underrun errors?  What do they mean?

Nov 23 22:10:50 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:10:51 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:10:52 moon kernel: scsi(4): Discard RND Frame -- 1406 02c1 0000.
Nov 23 22:11:38 moon kernel: scsi(4): Discard RND Frame -- 1008 02a1 0000.
Nov 23 22:12:38 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a240 from RISC. pid=23250.
Nov 23 22:12:38 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:12:38 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 5ad2 2002.
Nov 23 22:12:38 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a800 from RISC. pid=24452.
Nov 23 22:12:38 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:12:39 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 5f84 2002.
Nov 23 22:12:39 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:13:43 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c854522c0 from RISC. pid=25014.
Nov 23 22:13:43 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:13:44 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 61b6 2002.
Nov 23 22:13:44 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c854521c0 from RISC. pid=25346.
Nov 23 22:13:44 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:13:45 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 6302 2002.
Nov 23 22:14:04 moon kernel: scsi(4): Discard RND Frame -- 1003 00c1 0000.
Nov 23 22:14:16 moon kernel: scsi(4): Discard RND Frame -- 1003 02c1 0000.
Nov 23 22:14:35 moon kernel: scsi(4): Discard RND Frame -- 1003 00c1 0000.
Nov 23 22:14:38 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:14:44 moon kernel: scsi(4): Discard RND Frame -- ffff ec00 0000.
Nov 23 22:15:41 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c850d92c0 from RISC. pid=29867.
Nov 23 22:15:41 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:42 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 74ab 2002.
Nov 23 22:15:42 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a2c0 from RISC. pid=32033.
Nov 23 22:15:42 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:43 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 7d21 2002.
Nov 23 22:15:43 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c850d93c0 from RISC. pid=34985.
Nov 23 22:15:43 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:44 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 88a9 2002.
Nov 23 22:15:44 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c85db42c0 from RISC. pid=35677.
Nov 23 22:15:44 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:45 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b5d 2002.
Nov 23 22:15:45 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a780 from RISC. pid=35678.
Nov 23 22:15:45 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:46 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b5e 2002.
Nov 23 22:15:46 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a880 from RISC. pid=35679.
Nov 23 22:15:46 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:47 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b5f 2002.
Nov 23 22:15:47 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c85db4280 from RISC. pid=35681.
Nov 23 22:15:47 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:47 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b61 2002.
Nov 23 22:15:47 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a680 from RISC. pid=35683.
Nov 23 22:15:47 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:47 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b63 2002.
Nov 23 22:15:47 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a8c0 from RISC. pid=35684.
Nov 23 22:15:47 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:48 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b64 2002.
Nov 23 22:15:48 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c850d94c0 from RISC. pid=35685.
Nov 23 22:15:48 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:49 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b65 2002.
Nov 23 22:15:49 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a340 from RISC. pid=35686.
Nov 23 22:15:49 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:49 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b66 2002.
Nov 23 22:15:49 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a3c0 from RISC. pid=35687.
Nov 23 22:15:49 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:50 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b67 2002.
Nov 23 22:15:50 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c85db4380 from RISC. pid=35688.
Nov 23 22:15:50 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:51 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b68 2002.
Nov 23 22:15:51 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c85db41c0 from RISC. pid=35689.
Nov 23 22:15:51 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:51 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b69 2002.
Nov 23 22:15:51 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c850d9380 from RISC. pid=35690.
Nov 23 22:15:51 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:15:51 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 8b6a 2002.
Nov 23 22:15:51 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:15:52 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:15:54 moon last message repeated 2 times
Nov 23 22:15:54 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:15:55 moon last message repeated 4 times
Nov 23 22:15:58 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:15:59 moon last message repeated 3 times
Nov 23 22:16:00 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:01 moon kernel: scsi(4): Discard RND Frame -- 1006 03c1 0000.
Nov 23 22:16:01 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:02 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:07 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:07 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:08 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:08 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:09 moon last message repeated 2 times
Nov 23 22:16:12 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:13 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:15 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:16 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:16 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:18 moon last message repeated 2 times
Nov 23 22:16:19 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:19 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:20 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:20 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:21 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:21 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:25 moon last message repeated 5 times
Nov 23 22:16:26 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:27 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:28 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:29 moon last message repeated 2 times
Nov 23 22:16:30 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:30 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:31 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:16:33 moon last message repeated 2 times
Nov 23 22:16:33 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:17:01 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ec800 from RISC. pid=43953.
Nov 23 22:17:01 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:17:02 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 abb1 2002.
Nov 23 22:17:12 moon kernel: scsi(4): Discard RND Frame -- 1006 03c1 0000.
Nov 23 22:17:15 moon kernel: scsi(4): Discard RND Frame -- 1003 02c1 0000.
Nov 23 22:17:15 moon kernel: scsi(4): Discard RND Frame -- 1003 03c1 0000.
Nov 23 22:17:20 moon kernel: scsi(4): Discard RND Frame -- 1003 00c1 0000.
Nov 23 22:17:48 moon kernel: scsi(4): Discard RND Frame -- 1003 02c1 0000.
Nov 23 22:17:53 moon kernel: scsi(4): Discard RND Frame -- ffff ec00 0000.
Nov 23 22:18:50 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ec940 from RISC. pid=65916.
Nov 23 22:18:50 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:50 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 1017c 2002.
Nov 23 22:18:50 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a940 from RISC. pid=66396.
Nov 23 22:18:50 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:51 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 1035c 2002.
Nov 23 22:18:51 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c850d9140 from RISC. pid=66520.
Nov 23 22:18:51 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:52 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 103d8 2002.
Nov 23 22:18:52 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c850d9340 from RISC. pid=66961.
Nov 23 22:18:52 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:52 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 10591 2002.
Nov 23 22:18:52 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567ac00 from RISC. pid=67615.
Nov 23 22:18:52 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:52 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 1081f 2002.
Nov 23 22:18:52 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ec780 from RISC. pid=71503.
Nov 23 22:18:52 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:53 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 1174f 2002.
Nov 23 22:18:53 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c85452140 from RISC. pid=71678.
Nov 23 22:18:53 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:53 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 117fe 2002.
Nov 23 22:18:53 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ec8c0 from RISC. pid=71679.
Nov 23 22:18:53 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:54 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 117ff 2002.
Nov 23 22:18:54 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a380 from RISC. pid=71680.
Nov 23 22:18:54 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:54 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11800 2002.
Nov 23 22:18:54 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567ac80 from RISC. pid=71681.
Nov 23 22:18:54 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:55 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11801 2002.
Nov 23 22:18:55 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c8567a200 from RISC. pid=71683.
Nov 23 22:18:55 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:56 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11803 2002.
Nov 23 22:18:56 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ec980 from RISC. pid=71684.
Nov 23 22:18:56 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:56 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11804 2002.
Nov 23 22:18:56 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ecc80 from RISC. pid=71685.
Nov 23 22:18:56 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:56 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11805 2002.
Nov 23 22:18:56 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ecb00 from RISC. pid=71686.
Nov 23 22:18:56 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:57 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11806 2002.
Nov 23 22:18:57 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810a084ec9c0 from RISC. pid=71687.
Nov 23 22:18:57 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:57 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11807 2002.
Nov 23 22:18:57 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c854527c0 from RISC. pid=71688.
Nov 23 22:18:57 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:58 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11808 2002.
Nov 23 22:18:58 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c85452700 from RISC. pid=71689.
Nov 23 22:18:58 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:58 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 11809 2002.
Nov 23 22:18:58 moon kernel: qla2xxx_eh_abort(4): aborting sp ffff810c85452180 from RISC. pid=71690.
Nov 23 22:18:58 moon kernel: scsi(4): ABORT status detected 0x5-0x0.
Nov 23 22:18:58 moon kernel: qla2xxx 0000:05:00.1: scsi(4:0:0): Abort command issued -- 1 1180a 2002.
Nov 23 22:18:59 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:36 moon kernel: scsi(4): Discard RND Frame -- ffff fc00 0000.
Nov 23 22:19:55 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:56 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:56 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:56 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:57 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:57 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:58 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:58 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:59 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:19:59 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:01 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:01 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:02 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:03 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:03 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:04 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:04 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:05 moon last message repeated 3 times
Nov 23 22:20:05 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:06 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:07 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:07 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:08 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x800 of 0x1000 bytes)...firmware reported underrun...retrying command.
Nov 23 22:20:08 moon kernel: scsi(4:0:0:0) Dropped frame(s) detected (0x1000 of 0x1000 bytes)...firmware reported underrun...retrying command.
[root@moon ~]#

Comment 5 James Hofmeister 2010-11-25 07:34:15 UTC
Hello Zoltan, a key point here is the "qla2xxx Abort command issued" is not an error.  It is a status message indicating that the driver aborted a command at the kernels request (some upper level service) ~or~ in response to a fabric or end storage device status received.

Each person who has said "me too" to this bugzilla likley has a different root cause.

Yours: Dropped frames, firmware reported underrun.

After verifying the qlogic driver and firmware versions match that required by your storage vendor, update if needed, I would start troubleshooting the hardware.

- FC-CABLE (make sure it is not looped too tight), swap with another.
- GBIC.
- FC-SWITCH port, check for errors, switch to alternate port.
- FC-HBA, replace.

If this is a blade, check enclosure interface to FC.

Regards, James Hofmeister      Hewlett Packard Linux Solutions Engineer

Comment 6 Zoltan Forray 2010-11-25 16:28:56 UTC
(In reply to comment #5)
> Hello Zoltan, a key point here is the "qla2xxx Abort command issued" is not an
> error.  It is a status message indicating that the driver aborted a command at
> the kernels request (some upper level service) ~or~ in response to a fabric or
> end storage device status received.
> 
> Each person who has said "me too" to this bugzilla likley has a different root
> cause.
> 
> Yours: Dropped frames, firmware reported underrun.
> 
> After verifying the qlogic driver and firmware versions match that required by
> your storage vendor, update if needed, I would start troubleshooting the
> hardware.
> 
> - FC-CABLE (make sure it is not looped too tight), swap with another.
> - GBIC.
> - FC-SWITCH port, check for errors, switch to alternate port.
> - FC-HBA, replace.
> 
> If this is a blade, check enclosure interface to FC.
> 
> Regards, James Hofmeister      Hewlett Packard Linux Solutions Engineer

James,

Again, thanks so much for your efforts to help.

I apologize for not giving details on what troubleshooting we have already tried. We have a very skilled staff - by the time we put out any request for help, we have gone pretty far testing things ourselves.

HBA -  This box came with 3-QlA2462 (planned to use 2-for tape and 1-for disk - currently not enough switch ports to use all 3).  So, we switched the disk connections to the unused HBA - no difference.  Errors and slooow SAN disk access still the same.  We also first tried switching the HBA connections/cables - same thing.

The switches are CISCO.  We ran all reports (show techsupport) with no sign of any errors, either from the switch port or cable.

As for the QLogic drivers/firmware, the QLogic SANsurfer cli (scli) shows all firmware levels are at their latest on both servers (more on this, shortly).  We also checked the Linux kernel level, filesystem format (ext4), etc.

My SAN guy also tried force/switching what paths/switches are used.  Also no difference.  With no signs of any errors from the switches, I don't think he has tried switching GBIC's.  Same for cable.  All lights are good and we don't see any error counts.  I will suggest we try replacing both, especially since nothing else we have tried has made any difference.

This server is one of two identically configured boxes purchased at the same time.  Their serial numbers are basically 1-digit difference.  The other box is running flawlessly, as for the SAN storage.  It also has a 5.3TB SAN filesystem (separately zones - no sharing) .  Firmware/kernel/drivers are the same on both.

One other point of interest is the SANsurfer software.  I installed it on both machines to do comparisons.  The GUI version will not run on the machine with the problems. When launched, it gives some kind of error related to networking/connections (sorry, I don't have the details/exact message).  Multiple un/reinstalls did not fix it and again we can't find any info on how to fix it.  

Not sure if this means anything but I thought I would mention it anyway.  I did IBM z/OS software maintenance and troubleshooting for 25+ years and have seen how seemingly unrelated issues/errors/messages are often linked.

Comment 7 Zoltan Forray 2010-12-02 14:28:10 UTC
An update.

Just for the ha-ha's, we swapped out both fibre cables. Then I ran a 10GB format. No errors on the logs and it ran pretty quickly.

Not believing we had 2-bad cables, we switched back to the original cables and ran another format.  Still no errors.

While the possibility of a fleck of dust on one of the cables (remember, we forcefully pushed the switch/SAN to use the secondary interface/connection and switched HBA's with no change), I did not think this fixed it.

So, I ran a 300GB format of a volume in the SAN space.  It ran 4-hours with no errors logged.  This time is not acceptable for performance.

I went to the other, identical system (both purchased and installed at the same time and setup the same way with both having 5.3TB of SAN space) and formatted a 300GB volume.   This format ran 1-hour.

As far as I am concerned, whatever we did with the cable switching/rebooting has stopped the reporting of errors/problems but the problem still exists.

We are continuing to dig through configs/logs to see if we can figure out what is going on.

Comment 8 Harrison Han 2011-11-10 06:38:43 UTC
Hi,Sirs

We also encountered such errors.

OS:Red Hat Enterprise Linux 5
Applications: Oracle RAC

On RAC node 1, the I/O(especially write) turned very slowly, It took 3~18 minutes to write the DB redo logs to archive log files, normally only need several seconds.The average I/O wait was 30 times of normal one.

After OS reboot, the failure disappeared


The logs was regestered as below.

Oct 30 16:49:01 mxrac01 auditd[7475]: Audit daemon rotating log files
Oct 31 18:58:00 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b19574 2002.
Oct 31 18:58:01 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:0:1): Abort command issued -- 1 46b19608 2002.
Oct 31 18:59:07 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:0:1): Abort command issued -- 1 46b1a38a 2002.
Oct 31 19:00:11 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b1abe9 2002.
Oct 31 19:00:11 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b1acbe 2002.
Oct 31 19:01:15 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b1add4 2002.
...............................................................................................................................................................................................................
#remark: until the node 1 restarted
...............................................................................................................................................................................................................
Oct 31 20:39:07 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:0:2): Abort command issued -- 1 46b6787e 2002.


The vendor checked the server and storage, but no failure found.

So is this a OS bug?

Comment 9 Harrison Han 2011-11-10 06:39:30 UTC
Hi,Sirs

We also encountered such errors.

OS:Red Hat Enterprise Linux 5
Applications: Oracle RAC

On RAC node 1, the I/O(especially write) turned very slowly, It took 3~18 minutes to write the DB redo logs to archive log files, normally only need several seconds.The average I/O wait was 30 times of normal one.

After OS reboot, the failure disappeared


The logs was regestered as below.

Oct 30 16:49:01 mxrac01 auditd[7475]: Audit daemon rotating log files
Oct 31 18:58:00 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b19574 2002.
Oct 31 18:58:01 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:0:1): Abort command issued -- 1 46b19608 2002.
Oct 31 18:59:07 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:0:1): Abort command issued -- 1 46b1a38a 2002.
Oct 31 19:00:11 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b1abe9 2002.
Oct 31 19:00:11 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b1acbe 2002.
Oct 31 19:01:15 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:1:0): Abort command issued -- 1 46b1add4 2002.
...............................................................................................................................................................................................................
#remark: until the node 1 restarted
...............................................................................................................................................................................................................
Oct 31 20:39:07 mxrac01 kernel: qla2xxx 0000:12:00.0: scsi(4:0:2): Abort command issued -- 1 46b6787e 2002.


The vendor checked the server and storage, but no failure found.

So is this a OS bug?

Comment 10 Sandro 2011-12-13 11:41:30 UTC
We are seeing the same symptom on four boxes connected to XIV G2 storage systems.


Dec 12 12:29:34 spch1222 kernel: qla2xxx 0000:0b:00.0: scsi(3:3:7): Abort command issued -- 1 388bf 2002.
Dec 12 12:29:34 spch1222 kernel: sd 3:0:3:7: timing out command, waited 20s
Dec 12 12:29:55 spch1222 kernel: qla2xxx 0000:0b:00.0: scsi(3:3:7): Abort command issued -- 1 3890f 2002.
Dec 12 12:29:55 spch1222 kernel: sd 3:0:3:7: timing out command, waited 20s
Dec 12 12:30:15 spch1222 kernel: qla2xxx 0000:0b:00.0: scsi(3:3:8): Abort command issued -- 1 389be 2002.
Dec 12 12:30:15 spch1222 kernel: sd 3:0:3:8: timing out command, waited 20s


After enabling extended error logging it looks like this


Dec 12 11:18:26 spch1222 kernel: qla2xxx_eh_abort(2): aborting sp ffff810135a04240 from RISC. pid=197766.
Dec 12 11:18:26 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:6) FCP command status: 0x5-0x0 (0x80000) por                                     tid=344200 oxid=0x112 ser=0x30486 cdb=9e1000 len=0x10 rsp_info=0x0 resid=0x0 fw_resid=0x0
Dec 12 11:18:27 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:6): Abort command issued -- 1 30486 2002.
Dec 12 11:18:27 spch1222 kernel: sd 2:0:3:6: timing out command, waited 20s
Dec 12 11:18:47 spch1222 kernel: qla2xxx_eh_abort(2): aborting sp ffff8101303f74c0 from RISC. pid=197862.
Dec 12 11:18:47 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:7) FCP command status: 0x5-0x0 (0x80000) por                                     tid=344200 oxid=0x173 ser=0x304e6 cdb=9e1000 len=0x10 rsp_info=0x0 resid=0x0 fw_resid=0x0
Dec 12 11:18:47 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:7): Abort command issued -- 1 304e6 2002.
Dec 12 11:19:07 spch1222 kernel: qla2xxx_eh_abort(2): aborting sp ffff8101f1ed9b40 from RISC. pid=197973.
Dec 12 11:19:07 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:7) FCP command status: 0x5-0x0 (0x80000) por                                     tid=344200 oxid=0x1e3 ser=0x30555 cdb=9e1000 len=0x10 rsp_info=0x0 resid=0x0 fw_resid=0x0
Dec 12 11:19:07 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:7): Abort command issued -- 1 30555 2002.
Dec 12 11:19:07 spch1222 kernel: sd 2:0:3:7: timing out command, waited 20s
Dec 12 11:19:27 spch1222 kernel: qla2xxx_eh_abort(2): aborting sp ffff8101f1ed9b40 from RISC. pid=198133.
Dec 12 11:19:27 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:8) FCP command status: 0x5-0x0 (0x80000) por                                     tid=344200 oxid=0x284 ser=0x305f5 cdb=9e1000 len=0x10 rsp_info=0x0 resid=0x0 fw_resid=0x0
Dec 12 11:19:27 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:8): Abort command issued -- 1 305f5 2002.
Dec 12 11:19:27 spch1222 kernel: sd 2:0:3:8: timing out command, waited 20s
Dec 12 11:19:47 spch1222 kernel: qla2xxx_eh_abort(2): aborting sp ffff8101f1ed9b40 from RISC. pid=198250.
Dec 12 11:19:47 spch1222 kernel: qla2xxx 0000:0e:00.0: scsi(2:3:8) FCP command status: 0x5-0x0 (0x80000) por                                     tid=344200 oxid=0x2fa ser=0x3066a cdb=9e1000 len=0x10 rsp_info=0x0 resid=0x0 fw_resid=0x0


# uname -r
2.6.18-274.el5


# modinfo qla2xxx
filename:       /lib/modules/2.6.18-274.el5/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
version:        8.03.07.03.05.07-k
license:        GPL
description:    QLogic Fibre Channel HBA Driver
author:         QLogic Corporation
srcversion:     4BEA1D57FF410848860873A
alias:          pci:v00001077d00008021sv*sd*bc*sc*i*
alias:          pci:v00001077d00008001sv*sd*bc*sc*i*
alias:          pci:v00001077d00002532sv*sd*bc*sc*i*
alias:          pci:v00001077d00005432sv*sd*bc*sc*i*
alias:          pci:v00001077d00005422sv*sd*bc*sc*i*
alias:          pci:v00001077d00008432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002432sv*sd*bc*sc*i*
alias:          pci:v00001077d00002422sv*sd*bc*sc*i*
alias:          pci:v00001077d00006322sv*sd*bc*sc*i*
alias:          pci:v00001077d00006312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002322sv*sd*bc*sc*i*
alias:          pci:v00001077d00002312sv*sd*bc*sc*i*
alias:          pci:v00001077d00002300sv*sd*bc*sc*i*
alias:          pci:v00001077d00002200sv*sd*bc*sc*i*
alias:          pci:v00001077d00002100sv*sd*bc*sc*i*
depends:        scsi_mod,scsi_transport_fc
vermagic:       2.6.18-274.el5 SMP mod_unload gcc-4.1
parm:           ql2xlogintimeout:Login timeout value in seconds. (int)
parm:           qlport_down_retry:Maximum number of command retries to a port that returns a PORT-DOWN status. (int)
parm:           ql2xplogiabsentdevice:Option to enable PLOGI to devices that are not present after a Fabric scan.  This is needed for several broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int)
parm:           ql2xloginretrycount:Specify an alternate value for the NVRAM login retry count. (int)
parm:           ql2xallocfwdump:Option to enable allocation of memory for a firmware dump during HBA initialization.  Memory allocation requirements vary by ISP type.  Default is 1 - allocate memory. (int)
parm:           ql2xextended_error_logging:Option to enable extended error logging, Default is 0 - no logging. 1 - log errors. (int)
parm:           ql2xdevdiscgoldfw:Option to enable device discovery with golden firmware Applicable to ISP81XX based CNA only. Default is 0 - no discovery. 1 - discover device. (int)
parm:           ql2xfdmienable:Enables FDMI registratons Default is 0 - no FDMI. 1 - perfom FDMI. (int)
parm:           ql2xmaxqdepth:Maximum queue depth to report for target devices. (int)
parm:           ql2xqfulltracking:Controls whether the driver tracks queue full status returns and dynamically adjusts a scsi device's queue depth.  Default is 1, perform tracking.  Set to 0 to disable dynamic tracking and adjustment of queue depth. (int)
parm:           ql2xqfullrampup:Number of seconds to wait to begin to ramp-up the queue depth for a device after a queue-full condition has been detected.  Default is 120 seconds. (int)
parm:           ql2xenablemsix:Set to enable MSI or MSI-X interrupt mechanism. Default is 1, enable MSI-X interrupt mechanism. 0 = enable traditional pin-based mechanism. 1 = enable MSI-X interrupt mechanism. 2 = enable MSI interrupt mechanism. (int)
parm:           ql2xshiftctondsd:Set to control shifting of command type processing based on total number of DSD. (int)
parm:           ql2xfwloadbin:Option to specify location from which to load ISP firmware: 1 -- load firmware from flash. 0 -- use default semantics. (int)
parm:           ql2xdbwr:Option to specify scheme for request queue posting 0 -- Regular doorbell. 1 -- (Default) CAMRAM doorbell (faster). (int)
parm:           ql2xdontresethba:1: Do not reset on failure, 0(Default): Reset on failure. (Debug) (int)
parm:           ql2xsetdevstate:1: Reset device state to COLD. (Debug) (int)
parm:           ql2xetsenable:Enables firmware ETS burst.Default is 0 - skip ETS enablement. (int)
parm:           ql2xtargetreset:Enable target reset.Default is 1 - use hw defaults. (int)
module_sig:     883f3504e177a155e46ee793d437541129baf09c95b89e7d21eff91f367d4ce3d0cc5d20ecc52109e256c13e71f38eb123bcb1f54285178b9362d2c


 # cat /sys/class/fc_host/host*/symbolic_name
HPAK344A FW:v5.03.16 DVR:v8.03.07.03.05.07-k
HPAK344A FW:v5.03.16 DVR:v8.03.07.03.05.07-k

These are HP branded QLogic adapters.
We are not having any performance issues nor do we think it's hw related (hba, cable, etc) since these messages appear on all boxes and are located in different data centers with different san hw, storages, zones etc etc.

Let me know if you need more info!

Cheers

Comment 11 Chris Williams 2012-01-18 18:24:52 UTC
Anecdotal evidence indicated a HBA firmware update resolved this for one of our customers.
Closing this as NOTABUG. However, if you are still experiencing this issue, please open a case with Red Hat Support via the Customer Portal.


Note You need to log in before you can comment on or make changes to this bug.