Bug 366181

Summary: NCQ issues with new laptop
Product: Red Hat Enterprise Linux 5 Reporter: Florian La Roche <laroche>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED WORKSFORME QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 5.1CC: dzickus, hashdump, peterm
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-01-30 09:44:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Tested patch with correct string to identify the disk. none

Description Florian La Roche 2007-11-04 21:53:29 UTC
Description of problem:

Installing RHEL5.1 onto a new latop Dell D630, I get the following lines
in /var/log/messages:

Oct 29 10:42:34 dudweiler kernel: ata1: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 29 10:42:34 dudweiler kernel: ata1.00: ATA-8: FUJITSU MHW2160BJ G2,
0085001A, max UDMA/100
Oct 29 10:42:34 dudweiler kernel: ata1.00: 312581808 sectors, multi 8: LBA48 NCQ
(depth 31/32)
Oct 29 10:42:34 dudweiler kernel: ata1.00: configured for UDMA/100

Oct 29 10:42:34 dudweiler kernel: ata3: SATA link down (SStatus 0 SControl 300)
Oct 29 10:42:34 dudweiler kernel:   Vendor: ATA       Model: FUJITSU MHW2160B 
Rev: 0085
Oct 29 10:42:34 dudweiler kernel:   Type:   Direct-Access                     
ANSI SCSI revision: 05
Oct 29 10:42:34 dudweiler kernel: SCSI device sda: 312581808 512-byte hdwr
sectors (160042 MB)
Oct 29 10:42:34 dudweiler kernel: sda: Write Protect is off
Oct 29 10:42:34 dudweiler kernel: SCSI device sda: drive cache: write back
Oct 29 10:42:34 dudweiler kernel: SCSI device sda: 312581808 512-byte hdwr
sectors (160042 MB)
Oct 29 10:42:34 dudweiler kernel: sda: Write Protect is off
Oct 29 10:42:34 dudweiler kernel: SCSI device sda: drive cache: write back
Oct 29 10:42:34 dudweiler kernel:  sda: sda1 sda2 sda3 sda4
Oct 29 10:42:34 dudweiler kernel: sd 0:0:0:0: Attached scsi disk sda


Oct 29 11:22:37 dudweiler kernel: ata1.00: exception Emask 0x2 SAct 0x3fe003
SErr 0x0 action 0x2 frozen
Oct 29 11:22:37 dudweiler kernel: ata1.00: (spurious completions during NCQ
issue=0x0 SAct=0x3fe003 FIS=004040a1:00001000)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:00:2c:ec:a3/00:00:01:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:08:ec:ec:a3/00:00:01:00:00/40 tag 1 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:68:2c:eb:a3/00:00:01:00:00/40 tag 13 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:70:74:eb:a3/00:00:01:00:00/40 tag 14 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:78:ec:eb:a3/00:00:01:00:00/40 tag 15 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:80:4c:ec:a3/00:00:01:00:00/40 tag 16 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:88:5c:ec:a3/00:00:01:00:00/40 tag 17 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/10:90:8c:ec:a3/00:00:01:00:00/40 tag 18 cdb 0x0 data 8192 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:98:8c:ed:a3/00:00:01:00:00/40 tag 19 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:a0:ac:eb:b3/00:00:01:00:00/40 tag 20 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd
61/08:a8:d4:eb:bb/00:00:01:00:00/40 tag 21 cdb 0x0 data 4096 out
Oct 29 11:22:37 dudweiler kernel:          res
40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation)
Oct 29 11:22:37 dudweiler kernel: ata1: soft resetting port
Oct 29 11:22:37 dudweiler kernel: ata1: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Oct 29 11:22:37 dudweiler kernel: ata1.00: configured for UDMA/100
Oct 29 11:22:37 dudweiler kernel: ata1: EH complete
Oct 29 11:22:37 dudweiler kernel: SCSI device sda: 312581808 512-byte hdwr
sectors (160042 MB)
Oct 29 11:22:37 dudweiler kernel: sda: Write Protect is off
Oct 29 11:22:37 dudweiler kernel: SCSI device sda: drive cache: write back




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Florian La Roche 2007-11-04 21:56:41 UTC
I'll test the following patch the next days to disable NCQ. Adding a kernel
param to disable ncq would also be great:

--- linux-2.6.18.i386/drivers/ata/libata-core.c.lr      2007-11-04
22:53:46.000000000 +0100
+++ linux-2.6.18.i386/drivers/ata/libata-core.c 2007-11-04 22:55:03.000000000 +0100
@@ -3786,6 +3786,8 @@
         { "WDC WD740ADFD-00",   NULL,          ATA_HORKAGE_NONCQ },
        /* http://thread.gmane.org/gmane.linux.ide/14907 */
        { "FUJITSU MHT2060BH",  NULL,           ATA_HORKAGE_NONCQ },
+       /* RH bz#???? */
+       { "FUJITSU MHW2160BJ",  NULL,           ATA_HORKAGE_NONCQ },
        /* NCQ is broken */
        { "Maxtor 6L250S0",     "BANC1G10",     ATA_HORKAGE_NONCQ },
        { "Maxtor 6B200M0",     "BANC1B10",     ATA_HORKAGE_NONCQ },


Comment 2 Florian La Roche 2007-11-06 12:04:36 UTC
Created attachment 248991 [details]
Tested patch with correct string to identify the disk.

Here a new patch as attachment with the real patch. Seems to work fine
here. Even better (optimum?) would be a change where a param can specify the
queue depth, with 0 disabling NCQ.

Let me know if this should also get submitted upstream somewhere to get
this into the upstream kernel.

regards and thanks for the beautiful kernel,

Florian La Roche

Comment 4 Florian La Roche 2008-01-30 09:44:20 UTC
Closing this, probably fixed by other patches already.

regards,

Florian La Roche


Comment 5 Jim Huang 2009-10-27 22:48:29 UTC
I am not clear exactly what is the root cause of the issue here.  
What are some of the symptoms besides seeing these error messages in the dmesg or /var/log/messages?

In my case, I am running this version of the RedHat.

[root@testbox ~]# uname -a
Linux testbox 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

[root@testbox ~]# uptime
 22:44:55 up 167 days, 14:25,  2 users,  load average: 1.34, 2.62, 3.89

I am only seeing this issue been reported after over 160 days of operation.  

Oct 26 00:47:44 testbox kernel: ata1.00: exception Emask 0x2 SAct 0x1 SErr 0x0 action 0x2 frozen
Oct 26 00:47:44 testbox kernel: ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x1 FIS=004040a1:00000100)
Oct 26 00:47:47 testbox kernel: ata1.00: cmd 61/08:00:4c:72:0f/00:00:30:00:00/40 tag 0 cdb 0x0 data 4096 out
Oct 26 00:47:47 testbox kernel:          res 40/00:04:4c:72:0f/00:00:30:00:00/40 Emask 0x2 (HSM violation)
Oct 26 00:47:48 testbox kernel: ata1: soft resetting port
Oct 26 00:47:48 testbox kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Oct 26 00:47:48 testbox kernel: ata1.00: configured for UDMA/133
Oct 26 00:47:48 testbox kernel: ata1: EH complete

Is this an hardware issue?  

This is the drive that I am using:

[root@testbox ~]# hdparm -I /dev/sda

/dev/sda:

ATA device, with non-removable media
	Model Number:       WDC WD1000FYPS-01ZKB0                   
	Serial Number:      WD-WCASJ0675404
	Firmware Revision:  02.01B01
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5
Standards:
	Supported: 8 7 6 5 
	Likely used: 8
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:  268435455
	LBA48  user addressable sectors: 1953525168
	device size with M = 1024*1024:      953869 MBytes
	device size with M = 1000*1000:     1000204 MBytes (1000 GB)
Capabilities:
	LBA, IORDY(can be disabled)
	Queue depth: 32
	Standby timer values: spec'd by Standard, with device specific minimum
	R/W multiple sector transfer: Max = 16	Current = 0
	Recommended acoustic management value: 128, current value: 254
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	SMART feature set
	    	Security Mode feature set
	   *	Power Management feature set
	   *	Write cache
	   *	Look-ahead
	   *	Host Protected Area feature set
	   *	WRITE_BUFFER command
	   *	READ_BUFFER command
	   *	NOP cmd
	   *	DOWNLOAD_MICROCODE
	    	Power-Up In Standby feature set
	   *	SET_FEATURES required to spinup after power up
	    	SET_MAX security extension
	   *	Automatic Acoustic Management feature set
	   *	48-bit Address feature set
	   *	Device Configuration Overlay feature set
	   *	Mandatory FLUSH_CACHE
	   *	FLUSH_CACHE_EXT
	   *	SMART error logging
	   *	SMART self-test
	   *	General Purpose Logging feature set
	   *	WRITE_{DMA|MULTIPLE}_FUA_EXT
	   *	64-bit World wide name
	   *	WRITE_UNCORRECTABLE command
	   *	{READ,WRITE}_DMA_EXT_GPL commands
	   *	Segmented DOWNLOAD_MICROCODE
	   *	SATA-I signaling speed (1.5Gb/s)
	   *	SATA-II signaling speed (3.0Gb/s)
	   *	Native Command Queueing (NCQ)
	   *	Phy event counters
	    	DMA Setup Auto-Activate optimization
	   *	Software settings preservation
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
	not	frozen
	not	expired: security count
		supported: enhanced erase
	260min for SECURITY ERASE UNIT. 260min for ENHANCED SECURITY ERASE UNIT.
Checksum: correct