Description of problem: Installing RHEL5.1 onto a new latop Dell D630, I get the following lines in /var/log/messages: Oct 29 10:42:34 dudweiler kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 29 10:42:34 dudweiler kernel: ata1.00: ATA-8: FUJITSU MHW2160BJ G2, 0085001A, max UDMA/100 Oct 29 10:42:34 dudweiler kernel: ata1.00: 312581808 sectors, multi 8: LBA48 NCQ (depth 31/32) Oct 29 10:42:34 dudweiler kernel: ata1.00: configured for UDMA/100 Oct 29 10:42:34 dudweiler kernel: ata3: SATA link down (SStatus 0 SControl 300) Oct 29 10:42:34 dudweiler kernel: Vendor: ATA Model: FUJITSU MHW2160B Rev: 0085 Oct 29 10:42:34 dudweiler kernel: Type: Direct-Access ANSI SCSI revision: 05 Oct 29 10:42:34 dudweiler kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) Oct 29 10:42:34 dudweiler kernel: sda: Write Protect is off Oct 29 10:42:34 dudweiler kernel: SCSI device sda: drive cache: write back Oct 29 10:42:34 dudweiler kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) Oct 29 10:42:34 dudweiler kernel: sda: Write Protect is off Oct 29 10:42:34 dudweiler kernel: SCSI device sda: drive cache: write back Oct 29 10:42:34 dudweiler kernel: sda: sda1 sda2 sda3 sda4 Oct 29 10:42:34 dudweiler kernel: sd 0:0:0:0: Attached scsi disk sda Oct 29 11:22:37 dudweiler kernel: ata1.00: exception Emask 0x2 SAct 0x3fe003 SErr 0x0 action 0x2 frozen Oct 29 11:22:37 dudweiler kernel: ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x3fe003 FIS=004040a1:00001000) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:00:2c:ec:a3/00:00:01:00:00/40 tag 0 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:08:ec:ec:a3/00:00:01:00:00/40 tag 1 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:68:2c:eb:a3/00:00:01:00:00/40 tag 13 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:70:74:eb:a3/00:00:01:00:00/40 tag 14 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:78:ec:eb:a3/00:00:01:00:00/40 tag 15 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:80:4c:ec:a3/00:00:01:00:00/40 tag 16 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:88:5c:ec:a3/00:00:01:00:00/40 tag 17 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/10:90:8c:ec:a3/00:00:01:00:00/40 tag 18 cdb 0x0 data 8192 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:98:8c:ed:a3/00:00:01:00:00/40 tag 19 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:a0:ac:eb:b3/00:00:01:00:00/40 tag 20 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1.00: cmd 61/08:a8:d4:eb:bb/00:00:01:00:00/40 tag 21 cdb 0x0 data 4096 out Oct 29 11:22:37 dudweiler kernel: res 40/00:a8:d4:eb:bb/00:00:01:00:00/40 Emask 0x2 (HSM violation) Oct 29 11:22:37 dudweiler kernel: ata1: soft resetting port Oct 29 11:22:37 dudweiler kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 29 11:22:37 dudweiler kernel: ata1.00: configured for UDMA/100 Oct 29 11:22:37 dudweiler kernel: ata1: EH complete Oct 29 11:22:37 dudweiler kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB) Oct 29 11:22:37 dudweiler kernel: sda: Write Protect is off Oct 29 11:22:37 dudweiler kernel: SCSI device sda: drive cache: write back Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I'll test the following patch the next days to disable NCQ. Adding a kernel param to disable ncq would also be great: --- linux-2.6.18.i386/drivers/ata/libata-core.c.lr 2007-11-04 22:53:46.000000000 +0100 +++ linux-2.6.18.i386/drivers/ata/libata-core.c 2007-11-04 22:55:03.000000000 +0100 @@ -3786,6 +3786,8 @@ { "WDC WD740ADFD-00", NULL, ATA_HORKAGE_NONCQ }, /* http://thread.gmane.org/gmane.linux.ide/14907 */ { "FUJITSU MHT2060BH", NULL, ATA_HORKAGE_NONCQ }, + /* RH bz#???? */ + { "FUJITSU MHW2160BJ", NULL, ATA_HORKAGE_NONCQ }, /* NCQ is broken */ { "Maxtor 6L250S0", "BANC1G10", ATA_HORKAGE_NONCQ }, { "Maxtor 6B200M0", "BANC1B10", ATA_HORKAGE_NONCQ },
Created attachment 248991 [details] Tested patch with correct string to identify the disk. Here a new patch as attachment with the real patch. Seems to work fine here. Even better (optimum?) would be a change where a param can specify the queue depth, with 0 disabling NCQ. Let me know if this should also get submitted upstream somewhere to get this into the upstream kernel. regards and thanks for the beautiful kernel, Florian La Roche
Closing this, probably fixed by other patches already. regards, Florian La Roche
I am not clear exactly what is the root cause of the issue here. What are some of the symptoms besides seeing these error messages in the dmesg or /var/log/messages? In my case, I am running this version of the RedHat. [root@testbox ~]# uname -a Linux testbox 2.6.18-53.1.13.el5 #1 SMP Mon Feb 11 13:27:27 EST 2008 x86_64 x86_64 x86_64 GNU/Linux [root@testbox ~]# uptime 22:44:55 up 167 days, 14:25, 2 users, load average: 1.34, 2.62, 3.89 I am only seeing this issue been reported after over 160 days of operation. Oct 26 00:47:44 testbox kernel: ata1.00: exception Emask 0x2 SAct 0x1 SErr 0x0 action 0x2 frozen Oct 26 00:47:44 testbox kernel: ata1.00: (spurious completions during NCQ issue=0x0 SAct=0x1 FIS=004040a1:00000100) Oct 26 00:47:47 testbox kernel: ata1.00: cmd 61/08:00:4c:72:0f/00:00:30:00:00/40 tag 0 cdb 0x0 data 4096 out Oct 26 00:47:47 testbox kernel: res 40/00:04:4c:72:0f/00:00:30:00:00/40 Emask 0x2 (HSM violation) Oct 26 00:47:48 testbox kernel: ata1: soft resetting port Oct 26 00:47:48 testbox kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Oct 26 00:47:48 testbox kernel: ata1.00: configured for UDMA/133 Oct 26 00:47:48 testbox kernel: ata1: EH complete Is this an hardware issue? This is the drive that I am using: [root@testbox ~]# hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: WDC WD1000FYPS-01ZKB0 Serial Number: WD-WCASJ0675404 Firmware Revision: 02.01B01 Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5 Standards: Supported: 8 7 6 5 Likely used: 8 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 268435455 LBA48 user addressable sectors: 1953525168 device size with M = 1024*1024: 953869 MBytes device size with M = 1000*1000: 1000204 MBytes (1000 GB) Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, with device specific minimum R/W multiple sector transfer: Max = 16 Current = 0 Recommended acoustic management value: 128, current value: 254 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE Power-Up In Standby feature set * SET_FEATURES required to spinup after power up SET_MAX security extension * Automatic Acoustic Management feature set * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name * WRITE_UNCORRECTABLE command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * SATA-I signaling speed (1.5Gb/s) * SATA-II signaling speed (3.0Gb/s) * Native Command Queueing (NCQ) * Phy event counters DMA Setup Auto-Activate optimization * Software settings preservation Security: Master password revision code = 65534 supported not enabled not locked not frozen not expired: security count supported: enhanced erase 260min for SECURITY ERASE UNIT. 260min for ENHANCED SECURITY ERASE UNIT. Checksum: correct