Created attachment 379978 [details] Kernel log via dmesg Periodically, I get the following exception from one of the three SATA drives I have attached to an onboard Marvell 88SE6145 SATA II PCI-E controller. The kernel is configured to use AHCI for the Marvell controller via the ahci.marvell_enable=1 kernel parameter. The error does not occur for drives on the primary ICH7 Intel controller (also using AHCI). The error also does not occur for the same drive swapped onto the ICH7 controller. Dec 23 00:19:37 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 23 00:19:37 zeus kernel: ata6.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 23 00:19:37 zeus kernel: res 40/00:00:00:22:e2/00:00:00:00:00/e0 Emask 0x4 (timeout) Dec 23 00:19:37 zeus kernel: ata6.00: status: { DRDY } Dec 23 00:19:37 zeus kernel: ata6: hard resetting link Dec 23 00:19:37 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 23 00:19:37 zeus smartd[1989]: Device: /dev/sdd, Read SMART Self Test Log Failed Dec 23 00:19:37 zeus kernel: ata6.00: configured for UDMA/133 Dec 23 00:19:37 zeus kernel: ata6: EH complete Note that the exception often occurs when doing smart commands. Normally smart commands work fine but every so often there is a pause or failure in the smart command executed, with a corresponding entry in /var/log/messages similar to the above. For example, here is the result of doing smartctl -a /dev/sdd several times until the problem was replicated: Dec 23 01:20:36 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 23 01:20:36 zeus kernel: ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Dec 23 01:20:36 zeus kernel: res 40/00:00:30:f8:64/00:00:17:00:00/e0 Emask 0x4 (timeout) Dec 23 01:20:36 zeus kernel: ata6.00: status: { DRDY } Dec 23 01:20:36 zeus kernel: ata6: hard resetting link Dec 23 01:20:36 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 23 01:20:36 zeus kernel: ata6.00: configured for UDMA/133 Dec 23 01:20:36 zeus kernel: ata6: EH complete Version-Release number of selected component (if applicable): $ rpm -q kernel kernel-2.6.31.6-166.fc12@x86_64 How reproducible: Reproducible quite easily by running smartctl -a <drive> repeatedly until the problem occurs. Steps to Reproduce: 1. Attach drive to Marvell secondary controller. 2. Run smartctl -a /dev/sdx repeatedly until there is a pause. 3. Check message.log for the error. Additional info: I have tried with and without acpi=off. Problem occurs either way. I will attach the output of lspci -vv and a full kernel log via dmesg.
Created attachment 379980 [details] lspci -vv output
Oh, and BTW NCQ for the drives on the secondary Marvell controller is disabled -- not by choice -- it seems NCQ is not supported by this controller in this configuration: # cat /sys/block/sd[def]/device/queue_depth 1 1 1 # echo 31 > /sys/block/sdd/device/queue_depth -bash: echo: write error: Invalid argument The drives on the Intel ICH7 controller (sda, sdb, and sdc) all have NCQ enabled and are working fine: # cat /sys/block/sd[abc]/device/queue_depth 31 31 31
Bug #46225 seems to be related, with lots of people seeing the same type of error. However, that bug was fixed in 2.6.27.15. The only difference is that I am using a Marvell controller via the ahci driver instead of sata_mv. I'm just shooting in the dark, but perhaps the same race condition(s) fixed by Mark Lord in the sata_mv driver should also be applied to the ahci driver?
(In reply to comment #3) > Bug #46225 seems to be related Sorry for the mailstorm, the related bug is #462425.
While up to now the error seemed to be benign i.e. it did not cause any problems with my RAID-5 array or with filesystem read/write operations, since the disks were mostly idle, I decided to do an overnight disk test with "fio surface-scan" to make sure all was well before putting real data on the drives. About two-fifths of the way through the test fio crapped out with a i/o error: # fio /usr/share/doc/fio-1.34/examples/surface-scan write-phase: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=sync, iodepth=1 verify-phase: (g=1): rw=read, bs=64K-64K/64K-64K, ioengine=sync, iodepth=1 Starting 2 threads write-phase: Laying out IO file(s) (1 file(s) / 17592186044415MB) fio: pid=25313, err=30/file:engines/sync.c:56, func=xfer, error=Read-only file system fio: pid=25313, err=5/file:engines/sync.c:56, func=xfer, error=Input/output error Here are the corresponding kernel logs -- you can see that all the drives on the marvell sata controller (sdd, sde, and sdf) appear to have entered a state where smartd cannot even access them: ============================================================== Dec 25 03:24:00 zeus kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 25 03:24:00 zeus kernel: ata8.00: cmd 35/00:80:81:81:cc/00:00:1d:00:00/e0 tag 0 dma 65536 out Dec 25 03:24:00 zeus kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 25 03:24:00 zeus kernel: ata8.00: status: { DRDY } Dec 25 03:24:00 zeus kernel: ata8: hard resetting link Dec 25 03:24:00 zeus kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 03:24:05 zeus kernel: ata8.00: qc timeout (cmd 0xec) Dec 25 03:24:05 zeus kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:24:05 zeus kernel: ata8.00: revalidation failed (errno=-5) Dec 25 03:24:05 zeus kernel: ata8: hard resetting link Dec 25 03:24:06 zeus kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 03:24:16 zeus kernel: ata8.00: qc timeout (cmd 0xec) Dec 25 03:24:16 zeus kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:24:16 zeus kernel: ata8.00: revalidation failed (errno=-5) Dec 25 03:24:16 zeus kernel: ata8: limiting SATA link speed to 1.5 Gbps Dec 25 03:24:16 zeus kernel: ata8: hard resetting link Dec 25 03:24:17 zeus kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 25 03:24:45 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 25 03:24:45 zeus kernel: ata6.00: cmd e5/00:00:00:00:00/00:00:00:00:00/00 tag 0 Dec 25 03:24:45 zeus kernel: res 40/00:00:80:81:cc/00:00:1d:00:00/e0 Emask 0x4 (timeout) Dec 25 03:24:45 zeus kernel: ata6.00: status: { DRDY } Dec 25 03:24:45 zeus kernel: ata6: hard resetting link Dec 25 03:24:46 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 03:24:47 zeus kernel: ata8.00: qc timeout (cmd 0xec) Dec 25 03:24:47 zeus kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:24:47 zeus kernel: ata8.00: revalidation failed (errno=-5) Dec 25 03:24:47 zeus kernel: ata8.00: disabled Dec 25 03:24:47 zeus kernel: ata8.00: device reported invalid CHS sector 0 Dec 25 03:24:47 zeus kernel: ata8: hard resetting link Dec 25 03:24:48 zeus kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 25 03:24:48 zeus kernel: ata8: EH complete Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 499941761 Dec 25 03:24:48 zeus kernel: raid5: Disk failure on sdf1, disabling device. Dec 25 03:24:48 zeus kernel: raid5: Operation continuing on 3 devices. Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 488055273 Dec 25 03:24:48 zeus kernel: raid5:md1: read error not correctable (sector 488055272 on sdf1). Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 488055281 Dec 25 03:24:48 zeus kernel: raid5:md1: read error not correctable (sector 488055280 on sdf1). Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 488055169 Dec 25 03:24:51 zeus kernel: ata6.00: qc timeout (cmd 0xec) Dec 25 03:24:51 zeus kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:24:51 zeus kernel: ata6.00: revalidation failed (errno=-5) Dec 25 03:24:51 zeus kernel: ata6: hard resetting link Dec 25 03:24:52 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 03:25:02 zeus kernel: ata6.00: qc timeout (cmd 0xec) Dec 25 03:25:03 zeus kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:25:03 zeus kernel: ata6.00: revalidation failed (errno=-5) Dec 25 03:25:03 zeus kernel: ata6: limiting SATA link speed to 1.5 Gbps Dec 25 03:25:03 zeus kernel: ata6: hard resetting link Dec 25 03:25:03 zeus kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 25 03:25:19 zeus kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 25 03:25:19 zeus kernel: ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Dec 25 03:25:19 zeus kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 25 03:25:19 zeus kernel: ata7.00: status: { DRDY } Dec 25 03:25:19 zeus kernel: ata7: hard resetting link Dec 25 03:25:20 zeus kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 03:25:25 zeus kernel: ata7.00: qc timeout (cmd 0xec) Dec 25 03:25:25 zeus kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:25:25 zeus kernel: ata7.00: revalidation failed (errno=-5) Dec 25 03:25:25 zeus kernel: ata7: hard resetting link Dec 25 03:25:26 zeus kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 03:25:33 zeus kernel: ata6.00: qc timeout (cmd 0xec) Dec 25 03:25:34 zeus kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:25:34 zeus kernel: ata6.00: revalidation failed (errno=-5) Dec 25 03:25:34 zeus kernel: ata6.00: disabled Dec 25 03:25:34 zeus kernel: ata6: hard resetting link Dec 25 03:25:35 zeus kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 25 03:25:35 zeus kernel: ata6: EH complete Dec 25 03:25:35 zeus kernel: sd 5:0:0:0: [sdd] Unhandled error code Dec 25 03:25:35 zeus kernel: sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Dec 25 03:25:35 zeus kernel: end_request: I/O error, dev sdd, sector 976767873 Dec 25 03:25:35 zeus kernel: end_request: I/O error, dev sdd, sector 976767873 Dec 25 03:25:35 zeus kernel: md: super_written gets error=-5, uptodate=0 Dec 25 03:25:35 zeus kernel: raid5: Disk failure on sdd1, disabling device. Dec 25 03:25:35 zeus kernel: raid5: Operation continuing on 2 devices. Dec 25 03:25:36 zeus kernel: ata7.00: qc timeout (cmd 0xec) Dec 25 03:25:37 zeus kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:25:37 zeus kernel: ata7.00: revalidation failed (errno=-5) Dec 25 03:25:37 zeus kernel: ata7: limiting SATA link speed to 1.5 Gbps Dec 25 03:25:37 zeus kernel: ata7: hard resetting link Dec 25 03:25:37 zeus kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 25 03:26:07 zeus kernel: ata7.00: qc timeout (cmd 0xec) Dec 25 03:26:08 zeus kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 25 03:26:08 zeus kernel: ata7.00: revalidation failed (errno=-5) Dec 25 03:26:08 zeus kernel: ata7.00: disabled Dec 25 03:26:08 zeus kernel: ata7.00: device reported invalid CHS sector 0 Dec 25 03:26:08 zeus kernel: ata7: hard resetting link Dec 25 03:26:09 zeus kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 25 03:26:09 zeus kernel: ata7: EH complete Dec 25 03:26:09 zeus kernel: sd 6:0:0:0: [sde] Unhandled error code Dec 25 03:26:09 zeus kernel: sd 6:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK Dec 25 03:26:09 zeus kernel: end_request: I/O error, dev sde, sector 976767873 Dec 25 03:26:09 zeus kernel: end_request: I/O error, dev sde, sector 976767873 Dec 25 03:26:09 zeus kernel: md: super_written gets error=-5, uptodate=0 Dec 25 03:26:09 zeus kernel: raid5: Disk failure on sde1, disabling device. Dec 25 03:26:09 zeus kernel: raid5: Operation continuing on 1 devices. Dec 25 03:26:09 zeus kernel: RAID5 conf printout: Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1 Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1 Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1 Dec 25 03:26:09 zeus kernel: disk 2, o:0, dev:sde1 Dec 25 03:26:09 zeus kernel: disk 3, o:0, dev:sdf1 Dec 25 03:26:09 zeus kernel: RAID5 conf printout: Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1 Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1 Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1 Dec 25 03:26:09 zeus kernel: disk 2, o:0, dev:sde1 Dec 25 03:26:09 zeus kernel: RAID5 conf printout: Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1 Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1 Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1 Dec 25 03:26:09 zeus kernel: disk 2, o:0, dev:sde1 Dec 25 03:26:09 zeus kernel: RAID5 conf printout: Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1 Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1 Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1 Dec 25 03:26:09 zeus kernel: RAID5 conf printout: Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1 Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1 Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1 Dec 25 03:26:09 zeus kernel: RAID5 conf printout: Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1 Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1 Dec 25 03:26:09 zeus kernel: Buffer I/O error on device dm-2, logical block 187170824 Dec 25 03:26:09 zeus kernel: lost page write due to I/O error on dm-2 Dec 25 03:26:09 zeus kernel: Aborting journal on device dm-2:8. Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2): ext4_journal_start_sb: Detected aborted journal Dec 25 03:26:09 zeus kernel: EXT4-fs (dm-2): Remounting filesystem read-only Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2) in ext4_reserve_inode_write: Journal has aborted Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2) in ext4_orphan_del: Journal has aborted Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2) in ext4_reserve_inode_write: Journal has aborted Dec 25 03:26:09 zeus kernel: Buffer I/O error on device dm-2, logical block 183009280 Dec 25 03:26:09 zeus kernel: lost page write due to I/O error on dm-2 Dec 25 03:26:09 zeus kernel: JBD2: I/O error detected when updating journal superblock for dm-2:8. Dec 25 03:37:18 zeus smartd[2048]: Device: /dev/sdd, not capable of SMART self-check Dec 25 03:37:18 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:19 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:19 zeus smartd[2048]: Device: /dev/sdd, failed to read SMART Attribute Data Dec 25 03:37:19 zeus smartd[2048]: Device: /dev/sdd, Read SMART Self Test Log Failed Dec 25 03:37:19 zeus smartd[2048]: Device: /dev/sdd, Read SMART Error Log Failed Dec 25 03:37:19 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:20 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:20 zeus smartd[2048]: Device: /dev/sde, not capable of SMART self-check Dec 25 03:37:20 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:22 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:22 zeus smartd[2048]: Device: /dev/sde, failed to read SMART Attribute Data Dec 25 03:37:22 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:23 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:23 zeus smartd[2048]: Device: /dev/sde, Read SMART Self Test Log Failed Dec 25 03:37:23 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:24 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:24 zeus smartd[2048]: Device: /dev/sde, Read SMART Error Log Failed Dec 25 03:37:24 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:25 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:25 zeus smartd[2048]: Device: /dev/sdf, not capable of SMART self-check Dec 25 03:37:25 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:26 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:26 zeus smartd[2048]: Device: /dev/sdf, failed to read SMART Attribute Data Dec 25 03:37:26 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:28 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:28 zeus smartd[2048]: Device: /dev/sdf, Read SMART Self Test Log Failed Dec 25 03:37:28 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:29 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:29 zeus smartd[2048]: Device: /dev/sdf, Read SMART Error Log Failed Dec 25 03:37:29 zeus smartd[2048]: Sending warning via mail to root ... Dec 25 03:37:30 zeus smartd[2048]: Warning via mail to root: successful Dec 25 03:37:30 zeus ata_id[2974]: HDIO_GET_IDENTITY failed for '/dev/sdf' ============================================================== BTW, merry Christmas!
(In reply to comment #5) > While up to now the error seemed to be benign i.e. it did not cause any > problems with my RAID-5 array or with filesystem read/write operations, since > the disks were mostly idle, I decided to do an overnight disk test with "fio > surface-scan" to make sure all was well before putting real data on the > drives. About two-fifths of the way through the test fio crapped out with a > i/o error: A few more notes: 1) The system has smartd and hddtemp enabled, so I believe a periodic smart/hddtemp command caused the problem even during the fio surface-scan test, rather than it being a function of load. 2) I can replicate the primary problem as originally reported (exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen) with no filesystem access to the disks at all (the RAID array is stopped) just by issuing smartctl -a /dev/sdd commands repeatedly. The problem usually occurs within about 10-15 executions. 3) I verified the problem does not appear to be based on load. With smartd and hddtemp disabled for sdd, sde, and sdf, I re-executed the fio test with no problems at all. So this *appears* to be exclusively an issue with issuing smart/hddtemp commands via the Marvell controller. 4) At the suggestion of Robert Hancock on the linux-ide list [1], I verified the power supply voltages while the disks were under load. Both the 5V and 12V outputs seemed to be correct and steady. Jives with #3 -- pure loading with no smart/hddtemp commands has up till now not triggered the exception. [1] http://thread.gmane.org/gmane.linux.ide/39913/focus=39947
I have also seen something similar three times in about three weeks (on 64bit Fedora 12). In my case it is fatal as the drives become unusable until the machine is powered down and back up (not just reset). This is with an Intel 82801JB ICH10 2-port SATA controller and Western Digital WD10EADS drives. It is on a Supermicro X8SAX motherboard which has 6 SATA ports. 4 on a 4-port controller and 2 on this 2-port controller. All three errors have been on the 2-port controller and have prevented all access to the two disks attached once the error has happened. Unfortunately I have taken kernel updates since I noted the error so I can't give a full report. Relevant output from /var/log/messages however was: Dec 29 00:28:04 jr-f12 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 29 00:28:04 jr-f12 kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Dec 29 00:28:04 jr-f12 kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 29 00:28:04 jr-f12 kernel: ata2.00: status: { DRDY } Dec 29 00:28:04 jr-f12 kernel: ata2.00: hard resetting link Dec 29 00:28:04 jr-f12 kernel: ata2.01: hard resetting link Dec 29 00:28:10 jr-f12 kernel: ata2.00: link is slow to respond, please be patient (ready=0) Dec 29 00:28:14 jr-f12 kernel: ata2.00: SRST failed (errno=-16) Dec 29 00:28:14 jr-f12 kernel: ata2.00: hard resetting link Dec 29 00:28:14 jr-f12 kernel: ata2.01: hard resetting link Dec 29 00:28:20 jr-f12 kernel: ata2.00: link is slow to respond, please be patient (ready=0) Dec 29 00:28:24 jr-f12 kernel: ata2.00: SRST failed (errno=-16) Dec 29 00:28:24 jr-f12 kernel: ata2.00: hard resetting link Dec 29 00:28:24 jr-f12 kernel: ata2.01: hard resetting link Dec 29 00:28:30 jr-f12 kernel: ata2.00: link is slow to respond, please be patient (ready=0) Dec 29 00:28:59 jr-f12 kernel: ata2.00: SRST failed (errno=-16) Dec 29 00:28:59 jr-f12 kernel: ata2.00: limiting SATA link speed to 1.5 Gbps Dec 29 00:28:59 jr-f12 kernel: ata2.01: limiting SATA link speed to 1.5 Gbps Dec 29 00:28:59 jr-f12 kernel: ata2.00: hard resetting link Dec 29 00:28:59 jr-f12 kernel: ata2.01: hard resetting link Dec 29 00:29:04 jr-f12 kernel: ata2.00: SRST failed (errno=-16) Dec 29 00:29:04 jr-f12 kernel: ata2.00: reset failed, giving up Dec 29 00:29:04 jr-f12 kernel: ata2.00: disabled Dec 29 00:29:04 jr-f12 kernel: ata2.01: disabled Dec 29 00:29:04 jr-f12 kernel: ata2.00: device reported invalid CHS sector 0 Dec 29 00:29:04 jr-f12 kernel: end_request: I/O error, dev sdb, sector 1953524992 Dec 29 00:29:04 jr-f12 kernel: md: super_written gets error=-5, uptodate=0 Dec 29 00:29:04 jr-f12 kernel: raid5: Disk failure on sdb, disabling device. Dec 29 00:29:04 jr-f12 kernel: raid5: Operation continuing on 3 devices. Dec 29 00:29:04 jr-f12 kernel: ata2: EH complete Dec 29 00:29:04 jr-f12 kernel: end_request: I/O error, dev sdc, sector 1953524992 Dec 29 00:29:04 jr-f12 kernel: md: super_written gets error=-5, uptodate=0 Dec 29 00:29:04 jr-f12 kernel: raid5: Disk failure on sdc, disabling device. Dec 29 00:29:04 jr-f12 kernel: raid5: Operation continuing on 2 devices.
This bug is possibly related (mentioned by Tim Small on the linux-ide list): http://bugzilla.kernel.org/show_bug.cgi?id=14831 As I reported on the linux-ide list [1], when the script attached to that bug is run on my system against any of the drives attached to my Marvell controller, the script causes drive failure within one minute (with the cascading failure ending in failure to IDENTIFY reported previously in comment #5). I have only seen the cascading failure occur if the disks are being accessed at the same time as the exception occurs (as in Tim Small's test script). I also note that when the cascading failure does occur, command EA - ATA_CMD_FLUSH_EXT - is one of the commands executing during the failure process (though not the first one, which tends on my system to be B0 - ATA_CMD_SMART, or EC - ATA_CMD_ID_ATA). EA is the same command shown in the exception reported in comment #7. [1] http://thread.gmane.org/gmane.linux.ide/39913/focus=44247
Problem still occurring with kernel 2.6.31.12-174.2.3.fc12.x86_64. I have reduced the incidence of this problem by turning off smartd and hddtemp for the drives attached to the Marvel controller but occasionally still get an error. For example, this morning: Jan 28 20:00:10 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jan 28 20:00:10 zeus kernel: ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Jan 28 20:00:10 zeus kernel: res 40/00:00:0f:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout) Jan 28 20:00:10 zeus kernel: ata6.00: status: { DRDY } Jan 28 20:00:10 zeus kernel: ata6: hard resetting link Jan 28 20:00:10 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 28 20:00:10 zeus kernel: ata6.00: configured for UDMA/133 Jan 28 20:00:10 zeus kernel: ata6: EH complete
I have the same problem with 2.6.32.11-99.fc12.x86_64 Apr 10 20:48:43 Gaetan64 kernel: ata3.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x6 frozen Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:00:08:42:8d/00:00:0f:00:00/40 tag 0 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:08:40:41:8d/00:00:0f:00:00/40 tag 1 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:10:50:41:8d/00:00:0f:00:00/40 tag 2 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:18:60:41:8d/00:00:0f:00:00/40 tag 3 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:20:78:41:8d/00:00:0f:00:00/40 tag 4 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:28:90:41:8d/00:00:0f:00:00/40 tag 5 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:30:a0:41:8d/00:00:0f:00:00/40 tag 6 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:38:c0:41:8d/00:00:0f:00:00/40 tag 7 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:40:18:42:8d/00:00:0f:00:00/40 tag 8 ncq 4096 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/18:48:28:42:8d/00:00:0f:00:00/40 tag 9 ncq 12288 in Apr 10 20:48:43 Gaetan64 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY } Apr 10 20:48:43 Gaetan64 kernel: ata3: hard resetting link Apr 10 20:48:43 Gaetan64 kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 10 20:48:43 Gaetan64 kernel: ata3.00: configured for UDMA/133 Apr 10 20:48:43 Gaetan64 kernel: ata3.00: device reported invalid CHS sector 0 The smart report of all hdd are ok
Problem still happening on 2.6.32.11-99.fc12.x86_64: Apr 23 20:00:11 zeus kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Apr 23 20:00:11 zeus kernel: ata8.00: failed command: IDENTIFY DEVICE Apr 23 20:00:11 zeus kernel: ata8.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Apr 23 20:00:11 zeus kernel: res 40/00:00:42:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout) Apr 23 20:00:11 zeus kernel: ata8.00: status: { DRDY } Apr 23 20:00:11 zeus kernel: ata8: hard resetting link Apr 23 20:00:11 zeus kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 23 20:00:11 zeus kernel: ata8.00: configured for UDMA/133 Apr 23 20:00:11 zeus kernel: ata8: EH complete
i have found an solution : add the option line to grub to disable ncq : libata.force=noncq for me, with this, i have no froze
On my system, none of the drives involved have NCQ enabled, since the controller does not support it. See Comment #2.
Wow, glad that I found this bug. Exactly the same happens here on Fedora 13 (not 12), 32 Bit, with all kernel versions/updates since F13 was released (currently 2.6.33.8-149.fc13.i686.PAE). But I don't have a Marvell controller but something from Nvidia (Asus M2NPV-VM, Nvidia nForce 430, GeForce 6150, sata_nv driver). There's nothing to configure in the BIOS because on these Nvidia chipsets, AHCI is done in the driver. Kernel options like "libata.noacpi=1", "libata.force=noncq" or "sata_nv.adma=0" didn't help. I can trigger the error with "smartctl", too. Unfortunately, this bug destroys my RAID-1 if there's disk access when it happens. And rebuilding 1 TB of data takes some hours. :-( One time it happened during boot procedure and basically killed the system (had to recover from CD). My SATA drives are two harddisks of type "SAMSUNG HD103SJ". The errors also show up in the SMART Error Log. They all seem to be related to commands like "SMART READ DATA", "IDENTIFY DEVICE" and "SMART READ ATTRIBUTE THRESHOLDS". Never had this problem with PATA disks (same machine).
This bug blocks my install of Fedora 13 64-bit on a Mac Mini (mid 2010) with a Corsair SSD. None of the "work-arounds" such as libata.force=noncq seem to fix the issue. I'd be happy to provide dmesg output, etc, if it would be of some use. However, it is identical to other comments in this bug.
I think I can resolved this probelm. I am using centos5.5-x64:kernel version is 2.6.18-194-x64. I met this problem. I modifed the code: driver/ata/ahci: ahci_port_start: [root@nfs-server linux-2.6.18-194.x86_64]# diff -Naur drivers/ata/ahci.c ahci.c --- drivers/ata/ahci.c 2010-10-15 17:18:55.000000000 +0000 +++ ahci.c 2010-10-27 01:27:55.000000000 +0000 @@ -429,7 +429,7 @@ /* board_ahci_mv */ { AHCI_HFLAGS (AHCI_HFLAG_NO_NCQ | AHCI_HFLAG_NO_MSI | - AHCI_HFLAG_MV_PATA), + AHCI_HFLAG_MV_PATA | AHCI_HFLAG_NO_PMP), .flags = ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY | ATA_FLAG_MMIO | ATA_FLAG_PIO_DMA, .pio_mask = 0x1f, /* pio0-4 */ @@ -2330,6 +2330,7 @@ static int ahci_port_start(struct ata_port *ap) { struct device *dev = ap->host->dev; + struct pci_dev *pdev = to_pci_dev(dev); struct ahci_port_priv *pp; void *mem; dma_addr_t mem_dma; @@ -2379,7 +2380,11 @@ * Save off initial list of interrupts to be enabled. * This could be changed later */ - pp->intr_mask = DEF_PORT_IRQ; + if(pdev->vendor==0x11ab && pdev->device == 0x6145)//marvell 88se6145 + pp->intr_mask = (DEF_PORT_IRQ) & (~(PORT_IRQ_PIOS_FIS)); + else + pp->intr_mask = DEF_PORT_IRQ; I did not know why? At present,I readed the SATA spec, wanting to know the reason. Someone can exam this path?If success,I can deliver this path,as my first path. Thanks!
I tested a week. The result is fine. PIO read PIO write and normal DMA operation hotplug also fine. So I think this bug can closed.
(In reply to comment #17) > I tested a week. The result is fine. PIO read PIO write and normal DMA > operation hotplug also fine. > So I think this bug can closed. Have you submitted this patch upstream? If so, could you provide a link to the upstream patch submission? I don't think the bug here gets closed until the patch is confirmed as working and is available in the Fedora repositories.
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
I upgraded to Fedora 14 yesterday and see that this problem is still happening. From my logs: Dec 16 14:04:14 zeus kernel: [32870.720065] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 16 14:04:14 zeus kernel: [32870.720072] ata6.00: failed command: SMART Dec 16 14:04:14 zeus kernel: [32870.720080] ata6.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 16 14:04:14 zeus kernel: [32870.720082] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 16 14:04:14 zeus kernel: [32870.720086] ata6.00: status: { DRDY } Dec 16 14:04:14 zeus kernel: [32870.720093] ata6: hard resetting link Dec 16 14:04:14 zeus kernel: [32871.025048] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 16 14:04:14 zeus kernel: [32871.027395] ata6.00: configured for UDMA/133 Dec 16 14:04:14 zeus kernel: [32871.027414] ata6: EH complete # uname -a Linux zeus 2.6.35.9-64.fc14.x86_64 #1 SMP Fri Dec 3 12:19:41 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
Yes, I can second that. The SATA errors still occur with Fedora 14 (about every 1-2 days) # uname -a Linux lakai 2.6.35.10-72.fc14.i686 #1 SMP Mon Dec 20 22:05:49 UTC 2010 i686 i686 i386 GNU/Linux (excerpt from /var/log/messages) Dec 25 05:42:58 lakai kernel: [56212.373428] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Dec 25 05:42:58 lakai kernel: [56212.373436] ata1.00: failed command: SMART Dec 25 05:42:58 lakai kernel: [56212.373443] ata1.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 25 05:42:58 lakai kernel: [56212.373445] res 51/84:00:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error) Dec 25 05:42:58 lakai kernel: [56212.373449] ata1.00: status: { DRDY ERR } Dec 25 05:42:58 lakai kernel: [56212.373451] ata1.00: error: { ICRC ABRT } Dec 25 05:42:58 lakai kernel: [56212.373458] ata1: hard resetting link Dec 25 05:42:58 lakai kernel: [56212.373461] ata1: nv: skipping hardreset on occupied port Dec 25 05:42:58 lakai kernel: [56212.827054] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 05:42:58 lakai kernel: [56212.846436] ata1.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) Dec 25 05:42:58 lakai kernel: [56212.846441] ata1.00: revalidation failed (errno=-5) Dec 25 05:43:03 lakai kernel: [56217.827038] ata1: hard resetting link Dec 25 05:43:03 lakai kernel: [56217.827042] ata1: nv: skipping hardreset on occupied port Dec 25 05:43:04 lakai kernel: [56218.281040] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 05:43:04 lakai kernel: [56218.319324] ata1.00: configured for UDMA/133 Dec 25 05:43:04 lakai kernel: [56218.319358] ata1: EH complete As said in Bug comment #14, my hardware setup is different, so the problem seems to be more general.
This seems to be happening more frequently now, even with smartd and hddtemp turned off for these drives. The failed commands are either "SMART" or "IDENTIFY DEVICE". See below. ============================================================== # grep -A 8 exception /var/log/messages Dec 19 19:04:16 zeus kernel: [310072.720043] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 19 19:04:16 zeus kernel: [310072.720050] ata6.00: failed command: SMART Dec 19 19:04:16 zeus kernel: [310072.720058] ata6.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 19 19:04:16 zeus kernel: [310072.720059] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Dec 19 19:04:16 zeus kernel: [310072.720073] ata6.00: status: { DRDY } Dec 19 19:04:16 zeus kernel: [310072.720078] ata6: hard resetting link Dec 19 19:04:16 zeus kernel: [310073.025061] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 19 19:04:16 zeus kernel: [310073.027358] ata6.00: configured for UDMA/133 Dec 19 19:04:16 zeus kernel: [310073.027382] ata6: EH complete -- Dec 20 16:04:16 zeus kernel: [385672.720212] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 20 16:04:16 zeus kernel: [385672.720219] ata6.00: failed command: SMART Dec 20 16:04:16 zeus kernel: [385672.720227] ata6.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 20 16:04:16 zeus kernel: [385672.720228] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 20 16:04:16 zeus kernel: [385672.720232] ata6.00: status: { DRDY } Dec 20 16:04:16 zeus kernel: [385672.720240] ata6: hard resetting link Dec 20 16:04:16 zeus kernel: [385673.025074] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 20 16:04:16 zeus kernel: [385673.027455] ata6.00: configured for UDMA/133 Dec 20 16:04:16 zeus kernel: [385673.027471] ata6: EH complete -- Dec 20 23:04:16 zeus kernel: [410872.705189] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 20 23:04:16 zeus kernel: [410872.705194] ata6.00: failed command: SMART Dec 20 23:04:16 zeus kernel: [410872.705198] ata6.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 20 23:04:16 zeus kernel: [410872.705199] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Dec 20 23:04:16 zeus kernel: [410872.705202] ata6.00: status: { DRDY } Dec 20 23:04:16 zeus kernel: [410872.705207] ata6: hard resetting link Dec 20 23:04:16 zeus kernel: [410873.010415] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 20 23:04:16 zeus kernel: [410873.012707] ata6.00: configured for UDMA/133 Dec 20 23:04:16 zeus kernel: [410873.012748] ata6: EH complete -- Dec 22 23:04:16 zeus kernel: [583672.704056] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 22 23:04:16 zeus kernel: [583672.704064] ata7.00: failed command: SMART Dec 22 23:04:16 zeus kernel: [583672.704072] ata7.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 22 23:04:16 zeus kernel: [583672.704073] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 22 23:04:16 zeus kernel: [583672.704078] ata7.00: status: { DRDY } Dec 22 23:04:16 zeus kernel: [583672.704086] ata7: hard resetting link Dec 22 23:04:16 zeus kernel: [583673.010041] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 22 23:04:16 zeus kernel: [583673.012419] ata7.00: configured for UDMA/133 Dec 22 23:04:16 zeus kernel: [583673.012437] ata7: EH complete Dec 22 23:34:16 zeus kernel: [585472.704175] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 22 23:34:16 zeus kernel: [585472.704181] ata6.00: failed command: SMART Dec 22 23:34:16 zeus kernel: [585472.704189] ata6.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 22 23:34:16 zeus kernel: [585472.704191] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 22 23:34:16 zeus kernel: [585472.704195] ata6.00: status: { DRDY } Dec 22 23:34:16 zeus kernel: [585472.704203] ata6: hard resetting link Dec 22 23:34:16 zeus kernel: [585473.009174] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 22 23:34:16 zeus kernel: [585473.011526] ata6.00: configured for UDMA/133 Dec 22 23:34:16 zeus kernel: [585473.011559] ata6: EH complete -- Dec 24 05:04:16 zeus kernel: [691672.720041] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 24 05:04:16 zeus kernel: [691672.720047] ata6.00: failed command: IDENTIFY DEVICE Dec 24 05:04:16 zeus kernel: [691672.720055] ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Dec 24 05:04:16 zeus kernel: [691672.720057] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout) Dec 24 05:04:16 zeus kernel: [691672.720061] ata6.00: status: { DRDY } Dec 24 05:04:16 zeus kernel: [691672.720069] ata6: hard resetting link Dec 24 05:04:16 zeus kernel: [691673.025073] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 24 05:04:16 zeus kernel: [691673.027400] ata6.00: configured for UDMA/133 Dec 24 05:04:16 zeus kernel: [691673.027418] ata6: EH complete -- Dec 24 05:34:16 zeus kernel: [693472.704175] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 24 05:34:16 zeus kernel: [693472.704181] ata6.00: failed command: IDENTIFY DEVICE Dec 24 05:34:16 zeus kernel: [693472.704189] ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Dec 24 05:34:16 zeus kernel: [693472.704191] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout) Dec 24 05:34:16 zeus kernel: [693472.704195] ata6.00: status: { DRDY } Dec 24 05:34:16 zeus kernel: [693472.704203] ata6: hard resetting link Dec 24 05:34:16 zeus kernel: [693473.009165] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 24 05:34:16 zeus kernel: [693473.011526] ata6.00: configured for UDMA/133 Dec 24 05:34:16 zeus kernel: [693473.011552] ata6: EH complete -- Dec 24 15:34:16 zeus kernel: [729472.704061] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 24 15:34:16 zeus kernel: [729472.704068] ata6.00: failed command: IDENTIFY DEVICE Dec 24 15:34:16 zeus kernel: [729472.704076] ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Dec 24 15:34:16 zeus kernel: [729472.704078] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout) Dec 24 15:34:16 zeus kernel: [729472.704082] ata6.00: status: { DRDY } Dec 24 15:34:16 zeus kernel: [729472.704090] ata6: hard resetting link Dec 24 15:34:16 zeus kernel: [729473.010041] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 24 15:34:16 zeus kernel: [729473.012327] ata6.00: configured for UDMA/133 Dec 24 15:34:16 zeus kernel: [729473.012348] ata6: EH complete -- Dec 25 08:04:16 zeus kernel: [788872.704075] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 25 08:04:16 zeus kernel: [788872.704082] ata7.00: failed command: SMART Dec 25 08:04:16 zeus kernel: [788872.704090] ata7.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 25 08:04:16 zeus kernel: [788872.704092] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 25 08:04:16 zeus kernel: [788872.704096] ata7.00: status: { DRDY } Dec 25 08:04:16 zeus kernel: [788872.704104] ata7: hard resetting link Dec 25 08:04:16 zeus kernel: [788873.009301] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 08:04:16 zeus kernel: [788873.011659] ata7.00: configured for UDMA/133 Dec 25 08:04:16 zeus kernel: [788873.011676] ata7: EH complete -- Dec 25 18:04:16 zeus kernel: [824872.704118] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 25 18:04:16 zeus kernel: [824872.704124] ata7.00: failed command: IDENTIFY DEVICE Dec 25 18:04:16 zeus kernel: [824872.704132] ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Dec 25 18:04:16 zeus kernel: [824872.704134] res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout) Dec 25 18:04:16 zeus kernel: [824872.704138] ata7.00: status: { DRDY } Dec 25 18:04:16 zeus kernel: [824872.704146] ata7: hard resetting link Dec 25 18:04:16 zeus kernel: [824873.011302] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 25 18:04:16 zeus kernel: [824873.013643] ata7.00: configured for UDMA/133 Dec 25 18:04:16 zeus kernel: [824873.013661] ata7: EH complete ==============================================================
Still happening on 2.6.35.10-74, though frequency seems to be less. # uname -a Linux zeus 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux Dec 29 13:04:10 zeus kernel: [128122.720420] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 29 13:04:10 zeus kernel: [128122.720427] ata7.00: failed command: SMART Dec 29 13:04:10 zeus kernel: [128122.720434] ata7.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Dec 29 13:04:10 zeus kernel: [128122.720436] res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Dec 29 13:04:10 zeus kernel: [128122.720440] ata7.00: status: { DRDY } Dec 29 13:04:10 zeus kernel: [128122.720448] ata7: hard resetting link Dec 29 13:04:11 zeus kernel: [128123.025427] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 29 13:04:11 zeus kernel: [128123.027849] ata7.00: configured for UDMA/133 Dec 29 13:04:11 zeus kernel: [128123.027880] ata7: EH complete
I was getting "failed command: IDENTIFY DEVICE" and "FLUSH CACHE" in Ubuntu 10.04, but after having installed Fedore Core 14, the problem doesn't happen anymore. Here is my Fedora dmesg: http://pastie.org/1534327 More details at my StackExchange question: http://askubuntu.com/questions/16608/how-do-you-fix-failed-command-identify-device-showing-up-in-dmesg
Neil -- interesting. You seem to be using an older Fedora 14 kernel -- can you try with the latest one? Because I'm still having this happen at least once/day with 2.6.35.10-74.fc14.x86_64, even with smart turned off.
Are you currently using any special kernel parameters for F14? can you upload, for a normal boot (with no special parameters / booting the live-cd may be enough to see the error, at least for me) # lspci -vvv At any point did you try the kernel parameter pcie_aspm=off (if true, did it work?, and in that case also upload a lspci -vvv)
So far in five days with kernel 2.6.35.11-83.fc14.x86_64, I haven't had this exception, though I am still running without smartd and hddtemp. Will continue to monitor and if the exception does not occur again for a couple of weeks, I'll take the next step and re-enable smartd and hddtemp.
@Raman Gupta: Unfortunately I use Ubuntu, and have installed over Fedora Core 14 with Ubuntu 10.10 Maverick, but I don't get any "failed command:" strings in my dmesg anymore. The Kernel Ubuntu 10.10 uses is: 2.6.35-25-generic @Reartes Guillermo: I'll try booting off the livecd and get the output of `lspci -vvv` for you soon.
Created attachment 481078 [details] lspci -vvv as of 02/25/2011, kernel 2.6.35.11-83.fc14.x86_64 Update on kernel 2.6.35.11-83.fc14.x86_64. The problem happened again. In my case I have no special kernel parameters except the one necessary to use the AHCI driver with my Marvell controller card: ahci.marvell_enable=1. The drives do not work at all without this setting. I have attached the results of lspci -vvv. Note there are two SATA controllers in this machine -- the one that gives this problem is: 03:00.0 SATA controller: Marvell Technology Group Ltd. 88SE6145 SATA II PCI-E controller (rev a1) (prog-if 8f)
Same problem here on: Linux frontend-0 2.6.35.11-83.fc14.x86_64 #1 SMP Mon Feb 7 07:06:44 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux [57188.721348] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [57188.721497] ata6.00: failed command: READ DMA EXT [57188.721639] ata6.00: cmd 25/00:28:cf:44:15/00:01:09:00:00/e0 tag 0 dma 151552 in [57188.721641] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [57188.722189] ata6.00: status: { DRDY } [57188.722349] ata6: hard resetting link [57189.178055] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [57189.184355] ata6.00: configured for UDMA/100 [57189.184497] ata6.00: device reported invalid CHS sector 0 [57189.184643] ata6: EH complete [57432.817442] ata6.00: limiting speed to UDMA/33:PIO4 [57432.817585] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [57432.817724] ata6.00: failed command: READ DMA EXT [57432.817863] ata6.00: cmd 25/00:88:b7:a4:7c/00:01:0b:00:00/e0 tag 0 dma 200704 in [57432.817865] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [57432.818416] ata6.00: status: { DRDY } [57432.818579] ata6: hard resetting link [57433.275055] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [57433.281354] ata6.00: configured for UDMA/33 [57433.281498] ata6.00: device reported invalid CHS sector 0 [57433.281643] ata6: EH complete [57714.729296] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [57714.729442] ata6.00: failed command: READ DMA EXT [57714.729584] ata6.00: cmd 25/00:b0:3f:70:35/00:03:0e:00:00/e0 tag 0 dma 483328 in [57714.729586] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [57714.730127] ata6.00: status: { DRDY } [57714.730279] ata6: hard resetting link [57715.186058] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [57715.192356] ata6.00: configured for UDMA/33 [57715.192499] ata6.00: device reported invalid CHS sector 0 [57715.192648] ata6: EH complete [57778.721359] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [57778.721508] ata6.00: failed command: READ DMA EXT [57778.721652] ata6.00: cmd 25/00:00:df:0f:9a/00:04:0e:00:00/e0 tag 0 dma 524288 in [57778.721655] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [57778.722199] ata6.00: status: { DRDY } [57778.722357] ata6: hard resetting link [57779.178057] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [57779.184356] ata6.00: configured for UDMA/33 [57779.184501] ata6.00: device reported invalid CHS sector 0 [57779.184654] ata6: EH complete [57820.721362] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [57820.721510] ata6.00: failed command: READ DMA EXT [57820.721653] ata6.00: cmd 25/00:f8:47:f4:af/00:02:0e:00:00/e0 tag 0 dma 389120 in [57820.721655] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [57820.722201] ata6.00: status: { DRDY } [57820.722365] ata6: hard resetting link
I have the same problem with rhel5 and vanilla kernel 2.6.37.3 (I need the new kernel to switch marvell to AHCI). It is asus hummingbird motherboard with two controllers: 00:1f.2 SATA controller: Intel Corporation N10/ICH7 Family SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0]) no problem with this one. 04:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b2) (prog-if 8f) this one causes the problems.
ata4.00: exception Emask 0x10 SAct 0x1 SErr 0x90200 action 0xe frozen [ 169.470937] ata4.00: irq_stat 0x00400000, PHY RDY changed [ 169.473085] ata4: SError: { Persist PHYRdyChg 10B8B } [ 169.475236] ata4.00: failed command: READ FPDMA QUEUED [ 169.477392] ata4.00: cmd 60/08:00:00:b0:b3/00:00:2b:00:00/40 tag 0 ncq 4096 in [ 169.477394] res 40/00:00:00:b0:b3/00:00:2b:00:00/40 Emask 0x10 (ATA bus error) [ 169.481670] ata4.00: status: { DRDY } [ 169.483793] ata4: hard resetting link [ 173.423038] ata4: softreset failed (device not ready) x86-64 Fedora 14, 15beta(livecd); 00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA (prog-if 01 [AHCI 1.0]) Subsystem: Micro-Star International Co., Ltd. Device 7329 Flags: bus master, 66MHz, medium devsel, latency 96, IRQ 22 I/O ports at b000 [size=8] I/O ports at a000 [size=4] I/O ports at 9000 [size=8] I/O ports at 8000 [size=4] I/O ports at 7000 [size=16] Memory at fe8ff800 (32-bit, non-prefetchable) [size=1K] Capabilities: <access denied> Kernel driver in use: ahci Is it a) controller out of order b) Mainboard issue c) Power issue
Did anybody ever tried booting with the kernel parameter: pcie_aspm=off ? For me it worked (nvidia), despite that i dont know really why. (by accident?) I noticed that it changes pcie/pci config stuff not related to pcie_aspm in my case at least (of course, it ALSO disables pcie_aspm functionality) I noticed it comparing different distros/kernels with and without pcie_aspm. Lspci -vvv showed that in my case, F13 changed link speed and others. But the parameter solved my problems, at least empirically. I now run F15 with litle or no issues, i still have F13 installed to perform test if desired.
Addendum: For the SB600 controller, the right thing to do is to restrict all drives to 1.5Gbps by jumpers or with a boot option. http://forums.opensuse.org/english/get-technical-help-here/hardware/444228-delay-libata-when-connecting-esata.html @Guillermo, PCIE_ASPM is also set by bios. Possibly it is mis-configured. I will check my system
I am still having this problem on kernel 2.6.35.13-91.fc14.x86_64. > Did anybody ever tried booting with the kernel parameter: pcie_aspm=off ? I tried this and it did not fix the problem. > Addendum: For the SB600 controller, the right thing to do is to restrict all drives to 1.5Gbps by jumpers or with a boot option. I also tried this -- but with this setting all drives attached to my Marvell controller could not even be started by the kernel -- permanent "failed to IDENTIFY" errors.
I have the same problem. Yesterday, i update my fedora from 14 to 15 and got the same messages in dmesg: [ 1400.032052] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 1400.032064] ata2.00: failed command: FLUSH CACHE EXT [ 1400.032080] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [ 1400.032083] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1400.032090] ata2.00: status: { DRDY } [ 1400.032102] ata2: hard resetting link [ 1400.337036] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1400.346998] ata2.00: configured for UDMA/133 [ 1400.347032] ata2.00: retrying FLUSH 0xea Emask 0x4 [ 1400.347094] ata2.00: device reported invalid CHS sector 0 [ 1400.347110] ata2: EH complete I tried all the recipes of the above and nothing works.
Still happening on Fedora 14 kernel 2.6.35.14-95.fc14.x86_64: Aug 29 18:55:37 zeus kernel: [27239.712064] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Aug 29 18:55:37 zeus kernel: [27239.712072] ata8.00: failed command: SMART Aug 29 18:55:37 zeus kernel: [27239.712080] ata8.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in Aug 29 18:55:37 zeus kernel: [27239.712082] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 29 18:55:37 zeus kernel: [27239.712086] ata8.00: status: { DRDY } Aug 29 18:55:37 zeus kernel: [27239.712094] ata8: hard resetting link Aug 29 18:55:37 zeus kernel: [27240.017054] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 29 18:55:37 zeus kernel: [27240.019373] ata8.00: configured for UDMA/133 Aug 29 18:55:37 zeus kernel: [27240.019395] ata8: EH complete
Seeing this on Fedora 15 at 2.6.40.6-0.fc15.x86_64 on MSI mb with AMD SB850 controller and ST31000528AS Rev: CC38 There are 4 drives in this machine, but this only appears to happen on this particular one. [82172.704050] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [82172.704109] ata2.00: failed command: WRITE DMA EXT [82172.704166] ata2.00: cmd 35/00:00:00:3c:36/00:04:0f:00:00/e0 tag 0 dma 524288 out [82172.704167] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [82172.704383] ata2.00: status: { DRDY } [82172.704444] ata2: hard resetting link [82182.704036] ata2: softreset failed (1st FIS failed) [82182.704100] ata2: hard resetting link [82186.428033] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [82186.430445] ata2.00: configured for UDMA/33 [82186.430509] ata2.00: device reported invalid CHS sector 0 [82186.430573] ata2: EH complete [82433.712040] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [82433.712099] ata2.00: failed command: WRITE DMA EXT [82433.713594] ata2.00: cmd 35/00:00:00:84:3c/00:04:0f:00:00/e0 tag 0 dma 524288 out [82433.713595] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [82433.713811] ata2.00: status: { DRDY } [82433.713867] ata2: hard resetting link [82440.118034] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [82440.331449] ata2.00: configured for UDMA/33 [82440.331509] ata2.00: device reported invalid CHS sector 0 [82440.331568] ata2: EH complete
Correction: Last item was using SB710 controller, but similar symptoms using add-on SiI 3512 adapter
Replaced Seagate ST31000528AS with WDC WD20EARS, same symptom. Also replaced cable. This is definitely not a hardware problem, but it only appears to happen with recent vintage drives 1TB or greater. Smaller drives on this same system are not affected.
Still happening on 2.6.41.10-3.fc15.x86_64 --- ASUS P8H67-M PRO/CSM LGA 1155 Corporate Stable Model Intel H67 DDR3 1333 Western Digital WD20EARS --- Feb 11 13:05:46 <hostname> kernel: [167513.519922] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 frozen Feb 11 13:05:46 <hostname> kernel: [167513.519925] ata1.01: failed command: IDENTIFY DEVICE Feb 11 13:05:46 <hostname> kernel: [167513.519929] ata1.01: cmd ec/00:01:00:00:00/00:00:00:00:00/10 tag 0 pio 512 in Feb 11 13:05:46 <hostname> kernel: [167513.519930] res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) Feb 11 13:05:46 <hostname> kernel: [167513.519932] ata1.01: status: { DRDY } Feb 11 13:05:46 <hostname> kernel: [167513.519938] ata1.00: hard resetting link Feb 11 13:05:46 <hostname> kernel: [167513.824625] ata1.01: hard resetting link Feb 11 13:05:52 <hostname> kernel: [167519.325108] ata1.00: link is slow to respond, please be patient (ready=0) Feb 11 13:05:56 <hostname> kernel: [167523.554630] ata1.00: SRST failed (errno=-16) Feb 11 13:05:56 <hostname> kernel: [167523.554636] ata1.00: hard resetting link Feb 11 13:05:56 <hostname> kernel: [167523.859387] ata1.01: hard resetting link Feb 11 13:06:02 <hostname> kernel: [167529.359866] ata1.00: link is slow to respond, please be patient (ready=0) Feb 11 13:06:06 <hostname> kernel: [167533.589398] ata1.00: SRST failed (errno=-16) Feb 11 13:06:06 <hostname> kernel: [167533.589405] ata1.00: hard resetting link Feb 11 13:06:06 <hostname> kernel: [167533.894151] ata1.01: hard resetting link Feb 11 13:06:12 <hostname> kernel: [167539.394636] ata1.00: link is slow to respond, please be patient (ready=0) Feb 11 13:06:41 <hostname> kernel: [167568.593678] ata1.00: SRST failed (errno=-16) Feb 11 13:06:41 <hostname> kernel: [167568.593684] ata1.00: limiting SATA link speed to 1.5 Gbps Feb 11 13:06:41 <hostname> kernel: [167568.593688] ata1.01: limiting SATA link speed to 1.5 Gbps Feb 11 13:06:41 <hostname> kernel: [167568.593693] ata1.00: hard resetting link Feb 11 13:06:41 <hostname> kernel: [167568.898431] ata1.01: hard resetting link Feb 11 13:06:46 <hostname> kernel: [167573.634563] ata1.00: SRST failed (errno=-16) Feb 11 13:06:46 <hostname> kernel: [167573.645857] ata1.00: reset failed, giving up Feb 11 13:06:46 <hostname> kernel: [167573.645859] ata1.00: disabled Feb 11 13:06:46 <hostname> kernel: [167573.645860] ata1.01: disabled Feb 11 13:06:46 <hostname> kernel: [167573.645870] ata1: EH complete Feb 11 13:06:46 <hostname> ata_id[23723]: HDIO_GET_IDENTITY failed for '/dev/sr0' --- Feb 11 18:31:58 <hostname> smartd[839]: Device: /dev/sda [SAT], SLEEP mode ignored due to reached limit of skipped checks (10 checks skipped) Feb 11 18:31:58 <hostname> smartd[839]: Device: /dev/sda [SAT], not capable of SMART self-check Feb 11 18:31:58 <hostname> smartd[839]: Sending warning via /usr/libexec/smartmontools/smartdnotify to root ... Feb 11 18:31:58 <hostname> smartd[839]: Warning via /usr/libexec/smartmontools/smartdnotify to root: successful WARNING: Your hard drive is failing Device: /dev/sda [SAT], not capable of SMART self-check Feb 11 18:31:58 <hostname> smartd[839]: Device: /dev/sda [SAT], failed to read SMART Attribute Data Feb 11 18:31:58 <hostname> smartd[839]: Sending warning via /usr/libexec/smartmontools/smartdnotify to root ... Feb 11 18:31:58 <hostname> smartd[839]: Warning via /usr/libexec/smartmontools/smartdnotify to root: successful WARNING: Your hard drive is failing Device: /dev/sda [SAT], failed to read SMART Attribute Data --- Message from syslogd@<hostname> at Feb 12 02:35:04 ... kernel:[216031.865936] journal commit I/O error Message from syslogd@<hostname> at Feb 12 02:35:04 ... kernel:[216031.965940] journal commit I/O error Message from syslogd@<hostname> at Feb 12 02:35:04 ... kernel:[216032.065959] journal commit I/O error Message from syslogd@<hostname> at Feb 12 02:35:04 ... kernel:[216032.165924] journal commit I/O error Message from syslogd@<hostname> at Feb 12 02:35:05 ... kernel:[216032.265933] journal commit I/O error --- 2012-02-12 02:35:05.081 TFW, Error: safe_write(): File I/O errcnt: 1 eno: Read-only file system (30) errno: 30 2012-02-12 02:35:05.111 TFW: safe_write(): funky usleep . . . 2012-02-12 07:18:33.386 TFW: safe_write(): funky usleep 2012-02-12 07:18:33.418 TFW, Error: safe_write(): File I/O errcnt: 3 eno: Read-only file system (30) errno: 30
Replaced failed Seagate ST4200320AS with WDC WD5000AAKX-0, and problem appeared on the new drive. This is still happening as of kernel 3.2.7-1.fc16.x86_64. My original report of this was over four months ago, and there has not been any update despite others reporting identical issues up through Fedora 15. It seems to be related to recent drives, as older models do not have the problem. I am successfully running the large drives previously reported on USB adapters since they do not work directly connected on SATA, however this new drive is part of a RAID1 pair, so this is not practical. Can we get any idea whether or not this is being looked at? This is all recent hardware, and nothing exotic. Any assistance would be appreciated.
Created attachment 571505 [details] dmesg after successful boot with errors Kubuntu 11.10, ia64.
Created attachment 571506 [details] lspci -vvv after successful boot with errors Kubuntu 11.10, ia64.
Bug's occurring for me in Kubuntu 11.10, kernel 3.0.0-16-generic. I didn't have any problems for some months and have checked the drive and replaced the cable and all appears well. Sometimes it has no problem at all, but usually I have to power-cycle (not reset) my PC multiple times for it to boot. I'm attaching lspci -vvv and dmesg.
[Raman, please bump version to 16 per comment #43] I'm seeing a similar exception here reading from an optical drive (k3b verify phase, so heavy i/o): [140064.704061] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [140064.704075] sr 2:0:0:0: [sr0] CDB: Read(10): 28 00 00 1e 08 00 00 00 01 00 [140064.704104] ata3.00: cmd a0/01:00:00:00:08/00:00:00:00:00/a0 tag 0 dma 2048 in [140064.704107] res 40/00:03:00:00:fc/00:00:00:00:00/a0 Emask 0x4 (timeout) [140064.704115] ata3.00: status: { DRDY } [140064.704125] ata3: hard resetting link [140065.164062] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [140065.165171] ata3.00: configured for UDMA/100 [140065.170683] ata3: EH complete on: 3.3.0-4.fc16.x86_64 00:11.0 SATA controller: Advanced Micro Devices [AMD] Hudson SATA Controller [AHCI mode] (rev 40) I'm really not clear on what software tools we have available to isolate these bugs among linux, chipset, bios, and drive firmware bugs. Suggestions appreciated.
Seagate replaced original 500GB drive with ST3500320AS. Original drive had no problem until it failed. New drive fails out of the box on Linux using SATA interface, but works perfectly plugged into USB adapter. Common denominator appears to be that these are SATA-III (6GB/s) drives running on SATA-II (3GB/s) controllers. Slowing SATA-II drives to 1.5GB/s via jumpers corrected this on older drives, but SATA-III drives can only be slowed to 3GB/s via jumper and experience this error consistently. As I said, I've tested this using both on-board and add-on SATA controllers, and with both Seagate and WD drives, and the error is consistent. Latest versions lock up so bad it takes a power-off to make the drive visible again, but tests show the drive is fine. System is F16 at latest kernel. Willing to re-test any combination of hardware I have available if it will help.
Is this related to the closed bug (not for me) Bug 577936. I have a similar configuration like in 577936, and I get more or less the same errors - READ FPDMA QUEUED and exception emask. Hardware: Asus p6t Asus U3S6 sata3/usb3 card (uses the marvel 88se9123 sata3 controller) Crucial SSD C300 (128GB)
We have seen this issue as well since upgrading to RedHat6 from Suse10. ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x6 frozen ata3: SError: { Handshk } ata3.00: failed command: WRITE DMA EXT ata3.00: cmd 35/00:02:53:50:2b/00:00:1d:00:00/e0 tag 0 dma 1024 out res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata3.00: status: { DRDY } ata3: hard resetting link ata3: nv: skipping hardreset on occupied port We had tried lots things. But what fix the issue was setting acpi=off at the kernel line. Some observations from 130 node cluster... 1. The issue would occur after the nodes sat idle. 2. 1TB seagate drive didn't have the issue and only 750GB seagate drives did. 3. acpi=off always fixed the issue. So it would seem that this is related to acpi. It turned out that the 1TB drives didn't support APM_level setting and the 750GB did. With acpi=off the kernel sets the 750GB drives APM_level to 254, which is the highest. With acpi=on the kernel set APM_level to 60. I think whats happening is the sata_nv driver is not dealing with a drive in a power save mode and marks the drive as frozen. Or its not able to wake up the drive.... You can disable APM_level by hdparm -B 255 /dev/sda I wrote a udev rule to do this KERNEL=="sd*", BUS=="scsi", SYSFS{vendor}=="ATA", PROGRAM=="/sbin/hdparm -B 255 /dev/%k"
(In reply to comment #50) > But what fix the issue was setting acpi=off at the kernel line. Some > observations from 130 node cluster... You may be seeing a different issue. As per "Additional info:" section of the bug description, setting acpi=off does NOT fix the issue reported.
I've now seen this 2 times in the last month. Not often, but worrisome. My setup is 2 Seagate 1.5T drives (ST31500341AS) a sata cdrom and a Seagate 500G (ST3500830AS). The 1.5T drives are using mdraid in RAID1 and the 500G is stand-alone. When this happens it drops the bus that has one of the 1.5T drives and the 500G drive and I have to reboot to get them to respond again. None of these drives support the hdparm -B option so I can't tell if it is a powerdown problem or not. It doesn't happen often enough for me to have any luck trying different kernel options. The motherboard is a Gigabyte Z68P-DS3 and the SATA chipset is: Intel Corporation 6 Series/C200 Series Chipset Family 4 port SATA IDE Controller (rev 05) (prog-if 8f [Master SecP SecO PriP PriO]) Here's the relevant part of /var/log/messages: Jul 10 21:18:16 lister kernel: [136925.562492] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jul 10 21:18:16 lister kernel: [136925.562497] ata2.00: failed command: FLUSH CACHE EXT Jul 10 21:18:16 lister kernel: [136925.562511] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Jul 10 21:18:16 lister kernel: [136925.562512] res 40/00:01:00:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout) Jul 10 21:18:16 lister kernel: [136925.562514] ata2.00: status: { DRDY } Jul 10 21:18:16 lister kernel: [136925.562521] ata2.00: hard resetting link Jul 10 21:18:16 lister kernel: [136925.867157] ata2.01: hard resetting link Jul 10 21:18:22 lister kernel: [136931.366661] ata2.00: link is slow to respond, please be patient (ready=0) Jul 10 21:18:26 lister kernel: [136935.595344] ata2.00: SRST failed (errno=-16) Jul 10 21:18:26 lister kernel: [136935.595351] ata2.00: hard resetting link Jul 10 21:18:26 lister kernel: [136935.900017] ata2.01: hard resetting link Jul 10 21:18:32 lister kernel: [136941.399485] ata2.00: link is slow to respond, please be patient (ready=0) Jul 10 21:18:36 lister kernel: [136945.628177] ata2.00: SRST failed (errno=-16) Jul 10 21:18:36 lister kernel: [136945.628186] ata2.00: hard resetting link Jul 10 21:18:36 lister kernel: [136945.932937] ata2.01: hard resetting link Jul 10 21:18:42 lister kernel: [136951.432347] ata2.00: link is slow to respond, please be patient (ready=0) Jul 10 21:19:11 lister kernel: [136980.625846] ata2.00: SRST failed (errno=-16) Jul 10 21:19:11 lister kernel: [136980.625850] ata2.00: limiting SATA link speed to 1.5 Gbps Jul 10 21:19:11 lister kernel: [136980.625853] ata2.01: limiting SATA link speed to 1.5 Gbps Jul 10 21:19:11 lister kernel: [136980.625857] ata2.00: hard resetting link Jul 10 21:19:11 lister kernel: [136980.930475] ata2.01: hard resetting link Jul 10 21:19:16 lister kernel: [136985.665730] ata2.00: SRST failed (errno=-16) Jul 10 21:19:16 lister kernel: [136985.676674] ata2.00: reset failed, giving up Jul 10 21:19:16 lister kernel: [136985.676677] ata2.00: disabled Jul 10 21:19:16 lister kernel: [136985.676680] ata2.01: disabled Jul 10 21:19:16 lister kernel: [136985.676683] ata2.00: device reported invalid CHS sector 0 Jul 10 21:19:16 lister kernel: [136985.676689] ata2: EH complete
Forgot to add: kernel is 3.4.4-4.fc16.x86_64
OP here... I replaced the machine that was producing this error. My new machines have significantly newer and different hardware, and do not demonstrate this problem. Therefore, I will not be able to confirm resolution of this (doesn't seem like it ever will be closed, or even worked on, anyway).
Started getting these after attaching (unformatted) external WDC WD3200AUDX-7 drive via eSATA. Initial connection: ata6: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen ata6: irq_stat 0x00400040, connection status changed ata6: SError: { PHYRdyChg CommWake DevExch } ata6: hard resetting link ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.00: ATA-8: WDC WD3200AUDX-73MK3Y0, 01.01A01, max UDMA/133 ata6.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32), AA ata6.00: configured for UDMA/133 ata6: EH complete scsi 5:0:0:0: Direct-Access ATA WDC WD3200AUDX-7 01.0 PQ: 0 ANSI: 5 sd 5:0:0:0: [sdd] 625142448 512-byte logical blocks: (320 GB/298 GiB) sd 5:0:0:0: [sdd] 4096-byte physical blocks sd 5:0:0:0: [sdd] Write Protect is off sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:0:0:0: Attached scsi generic sg4 type 0 sdd: unknown partition table sd 5:0:0:0: [sdd] Attached SCSI disk Thereafter these repeat, looks like every 30 mins (which would coincide w/ smartd config): ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata6.00: failed command: IDENTIFY DEVICE ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata6.00: status: { DRDY } ata6: hard resetting link ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata6.00: configured for UDMA/133 ata6: EH complete smartd is running and configured to skip check in sleep and standby.
There are 17 people (not counting me nor the original reporter). Brief description: @Raman Gupta * ORIGINAL REPORTER * F12, then F14 Intel N10/ICH7 + Marvell Technology Group Ltd. 88SE6145 + SEAGATE * exeptions on drives attached to marvell controller (usually smart commands) hard reset, timeouts, revalidation failed, i/o errors, invalid sector 0 * +2 Years have passed. * NO LONGER HAS THE SYSTEM TO REPRODUCE THE ISSUE. (comment #54) * If the issue is fixed, since the original system is gone, it cannot be verified. And he is right, the bug-report (as it is now) appears stalled. @djones256 * F12 Intel 82801JB ICH10 + WDC * exeptions on 2-port controller SRST failed -16, link is slow to respond, hard reset link, port disabled, i/o error, invalid sector 0 @Gaetan Cambier * F12 unkown controller(s) * exeptions failed command, hard reset, invalid sector 0 FIXED = libata.force=noncq @Andreas M. Kirchwitz * F13, then F14 Nvidia nForce 430, GeForce 6150, sata_nv driver Samsung hardisk no logs, but claims a raid1 is forced to resync or worse. @Michael Hughes * F13 F13 cannot be installed on Mac Mini 2010 with corsair SSD. no logs, but claims his logs are the same to this bug-report, but does not define any comment# number, so it is unkown. @Neil Hooey * Has failed to identify and flush cache errors on ubuntu 10.04. F14 does not have them. Later ubuntu fixed them. @majianpeng * Con comment #13, contains a patch. I don't know what happened to that patch. @Marc Eberhard * F14 exceptions, hard resets, link speed, invalid sector 0 controller unknown @Tomas Kouba * RHEL5 Intel Corporation N10/ICH7 + Marvell Technology Group Ltd. 88SE6121 No logs, but the issues are with the marvell controller. @A. Mani * F15 beta * controller: ATI Technologies Inc SB600 exeptions, Persist PHYRdyChg 10B8B , ata bus error, hard reset, soft reset failed, device not ready. He posted in comment #35: Addendum: For the SB600 controller, the right thing to do is to restrict all drives to 1.5Gbps by jumpers or with a boot option. (so, in that case, one might want to force to 1.5Gbps ONLY the SB600 sata ports) @Nikita Staroverov * F14, F15 exeptions, failed command, hard reset, link speed, retrying flush, invalid sector 0, unknown controller @Ken Hall * F15 controller SB750 + seagate exeptions, failed command, hard reset, soft reset failed (1st fis failed), link speed, invalid sector 0 Similar simptoms on SiI 3512 adapter changed cable, hdd. @Mike Eddy * F15 ASUS P8H67-M PRO/CSM LGA 1155 Corporate Stable Model Intel H67 DDR3 1333 Western Digital WD20EARS exceptions, failed command, timeouts, hard resets, link is slow to respond, srst failed, cdrom fails to identify, problems with smart. @Bill McGonigle * F16 controller amd hudson exeptions, had reset reading optical disk @M8R-7fin56 Asus p6t Asus U3S6 sata3/usb3 card (uses the marvel 88se9123 sata3 controller) Crucial SSD C300 (128GB) exeptions, points to another bugreport comment #49. @mahmoud hanafi * RHEL6 controller: sata_nv??? exeptions, failed command, timout, hard reset, skipping hardreset on occupied port He uses: acpi=off @Brian C. Lane * F16 Controller: Intel Corporation 6 Series/C200 Series Chipset Family 4 port SATA IDE Controller exeptions, failed command, hard reset, link slow to respond, srst failed -16, limiting sata link speed, reset failed, giving up, ata port disabled. @Aaron Hamid unknown controller exeptions, HYRdyChg CommWake DevExch, hard reset, timeout, maybe others. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After reading (briefly) all comments, i do feel some of them are (or were) exeperiencing a different issue.
I recommend to close this bug-report. (too many different hw and slighly different error messages. There are several people in this bug-report that have slighly different issues (or at least different messages). Even different hardware (controllers) are involved. It has become very difficult to read and track which one did what unless one reads all again. I propose this template, for opening new bug-reports (see next comment), but only if the following criteria is met: * The issue you have does happen or is reproductible in F16 or F17 (or F18-rawhide) If the issue is still there, see next comment. Once the bug-report is opened, please let the maintainer mark them as duplicate.
Proposed template for new bug-reports: Issue description: Describe the issue you have. On which controller On which disks On which controller NEVER happened. When does it happen, what whas the system doing? Idly, heavy io, unknown. Put a snipped (partial errro message) on the first comment. For example: Dec 23 01:20:36 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 23 01:20:36 zeus kernel: ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Dec 23 01:20:36 zeus kernel: res 40/00:00:30:f8:64/00:00:17:00:00/e0 Emask 0x4 (timeout) Dec 23 01:20:36 zeus kernel: ata6.00: status: { DRDY } Dec 23 01:20:36 zeus kernel: ata6: hard resetting link Dec 23 01:20:36 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 23 01:20:36 zeus kernel: ata6.00: configured for UDMA/133 Dec 23 01:20:36 zeus kernel: ata6: EH complete Information of the affected system: * Distribution: for example: F16 32-BIT * Fist, create a directory to store the output of the next commands: # mkdir ~/buginfo # cd ~/buginfo Al commands will be assumed to be executed from that directory. * PCIe / PCI HW Information: # dmesg > dmesg-`uname -r`.out # lspci -nnvv > lspci-nnvv-`uname -r`.out # lspci -k > lspci-k-`uname -r`.out # lspci -tvn > lspci-tvn-`uname -r`.out * BIOS / MANUFACTURER Information: # cat /sys/devices/virtual/dmi/id/product_name > dmi-product_name-`uname -r`.out # cat /sys/devices/virtual/dmi/id/bios_version > dmi-bios_version-`uname -r`.out # cat /sys/devices/virtual/dmi/id/board_name > dmi-board_name-`uname -r`.out * A copy of the messages file AFTER reproducing the error: # cp /var/log/messages messages-`uname -r`--`date +%Y%m%d`.out Upload those commands and fill correctly the description, let it be clear what is being attached. Also, AFTER reproducing the issue with any current fedora version, please try also with systemrescuecd (with both default and alternate kernel). Also execute above commands and upload them after uploading fedora ones. This is to check a different kernel. Also reference this bug-report number.
And then we get a bunch of very similar bug reports that all wind up getting marked duplicates of each other? I think what we need here is a way to instrument the problems to find the root cause. Without that, what good are multiple reports going to be? With that, we could find actual duplicates and differences.
@Bill McGonigle It is not a problem that get are marked as duplicated, that is how it is supposed to be. As long as that is managed by the maintainer or assignee. Also, some controllers do use different drivers (ahci, sata_nv, sata_sil), etc. IMHO, each bug-report should contain all duplicated bug-reports with the same hardware, until proven otherwise (by the maintainer or assignee). There could be chipset issues, bus issues, or something completely non-obvious. By using one bug-report per sata controller, the tracking of the issue will be simplified. For example if a kernel parameter fixes 'marvell', it will documented and will be confirmed by all dupplicates in the bugreport. Also, if the issue is very subtle, puting all different hardware together will probabbly hide it. Also, in each bugreport there will be uploaded info (such as what i recommended in #58) all corresponding to the same controller on different systems and with errors and logs of all. I thing that you will get better results if you (and all others) open a new bug-report, setting version to the affected supported version, and upload recommended info #58 plus whatever additional iformation avaiable. The bug-report has +2 years, i do not believe that it can be resolved as it is. Reading the whole bug is difficult, subtle issues might get overlooked, too many factors, too many different hardware & software (drivers). That is why i sugest to start from scratch, if the issue is still there in any current version. Cheers.
This problem can't be too subtle if it's being reported by 17 different people across multiple controllers and multiple disk devices (brands and models). The only common denominator I can find to this is that it appears to happen most when using SATA-III devices on SATA-II controllers. I haven't seen any suggestion of attempt by developer to collect doc on this, just numerous users reporting essentially the same issue over and over. I have no problem opening a new issue for this, but it would be nice to have some idea that it will be addressed. This has been in "assigned" status for 2+ years. For me the problem is completely reproducable, and I have multiple sets of hardware I can test on. I have purchased a SATA-III add-on controller and will attempt to test SATA-III disks on that. I am perfectly willing to provide whatever doc is requested, but since I originally posted my problem report NO ONE has asked for any further information. Incidentally, it's still happening on Fedora 16 at the current kernel level (kernel-3.4.7-1.fc16.x86_64), I have to run my Bacula data stores on USB to keep them working and I had to replace the boot disk with a PATA drive.
I'm reluctant to post and openly display my ignorance, but I have a small piece of information to add to this thread. I'm working on a mini-ITX (ASROCK H67M-ITX) appliance that uses a single 60GB SSD drive running Linux based on 3.4.9-2.fc16.i686 (Fedora 16). We have migrated 6 (of 60) clients to this new platform. Some of our users are experiencing "crashes" when they insert a USB device. Upon examination of the incident, one or more filesystems were remounted read-only. The issue was not immediately reproducible - intermittent in nature. I suspected static electricity discharge - and I was right. Using a simple $1.29 "thumb saver" lighter (electric-spark inducing trigger), I was able to reproduce the "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen" error, and the filesystem(s) are remounted read-only. Is the high voltage spark disrupting general power, knocking the SDD offline, or resetting the storage controller? I don't know. As far as I can tell, the chassis/case/motherboard/PSU are all grounded - there is continuity to ground, and my ground tests OK. This same situation happens with a H77M-ITX motherboard - Intel Series 6 (H67) vs. Series 7 (H77). What would cure this for me? Again, forgive my ignorance... I would like an easy way to set the I/O queueing to retry indefinitely so as to avoid remounting in read-only when the drive becomes available again. Is this possible? Tomorrow I will disable ACPI, PIC, setting a single-path multipath device (this option has no_path_retry=queue, for never stop queueing), or even the ASROCK built-in hardware RAID-1. I have to fix this static discharge issue because my product is a mission-critical consumer electronics product. Thoughts?
Two SDDs have been used with identical results. Corsair Force Series 3 CSSD-F60GB3A-BK 2.5" 60GB SATA III Mushkin Enhanced Chronos MKNSSDCR60GB 2.5" 60GB SATA III Asynchronous MLC Case in use: http://www.mini-box.com/M300-LCD-Enclosure
I too am having problems like this. - Corsair Force Serices CSSD-F115GB2-A/RF2 (refurbished) - attached to an Acer Aspire ASE380--ED424A. This desktop is a few years old. it has an Athlon 62 X2 CPU, with an nVidia chipset. - the SATA controller is (according to lspci) nVidia Corporation MCP61 SATA Controller (rev a2) - I'm using x86_64 Fedora 16 + current updates (as of about 2012 Oct 1). Kernel is 3.4.2-1.fc16.x86_64. - the problem shows up in the Debian Wheezy Beta2 Net Install AMD64 too -- it's dmesg has complaints and the installer decides that there is no SSD disk. - none of my problems involve heavy loads. Symptoms: - booting (from another disk) is slow. I think that Fedora looks at the SSD drive and gets hung until timeouts occur. - dmesg shows several errors accessing the SSD (just snooping: fstab doesn't specify that any partitions on the disk be mounted) - smarctl -a is a dead loss: smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.4.2-1.fc16.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net Probable ATA device behind a SAT layer Try an additional '-d ata' or '-d sat' argument. I tried booting with pcie_aspm=off (as per comment #34). It made no difference. I've used another SSD on this same SATA port with no problem. Juicy messages from dmesg: [ 2.194035] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 2.197200] ata5.00: ATA-8: Corsair CSSD-F115GB2-A, 2.1a, max UDMA/133 [ 2.197203] ata5.00: 224674128 sectors, multi 16: LBA48 NCQ (depth 0/32) [ 5.875487] ata6: SATA link down (SStatus 0 SControl 300) [ 36.736213] ata5: drained 512 bytes to clear DRQ [ 36.736230] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 36.736241] ata5.00: failed command: IDENTIFY DEVICE [ 36.736255] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [ 36.736258] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 36.736264] ata5.00: status: { DRDY } [ 36.736276] ata5: hard resetting link [ 36.736281] ata5: nv: skipping hardreset on occupied port [ 37.190064] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 37.231148] ata5.00: configured for UDMA/133 [ 37.231185] ata5: EH complete [ 82.784403] ata5: drained 512 bytes to clear DRQ [ 82.784421] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 82.784431] ata5.00: failed command: IDENTIFY DEVICE [ 82.784445] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in [ 82.784448] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 82.784455] ata5.00: status: { DRDY } [ 82.784466] ata5: hard resetting link [ 82.784471] ata5: nv: skipping hardreset on occupied port [ 83.238053] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 83.263154] ata5.00: configured for UDMA/133 [ 83.263194] ata5: EH complete etc. These log entries are during booting. The timeouts associated with these bursts explains why booting seems slow.
Here's another report that seems to be related. Corsair Force 3 SSD, laptop with SiS968 Southbridge (including SATA2 controller), Debian Testing and Ubuntu 11.10. Problem with "IDENTIFY DEVICE" command. <http://forum.corsair.com/forums/showthread.php?t=100231>
# Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
Hi all, I'm actually running Debian Linux but stumbled across this thread while looking for information on the exact same issue. Until today I was experiencing these symptoms with Kernel 3.2.0. I've now upgraded to 3.6.3 and still see the same errors during boot and heavy load (such as RAID rebuilds). fleegle:~# uname -r 3.6.3 fleegle:~# lspci 00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333] 00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP] 00:05.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10) 00:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx] 00:09.0 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50) 00:09.1 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50) 00:09.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 51) 00:0d.0 SATA controller: Silicon Image, Inc. SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01) 00:0e.0 SATA controller: Silicon Image, Inc. SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01) 00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 00:10.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) 00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge 00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06) 00:11.2 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) 00:11.3 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23) 01:00.0 VGA compatible controller: NVIDIA Corporation NV11 [GeForce2 MX/MX 400] (rev b2) lshw -class storage *-storage:0 description: SATA controller product: SiI 3512 [SATALink/SATARaid] Serial ATA Controller vendor: Silicon Image, Inc. physical id: d bus info: pci@0000:00:0d.0 version: 01 width: 32 bits clock: 66MHz capabilities: storage pm vendor_specific bus_master cap_list rom configuration: driver=sata_sil latency=32 resources: irq:16 ioport:b800(size=8) ioport:b400(size=4) ioport:b000(size=8) ioport:a800(size=4) ioport:a400(size=16) memory:ec000000-ec0001ff memory:60000000-6007ffff *-storage:1 description: SATA controller product: SiI 3512 [SATALink/SATARaid] Serial ATA Controller vendor: Silicon Image, Inc. physical id: e bus info: pci@0000:00:0e.0 version: 01 width: 32 bits clock: 66MHz capabilities: storage pm vendor_specific bus_master cap_list rom configuration: driver=sata_sil latency=32 resources: irq:17 ioport:a000(size=8) ioport:9800(size=4) ioport:9400(size=8) ioport:9000(size=4) ioport:8800(size=16) memory:eb800000-eb8001ff memory:60080000-600fffff lshw -class disk *-disk description: ATA Disk product: Hitachi HDT72503 vendor: Hitachi physical id: 0.0.0 bus info: scsi@0:0.0.0 logical name: /dev/sda version: V54O serial: VFD200R22BMRUC size: 298GiB (320GB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 sectorsize=512 signature=e1e1e1e1 *-disk description: ATA Disk product: Hitachi HDS72101 vendor: Hitachi physical id: 0.0.0 bus info: scsi@1:0.0.0 logical name: /dev/sdb version: JP4O serial: JP2940HQ3VBSWH size: 931GiB (1TB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 sectorsize=512 signature=c57e7aef *-disk description: ATA Disk product: Hitachi HDS72101 vendor: Hitachi physical id: 0.0.0 bus info: scsi@3:0.0.0 logical name: /dev/sdc version: JP4O serial: JP2940HQ3UY93H size: 931GiB (1TB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 sectorsize=512 signature=df63f9c5 fleegle:~# dmesg | less (edited for sanity) ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata4.00: BMDMA2 stat 0x6d0009 ata4.00: failed command: READ DMA ata4.00: cmd c8/00:a8:48:42:a2/00:00:00:00:00/e2 tag 0 dma 86016 in res 51/04:79:77:42:a2/00:00:00:00:00/02 Emask 0x1 (device error) ata4.00: status: { DRDY ERR } ata4.00: error: { ABRT } ata4.00: configured for UDMA/100 sd 3:0:0:0: [sdc] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 3:0:0:0: [sdc] Sense Key : Aborted Command [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 02 a2 42 77 sd 3:0:0:0: [sdc] Add. Sense: No additional sense information sd 3:0:0:0: [sdc] CDB: Read(10): 28 00 02 a2 42 48 00 00 a8 00 end_request: I/O error, dev sdc, sector 44188232 ata4: EH complete Please note that although the error is still occurring, the impact in 3.6.3 seems greatly reduced. Previously the system would hang for approx 10 seconds at each error. Now the impact is undetectable and only the log messages reveal it's still happening.
(In reply to comment #64) > I too am having problems like this. > > - Corsair Force Serices CSSD-F115GB2-A/RF2 (refurbished) > > - attached to an Acer Aspire ASE380--ED424A. This desktop is a few years > old. it has an Athlon 62 X2 CPU, with an nVidia chipset. > > - the SATA controller is (according to lspci) nVidia Corporation MCP61 SATA > Controller (rev a2) > > - I'm using x86_64 Fedora 16 + current updates (as of about 2012 Oct 1). > Kernel is 3.4.2-1.fc16.x86_64. I've just installed the same actual drive in a ThinkPad x61t and am having no trouble installing Fedora 17 on it. It sure looks as if the driver+SATA controller on the Acer does not get along with the Corsair F115GB2. I don't know which is at fault. Additional datapoint: I tried a live PC-BSD 9.0 DVD on the Acer hardware and it could not see the F115GB2 drive. I assume that the FreeBSD driver is different from the Linux driver and so this point away from a driver bug.
Hello, I was able to solve this issue on my configuration so I hope this will help somebody. Recently I have migrated my CentOS 6.3 server from Intel based motherboard Gigabyte GA-P55-UD3L (rev. 1.0) to AMD based motherboard MSI 790FX-GD70. The only change in the system was the motherboard, cpu and memory, the rest of the configuration stayed the same. Immediately after migration I have started experiencing this exact issue During boot I was getting errors Nov 17 23:41:12 storage kernel: ata6.00: qc timeout (cmd 0xec) Nov 17 23:41:12 storage kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x5) Nov 17 23:41:12 storage kernel: ata6.00: revalidation failed (errno=-5) Nov 17 23:41:12 storage kernel: ata6: hard resetting link Nov 17 23:41:13 storage kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 17 23:41:13 storage kernel: ata6.00: configured for UDMA/133 Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor] Nov 17 23:41:13 storage kernel: Descriptor sense data with sense descriptors (in hex): Nov 17 23:41:13 storage kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 Nov 17 23:41:13 storage kernel: 91 40 16 4e Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] Add. Sense: No additional sense information Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 91 40 1b 4f 00 00 c0 00 Nov 17 23:41:13 storage kernel: ata6: EH complete N After boot this error transformed to Nov 17 21:05:26 storage kernel: ata6.00: qc timeout (cmd 0xec) Nov 17 21:05:26 storage kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4) Nov 17 21:05:26 storage kernel: ata6.00: revalidation failed (errno=-5) Nov 17 21:05:26 storage kernel: ata6: hard resetting link Nov 17 21:05:27 storage kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Nov 17 21:05:27 storage kernel: ata6.00: configured for UDMA/33 Nov 17 21:05:27 storage kernel: ata6: EH complete Nov 17 21:06:26 storage kernel: ata6.00: exception Emask 0x50 SAct 0x1 SErr 0x280900 action 0x6 frozen Nov 17 21:06:26 storage kernel: ata6.00: irq_stat 0x08000000, interface fatal error Nov 17 21:06:26 storage kernel: ata6: SError: { UnrecovData HostInt 10B8B BadCRC } Nov 17 21:06:26 storage kernel: ata6.00: failed command: READ FPDMA QUEUED Nov 17 21:06:26 storage kernel: ata6.00: cmd 60/30:00:3f:9b:cc/00:00:8e:00:00/40 tag 0 ncq 24576 in Nov 17 21:06:26 storage kernel: res 40/00:00:3f:9b:cc/00:00:8e:00:00/40 Emask 0x50 (ATA bus error) Nov 17 21:06:26 storage kernel: ata6.00: status: { DRDY } Nov 17 21:06:26 storage kernel: ata6: hard resetting link Nov 17 21:06:31 storage kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) In case it matters, these are the disks used in the system Nov 17 23:53:58 storage kernel: ata6.00: ATA-8: ST31500341AS, SD1B, max UDMA/133 Nov 17 23:53:58 storage kernel: ata5.00: ATA-8: WDC WD20EADS-00S2B0, 01.00A01, max UDMA/133 Nov 17 23:53:58 storage kernel: ata7.00: ATA-6: ST380011A, 3.16, max UDMA/100 Nov 17 23:53:58 storage kernel: ata7.01: ATA-6: ST380011A, 3.16, max UDMA/100 I was able to completely eliminate this issue by 1. disabling NCQ 2. limiting the speed of SATA to 1.5GB I have done so by modifying my /boot/grub/grub.conf and adding libata.force=noncq,1.5G at the end of my kernel line. This bug also seems to be related or is duplicate of Bug 684599.
I was able to resolve this issue in my case entirely by accident. I noticed that during testing of drives, the problem did not happen, but once they were installed, the problem returned. Removing the drives from the case and running them externally eliminated the issue. It turned out that the drives were too close to a large cooling fan in the front of the case. Moving them away for the fan (more than 3 inches) eliminated the problem completely. Not implying this is the entire solution to this, but it might be worth checking for electrical interference with the disks, as it seems the new ones are more sensitive to environmental factors.
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
As the OP, I'm taking the liberty of closing this. As per comment #54 I can't confirm a fix for this as the hardware on which I reported this is long gone. If you are still experiencing this or a similar problem, I'd recommend creating a new bug report. And good luck 'cause you're gonna need it -- there seems to be many many reasons for this error msg, and not a lot of info from the kernel is available to distinguish them. Or buy new hardware instead and be happy :)