Bug 549981 - Periodic exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen from sata drive
Summary: Periodic exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen from sata drive
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-12-23 06:34 UTC by Raman Gupta
Modified: 2013-01-17 13:30 UTC (History)
32 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-01-17 02:05:20 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Kernel log via dmesg (43.27 KB, text/x-log)
2009-12-23 06:34 UTC, Raman Gupta
no flags Details
lspci -vv output (22.93 KB, text/x-log)
2009-12-23 06:35 UTC, Raman Gupta
no flags Details
lspci -vvv as of 02/25/2011, kernel 2.6.35.11-83.fc14.x86_64 (28.38 KB, text/plain)
2011-02-25 20:40 UTC, Raman Gupta
no flags Details
dmesg after successful boot with errors (81.27 KB, text/plain)
2012-03-20 19:04 UTC, kevinfishburne
no flags Details
lspci -vvv after successful boot with errors (20.11 KB, text/plain)
2012-03-20 19:05 UTC, kevinfishburne
no flags Details

Description Raman Gupta 2009-12-23 06:34:25 UTC
Created attachment 379978 [details]
Kernel log via dmesg

Periodically, I get the following exception from one of the three SATA drives I have attached to an onboard Marvell 88SE6145 SATA II PCI-E controller. The kernel is configured to use AHCI for the Marvell controller via the ahci.marvell_enable=1 kernel parameter. The error does not occur for drives on the primary ICH7 Intel controller (also using AHCI). The error also does not occur for the same drive swapped onto the ICH7 controller.

Dec 23 00:19:37 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 23 00:19:37 zeus kernel: ata6.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 23 00:19:37 zeus kernel:         res 40/00:00:00:22:e2/00:00:00:00:00/e0 Emask 0x4 (timeout)
Dec 23 00:19:37 zeus kernel: ata6.00: status: { DRDY }
Dec 23 00:19:37 zeus kernel: ata6: hard resetting link
Dec 23 00:19:37 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 23 00:19:37 zeus smartd[1989]: Device: /dev/sdd, Read SMART Self Test Log Failed
Dec 23 00:19:37 zeus kernel: ata6.00: configured for UDMA/133
Dec 23 00:19:37 zeus kernel: ata6: EH complete

Note that the exception often occurs when doing smart commands. Normally smart commands work fine but every so often there is a pause or failure in the smart command executed, with a corresponding entry in /var/log/messages similar to the above. For example, here is the result of doing smartctl -a /dev/sdd several times until the problem was replicated:

Dec 23 01:20:36 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 23 01:20:36 zeus kernel: ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Dec 23 01:20:36 zeus kernel:         res 40/00:00:30:f8:64/00:00:17:00:00/e0 Emask 0x4 (timeout)
Dec 23 01:20:36 zeus kernel: ata6.00: status: { DRDY }
Dec 23 01:20:36 zeus kernel: ata6: hard resetting link
Dec 23 01:20:36 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 23 01:20:36 zeus kernel: ata6.00: configured for UDMA/133
Dec 23 01:20:36 zeus kernel: ata6: EH complete


Version-Release number of selected component (if applicable):

$ rpm -q kernel
kernel-2.6.31.6-166.fc12@x86_64


How reproducible:

Reproducible quite easily by running smartctl -a <drive> repeatedly until the problem occurs.


Steps to Reproduce:

1. Attach drive to Marvell secondary controller.
2. Run smartctl -a /dev/sdx repeatedly until there is a pause. 
3. Check message.log for the error.
  

Additional info:

I have tried with and without acpi=off. Problem occurs either way.

I will attach the output of lspci -vv and a full kernel log via dmesg.

Comment 1 Raman Gupta 2009-12-23 06:35:41 UTC
Created attachment 379980 [details]
lspci -vv output

Comment 2 Raman Gupta 2009-12-23 06:49:04 UTC
Oh, and BTW NCQ for the drives on the secondary Marvell controller is disabled -- not by choice -- it seems NCQ is not supported by this controller in this configuration:

# cat /sys/block/sd[def]/device/queue_depth
1
1
1

# echo 31 > /sys/block/sdd/device/queue_depth
-bash: echo: write error: Invalid argument

The drives on the Intel ICH7 controller (sda, sdb, and sdc) all have NCQ enabled and are working fine:

# cat /sys/block/sd[abc]/device/queue_depth
31
31
31

Comment 3 Raman Gupta 2009-12-23 22:43:26 UTC
Bug #46225 seems to be related, with lots of people seeing the same type of error. However, that bug was fixed in 2.6.27.15. The only difference is that I am using a Marvell controller via the ahci driver instead of sata_mv.

I'm just shooting in the dark, but perhaps the same race condition(s) fixed by Mark Lord in the sata_mv driver should also be applied to the ahci driver?

Comment 4 Raman Gupta 2009-12-23 22:50:20 UTC
(In reply to comment #3)
> Bug #46225 seems to be related

Sorry for the mailstorm, the related bug is #462425.

Comment 5 Raman Gupta 2009-12-25 17:39:07 UTC
While up to now the error seemed to be benign i.e. it did not cause any problems with my RAID-5 array or with filesystem read/write operations, since the disks were mostly idle, I decided to do an overnight disk test with "fio surface-scan" to make sure all was well before putting real data on the drives. About two-fifths of the way through the test fio crapped out with a i/o error:

# fio /usr/share/doc/fio-1.34/examples/surface-scan
write-phase: (g=0): rw=write, bs=64K-64K/64K-64K, ioengine=sync, iodepth=1
verify-phase: (g=1): rw=read, bs=64K-64K/64K-64K, ioengine=sync, iodepth=1
Starting 2 threads
write-phase: Laying out IO file(s) (1 file(s) / 17592186044415MB)
fio: pid=25313, err=30/file:engines/sync.c:56, func=xfer, error=Read-only file system
fio: pid=25313, err=5/file:engines/sync.c:56, func=xfer, error=Input/output error

Here are the corresponding kernel logs -- you can see that all the drives on the marvell sata controller (sdd, sde, and sdf) appear to have entered a state where smartd cannot even access them:

==============================================================
Dec 25 03:24:00 zeus kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 25 03:24:00 zeus kernel: ata8.00: cmd 35/00:80:81:81:cc/00:00:1d:00:00/e0 tag 0 dma 65536 out
Dec 25 03:24:00 zeus kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 25 03:24:00 zeus kernel: ata8.00: status: { DRDY }
Dec 25 03:24:00 zeus kernel: ata8: hard resetting link
Dec 25 03:24:00 zeus kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 03:24:05 zeus kernel: ata8.00: qc timeout (cmd 0xec)
Dec 25 03:24:05 zeus kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:24:05 zeus kernel: ata8.00: revalidation failed (errno=-5)
Dec 25 03:24:05 zeus kernel: ata8: hard resetting link
Dec 25 03:24:06 zeus kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 03:24:16 zeus kernel: ata8.00: qc timeout (cmd 0xec)
Dec 25 03:24:16 zeus kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:24:16 zeus kernel: ata8.00: revalidation failed (errno=-5)
Dec 25 03:24:16 zeus kernel: ata8: limiting SATA link speed to 1.5 Gbps
Dec 25 03:24:16 zeus kernel: ata8: hard resetting link
Dec 25 03:24:17 zeus kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 25 03:24:45 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 25 03:24:45 zeus kernel: ata6.00: cmd e5/00:00:00:00:00/00:00:00:00:00/00 tag 0
Dec 25 03:24:45 zeus kernel:         res 40/00:00:80:81:cc/00:00:1d:00:00/e0 Emask 0x4 (timeout)
Dec 25 03:24:45 zeus kernel: ata6.00: status: { DRDY }
Dec 25 03:24:45 zeus kernel: ata6: hard resetting link
Dec 25 03:24:46 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 03:24:47 zeus kernel: ata8.00: qc timeout (cmd 0xec)
Dec 25 03:24:47 zeus kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:24:47 zeus kernel: ata8.00: revalidation failed (errno=-5)
Dec 25 03:24:47 zeus kernel: ata8.00: disabled
Dec 25 03:24:47 zeus kernel: ata8.00: device reported invalid CHS sector 0
Dec 25 03:24:47 zeus kernel: ata8: hard resetting link
Dec 25 03:24:48 zeus kernel: ata8: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 25 03:24:48 zeus kernel: ata8: EH complete
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 499941761
Dec 25 03:24:48 zeus kernel: raid5: Disk failure on sdf1, disabling device.
Dec 25 03:24:48 zeus kernel: raid5: Operation continuing on 3 devices.
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 488055273
Dec 25 03:24:48 zeus kernel: raid5:md1: read error not correctable (sector 488055272 on sdf1).
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 488055281
Dec 25 03:24:48 zeus kernel: raid5:md1: read error not correctable (sector 488055280 on sdf1).
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Unhandled error code
Dec 25 03:24:48 zeus kernel: sd 7:0:0:0: [sdf] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 25 03:24:48 zeus kernel: end_request: I/O error, dev sdf, sector 488055169
Dec 25 03:24:51 zeus kernel: ata6.00: qc timeout (cmd 0xec)
Dec 25 03:24:51 zeus kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:24:51 zeus kernel: ata6.00: revalidation failed (errno=-5)
Dec 25 03:24:51 zeus kernel: ata6: hard resetting link
Dec 25 03:24:52 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 03:25:02 zeus kernel: ata6.00: qc timeout (cmd 0xec)
Dec 25 03:25:03 zeus kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:25:03 zeus kernel: ata6.00: revalidation failed (errno=-5)
Dec 25 03:25:03 zeus kernel: ata6: limiting SATA link speed to 1.5 Gbps
Dec 25 03:25:03 zeus kernel: ata6: hard resetting link
Dec 25 03:25:03 zeus kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 25 03:25:19 zeus kernel: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 25 03:25:19 zeus kernel: ata7.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Dec 25 03:25:19 zeus kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 25 03:25:19 zeus kernel: ata7.00: status: { DRDY }
Dec 25 03:25:19 zeus kernel: ata7: hard resetting link
Dec 25 03:25:20 zeus kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 03:25:25 zeus kernel: ata7.00: qc timeout (cmd 0xec)
Dec 25 03:25:25 zeus kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:25:25 zeus kernel: ata7.00: revalidation failed (errno=-5)
Dec 25 03:25:25 zeus kernel: ata7: hard resetting link
Dec 25 03:25:26 zeus kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 03:25:33 zeus kernel: ata6.00: qc timeout (cmd 0xec)
Dec 25 03:25:34 zeus kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:25:34 zeus kernel: ata6.00: revalidation failed (errno=-5)
Dec 25 03:25:34 zeus kernel: ata6.00: disabled
Dec 25 03:25:34 zeus kernel: ata6: hard resetting link
Dec 25 03:25:35 zeus kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 25 03:25:35 zeus kernel: ata6: EH complete
Dec 25 03:25:35 zeus kernel: sd 5:0:0:0: [sdd] Unhandled error code
Dec 25 03:25:35 zeus kernel: sd 5:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 25 03:25:35 zeus kernel: end_request: I/O error, dev sdd, sector 976767873
Dec 25 03:25:35 zeus kernel: end_request: I/O error, dev sdd, sector 976767873
Dec 25 03:25:35 zeus kernel: md: super_written gets error=-5, uptodate=0
Dec 25 03:25:35 zeus kernel: raid5: Disk failure on sdd1, disabling device.
Dec 25 03:25:35 zeus kernel: raid5: Operation continuing on 2 devices.
Dec 25 03:25:36 zeus kernel: ata7.00: qc timeout (cmd 0xec)
Dec 25 03:25:37 zeus kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:25:37 zeus kernel: ata7.00: revalidation failed (errno=-5)
Dec 25 03:25:37 zeus kernel: ata7: limiting SATA link speed to 1.5 Gbps
Dec 25 03:25:37 zeus kernel: ata7: hard resetting link
Dec 25 03:25:37 zeus kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 25 03:26:07 zeus kernel: ata7.00: qc timeout (cmd 0xec)
Dec 25 03:26:08 zeus kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Dec 25 03:26:08 zeus kernel: ata7.00: revalidation failed (errno=-5)
Dec 25 03:26:08 zeus kernel: ata7.00: disabled
Dec 25 03:26:08 zeus kernel: ata7.00: device reported invalid CHS sector 0
Dec 25 03:26:08 zeus kernel: ata7: hard resetting link
Dec 25 03:26:09 zeus kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec 25 03:26:09 zeus kernel: ata7: EH complete
Dec 25 03:26:09 zeus kernel: sd 6:0:0:0: [sde] Unhandled error code
Dec 25 03:26:09 zeus kernel: sd 6:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 25 03:26:09 zeus kernel: end_request: I/O error, dev sde, sector 976767873
Dec 25 03:26:09 zeus kernel: end_request: I/O error, dev sde, sector 976767873
Dec 25 03:26:09 zeus kernel: md: super_written gets error=-5, uptodate=0
Dec 25 03:26:09 zeus kernel: raid5: Disk failure on sde1, disabling device.
Dec 25 03:26:09 zeus kernel: raid5: Operation continuing on 1 devices.
Dec 25 03:26:09 zeus kernel: RAID5 conf printout:
Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1
Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1
Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1
Dec 25 03:26:09 zeus kernel: disk 2, o:0, dev:sde1
Dec 25 03:26:09 zeus kernel: disk 3, o:0, dev:sdf1
Dec 25 03:26:09 zeus kernel: RAID5 conf printout:
Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1
Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1
Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1
Dec 25 03:26:09 zeus kernel: disk 2, o:0, dev:sde1
Dec 25 03:26:09 zeus kernel: RAID5 conf printout:
Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1
Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1
Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1
Dec 25 03:26:09 zeus kernel: disk 2, o:0, dev:sde1
Dec 25 03:26:09 zeus kernel: RAID5 conf printout:
Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1
Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1
Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1
Dec 25 03:26:09 zeus kernel: RAID5 conf printout:
Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1
Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1
Dec 25 03:26:09 zeus kernel: disk 1, o:0, dev:sdd1
Dec 25 03:26:09 zeus kernel: RAID5 conf printout:
Dec 25 03:26:09 zeus kernel: --- rd:4 wd:1
Dec 25 03:26:09 zeus kernel: disk 0, o:1, dev:sdc1
Dec 25 03:26:09 zeus kernel: Buffer I/O error on device dm-2, logical block 187170824
Dec 25 03:26:09 zeus kernel: lost page write due to I/O error on dm-2
Dec 25 03:26:09 zeus kernel: Aborting journal on device dm-2:8.
Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2): ext4_journal_start_sb: Detected aborted journal
Dec 25 03:26:09 zeus kernel: EXT4-fs (dm-2): Remounting filesystem read-only
Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2) in ext4_reserve_inode_write: Journal has aborted
Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2) in ext4_orphan_del: Journal has aborted
Dec 25 03:26:09 zeus kernel: EXT4-fs error (device dm-2) in ext4_reserve_inode_write: Journal has aborted
Dec 25 03:26:09 zeus kernel: Buffer I/O error on device dm-2, logical block 183009280
Dec 25 03:26:09 zeus kernel: lost page write due to I/O error on dm-2
Dec 25 03:26:09 zeus kernel: JBD2: I/O error detected when updating journal superblock for dm-2:8.
Dec 25 03:37:18 zeus smartd[2048]: Device: /dev/sdd, not capable of SMART self-check
Dec 25 03:37:18 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:19 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:19 zeus smartd[2048]: Device: /dev/sdd, failed to read SMART Attribute Data
Dec 25 03:37:19 zeus smartd[2048]: Device: /dev/sdd, Read SMART Self Test Log Failed
Dec 25 03:37:19 zeus smartd[2048]: Device: /dev/sdd, Read SMART Error Log Failed
Dec 25 03:37:19 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:20 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:20 zeus smartd[2048]: Device: /dev/sde, not capable of SMART self-check
Dec 25 03:37:20 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:22 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:22 zeus smartd[2048]: Device: /dev/sde, failed to read SMART Attribute Data
Dec 25 03:37:22 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:23 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:23 zeus smartd[2048]: Device: /dev/sde, Read SMART Self Test Log Failed
Dec 25 03:37:23 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:24 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:24 zeus smartd[2048]: Device: /dev/sde, Read SMART Error Log Failed
Dec 25 03:37:24 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:25 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:25 zeus smartd[2048]: Device: /dev/sdf, not capable of SMART self-check
Dec 25 03:37:25 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:26 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:26 zeus smartd[2048]: Device: /dev/sdf, failed to read SMART Attribute Data
Dec 25 03:37:26 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:28 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:28 zeus smartd[2048]: Device: /dev/sdf, Read SMART Self Test Log Failed
Dec 25 03:37:28 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:29 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:29 zeus smartd[2048]: Device: /dev/sdf, Read SMART Error Log Failed
Dec 25 03:37:29 zeus smartd[2048]: Sending warning via mail to root ...
Dec 25 03:37:30 zeus smartd[2048]: Warning via mail to root: successful
Dec 25 03:37:30 zeus ata_id[2974]: HDIO_GET_IDENTITY failed for '/dev/sdf'
==============================================================

BTW, merry Christmas!

Comment 6 Raman Gupta 2009-12-28 05:09:03 UTC
(In reply to comment #5)
> While up to now the error seemed to be benign i.e. it did not cause any
> problems with my RAID-5 array or with filesystem read/write operations, since
> the disks were mostly idle, I decided to do an overnight disk test with "fio
> surface-scan" to make sure all was well before putting real data on the
> drives. About two-fifths of the way through the test fio crapped out with a
> i/o error:

A few more notes:

1) The system has smartd and hddtemp enabled, so I believe a periodic smart/hddtemp command caused the problem even during the fio surface-scan test, rather than it being a function of load.

2) I can replicate the primary problem as originally reported  (exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen) with no filesystem access to the disks at all (the RAID array is stopped) just by issuing smartctl -a /dev/sdd commands repeatedly. The problem usually occurs within about 10-15 executions.

3) I verified the problem does not appear to be based on load. With smartd and hddtemp disabled for sdd, sde, and sdf, I re-executed the fio test with no problems at all. So this *appears* to be exclusively an issue with issuing smart/hddtemp commands via the Marvell controller.

4) At the suggestion of Robert Hancock on the linux-ide list [1], I verified the power supply voltages while the disks were under load. Both the 5V and 12V outputs seemed to be correct and steady. Jives with #3 -- pure loading with no smart/hddtemp commands has up till now not triggered the exception.

[1] http://thread.gmane.org/gmane.linux.ide/39913/focus=39947

Comment 7 djones256 2010-01-02 19:01:19 UTC
I have also seen something similar three times in about three weeks (on 64bit Fedora 12). In my case it is fatal as the drives become unusable until the machine is powered down and back up (not just reset). This is with an Intel 82801JB ICH10 2-port SATA controller and Western Digital WD10EADS drives. It is on a Supermicro X8SAX motherboard which has 6 SATA ports. 4 on a 4-port controller and 2 on this 2-port controller. All three errors have been on the 2-port controller and have prevented all access to the two disks attached once the error has happened. Unfortunately I have taken kernel updates since I noted the error so I can't give a full report. Relevant output from /var/log/messages however was:

Dec 29 00:28:04 jr-f12 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 29 00:28:04 jr-f12 kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Dec 29 00:28:04 jr-f12 kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 29 00:28:04 jr-f12 kernel: ata2.00: status: { DRDY }
Dec 29 00:28:04 jr-f12 kernel: ata2.00: hard resetting link
Dec 29 00:28:04 jr-f12 kernel: ata2.01: hard resetting link
Dec 29 00:28:10 jr-f12 kernel: ata2.00: link is slow to respond, please be patient (ready=0)
Dec 29 00:28:14 jr-f12 kernel: ata2.00: SRST failed (errno=-16)
Dec 29 00:28:14 jr-f12 kernel: ata2.00: hard resetting link
Dec 29 00:28:14 jr-f12 kernel: ata2.01: hard resetting link
Dec 29 00:28:20 jr-f12 kernel: ata2.00: link is slow to respond, please be patient (ready=0)
Dec 29 00:28:24 jr-f12 kernel: ata2.00: SRST failed (errno=-16)
Dec 29 00:28:24 jr-f12 kernel: ata2.00: hard resetting link
Dec 29 00:28:24 jr-f12 kernel: ata2.01: hard resetting link
Dec 29 00:28:30 jr-f12 kernel: ata2.00: link is slow to respond, please be patient (ready=0)
Dec 29 00:28:59 jr-f12 kernel: ata2.00: SRST failed (errno=-16)
Dec 29 00:28:59 jr-f12 kernel: ata2.00: limiting SATA link speed to 1.5 Gbps
Dec 29 00:28:59 jr-f12 kernel: ata2.01: limiting SATA link speed to 1.5 Gbps
Dec 29 00:28:59 jr-f12 kernel: ata2.00: hard resetting link
Dec 29 00:28:59 jr-f12 kernel: ata2.01: hard resetting link
Dec 29 00:29:04 jr-f12 kernel: ata2.00: SRST failed (errno=-16)
Dec 29 00:29:04 jr-f12 kernel: ata2.00: reset failed, giving up
Dec 29 00:29:04 jr-f12 kernel: ata2.00: disabled
Dec 29 00:29:04 jr-f12 kernel: ata2.01: disabled
Dec 29 00:29:04 jr-f12 kernel: ata2.00: device reported invalid CHS sector 0
Dec 29 00:29:04 jr-f12 kernel: end_request: I/O error, dev sdb, sector 1953524992
Dec 29 00:29:04 jr-f12 kernel: md: super_written gets error=-5, uptodate=0
Dec 29 00:29:04 jr-f12 kernel: raid5: Disk failure on sdb, disabling device.
Dec 29 00:29:04 jr-f12 kernel: raid5: Operation continuing on 3 devices.
Dec 29 00:29:04 jr-f12 kernel: ata2: EH complete
Dec 29 00:29:04 jr-f12 kernel: end_request: I/O error, dev sdc, sector 1953524992
Dec 29 00:29:04 jr-f12 kernel: md: super_written gets error=-5, uptodate=0
Dec 29 00:29:04 jr-f12 kernel: raid5: Disk failure on sdc, disabling device.
Dec 29 00:29:04 jr-f12 kernel: raid5: Operation continuing on 2 devices.

Comment 8 Raman Gupta 2010-01-12 01:11:13 UTC
This bug is possibly related (mentioned by Tim Small on the linux-ide list):

http://bugzilla.kernel.org/show_bug.cgi?id=14831

As I reported on the linux-ide list [1], when the script attached to that bug is run on my system against any of the drives attached to my Marvell controller, the script causes drive failure within one minute (with the cascading failure ending in failure to IDENTIFY reported previously in comment #5). I have only seen the cascading failure occur if the disks are being accessed at the same time as the exception occurs (as in Tim Small's test script).

I also note that when the cascading failure does occur, command EA - ATA_CMD_FLUSH_EXT - is one of the commands executing during the failure process (though not the first one, which tends on my system to be B0 - ATA_CMD_SMART, or EC - ATA_CMD_ID_ATA). EA is the same command shown in the exception reported in comment #7.

[1] http://thread.gmane.org/gmane.linux.ide/39913/focus=44247

Comment 9 Raman Gupta 2010-01-29 15:32:47 UTC
Problem still occurring with kernel 2.6.31.12-174.2.3.fc12.x86_64.

I have reduced the incidence of this problem by turning off smartd and hddtemp for the drives attached to the Marvel controller but occasionally still get an error. For example, this morning:

Jan 28 20:00:10 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan 28 20:00:10 zeus kernel: ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Jan 28 20:00:10 zeus kernel:         res 40/00:00:0f:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
Jan 28 20:00:10 zeus kernel: ata6.00: status: { DRDY }
Jan 28 20:00:10 zeus kernel: ata6: hard resetting link
Jan 28 20:00:10 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 28 20:00:10 zeus kernel: ata6.00: configured for UDMA/133
Jan 28 20:00:10 zeus kernel: ata6: EH complete

Comment 10 Gaetan Cambier 2010-04-10 20:15:43 UTC
I have the same problem with 2.6.32.11-99.fc12.x86_64

Apr 10 20:48:43 Gaetan64 kernel: ata3.00: exception Emask 0x0 SAct 0x3ff SErr 0x0 action 0x6 frozen
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:00:08:42:8d/00:00:0f:00:00/40 tag 0 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:08:40:41:8d/00:00:0f:00:00/40 tag 1 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:10:50:41:8d/00:00:0f:00:00/40 tag 2 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:18:60:41:8d/00:00:0f:00:00/40 tag 3 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:20:78:41:8d/00:00:0f:00:00/40 tag 4 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:28:90:41:8d/00:00:0f:00:00/40 tag 5 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:30:a0:41:8d/00:00:0f:00:00/40 tag 6 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:38:c0:41:8d/00:00:0f:00:00/40 tag 7 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/08:40:18:42:8d/00:00:0f:00:00/40 tag 8 ncq 4096 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: failed command: READ FPDMA QUEUED
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: cmd 60/18:48:28:42:8d/00:00:0f:00:00/40 tag 9 ncq 12288 in
Apr 10 20:48:43 Gaetan64 kernel:         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: status: { DRDY }
Apr 10 20:48:43 Gaetan64 kernel: ata3: hard resetting link
Apr 10 20:48:43 Gaetan64 kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: configured for UDMA/133
Apr 10 20:48:43 Gaetan64 kernel: ata3.00: device reported invalid CHS sector 0

The smart report of all hdd are ok

Comment 11 Raman Gupta 2010-05-07 23:39:02 UTC
Problem still happening on 2.6.32.11-99.fc12.x86_64:

Apr 23 20:00:11 zeus kernel: ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 23 20:00:11 zeus kernel: ata8.00: failed command: IDENTIFY DEVICE
Apr 23 20:00:11 zeus kernel: ata8.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Apr 23 20:00:11 zeus kernel:         res 40/00:00:42:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
Apr 23 20:00:11 zeus kernel: ata8.00: status: { DRDY }
Apr 23 20:00:11 zeus kernel: ata8: hard resetting link
Apr 23 20:00:11 zeus kernel: ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 23 20:00:11 zeus kernel: ata8.00: configured for UDMA/133
Apr 23 20:00:11 zeus kernel: ata8: EH complete

Comment 12 Gaetan Cambier 2010-05-14 20:08:26 UTC
i have found an solution : 
add the option line to grub to disable ncq : libata.force=noncq

for me, with this, i have no froze

Comment 13 Raman Gupta 2010-05-14 20:35:13 UTC
On my system, none of the drives involved have NCQ enabled, since the controller does not support it. See Comment #2.

Comment 14 Andreas M. Kirchwitz 2010-08-26 02:54:35 UTC
Wow, glad that I found this bug.

Exactly the same happens here on Fedora 13 (not 12), 32 Bit, with all kernel versions/updates since F13 was released (currently 2.6.33.8-149.fc13.i686.PAE). 

But I don't have a Marvell controller but something from Nvidia (Asus M2NPV-VM, Nvidia nForce 430, GeForce 6150, sata_nv driver). There's nothing to configure in the BIOS because on these Nvidia chipsets, AHCI is done in the driver.

Kernel options like "libata.noacpi=1", "libata.force=noncq" or "sata_nv.adma=0" didn't help. I can trigger the error with "smartctl", too.

Unfortunately, this bug destroys my RAID-1 if there's disk access when it happens. And rebuilding 1 TB of data takes some hours. :-( One time it happened during boot procedure and basically killed the system (had to recover from CD).

My SATA drives are two harddisks of type "SAMSUNG HD103SJ".

The errors also show up in the SMART Error Log. They all seem to be related to commands like "SMART READ DATA", "IDENTIFY DEVICE" and "SMART READ ATTRIBUTE THRESHOLDS".

Never had this problem with PATA disks (same machine).

Comment 15 Michael Hughes 2010-08-27 19:37:27 UTC
This bug blocks my install of Fedora 13 64-bit on a Mac Mini (mid 2010) with a Corsair SSD. None of the "work-arounds" such as libata.force=noncq seem to fix the issue. I'd be happy to provide dmesg output, etc, if it would be of some use. However, it is identical to other comments in this bug.

Comment 16 majianpeng 2010-10-26 07:39:02 UTC
I think I can resolved this probelm.
I am using centos5.5-x64:kernel version is 2.6.18-194-x64.
I met this problem.
I modifed the code: driver/ata/ahci:
ahci_port_start:
[root@nfs-server linux-2.6.18-194.x86_64]# diff -Naur drivers/ata/ahci.c  ahci.c      
--- drivers/ata/ahci.c	2010-10-15 17:18:55.000000000 +0000
+++ ahci.c	2010-10-27 01:27:55.000000000 +0000
@@ -429,7 +429,7 @@
 	/* board_ahci_mv */
 	{
 		AHCI_HFLAGS	(AHCI_HFLAG_NO_NCQ | AHCI_HFLAG_NO_MSI |
-				 AHCI_HFLAG_MV_PATA),
+				 AHCI_HFLAG_MV_PATA | AHCI_HFLAG_NO_PMP),
 		.flags		= ATA_FLAG_SATA | ATA_FLAG_NO_LEGACY |
 				  ATA_FLAG_MMIO | ATA_FLAG_PIO_DMA,
 		.pio_mask	= 0x1f, /* pio0-4 */
@@ -2330,6 +2330,7 @@
 static int ahci_port_start(struct ata_port *ap)
 {
 	struct device *dev = ap->host->dev;
+	struct pci_dev *pdev = to_pci_dev(dev);
 	struct ahci_port_priv *pp;
 	void *mem;
 	dma_addr_t mem_dma;
@@ -2379,7 +2380,11 @@
 	 * Save off initial list of interrupts to be enabled.
 	 * This could be changed later
 	 */
-	pp->intr_mask = DEF_PORT_IRQ;
+	if(pdev->vendor==0x11ab  && pdev->device == 0x6145)//marvell 88se6145
+		pp->intr_mask = (DEF_PORT_IRQ) &  (~(PORT_IRQ_PIOS_FIS));
+	else
+		pp->intr_mask = DEF_PORT_IRQ;
 
I did not know why? At present,I readed the SATA spec, wanting to know the reason.
Someone can exam this path?If success,I can deliver this path,as my first path.
Thanks!

Comment 17 majianpeng 2010-10-30 00:56:05 UTC
I tested a week. The result is fine. PIO read PIO write and normal DMA operation hotplug also fine.
So I think this bug can closed.

Comment 18 Raman Gupta 2010-11-01 04:15:13 UTC
(In reply to comment #17)
> I tested a week. The result is fine. PIO read PIO write and normal DMA
> operation hotplug also fine.
> So I think this bug can closed.

Have you submitted this patch upstream? If so, could you provide a link to the upstream patch submission?

I don't think the bug here gets closed until the patch is confirmed as working and is available in the Fedora repositories.

Comment 19 Bug Zapper 2010-11-04 02:43:05 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 20 Bug Zapper 2010-12-04 01:19:26 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 21 Raman Gupta 2010-12-17 00:56:33 UTC
I upgraded to Fedora 14 yesterday and see that this problem is still happening. From my logs:

Dec 16 14:04:14 zeus kernel: [32870.720065] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 16 14:04:14 zeus kernel: [32870.720072] ata6.00: failed command: SMART
Dec 16 14:04:14 zeus kernel: [32870.720080] ata6.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 16 14:04:14 zeus kernel: [32870.720082]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 16 14:04:14 zeus kernel: [32870.720086] ata6.00: status: { DRDY }
Dec 16 14:04:14 zeus kernel: [32870.720093] ata6: hard resetting link
Dec 16 14:04:14 zeus kernel: [32871.025048] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 16 14:04:14 zeus kernel: [32871.027395] ata6.00: configured for UDMA/133
Dec 16 14:04:14 zeus kernel: [32871.027414] ata6: EH complete

# uname -a
Linux zeus 2.6.35.9-64.fc14.x86_64 #1 SMP Fri Dec 3 12:19:41 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 22 Andreas M. Kirchwitz 2010-12-25 19:14:16 UTC
Yes, I can second that. The SATA errors still occur with Fedora 14
(about every 1-2 days)

# uname -a
Linux lakai 2.6.35.10-72.fc14.i686 #1 SMP Mon Dec 20 22:05:49 UTC 2010 i686 i686 i386 GNU/Linux

(excerpt from /var/log/messages)
Dec 25 05:42:58 lakai kernel: [56212.373428] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Dec 25 05:42:58 lakai kernel: [56212.373436] ata1.00: failed command: SMART
Dec 25 05:42:58 lakai kernel: [56212.373443] ata1.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 25 05:42:58 lakai kernel: [56212.373445]          res 51/84:00:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
Dec 25 05:42:58 lakai kernel: [56212.373449] ata1.00: status: { DRDY ERR }
Dec 25 05:42:58 lakai kernel: [56212.373451] ata1.00: error: { ICRC ABRT }
Dec 25 05:42:58 lakai kernel: [56212.373458] ata1: hard resetting link
Dec 25 05:42:58 lakai kernel: [56212.373461] ata1: nv: skipping hardreset on occupied port
Dec 25 05:42:58 lakai kernel: [56212.827054] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 05:42:58 lakai kernel: [56212.846436] ata1.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80)
Dec 25 05:42:58 lakai kernel: [56212.846441] ata1.00: revalidation failed (errno=-5)
Dec 25 05:43:03 lakai kernel: [56217.827038] ata1: hard resetting link
Dec 25 05:43:03 lakai kernel: [56217.827042] ata1: nv: skipping hardreset on occupied port
Dec 25 05:43:04 lakai kernel: [56218.281040] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 05:43:04 lakai kernel: [56218.319324] ata1.00: configured for UDMA/133
Dec 25 05:43:04 lakai kernel: [56218.319358] ata1: EH complete

As said in Bug comment #14, my hardware setup is different, so the problem seems to be more general.

Comment 23 Raman Gupta 2010-12-26 00:24:43 UTC
This seems to be happening more frequently now, even with smartd and hddtemp turned off for these drives. The failed commands are either "SMART" or "IDENTIFY DEVICE". See below.

==============================================================
# grep -A 8 exception /var/log/messages
Dec 19 19:04:16 zeus kernel: [310072.720043] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 19 19:04:16 zeus kernel: [310072.720050] ata6.00: failed command: SMART
Dec 19 19:04:16 zeus kernel: [310072.720058] ata6.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 19 19:04:16 zeus kernel: [310072.720059]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 19 19:04:16 zeus kernel: [310072.720073] ata6.00: status: { DRDY }
Dec 19 19:04:16 zeus kernel: [310072.720078] ata6: hard resetting link
Dec 19 19:04:16 zeus kernel: [310073.025061] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 19 19:04:16 zeus kernel: [310073.027358] ata6.00: configured for UDMA/133
Dec 19 19:04:16 zeus kernel: [310073.027382] ata6: EH complete
--
Dec 20 16:04:16 zeus kernel: [385672.720212] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 20 16:04:16 zeus kernel: [385672.720219] ata6.00: failed command: SMART
Dec 20 16:04:16 zeus kernel: [385672.720227] ata6.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 20 16:04:16 zeus kernel: [385672.720228]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 20 16:04:16 zeus kernel: [385672.720232] ata6.00: status: { DRDY }
Dec 20 16:04:16 zeus kernel: [385672.720240] ata6: hard resetting link
Dec 20 16:04:16 zeus kernel: [385673.025074] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 20 16:04:16 zeus kernel: [385673.027455] ata6.00: configured for UDMA/133
Dec 20 16:04:16 zeus kernel: [385673.027471] ata6: EH complete
--
Dec 20 23:04:16 zeus kernel: [410872.705189] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 20 23:04:16 zeus kernel: [410872.705194] ata6.00: failed command: SMART
Dec 20 23:04:16 zeus kernel: [410872.705198] ata6.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 20 23:04:16 zeus kernel: [410872.705199]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 20 23:04:16 zeus kernel: [410872.705202] ata6.00: status: { DRDY }
Dec 20 23:04:16 zeus kernel: [410872.705207] ata6: hard resetting link
Dec 20 23:04:16 zeus kernel: [410873.010415] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 20 23:04:16 zeus kernel: [410873.012707] ata6.00: configured for UDMA/133
Dec 20 23:04:16 zeus kernel: [410873.012748] ata6: EH complete
--
Dec 22 23:04:16 zeus kernel: [583672.704056] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 22 23:04:16 zeus kernel: [583672.704064] ata7.00: failed command: SMART
Dec 22 23:04:16 zeus kernel: [583672.704072] ata7.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 22 23:04:16 zeus kernel: [583672.704073]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 22 23:04:16 zeus kernel: [583672.704078] ata7.00: status: { DRDY }
Dec 22 23:04:16 zeus kernel: [583672.704086] ata7: hard resetting link
Dec 22 23:04:16 zeus kernel: [583673.010041] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 22 23:04:16 zeus kernel: [583673.012419] ata7.00: configured for UDMA/133
Dec 22 23:04:16 zeus kernel: [583673.012437] ata7: EH complete
Dec 22 23:34:16 zeus kernel: [585472.704175] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 22 23:34:16 zeus kernel: [585472.704181] ata6.00: failed command: SMART
Dec 22 23:34:16 zeus kernel: [585472.704189] ata6.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 22 23:34:16 zeus kernel: [585472.704191]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 22 23:34:16 zeus kernel: [585472.704195] ata6.00: status: { DRDY }
Dec 22 23:34:16 zeus kernel: [585472.704203] ata6: hard resetting link
Dec 22 23:34:16 zeus kernel: [585473.009174] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 22 23:34:16 zeus kernel: [585473.011526] ata6.00: configured for UDMA/133
Dec 22 23:34:16 zeus kernel: [585473.011559] ata6: EH complete
--
Dec 24 05:04:16 zeus kernel: [691672.720041] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 24 05:04:16 zeus kernel: [691672.720047] ata6.00: failed command: IDENTIFY DEVICE
Dec 24 05:04:16 zeus kernel: [691672.720055] ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Dec 24 05:04:16 zeus kernel: [691672.720057]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 24 05:04:16 zeus kernel: [691672.720061] ata6.00: status: { DRDY }
Dec 24 05:04:16 zeus kernel: [691672.720069] ata6: hard resetting link
Dec 24 05:04:16 zeus kernel: [691673.025073] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 24 05:04:16 zeus kernel: [691673.027400] ata6.00: configured for UDMA/133
Dec 24 05:04:16 zeus kernel: [691673.027418] ata6: EH complete
--
Dec 24 05:34:16 zeus kernel: [693472.704175] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 24 05:34:16 zeus kernel: [693472.704181] ata6.00: failed command: IDENTIFY DEVICE
Dec 24 05:34:16 zeus kernel: [693472.704189] ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Dec 24 05:34:16 zeus kernel: [693472.704191]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 24 05:34:16 zeus kernel: [693472.704195] ata6.00: status: { DRDY }
Dec 24 05:34:16 zeus kernel: [693472.704203] ata6: hard resetting link
Dec 24 05:34:16 zeus kernel: [693473.009165] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 24 05:34:16 zeus kernel: [693473.011526] ata6.00: configured for UDMA/133
Dec 24 05:34:16 zeus kernel: [693473.011552] ata6: EH complete
--
Dec 24 15:34:16 zeus kernel: [729472.704061] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 24 15:34:16 zeus kernel: [729472.704068] ata6.00: failed command: IDENTIFY DEVICE
Dec 24 15:34:16 zeus kernel: [729472.704076] ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Dec 24 15:34:16 zeus kernel: [729472.704078]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 24 15:34:16 zeus kernel: [729472.704082] ata6.00: status: { DRDY }
Dec 24 15:34:16 zeus kernel: [729472.704090] ata6: hard resetting link
Dec 24 15:34:16 zeus kernel: [729473.010041] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 24 15:34:16 zeus kernel: [729473.012327] ata6.00: configured for UDMA/133
Dec 24 15:34:16 zeus kernel: [729473.012348] ata6: EH complete
--
Dec 25 08:04:16 zeus kernel: [788872.704075] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 25 08:04:16 zeus kernel: [788872.704082] ata7.00: failed command: SMART
Dec 25 08:04:16 zeus kernel: [788872.704090] ata7.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 25 08:04:16 zeus kernel: [788872.704092]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 25 08:04:16 zeus kernel: [788872.704096] ata7.00: status: { DRDY }
Dec 25 08:04:16 zeus kernel: [788872.704104] ata7: hard resetting link
Dec 25 08:04:16 zeus kernel: [788873.009301] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 08:04:16 zeus kernel: [788873.011659] ata7.00: configured for UDMA/133
Dec 25 08:04:16 zeus kernel: [788873.011676] ata7: EH complete
--
Dec 25 18:04:16 zeus kernel: [824872.704118] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 25 18:04:16 zeus kernel: [824872.704124] ata7.00: failed command: IDENTIFY DEVICE
Dec 25 18:04:16 zeus kernel: [824872.704132] ata7.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Dec 25 18:04:16 zeus kernel: [824872.704134]          res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 25 18:04:16 zeus kernel: [824872.704138] ata7.00: status: { DRDY }
Dec 25 18:04:16 zeus kernel: [824872.704146] ata7: hard resetting link
Dec 25 18:04:16 zeus kernel: [824873.011302] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 25 18:04:16 zeus kernel: [824873.013643] ata7.00: configured for UDMA/133
Dec 25 18:04:16 zeus kernel: [824873.013661] ata7: EH complete
==============================================================

Comment 24 Raman Gupta 2010-12-29 23:09:01 UTC
Still happening on 2.6.35.10-74, though frequency seems to be less.

# uname -a
Linux zeus 2.6.35.10-74.fc14.x86_64 #1 SMP Thu Dec 23 16:04:50 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Dec 29 13:04:10 zeus kernel: [128122.720420] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 29 13:04:10 zeus kernel: [128122.720427] ata7.00: failed command: SMART
Dec 29 13:04:10 zeus kernel: [128122.720434] ata7.00: cmd b0/d1:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Dec 29 13:04:10 zeus kernel: [128122.720436]          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 29 13:04:10 zeus kernel: [128122.720440] ata7.00: status: { DRDY }
Dec 29 13:04:10 zeus kernel: [128122.720448] ata7: hard resetting link
Dec 29 13:04:11 zeus kernel: [128123.025427] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 29 13:04:11 zeus kernel: [128123.027849] ata7.00: configured for UDMA/133
Dec 29 13:04:11 zeus kernel: [128123.027880] ata7: EH complete

Comment 25 Neil Hooey 2011-02-06 17:34:58 UTC
I was getting "failed command: IDENTIFY DEVICE" and "FLUSH CACHE" in Ubuntu 10.04, but after having installed Fedore Core 14, the problem doesn't happen anymore.

Here is my Fedora dmesg:
http://pastie.org/1534327

More details at my StackExchange question:
http://askubuntu.com/questions/16608/how-do-you-fix-failed-command-identify-device-showing-up-in-dmesg

Comment 26 Raman Gupta 2011-02-06 17:57:55 UTC
Neil -- interesting. You seem to be using an older Fedora 14 kernel -- can you try with the latest one? Because I'm still having this happen at least once/day with 2.6.35.10-74.fc14.x86_64, even with smart turned off.

Comment 27 Reartes Guillermo 2011-02-12 20:00:43 UTC
Are you currently using any special kernel parameters for F14?

can you upload, for a normal boot (with no special parameters / booting the live-cd may be enough to see the error, at least for me)
 
# lspci -vvv

At any point did you try the kernel parameter pcie_aspm=off (if true, did it 
work?, and in that case also upload a lspci -vvv)

Comment 28 Raman Gupta 2011-02-19 17:54:03 UTC
So far in five days with kernel 2.6.35.11-83.fc14.x86_64, I haven't had this exception, though I am still running without smartd and hddtemp. Will continue to monitor and if the exception does not occur again for a couple of weeks, I'll take the next step and re-enable smartd and hddtemp.

Comment 29 Neil Hooey 2011-02-19 18:39:04 UTC
@Raman Gupta:
Unfortunately I use Ubuntu, and have installed over Fedora Core 14 with Ubuntu 10.10 Maverick, but I don't get any "failed command:" strings in my dmesg anymore. The Kernel Ubuntu 10.10 uses is: 2.6.35-25-generic

@Reartes Guillermo:
I'll try booting off the livecd and get the output of `lspci -vvv` for you soon.

Comment 30 Raman Gupta 2011-02-25 20:40:56 UTC
Created attachment 481078 [details]
lspci -vvv as of 02/25/2011, kernel 2.6.35.11-83.fc14.x86_64

Update on kernel 2.6.35.11-83.fc14.x86_64. The problem happened again.

In my case I have no special kernel parameters except the one necessary to use the AHCI driver with my Marvell controller card: ahci.marvell_enable=1. The drives do not work at all without this setting.

I have attached the results of lspci -vvv. Note there are two SATA controllers in this machine -- the one that gives this problem is:

03:00.0 SATA controller: Marvell Technology Group Ltd. 88SE6145 SATA II PCI-E controller (rev a1) (prog-if 8f)

Comment 31 Marc Eberhard 2011-03-02 13:02:51 UTC
Same problem here on:

Linux frontend-0 2.6.35.11-83.fc14.x86_64 #1 SMP Mon Feb 7 07:06:44 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

[57188.721348] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[57188.721497] ata6.00: failed command: READ DMA EXT
[57188.721639] ata6.00: cmd 25/00:28:cf:44:15/00:01:09:00:00/e0 tag 0 dma 151552 in
[57188.721641]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[57188.722189] ata6.00: status: { DRDY }
[57188.722349] ata6: hard resetting link
[57189.178055] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[57189.184355] ata6.00: configured for UDMA/100
[57189.184497] ata6.00: device reported invalid CHS sector 0
[57189.184643] ata6: EH complete
[57432.817442] ata6.00: limiting speed to UDMA/33:PIO4
[57432.817585] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[57432.817724] ata6.00: failed command: READ DMA EXT
[57432.817863] ata6.00: cmd 25/00:88:b7:a4:7c/00:01:0b:00:00/e0 tag 0 dma 200704 in
[57432.817865]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[57432.818416] ata6.00: status: { DRDY }
[57432.818579] ata6: hard resetting link
[57433.275055] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[57433.281354] ata6.00: configured for UDMA/33
[57433.281498] ata6.00: device reported invalid CHS sector 0
[57433.281643] ata6: EH complete
[57714.729296] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[57714.729442] ata6.00: failed command: READ DMA EXT
[57714.729584] ata6.00: cmd 25/00:b0:3f:70:35/00:03:0e:00:00/e0 tag 0 dma 483328 in
[57714.729586]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[57714.730127] ata6.00: status: { DRDY }
[57714.730279] ata6: hard resetting link
[57715.186058] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[57715.192356] ata6.00: configured for UDMA/33
[57715.192499] ata6.00: device reported invalid CHS sector 0
[57715.192648] ata6: EH complete
[57778.721359] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[57778.721508] ata6.00: failed command: READ DMA EXT
[57778.721652] ata6.00: cmd 25/00:00:df:0f:9a/00:04:0e:00:00/e0 tag 0 dma 524288 in
[57778.721655]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[57778.722199] ata6.00: status: { DRDY }
[57778.722357] ata6: hard resetting link
[57779.178057] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[57779.184356] ata6.00: configured for UDMA/33
[57779.184501] ata6.00: device reported invalid CHS sector 0
[57779.184654] ata6: EH complete
[57820.721362] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[57820.721510] ata6.00: failed command: READ DMA EXT
[57820.721653] ata6.00: cmd 25/00:f8:47:f4:af/00:02:0e:00:00/e0 tag 0 dma 389120 in
[57820.721655]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[57820.722201] ata6.00: status: { DRDY }
[57820.722365] ata6: hard resetting link

Comment 32 Tomas Kouba 2011-03-23 14:34:44 UTC
I have the same problem with rhel5 and vanilla kernel 2.6.37.3 (I need the new kernel to switch marvell to AHCI).
It is asus hummingbird motherboard with two controllers:

00:1f.2 SATA controller: Intel Corporation N10/ICH7 Family SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0])
no problem with this one.

04:00.0 RAID bus controller: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b2) (prog-if 8f)
this one causes the problems.

Comment 33 A. Mani 2011-04-10 15:32:01 UTC
ata4.00: exception Emask 0x10 SAct 0x1 SErr 0x90200 action 0xe frozen
[  169.470937] ata4.00: irq_stat 0x00400000, PHY RDY changed
[  169.473085] ata4: SError: { Persist PHYRdyChg 10B8B }
[  169.475236] ata4.00: failed command: READ FPDMA QUEUED
[  169.477392] ata4.00: cmd 60/08:00:00:b0:b3/00:00:2b:00:00/40 tag 0 ncq 4096 in
[  169.477394]          res 40/00:00:00:b0:b3/00:00:2b:00:00/40 Emask 0x10 (ATA bus error)
[  169.481670] ata4.00: status: { DRDY }
[  169.483793] ata4: hard resetting link
[  173.423038] ata4: softreset failed (device not ready)


x86-64 Fedora 14, 15beta(livecd); 


00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA (prog-if 01 [AHCI 1.0])
        Subsystem: Micro-Star International Co., Ltd. Device 7329
        Flags: bus master, 66MHz, medium devsel, latency 96, IRQ 22
        I/O ports at b000 [size=8]
        I/O ports at a000 [size=4]
        I/O ports at 9000 [size=8]
        I/O ports at 8000 [size=4]
        I/O ports at 7000 [size=16]
        Memory at fe8ff800 (32-bit, non-prefetchable) [size=1K]
        Capabilities: <access denied>
        Kernel driver in use: ahci


Is it 
a) controller out of order
b) Mainboard issue
c) Power issue

Comment 34 Reartes Guillermo 2011-04-10 15:55:12 UTC
Did anybody ever tried booting with the kernel parameter: pcie_aspm=off ?

For me it worked (nvidia), despite that i dont know really why. (by accident?)
 
I noticed that it changes pcie/pci config stuff not related to pcie_aspm in my case at least (of course, it ALSO disables pcie_aspm functionality)
I noticed it comparing different distros/kernels with and without pcie_aspm. Lspci -vvv showed that in my case, F13 changed link speed and others. But the parameter solved my problems, at least empirically.

I now run F15 with litle or no issues, i still have F13 installed to perform test if desired.

Comment 35 A. Mani 2011-04-10 16:26:19 UTC
Addendum: For the SB600 controller, the right thing to do is to restrict all drives to 1.5Gbps by jumpers or with a boot option.

http://forums.opensuse.org/english/get-technical-help-here/hardware/444228-delay-libata-when-connecting-esata.html

@Guillermo, PCIE_ASPM is also set by bios. Possibly it is mis-configured. I will check my system

Comment 36 Raman Gupta 2011-05-10 17:07:00 UTC
I am still having this problem on kernel 2.6.35.13-91.fc14.x86_64.

> Did anybody ever tried booting with the kernel parameter: pcie_aspm=off ?

I tried this and it did not fix the problem.

> Addendum: For the SB600 controller, the right thing to do is to restrict all
drives to 1.5Gbps by jumpers or with a boot option.

I also tried this -- but with this setting all drives attached to my Marvell controller could not even be started by the kernel -- permanent "failed to IDENTIFY" errors.

Comment 37 Nikita Staroverov 2011-07-03 14:43:34 UTC
I have the same problem. Yesterday, i update my fedora from 14 to 15 and got the same messages in dmesg:

[ 1400.032052] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 1400.032064] ata2.00: failed command: FLUSH CACHE EXT
[ 1400.032080] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 1400.032083]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1400.032090] ata2.00: status: { DRDY }
[ 1400.032102] ata2: hard resetting link
[ 1400.337036] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1400.346998] ata2.00: configured for UDMA/133
[ 1400.347032] ata2.00: retrying FLUSH 0xea Emask 0x4
[ 1400.347094] ata2.00: device reported invalid CHS sector 0
[ 1400.347110] ata2: EH complete

I tried all the recipes of the above and nothing works.

Comment 38 Raman Gupta 2011-08-30 02:35:24 UTC
Still happening on Fedora 14 kernel 2.6.35.14-95.fc14.x86_64:

Aug 29 18:55:37 zeus kernel: [27239.712064] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 29 18:55:37 zeus kernel: [27239.712072] ata8.00: failed command: SMART
Aug 29 18:55:37 zeus kernel: [27239.712080] ata8.00: cmd b0/d0:01:00:4f:c2/00:00:00:00:00/00 tag 0 pio 512 in
Aug 29 18:55:37 zeus kernel: [27239.712082]          res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 29 18:55:37 zeus kernel: [27239.712086] ata8.00: status: { DRDY }
Aug 29 18:55:37 zeus kernel: [27239.712094] ata8: hard resetting link
Aug 29 18:55:37 zeus kernel: [27240.017054] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 29 18:55:37 zeus kernel: [27240.019373] ata8.00: configured for UDMA/133
Aug 29 18:55:37 zeus kernel: [27240.019395] ata8: EH complete

Comment 39 Ken Hall 2011-10-19 13:33:51 UTC
Seeing this on Fedora 15 at 2.6.40.6-0.fc15.x86_64 on MSI mb with AMD SB850 controller and ST31000528AS Rev: CC38

There are 4 drives in this machine, but this only appears to happen on this particular one.

[82172.704050] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[82172.704109] ata2.00: failed command: WRITE DMA EXT
[82172.704166] ata2.00: cmd 35/00:00:00:3c:36/00:04:0f:00:00/e0 tag 0 dma 524288 out
[82172.704167]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[82172.704383] ata2.00: status: { DRDY }
[82172.704444] ata2: hard resetting link
[82182.704036] ata2: softreset failed (1st FIS failed)
[82182.704100] ata2: hard resetting link
[82186.428033] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[82186.430445] ata2.00: configured for UDMA/33
[82186.430509] ata2.00: device reported invalid CHS sector 0
[82186.430573] ata2: EH complete
[82433.712040] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[82433.712099] ata2.00: failed command: WRITE DMA EXT
[82433.713594] ata2.00: cmd 35/00:00:00:84:3c/00:04:0f:00:00/e0 tag 0 dma 524288 out
[82433.713595]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[82433.713811] ata2.00: status: { DRDY }
[82433.713867] ata2: hard resetting link
[82440.118034] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[82440.331449] ata2.00: configured for UDMA/33
[82440.331509] ata2.00: device reported invalid CHS sector 0
[82440.331568] ata2: EH complete

Comment 40 Ken Hall 2011-10-20 16:11:34 UTC
Correction:  Last item was using SB710 controller, but similar symptoms using  add-on SiI 3512 adapter

Comment 41 Ken Hall 2011-10-22 22:42:08 UTC
Replaced Seagate ST31000528AS with WDC WD20EARS, same symptom.  Also replaced cable.  This is definitely not a hardware problem, but it only appears to happen with recent vintage drives 1TB or greater.  Smaller drives on this same system are not affected.

Comment 42 Mike Eddy 2012-02-12 21:42:12 UTC
Still happening on 2.6.41.10-3.fc15.x86_64 

---

ASUS P8H67-M PRO/CSM  LGA 1155 Corporate Stable Model Intel H67 DDR3 1333
Western Digital WD20EARS

---

Feb 11 13:05:46 <hostname> kernel: [167513.519922] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 frozen
Feb 11 13:05:46 <hostname> kernel: [167513.519925] ata1.01: failed command: IDENTIFY DEVICE
Feb 11 13:05:46 <hostname> kernel: [167513.519929] ata1.01: cmd ec/00:01:00:00:00/00:00:00:00:00/10 tag 0 pio 512 in
Feb 11 13:05:46 <hostname> kernel: [167513.519930]          res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
Feb 11 13:05:46 <hostname> kernel: [167513.519932] ata1.01: status: { DRDY }
Feb 11 13:05:46 <hostname> kernel: [167513.519938] ata1.00: hard resetting link
Feb 11 13:05:46 <hostname> kernel: [167513.824625] ata1.01: hard resetting link
Feb 11 13:05:52 <hostname> kernel: [167519.325108] ata1.00: link is slow to respond, please be patient (ready=0)
Feb 11 13:05:56 <hostname> kernel: [167523.554630] ata1.00: SRST failed (errno=-16)
Feb 11 13:05:56 <hostname> kernel: [167523.554636] ata1.00: hard resetting link
Feb 11 13:05:56 <hostname> kernel: [167523.859387] ata1.01: hard resetting link
Feb 11 13:06:02 <hostname> kernel: [167529.359866] ata1.00: link is slow to respond, please be patient (ready=0)
Feb 11 13:06:06 <hostname> kernel: [167533.589398] ata1.00: SRST failed (errno=-16)
Feb 11 13:06:06 <hostname> kernel: [167533.589405] ata1.00: hard resetting link
Feb 11 13:06:06 <hostname> kernel: [167533.894151] ata1.01: hard resetting link
Feb 11 13:06:12 <hostname> kernel: [167539.394636] ata1.00: link is slow to respond, please be patient (ready=0)
Feb 11 13:06:41 <hostname> kernel: [167568.593678] ata1.00: SRST failed (errno=-16)
Feb 11 13:06:41 <hostname> kernel: [167568.593684] ata1.00: limiting SATA link speed to 1.5 Gbps
Feb 11 13:06:41 <hostname> kernel: [167568.593688] ata1.01: limiting SATA link speed to 1.5 Gbps
Feb 11 13:06:41 <hostname> kernel: [167568.593693] ata1.00: hard resetting link
Feb 11 13:06:41 <hostname> kernel: [167568.898431] ata1.01: hard resetting link
Feb 11 13:06:46 <hostname> kernel: [167573.634563] ata1.00: SRST failed (errno=-16)
Feb 11 13:06:46 <hostname> kernel: [167573.645857] ata1.00: reset failed, giving up
Feb 11 13:06:46 <hostname> kernel: [167573.645859] ata1.00: disabled
Feb 11 13:06:46 <hostname> kernel: [167573.645860] ata1.01: disabled
Feb 11 13:06:46 <hostname> kernel: [167573.645870] ata1: EH complete
Feb 11 13:06:46 <hostname> ata_id[23723]: HDIO_GET_IDENTITY failed for '/dev/sr0'

---

Feb 11 18:31:58 <hostname> smartd[839]: Device: /dev/sda [SAT], SLEEP mode ignored due to reached limit of skipped checks (10 checks skipped)

Feb 11 18:31:58 <hostname> smartd[839]: Device: /dev/sda [SAT], not capable of SMART self-check
Feb 11 18:31:58 <hostname> smartd[839]: Sending warning via /usr/libexec/smartmontools/smartdnotify to root ...
Feb 11 18:31:58 <hostname> smartd[839]: Warning via /usr/libexec/smartmontools/smartdnotify to root: successful
WARNING: Your hard drive is failing
Device: /dev/sda [SAT], not capable of SMART self-check

Feb 11 18:31:58 <hostname> smartd[839]: Device: /dev/sda [SAT], failed to read SMART Attribute Data
Feb 11 18:31:58 <hostname> smartd[839]: Sending warning via /usr/libexec/smartmontools/smartdnotify to root ...
Feb 11 18:31:58 <hostname> smartd[839]: Warning via /usr/libexec/smartmontools/smartdnotify to root: successful
WARNING: Your hard drive is failing
Device: /dev/sda [SAT], failed to read SMART Attribute Data

---

Message from syslogd@<hostname> at Feb 12 02:35:04 ...
 kernel:[216031.865936] journal commit I/O error

Message from syslogd@<hostname> at Feb 12 02:35:04 ...
 kernel:[216031.965940] journal commit I/O error

Message from syslogd@<hostname> at Feb 12 02:35:04 ...
 kernel:[216032.065959] journal commit I/O error

Message from syslogd@<hostname> at Feb 12 02:35:04 ...
 kernel:[216032.165924] journal commit I/O error

Message from syslogd@<hostname> at Feb 12 02:35:05 ...
 kernel:[216032.265933] journal commit I/O error

---

2012-02-12 02:35:05.081 TFW, Error: safe_write(): File I/O  errcnt: 1
                        eno: Read-only file system (30) errno: 30
2012-02-12 02:35:05.111 TFW: safe_write(): funky usleep
.
.
.
2012-02-12 07:18:33.386 TFW: safe_write(): funky usleep
2012-02-12 07:18:33.418 TFW, Error: safe_write(): File I/O  errcnt: 3
                        eno: Read-only file system (30) errno: 30

Comment 43 Ken Hall 2012-02-27 01:41:50 UTC
Replaced failed Seagate ST4200320AS with WDC WD5000AAKX-0, and problem appeared on the new drive.  This is still happening as of kernel 3.2.7-1.fc16.x86_64.  My original report of this was over four months ago, and there has not been any update despite others reporting identical issues up through Fedora 15.  

It seems to be related to recent drives, as older models do not have the problem.  I am successfully running the large drives previously reported on USB adapters since they do not work directly connected on SATA, however this new drive is part of a RAID1 pair, so this is not practical.  Can we get any idea whether or not this is being looked at?  This is all recent hardware, and nothing exotic.  Any assistance would be appreciated.

Comment 44 kevinfishburne 2012-03-20 19:04:30 UTC
Created attachment 571505 [details]
dmesg after successful boot with errors

Kubuntu 11.10, ia64.

Comment 45 kevinfishburne 2012-03-20 19:05:29 UTC
Created attachment 571506 [details]
lspci -vvv after successful boot with errors

Kubuntu 11.10, ia64.

Comment 46 kevinfishburne 2012-03-20 19:07:17 UTC
Bug's occurring for me in Kubuntu 11.10, kernel 3.0.0-16-generic. I didn't have any problems for some months and have checked the drive and replaced the cable and all appears well. Sometimes it has no problem at all, but usually I have to power-cycle (not reset) my PC multiple times for it to boot. I'm attaching lspci -vvv and dmesg.

Comment 47 Bill McGonigle 2012-04-23 20:45:50 UTC
[Raman, please bump version to 16 per comment #43]

I'm seeing a similar exception here reading from an optical drive (k3b verify phase, so heavy i/o):

[140064.704061] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[140064.704075] sr 2:0:0:0: [sr0] CDB: Read(10): 28 00 00 1e 08 00 00 00 01 00
[140064.704104] ata3.00: cmd a0/01:00:00:00:08/00:00:00:00:00/a0 tag 0 dma 2048 in
[140064.704107]          res 40/00:03:00:00:fc/00:00:00:00:00/a0 Emask 0x4 (timeout)
[140064.704115] ata3.00: status: { DRDY }
[140064.704125] ata3: hard resetting link
[140065.164062] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[140065.165171] ata3.00: configured for UDMA/100
[140065.170683] ata3: EH complete

on:
  3.3.0-4.fc16.x86_64
  00:11.0 SATA controller: Advanced Micro Devices [AMD] Hudson SATA Controller [AHCI mode] (rev 40)

I'm really not clear on what software tools we have available to isolate these bugs among linux, chipset, bios, and drive firmware bugs.  Suggestions appreciated.

Comment 48 Ken Hall 2012-04-26 18:08:55 UTC
Seagate replaced original 500GB drive with ST3500320AS.  Original drive had no problem until it failed.  New drive fails out of the box on Linux using SATA interface, but works perfectly plugged into USB adapter.  Common denominator appears to be that these are SATA-III (6GB/s) drives running on SATA-II (3GB/s) controllers.  Slowing SATA-II drives to 1.5GB/s via jumpers corrected this on older drives, but SATA-III drives can only be slowed to 3GB/s via jumper and experience this error consistently.  As I said, I've tested this using both on-board and add-on SATA controllers, and with both Seagate and WD drives, and the error is consistent.  Latest versions lock up so bad it takes a power-off to make the drive visible again, but tests show the drive is fine.  System is F16 at latest kernel.  Willing to re-test any combination of hardware I have available if it will help.

Comment 49 Anonymous account 2012-05-12 18:00:01 UTC
Is this related to the closed bug (not for me) Bug 577936.
I have a similar configuration like in 577936, and I get more or less the same errors - READ FPDMA QUEUED and exception emask.
Hardware:
Asus p6t
Asus U3S6 sata3/usb3 card (uses the marvel 88se9123 sata3 controller)
Crucial SSD C300 (128GB)

Comment 50 mahmoud hanafi 2012-05-16 23:12:19 UTC
We have seen this issue as well since upgrading to RedHat6 from Suse10.

ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x400000 action 0x6 frozen
ata3: SError: { Handshk }
ata3.00: failed command: WRITE DMA EXT
ata3.00: cmd 35/00:02:53:50:2b/00:00:1d:00:00/e0 tag 0 dma 1024 out
         res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3: hard resetting link
ata3: nv: skipping hardreset on occupied port




We had tried lots things.

But what fix the issue was setting acpi=off at the kernel line. Some observations from 130 node cluster...
1. The issue would occur after the nodes sat idle.
2. 1TB seagate drive didn't have the issue and only 750GB seagate drives did.
3. acpi=off always fixed the issue. 

So it would seem that this is related to acpi. It turned out that the 1TB drives didn't support APM_level setting and the 750GB did. With acpi=off the kernel sets the 750GB drives APM_level to 254, which is the highest. With acpi=on the kernel set APM_level to 60. 

I think whats happening is the sata_nv driver is not dealing with a drive in a power save mode and marks the drive as frozen. Or its not able to wake up the drive....

You can disable APM_level by
hdparm -B 255 /dev/sda
 

I wrote a udev rule to do this
KERNEL=="sd*", BUS=="scsi", SYSFS{vendor}=="ATA", PROGRAM=="/sbin/hdparm -B 255 /dev/%k"

Comment 51 Raman Gupta 2012-05-16 23:41:29 UTC
(In reply to comment #50)
> But what fix the issue was setting acpi=off at the kernel line. Some
> observations from 130 node cluster...

You may be seeing a different issue. As per "Additional info:" section of the bug description, setting acpi=off does NOT fix the issue reported.

Comment 52 Brian Lane 2012-07-11 17:11:05 UTC
I've now seen this 2 times in the last month. Not often, but worrisome.

My setup is 2 Seagate 1.5T drives (ST31500341AS) a sata cdrom and a Seagate 500G (ST3500830AS). The 1.5T drives are using mdraid in RAID1 and the 500G is stand-alone. When this happens it drops the bus that has one of the 1.5T drives and the 500G drive and I have to reboot to get them to respond again.

None of these drives support the hdparm -B option so I can't tell if it is a powerdown problem or not. It doesn't happen often enough for me to have any luck trying different kernel options.

The motherboard is a Gigabyte Z68P-DS3 and the SATA chipset is: Intel Corporation 6 Series/C200 Series Chipset Family 4 port SATA IDE Controller (rev 05) (prog-if 8f [Master SecP SecO PriP PriO])


Here's the relevant part of /var/log/messages:

Jul 10 21:18:16 lister kernel: [136925.562492] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 10 21:18:16 lister kernel: [136925.562497] ata2.00: failed command: FLUSH CACHE EXT
Jul 10 21:18:16 lister kernel: [136925.562511] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jul 10 21:18:16 lister kernel: [136925.562512]          res 40/00:01:00:4f:c2/00:00:00:00:00/10 Emask 0x4 (timeout)
Jul 10 21:18:16 lister kernel: [136925.562514] ata2.00: status: { DRDY }
Jul 10 21:18:16 lister kernel: [136925.562521] ata2.00: hard resetting link
Jul 10 21:18:16 lister kernel: [136925.867157] ata2.01: hard resetting link
Jul 10 21:18:22 lister kernel: [136931.366661] ata2.00: link is slow to respond, please be patient (ready=0)
Jul 10 21:18:26 lister kernel: [136935.595344] ata2.00: SRST failed (errno=-16)
Jul 10 21:18:26 lister kernel: [136935.595351] ata2.00: hard resetting link
Jul 10 21:18:26 lister kernel: [136935.900017] ata2.01: hard resetting link
Jul 10 21:18:32 lister kernel: [136941.399485] ata2.00: link is slow to respond, please be patient (ready=0)
Jul 10 21:18:36 lister kernel: [136945.628177] ata2.00: SRST failed (errno=-16)
Jul 10 21:18:36 lister kernel: [136945.628186] ata2.00: hard resetting link
Jul 10 21:18:36 lister kernel: [136945.932937] ata2.01: hard resetting link
Jul 10 21:18:42 lister kernel: [136951.432347] ata2.00: link is slow to respond, please be patient (ready=0)
Jul 10 21:19:11 lister kernel: [136980.625846] ata2.00: SRST failed (errno=-16)
Jul 10 21:19:11 lister kernel: [136980.625850] ata2.00: limiting SATA link speed to 1.5 Gbps
Jul 10 21:19:11 lister kernel: [136980.625853] ata2.01: limiting SATA link speed to 1.5 Gbps
Jul 10 21:19:11 lister kernel: [136980.625857] ata2.00: hard resetting link
Jul 10 21:19:11 lister kernel: [136980.930475] ata2.01: hard resetting link
Jul 10 21:19:16 lister kernel: [136985.665730] ata2.00: SRST failed (errno=-16)
Jul 10 21:19:16 lister kernel: [136985.676674] ata2.00: reset failed, giving up
Jul 10 21:19:16 lister kernel: [136985.676677] ata2.00: disabled
Jul 10 21:19:16 lister kernel: [136985.676680] ata2.01: disabled
Jul 10 21:19:16 lister kernel: [136985.676683] ata2.00: device reported invalid CHS sector 0
Jul 10 21:19:16 lister kernel: [136985.676689] ata2: EH complete

Comment 53 Brian Lane 2012-07-11 17:17:08 UTC
Forgot to add: kernel is 3.4.4-4.fc16.x86_64

Comment 54 Raman Gupta 2012-07-11 17:24:02 UTC
OP here... I replaced the machine that was producing this error. My new machines have significantly newer and different hardware, and do not demonstrate this problem. Therefore, I will not be able to confirm resolution of this (doesn't seem like it ever will be closed, or even worked on, anyway).

Comment 55 Aaron Hamid 2012-07-23 03:08:38 UTC
Started getting these after attaching (unformatted) external WDC WD3200AUDX-7 drive via eSATA.  Initial connection:

ata6: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen
ata6: irq_stat 0x00400040, connection status changed
ata6: SError: { PHYRdyChg CommWake DevExch }
ata6: hard resetting link
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.00: ATA-8: WDC WD3200AUDX-73MK3Y0, 01.01A01, max UDMA/133
ata6.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
ata6.00: configured for UDMA/133
ata6: EH complete
scsi 5:0:0:0: Direct-Access     ATA      WDC WD3200AUDX-7 01.0 PQ: 0 ANSI: 5
sd 5:0:0:0: [sdd] 625142448 512-byte logical blocks: (320 GB/298 GiB)
sd 5:0:0:0: [sdd] 4096-byte physical blocks
sd 5:0:0:0: [sdd] Write Protect is off
sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 5:0:0:0: Attached scsi generic sg4 type 0
 sdd: unknown partition table
sd 5:0:0:0: [sdd] Attached SCSI disk

Thereafter these repeat, looks like every 30 mins (which would coincide w/ smartd config):

ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata6.00: failed command: IDENTIFY DEVICE
ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata6.00: status: { DRDY }
ata6: hard resetting link
ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata6.00: configured for UDMA/133
ata6: EH complete

smartd is running and configured to skip check in sleep and standby.

Comment 56 Reartes Guillermo 2012-08-13 19:40:48 UTC
There are 17 people (not counting me nor the original reporter).

Brief description:

@Raman Gupta

* ORIGINAL REPORTER
* F12, then F14
Intel N10/ICH7 +  Marvell Technology Group Ltd. 88SE6145 + SEAGATE
* exeptions on drives attached to marvell controller (usually smart commands)
  hard reset, timeouts, revalidation failed, i/o errors, invalid sector 0
* +2 Years have passed.
* NO LONGER HAS THE SYSTEM TO REPRODUCE THE ISSUE. (comment #54)
* If the issue is fixed, since the original system is gone, it cannot be verified.

And he is right, the bug-report (as it is now) appears stalled.

@djones256

* F12
Intel 82801JB ICH10 + WDC
* exeptions on 2-port controller
  SRST failed -16, link is slow to respond, hard reset link, port disabled, i/o error,
  invalid sector 0

@Gaetan Cambier

* F12
unkown controller(s)
* exeptions
  failed command, hard reset, invalid sector 0

FIXED = libata.force=noncq

@Andreas M. Kirchwitz

* F13, then F14
Nvidia nForce 430, GeForce 6150, sata_nv driver
Samsung hardisk
no logs, but claims a raid1 is forced to resync or worse.

@Michael Hughes

* F13
F13 cannot be installed on Mac Mini 2010 with corsair SSD.
no logs, but claims his logs are the same to this bug-report, but does not 
define any comment# number, so it is unkown.

@Neil Hooey

* Has failed to identify and flush cache errors on ubuntu 10.04.
  F14 does not have them. Later ubuntu fixed them.


@majianpeng 

* Con comment #13, contains a patch.
I don't know what happened to that patch.


@Marc Eberhard

* F14
exceptions, hard resets, link speed, invalid sector 0
controller unknown

@Tomas Kouba

* RHEL5
Intel Corporation N10/ICH7 + Marvell Technology Group Ltd. 88SE6121
No logs, but the issues are with the marvell controller. 


@A. Mani

* F15 beta
* controller:  ATI Technologies Inc SB600 
exeptions,  Persist PHYRdyChg 10B8B , ata bus error, hard reset, soft reset failed,
device not ready.

He posted in comment #35:
Addendum: For the SB600 controller, the right thing to do is to restrict all drives 
to 1.5Gbps by jumpers or with a boot option.

(so, in that case, one might want to force to 1.5Gbps ONLY the SB600 sata ports)

@Nikita Staroverov

* F14, F15
exeptions, failed command, hard reset, link speed, retrying flush, invalid sector 0,
unknown controller

@Ken Hall

* F15
controller SB750 + seagate
exeptions, failed command, hard reset, soft reset failed (1st fis failed), link speed,
invalid sector 0
Similar simptoms on  SiI 3512 adapter
changed cable, hdd.

@Mike Eddy

* F15
ASUS P8H67-M PRO/CSM  LGA 1155 Corporate Stable Model Intel H67 DDR3 1333
Western Digital WD20EARS
exceptions, failed command, timeouts, hard resets, link is slow to respond, srst failed,
cdrom fails to identify, problems with smart.


@Bill McGonigle

* F16
controller amd hudson
exeptions, had reset reading optical disk


@M8R-7fin56 

Asus p6t
Asus U3S6 sata3/usb3 card (uses the marvel 88se9123 sata3 controller)
Crucial SSD C300 (128GB)
exeptions, points to another bugreport comment #49.

@mahmoud hanafi 

* RHEL6
controller: sata_nv???
exeptions, failed command, timout, hard reset, skipping hardreset on occupied port
He uses: acpi=off


@Brian C. Lane
* F16
Controller:  Intel Corporation 6 Series/C200 Series Chipset Family 4 port SATA IDE Controller
exeptions, failed command, hard reset, link slow to respond, srst failed -16, limiting sata link speed, 
reset failed, giving up, ata port disabled.


@Aaron Hamid 
unknown controller
exeptions, HYRdyChg CommWake DevExch, hard reset, timeout, maybe others.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After reading (briefly) all comments, i do feel some of them are (or were) 
exeperiencing a different issue.

Comment 57 Reartes Guillermo 2012-08-13 19:41:36 UTC
I recommend to close this bug-report. (too many different hw and slighly different 
error messages.

There are several people in this bug-report that have slighly different issues (or at
least different messages). Even different hardware (controllers) are involved.

It has become very difficult to read and track which one did what unless one reads all 
again.

I propose this template, for opening new bug-reports (see next comment), but only if the
following criteria is met:

* The issue you have does happen or is reproductible in F16 or F17 (or F18-rawhide)

If the issue is still there, see next comment.

Once the bug-report is opened, please let the maintainer mark them as duplicate.

Comment 58 Reartes Guillermo 2012-08-13 19:42:42 UTC
Proposed template for new bug-reports:

Issue description:

Describe the issue you have.

On which controller
On which disks
On which controller NEVER happened.

When does it happen, what whas the system doing?
Idly, heavy io, unknown.

Put a snipped (partial errro message) on the first comment.

For example:

Dec 23 01:20:36 zeus kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 23 01:20:36 zeus kernel: ata6.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Dec 23 01:20:36 zeus kernel:         res 40/00:00:30:f8:64/00:00:17:00:00/e0 Emask 0x4 (timeout)
Dec 23 01:20:36 zeus kernel: ata6.00: status: { DRDY }
Dec 23 01:20:36 zeus kernel: ata6: hard resetting link
Dec 23 01:20:36 zeus kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 23 01:20:36 zeus kernel: ata6.00: configured for UDMA/133
Dec 23 01:20:36 zeus kernel: ata6: EH complete

Information of the affected system:

* Distribution: for example: F16 32-BIT

* Fist, create a directory to store the output of the next commands:

# mkdir ~/buginfo
# cd ~/buginfo

Al commands will be assumed to be executed from that directory.

* PCIe / PCI HW Information:

# dmesg > dmesg-`uname -r`.out
# lspci -nnvv > lspci-nnvv-`uname -r`.out
# lspci -k > lspci-k-`uname -r`.out
# lspci -tvn > lspci-tvn-`uname -r`.out

* BIOS / MANUFACTURER Information:

# cat /sys/devices/virtual/dmi/id/product_name > dmi-product_name-`uname -r`.out
# cat /sys/devices/virtual/dmi/id/bios_version > dmi-bios_version-`uname -r`.out
# cat /sys/devices/virtual/dmi/id/board_name > dmi-board_name-`uname -r`.out

* A copy of the messages file AFTER reproducing the error:

# cp /var/log/messages messages-`uname -r`--`date +%Y%m%d`.out

Upload those commands and fill correctly the description, let it be clear what is
being attached.

Also, AFTER reproducing the issue with any current fedora version, please try also
with systemrescuecd (with both default and alternate kernel). Also execute above
commands and upload them after uploading fedora ones. This is to check a  different kernel. 

Also reference this bug-report number.

Comment 59 Bill McGonigle 2012-08-14 18:14:54 UTC
And then we get a bunch of very similar bug reports that all wind up getting marked duplicates of each other?

I think what we need here is a way to instrument the problems to find the root cause.  Without that, what good are multiple reports going to be?  With that, we could find actual duplicates and differences.

Comment 60 Reartes Guillermo 2012-08-14 19:11:27 UTC
@Bill McGonigle

It is not a problem that get are marked as duplicated, that is how
it is supposed to be. As long as that is managed by the maintainer 
or assignee.

Also, some controllers do use different drivers (ahci, sata_nv, 
sata_sil), etc.

IMHO, each bug-report should contain all duplicated bug-reports 
with the same hardware, until proven otherwise (by the maintainer
or assignee). There could be chipset issues, bus issues, or 
something completely non-obvious. 
 
By using one bug-report per sata controller, the tracking of the 
issue will be simplified.

For example if a kernel parameter fixes 'marvell', it will 
documented and will be confirmed by all dupplicates in the bugreport. 

Also, if the issue is very subtle, puting all different hardware 
together will probabbly hide it. 

Also, in each bugreport there will be uploaded info (such as what i
recommended in #58) all corresponding to the same controller on 
different systems and with errors and logs of all.

I thing that you will get better results if you (and all others) 
open a new bug-report, setting version to the affected supported 
version, and upload recommended info #58 plus whatever additional
iformation avaiable. 

The bug-report has +2 years, i do not believe that it can be 
resolved as it is. Reading the whole bug is difficult, subtle 
issues might get overlooked, too many factors, too many different
hardware & software (drivers). 

That is why i sugest to start from scratch, if the issue is still
there in any current version. 

Cheers.

Comment 61 Ken Hall 2012-08-19 17:21:31 UTC
This problem can't be too subtle if it's being reported by 17 different people across multiple controllers and multiple disk devices (brands and models).  The only common denominator I can find to this is that it appears to happen most when using SATA-III devices on SATA-II controllers.  I haven't seen any suggestion of attempt by developer to collect doc on this, just numerous users reporting essentially the same issue over and over.

I have no problem opening a new issue for this, but it would be nice to have some idea that it will be addressed.  This has been in "assigned" status for 2+ years.  For me the problem is completely reproducable, and I have multiple sets of hardware I can test on.  I have purchased a SATA-III add-on controller and will attempt to test SATA-III disks on that.  I am perfectly willing to provide whatever doc is requested, but since I originally posted my problem report NO ONE has asked for any further information.

Incidentally, it's still happening on Fedora 16 at the current kernel level (kernel-3.4.7-1.fc16.x86_64), I have to run my Bacula data stores on USB to keep them working and I had to replace the boot disk with a PATA drive.

Comment 62 Caylan 2012-10-04 02:44:31 UTC
I'm reluctant to post and openly display my ignorance, but I have a small piece of information to add to this thread.  I'm working on a mini-ITX (ASROCK H67M-ITX) appliance that uses a single 60GB SSD drive running Linux based on 3.4.9-2.fc16.i686 (Fedora 16). We have migrated 6 (of 60) clients to this new platform. Some of our users are experiencing "crashes" when they insert a USB device. Upon examination of the incident, one or more filesystems were remounted read-only. The issue was not immediately reproducible - intermittent in nature. I suspected static electricity discharge - and I was right. Using a simple $1.29 "thumb saver" lighter (electric-spark inducing trigger), I was able to reproduce the "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen" error, and the filesystem(s) are remounted read-only.

Is the high voltage spark disrupting general power, knocking the SDD offline, or resetting the storage controller?  I don't know.  As far as I can tell, the chassis/case/motherboard/PSU are all grounded - there is continuity to ground, and my ground tests OK. This same situation happens with a H77M-ITX motherboard - Intel Series 6 (H67) vs. Series 7 (H77).

What would cure this for me? Again, forgive my ignorance... I would like an easy way to set the I/O queueing to retry indefinitely so as to avoid remounting in read-only when the drive becomes available again. Is this possible?  Tomorrow I will disable ACPI, PIC, setting a single-path multipath device (this option has no_path_retry=queue, for never stop queueing), or even the ASROCK built-in hardware RAID-1.

I have to fix this static discharge issue because my product is a mission-critical consumer electronics product.

Thoughts?

Comment 63 Caylan 2012-10-04 02:51:00 UTC
Two SDDs have been used with identical results.

Corsair Force Series 3 CSSD-F60GB3A-BK 2.5" 60GB SATA III
Mushkin Enhanced Chronos MKNSSDCR60GB 2.5" 60GB SATA III Asynchronous MLC

Case in use:  http://www.mini-box.com/M300-LCD-Enclosure

Comment 64 D. Hugh Redelmeier 2012-10-06 06:08:15 UTC
I too am having problems like this.

- Corsair Force Serices CSSD-F115GB2-A/RF2 (refurbished)

- attached to an Acer Aspire ASE380--ED424A.  This desktop is a few years old.  it has an Athlon 62 X2 CPU, with an nVidia chipset.

- the SATA controller is (according to lspci) nVidia Corporation MCP61 SATA Controller (rev a2)

- I'm using x86_64 Fedora 16 + current updates (as of about 2012 Oct 1).
  Kernel is 3.4.2-1.fc16.x86_64.

- the problem shows up in the Debian Wheezy Beta2 Net Install AMD64 too -- it's dmesg has complaints and the installer decides that there is no SSD disk.

- none of my problems involve heavy loads.


Symptoms:

- booting (from another disk) is slow.  I think that Fedora looks at the SSD drive and gets hung until timeouts occur.

- dmesg shows several errors accessing the SSD (just snooping: fstab doesn't specify that any partitions on the disk be mounted)

- smarctl -a is a dead loss:
  smartctl 5.43 2012-06-30 r3573 [x86_64-linux-3.4.2-1.fc16.x86_64] (local build)
  Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net


  Probable ATA device behind a SAT layer
  Try an additional '-d ata' or '-d sat' argument.

I tried booting with pcie_aspm=off (as per comment #34).  It made no difference.

I've used another SSD on this same SATA port with no problem.

Juicy messages from dmesg:

[    2.194035] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.197200] ata5.00: ATA-8: Corsair CSSD-F115GB2-A, 2.1a, max UDMA/133
[    2.197203] ata5.00: 224674128 sectors, multi 16: LBA48 NCQ (depth 0/32)

[    5.875487] ata6: SATA link down (SStatus 0 SControl 300)
[   36.736213] ata5: drained 512 bytes to clear DRQ
[   36.736230] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   36.736241] ata5.00: failed command: IDENTIFY DEVICE
[   36.736255] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
[   36.736258]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   36.736264] ata5.00: status: { DRDY }
[   36.736276] ata5: hard resetting link
[   36.736281] ata5: nv: skipping hardreset on occupied port
[   37.190064] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   37.231148] ata5.00: configured for UDMA/133
[   37.231185] ata5: EH complete

[   82.784403] ata5: drained 512 bytes to clear DRQ
[   82.784421] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[   82.784431] ata5.00: failed command: IDENTIFY DEVICE
[   82.784445] ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
[   82.784448]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   82.784455] ata5.00: status: { DRDY }
[   82.784466] ata5: hard resetting link
[   82.784471] ata5: nv: skipping hardreset on occupied port
[   83.238053] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   83.263154] ata5.00: configured for UDMA/133
[   83.263194] ata5: EH complete

etc.

These log entries are during booting.  The timeouts associated with these bursts explains why booting seems slow.

Comment 65 D. Hugh Redelmeier 2012-10-06 06:24:33 UTC
Here's another report that seems to be related.  Corsair Force 3 SSD, laptop with SiS968 Southbridge (including SATA2 controller), Debian Testing and Ubuntu 11.10.  Problem with "IDENTIFY DEVICE" command.

<http://forum.corsair.com/forums/showthread.php?t=100231>

Comment 66 Dave Jones 2012-10-23 15:40:10 UTC
# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Comment 67 Steve Crook 2012-10-24 17:32:30 UTC
Hi all,

I'm actually running Debian Linux but stumbled across this thread while looking for information on the exact same issue.  Until today I was experiencing these symptoms with Kernel 3.2.0.  I've now upgraded to 3.6.3 and still see the same errors during boot and heavy load (such as RAID rebuilds).

fleegle:~# uname -r
3.6.3

fleegle:~# lspci
00:00.0 Host bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333]
00:01.0 PCI bridge: VIA Technologies, Inc. VT8366/A/7 [Apollo KT266/A/333 AGP]
00:05.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
00:07.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx]
00:09.0 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
00:09.1 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 50)
00:09.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 51)
00:0d.0 SATA controller: Silicon Image, Inc. SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01)
00:0e.0 SATA controller: Silicon Image, Inc. SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01)
00:0f.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
00:10.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
00:11.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:11.2 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
00:11.3 USB controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 23)
01:00.0 VGA compatible controller: NVIDIA Corporation NV11 [GeForce2 MX/MX 400] (rev b2)

lshw -class storage
*-storage:0
       description: SATA controller
       product: SiI 3512 [SATALink/SATARaid] Serial ATA Controller
       vendor: Silicon Image, Inc.
       physical id: d
       bus info: pci@0000:00:0d.0
       version: 01
       width: 32 bits
       clock: 66MHz
       capabilities: storage pm vendor_specific bus_master cap_list rom
       configuration: driver=sata_sil latency=32
       resources: irq:16 ioport:b800(size=8) ioport:b400(size=4) ioport:b000(size=8) ioport:a800(size=4) ioport:a400(size=16) memory:ec000000-ec0001ff memory:60000000-6007ffff
  *-storage:1
       description: SATA controller
       product: SiI 3512 [SATALink/SATARaid] Serial ATA Controller
       vendor: Silicon Image, Inc.
       physical id: e
       bus info: pci@0000:00:0e.0
       version: 01
       width: 32 bits
       clock: 66MHz
       capabilities: storage pm vendor_specific bus_master cap_list rom
       configuration: driver=sata_sil latency=32
       resources: irq:17 ioport:a000(size=8) ioport:9800(size=4) ioport:9400(size=8) ioport:9000(size=4) ioport:8800(size=16) memory:eb800000-eb8001ff memory:60080000-600fffff

lshw -class disk
*-disk
       description: ATA Disk
       product: Hitachi HDT72503
       vendor: Hitachi
       physical id: 0.0.0
       bus info: scsi@0:0.0.0
       logical name: /dev/sda
       version: V54O
       serial: VFD200R22BMRUC
       size: 298GiB (320GB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 sectorsize=512 signature=e1e1e1e1
  *-disk
       description: ATA Disk
       product: Hitachi HDS72101
       vendor: Hitachi
       physical id: 0.0.0
       bus info: scsi@1:0.0.0
       logical name: /dev/sdb
       version: JP4O
       serial: JP2940HQ3VBSWH
       size: 931GiB (1TB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 sectorsize=512 signature=c57e7aef
  *-disk
       description: ATA Disk
       product: Hitachi HDS72101
       vendor: Hitachi
       physical id: 0.0.0
       bus info: scsi@3:0.0.0
       logical name: /dev/sdc
       version: JP4O
       serial: JP2940HQ3UY93H
       size: 931GiB (1TB)
       capabilities: partitioned partitioned:dos
       configuration: ansiversion=5 sectorsize=512 signature=df63f9c5

fleegle:~# dmesg | less (edited for sanity)
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata4.00: BMDMA2 stat 0x6d0009
ata4.00: failed command: READ DMA
ata4.00: cmd c8/00:a8:48:42:a2/00:00:00:00:00/e2 tag 0 dma 86016 in
         res 51/04:79:77:42:a2/00:00:00:00:00/02 Emask 0x1 (device error)
ata4.00: status: { DRDY ERR }
ata4.00: error: { ABRT }
ata4.00: configured for UDMA/100
sd 3:0:0:0: [sdc]
Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 3:0:0:0: [sdc]
Sense Key : Aborted Command [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
        02 a2 42 77
sd 3:0:0:0: [sdc]
Add. Sense: No additional sense information
sd 3:0:0:0: [sdc] CDB:
Read(10): 28 00 02 a2 42 48 00 00 a8 00
end_request: I/O error, dev sdc, sector 44188232
ata4: EH complete

Please note that although the error is still occurring, the impact in 3.6.3 seems greatly reduced.  Previously the system would hang for approx 10 seconds at each  error.  Now the impact is undetectable and only the log messages reveal it's still happening.

Comment 68 D. Hugh Redelmeier 2012-10-24 21:18:46 UTC
(In reply to comment #64)
> I too am having problems like this.
> 
> - Corsair Force Serices CSSD-F115GB2-A/RF2 (refurbished)
> 
> - attached to an Acer Aspire ASE380--ED424A.  This desktop is a few years
> old.  it has an Athlon 62 X2 CPU, with an nVidia chipset.
> 
> - the SATA controller is (according to lspci) nVidia Corporation MCP61 SATA
> Controller (rev a2)
> 
> - I'm using x86_64 Fedora 16 + current updates (as of about 2012 Oct 1).
>   Kernel is 3.4.2-1.fc16.x86_64.

I've just installed the same actual drive in a ThinkPad x61t and am having no trouble installing Fedora 17 on it.

It sure looks as if the driver+SATA controller on the Acer does not get along with the Corsair F115GB2.  I don't know which is at fault.


Additional datapoint: I tried a live PC-BSD 9.0 DVD on the Acer hardware and it could not see the F115GB2 drive.  I assume that the FreeBSD driver is different from the Linux driver and so this point away from a driver bug.

Comment 69 Miro Halas 2012-11-18 05:17:24 UTC
Hello,

I was able to solve this issue on my configuration so I hope this will help somebody. Recently I have migrated my CentOS 6.3 server from Intel based  motherboard Gigabyte GA-P55-UD3L (rev. 1.0) to AMD based motherboard MSI 790FX-GD70. The only change in the system was the motherboard, cpu and memory, the rest of the configuration stayed the same. Immediately after migration I have started experiencing this exact issue

During boot I was getting errors

Nov 17 23:41:12 storage kernel: ata6.00: qc timeout (cmd 0xec)
Nov 17 23:41:12 storage kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x5)
Nov 17 23:41:12 storage kernel: ata6.00: revalidation failed (errno=-5)
Nov 17 23:41:12 storage kernel: ata6: hard resetting link
Nov 17 23:41:13 storage kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Nov 17 23:41:13 storage kernel: ata6.00: configured for UDMA/133
Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] Sense Key : Aborted Command [current] [descriptor]
Nov 17 23:41:13 storage kernel: Descriptor sense data with sense descriptors (in hex):
Nov 17 23:41:13 storage kernel:        72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
Nov 17 23:41:13 storage kernel:        91 40 16 4e 
Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] Add. Sense: No additional sense information
Nov 17 23:41:13 storage kernel: sd 5:0:0:0: [sdb] CDB: Read(10): 28 00 91 40 1b 4f 00 00 c0 00
Nov 17 23:41:13 storage kernel: ata6: EH complete
N

After boot this error transformed to

Nov 17 21:05:26 storage kernel: ata6.00: qc timeout (cmd 0xec)
Nov 17 21:05:26 storage kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Nov 17 21:05:26 storage kernel: ata6.00: revalidation failed (errno=-5)
Nov 17 21:05:26 storage kernel: ata6: hard resetting link
Nov 17 21:05:27 storage kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Nov 17 21:05:27 storage kernel: ata6.00: configured for UDMA/33
Nov 17 21:05:27 storage kernel: ata6: EH complete
Nov 17 21:06:26 storage kernel: ata6.00: exception Emask 0x50 SAct 0x1 SErr 0x280900 action 0x6 frozen
Nov 17 21:06:26 storage kernel: ata6.00: irq_stat 0x08000000, interface fatal error
Nov 17 21:06:26 storage kernel: ata6: SError: { UnrecovData HostInt 10B8B BadCRC }
Nov 17 21:06:26 storage kernel: ata6.00: failed command: READ FPDMA QUEUED
Nov 17 21:06:26 storage kernel: ata6.00: cmd 60/30:00:3f:9b:cc/00:00:8e:00:00/40 tag 0 ncq 24576 in
Nov 17 21:06:26 storage kernel:         res 40/00:00:3f:9b:cc/00:00:8e:00:00/40 Emask 0x50 (ATA bus error)
Nov 17 21:06:26 storage kernel: ata6.00: status: { DRDY }
Nov 17 21:06:26 storage kernel: ata6: hard resetting link
Nov 17 21:06:31 storage kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

In case it matters, these are the disks used in the system
Nov 17 23:53:58 storage kernel: ata6.00: ATA-8: ST31500341AS, SD1B, max UDMA/133
Nov 17 23:53:58 storage kernel: ata5.00: ATA-8: WDC WD20EADS-00S2B0, 01.00A01, max UDMA/133
Nov 17 23:53:58 storage kernel: ata7.00: ATA-6: ST380011A, 3.16, max UDMA/100
Nov 17 23:53:58 storage kernel: ata7.01: ATA-6: ST380011A, 3.16, max UDMA/100


I was able to completely eliminate this issue by
1. disabling NCQ
2. limiting the speed of SATA to 1.5GB

I have done so by modifying my /boot/grub/grub.conf and adding

 libata.force=noncq,1.5G

at the end of my kernel line. This bug also seems to be related or is duplicate of Bug 684599.

Comment 70 Ken Hall 2012-11-30 19:51:11 UTC
I was able to resolve this issue in my case entirely by accident.  I noticed that during testing of drives, the problem did not happen, but once they were installed, the problem returned.  Removing the drives from the case and running them externally eliminated the issue.

It turned out that the drives were too close to a large cooling fan in the front of the case.  Moving them away for the fan (more than 3 inches) eliminated the problem completely.

Not implying this is the entire solution to this, but it might be worth checking for electrical interference with the disks, as it seems the new ones are more sensitive to environmental factors.

Comment 71 Fedora End Of Life 2013-01-17 01:14:01 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 72 Raman Gupta 2013-01-17 02:05:20 UTC
As the OP, I'm taking the liberty of closing this. As per comment #54 I can't confirm a fix for this as the hardware on which I reported this is long gone. If you are still experiencing this or a similar problem, I'd recommend creating a new bug report. And good luck 'cause you're gonna need it -- there seems to be many many reasons for this error msg, and not a lot of info from the kernel is available to distinguish them. Or buy new hardware instead and be happy :)


Note You need to log in before you can comment on or make changes to this bug.