Created attachment 705164 [details] dmesg output Version-Release number of selected component (if applicable): kernel-3.7.9-205.fc18.x86_64 How reproducible: Intermittent Description of problem: Under F18 the system pauses occassionally, when it resumes the dmesg contains the following (full dmesg is attached): [16994.423564] ata1: EH in SWNCQ mode,QC:qc_active 0x3 sactive 0x3 [16994.423578] ata1: SWNCQ:qc_active 0x1 defer_bits 0x2 last_issue_tag 0x1 dhfis 0x1 dmafis 0x1 sdbfis 0x2 [16994.423589] ata1: ATA_REG 0x40 ERR_REG 0x0 [16994.423594] ata1: tag : dhfis dmafis sdbfis sactive [16994.423600] ata1: tag 0x0: 1 1 0 1 [16994.423623] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen [16994.423631] ata1.00: failed command: WRITE FPDMA QUEUED [16994.423645] ata1.00: cmd 61/08:00:88:6b:74/00:00:0f:00:00/40 tag 0 ncq 4096 out res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [16994.423653] ata1.00: status: { DRDY } [16994.423659] ata1.00: failed command: WRITE FPDMA QUEUED [16994.423670] ata1.00: cmd 61/10:08:f8:37:ba/00:00:11:00:00/40 tag 1 ncq 8192 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [16994.423677] ata1.00: status: { DRDY } [16994.423687] ata1: hard resetting link [16994.423692] ata1: nv: skipping hardreset on occupied port [16994.930626] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [16994.955247] ata1.00: configured for UDMA/133 [16994.955264] ata1.00: device reported invalid CHS sector 0 [16994.955272] ata1.00: device reported invalid CHS sector 0 [16994.955295] ata1: EH complete The drive in question is a WDC WD6400BPVT-55HXZT3. Smart looks clean, except for a number of 'Power-Off_Retract_Count' events which seem to have ocurred due to a USB hub leaking power onto the system (I no longer power that hub). An extended offline test completed without error. The behaviour has changed somewhat between F16 and my current F18 install, under F16 these resets occurred, but resulted in ext being left in a state where the filesystem remounted read-only and the system had to be restarted and fscked (I have dmesgs from a couple of these, 'EH complete' doesn't appear, instead it descends into some 'Sense Key : Aborted Command' messages). In recent F18 kernels the above happens and the subsequent ext4 errors do not occur. Some googling reveals this could be a hardware problem, and I have only seen them with this disc (new in January). However: 1. The system is dual boot and similar problems do not seem to occur under Windows 8. Admittedly they could be less visible, but I've done a couple of embarassingly long gaming sessions on it and not noticed any problems. 2. I've tried replacing the cable and the housing (this is a 2.5" drive in a 3.5" enclosure, it previously had a plastic housing, now using a metal one in case of earth problems). 3. The SATA port the disc is plugged into was used for the drive it replaced with no problems (the older disc is currently connected to another SATA port on the motherboard). 4. Power-wise, this is a desktop system which previously had two 3.5" discs and now has a 2.5" and a 3.5", that doesn't necessarily avoid the issue of different power lines, but a 2.5" drive should be less demanding. - Of these I think #1 is the most persuasive argument against a hardware stability issue. The possibility remains of a hardware bug not triggered by windows. Under F16 I did try disabling NCQ with no success, I have yet to try it under F18.
P.S., this was suggested to me on the users list and returns nothing: smartctl -a /dev/sda | grep "ATA Error Count"
Just had this while ncq was disabled via /sys/block/sda/device/queue_depth: [ 6131.734187] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 6131.734202] ata3.00: failed command: WRITE DMA EXT [ 6131.734219] ata3.00: cmd 35/00:30:88:0f:26/00:01:1b:00:00/e0 tag 0 dma 155648 out res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 6131.734227] ata3.00: status: { DRDY } [ 6131.734237] ata3: hard resetting link [ 6131.734242] ata3: nv: skipping hardreset on occupied port [ 6132.187393] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 6132.227021] ata3.00: configured for UDMA/133 [ 6132.227037] ata3.00: device reported invalid CHS sector 0 [ 6132.227063] ata3: EH complete $ cat /sys/block/sda/device/queue_depth 1 kernel-3.8.1-201.fc18.x86_64
What is your SATA controller? try lspci -kk
I had the same problem on Fedora 18 (kernel 3.8.x). I thought this was a kernel bug so I downgraded the server to CentOS 6 (2.6.32-358.11.1.el6.x86_64). But the problem still exists and the kernel error message on CentOS is exactly the same as it was on Fedora. Here's what I'm getting: -------------------------------------------- Jun 23 16:57:45 liberty kernel: EXT4-fs (sdd1): recovery complete Jun 23 16:57:45 liberty kernel: EXT4-fs (sdd1): mounted filesystem with ordered data mode. Opts: Jun 23 17:00:59 liberty kernel: ata5.00: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen Jun 23 17:00:59 liberty kernel: ata5.00: irq_stat 0x00400040, connection status changed Jun 23 17:00:59 liberty kernel: ata5: SError: { PHYRdyChg 10B8B DevExch } Jun 23 17:00:59 liberty kernel: ata5.00: failed command: WRITE DMA EXT Jun 23 17:00:59 liberty kernel: ata5.00: cmd 35/00:00:38:11:6f/00:04:00:00:00/e0 tag 0 dma 524288 out Jun 23 17:00:59 liberty kernel: res 50/00:00:37:11:6f/00:00:00:00:00/e0 Emask 0x10 (ATA bus error) Jun 23 17:00:59 liberty kernel: ata5.00: status: { DRDY } Jun 23 17:00:59 liberty kernel: ata5: hard resetting link Jun 23 17:01:04 liberty kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 23 17:01:04 liberty kernel: ata5.00: configured for UDMA/133 Jun 23 17:01:04 liberty kernel: ata5: EH complete ---------------------------------------------- HDD Drive: WD5003ABYX Kernel options (tried them to troubleshoot the problem but no luck): [root@liberty ~]# cat /proc/cmdline ro root=UUID=e91f7c77-65fd-41be-bfd2-b1e475af9a3e rd_NO_LUKS rd_NO_LVM LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=129M@0M KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet reboot=force noapic nolapic noacpi libata.force=noncq,1.5G SCSI/SATA information: [root@liberty ~]# lsscsi -kk [0:0:0:0] disk ATA WDC WD10EURX-73F 01.0 /dev/sda [1:0:0:0] disk ATA WDC WD30EFRX-68A 80.0 /dev/sdb [4:0:0:0] disk ATA WDC WD5003ABYX-0 01.0 /dev/sdc [5:0:0:0] cd/dvd Optiarc DVD RW AD-5280S 1.01 /dev/sr0 [6:0:0:0] disk hp USB Flash Drive 3276 /dev/sdd Controller: [root@liberty ~]# lspci | grep SATA 00:1f.2 SATA controller: Intel Corporation 6 Series/C200 Series Chipset Family SATA AHCI Controller (rev 05) S.M.A.R.T reports read errors (which is obvious): [root@liberty ~]# smartctl -A /dev/sdc smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-358.11.1.el6.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 080 001 051 Pre-fail Always In_the_past 28654 3 Spin_Up_Time 0x0027 173 139 021 Pre-fail Always - 2341 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 191 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 268 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 190 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 181 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 12 194 Temperature_Celsius 0x0022 101 098 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs. Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19. If you experience different issues, please open a new bug report for those.
This bug is still present on 3.11.6 kernel on F20: When connecting a disk (working fine on another SATA2 port provided by Intel H55 controller) I get this logged to dmesg: [ 95.497010] ata7: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xe frozen [ 95.497015] ata7: irq_stat 0x80000040, connection status changed [ 95.497018] ata7: SError: { PHYRdyChg CommWake DevExch } [ 95.497024] ata7: hard resetting link [ 105.501662] ata7: softreset failed (1st FIS failed) [ 105.501674] ata7: hard resetting link [ 106.737079] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370) [ 106.740182] ata7.00: ATA-8: ST2000DL003-9VT166, CC45, max UDMA/133 [ 106.740192] ata7.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 106.740989] ata7.00: configured for UDMA/133 [ 106.740999] ata7: EH complete [ 106.741159] scsi 6:0:0:0: Direct-Access ATA ST2000DL003-9VT1 CC45 PQ: 0 ANSI: 5 [ 106.741522] sd 6:0:0:0: Attached scsi generic sg2 type 0 [ 106.742679] sd 6:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) [ 106.742684] sd 6:0:0:0: [sdc] 4096-byte physical blocks [ 106.742806] sd 6:0:0:0: [sdc] Write Protect is off [ 106.742811] sd 6:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [ 106.742859] sd 6:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 106.774003] ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x100000 action 0x6 frozen [ 106.774009] ata7.00: irq_stat 0x08000000, interface fatal error [ 106.774011] ata7: SError: { Dispar } [ 106.774014] ata7.00: failed command: READ FPDMA QUEUED [ 106.774018] ata7.00: cmd 60/08:00:08:00:00/00:00:00:00:00/40 tag 0 ncq 4096 in res 40/00:00:08:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) [ 106.774020] ata7.00: status: { DRDY } [ 106.774024] ata7: hard resetting link [ 107.234279] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370) [ 107.236454] ata7.00: configured for UDMA/133 [ 107.236474] ata7: EH complete [ 107.258631] sdc: unknown partition table [ 107.258845] sd 6:0:0:0: [sdc] Attached SCSI disk [ 175.080329] ata7.00: exception Emask 0x10 SAct 0x8 SErr 0x380000 action 0x6 frozen [ 175.080334] ata7.00: irq_stat 0x08000000, interface fatal error [ 175.080336] ata7: SError: { 10B8B Dispar BadCRC } [ 175.080340] ata7.00: failed command: READ FPDMA QUEUED [ 175.080343] ata7.00: cmd 60/80:18:80:10:00/01:00:00:00:00/40 tag 3 ncq 196608 in res 40/00:18:80:10:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) [ 175.080344] ata7.00: status: { DRDY } [ 175.080348] ata7: hard resetting link [ 175.539969] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370) [ 175.541486] ata7.00: configured for UDMA/133 [ 175.541501] ata7: EH complete [ 175.553070] ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x100000 action 0x6 frozen [ 175.553076] ata7.00: irq_stat 0x08000000, interface fatal error [ 175.553080] ata7: SError: { Dispar } [ 175.553085] ata7.00: failed command: READ FPDMA QUEUED [ 175.553093] ata7.00: cmd 60/80:00:80:10:00/01:00:00:00:00/40 tag 0 ncq 196608 in res 40/00:00:80:10:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) [ 175.553097] ata7.00: status: { DRDY } [ 175.553103] ata7: hard resetting link [ 176.013047] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 370) [ 176.014633] ata7.00: configured for UDMA/133 [ 176.014640] ata7: EH complete [ 176.017032] ata7: limiting SATA link speed to 3.0 Gbps [ 176.017035] ata7.00: exception Emask 0x10 SAct 0x1 SErr 0x100000 action 0x6 frozen [ 176.017037] ata7.00: irq_stat 0x08000000, interface fatal error [ 176.017038] ata7: SError: { Dispar } [ 176.017040] ata7.00: failed command: READ FPDMA QUEUED [ 176.017043] ata7.00: cmd 60/80:00:80:10:00/01:00:00:00:00/40 tag 0 ncq 196608 in res 40/00:00:80:10:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) [ 176.017045] ata7.00: status: { DRDY } [ 176.017047] ata7: hard resetting link [ 176.477260] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 176.478879] ata7.00: configured for UDMA/133 [ 176.478888] ata7: EH complete [ 176.667676] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts: (null) Is there any other information I could provide?
I see this bug in RHEL 6.5: ata5.00: exception Emask 0x0 SAct 0x7 SErr 0x1c0000 action 0x6 frozen ata5: SError: { CommWake 10B8B Dispar } ata5.00: failed command: READ FPDMA QUEUED ata5.00: cmd 60/08:00:58:69:0d/00:00:19:00:00/40 tag 0 ncq 4096 in res 40/00:f1:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5.00: failed command: READ FPDMA QUEUED ata5.00: cmd 60/08:08:50:7a:ca/00:00:2d:00:00/40 tag 1 ncq 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5.00: failed command: READ FPDMA QUEUED ata5.00: cmd 60/08:10:80:68:09/00:00:2e:00:00/40 tag 2 ncq 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata5.00: status: { DRDY } ata5: hard resetting link ata5: SATA link down (SStatus 1 SControl 300) ata5: hard resetting link ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata5.00: configured for UDMA/133 ata5.00: device reported invalid CHS sector 0 ata5.00: device reported invalid CHS sector 0 ata5.00: device reported invalid CHS sector 0 ata5: EH complete This is ASUS motherboard with two controllers: 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7-M Family) SATA Controller [IDE mode] (rev 02) 02:00.0 SATA controller: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 02) 02:00.1 IDE interface: JMicron Technology Corp. JMB363 SATA/IDE Controller (rev 02) I have one 500 GB disk in each controller, everything seems to be working fine: [root@ox ~]# hdparm -i /dev/sda /dev/sda: Model=WDC WD5000AVVS-63M8B0, FwRev=01.00A01, SerialNo=WD-WCAV90000444 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=no WriteCache=enabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7 * signifies the current active mode [root@ox ~]# hdparm -i /dev/sdb /dev/sdb: Model=SAMSUNG HD502HI, FwRev=1AG01118, SerialNo=S1VZJ9AS502747 Config={ Fixed } RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4 BuffType=DualPortCache, BuffSize=16384kB, MaxMultSect=16, MultSect=off CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=976773168 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=yes: unknown setting WriteCache=enabled Drive conforms to: unknown: ATA/ATAPI-3,4,5,6,7 * signifies the current active mode
I guess the disk is dying, its "famous" WD GreenPower. But I have disabled the 7 seconds spinoff for it already... [root@ox ~]# smartctl -A /dev/sda smartctl 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-431.el6.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 6 3 Spin_Up_Time 0x0027 146 145 021 Pre-fail Always - 3675 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1770 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 074 074 000 Old_age Always - 19553 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 130 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 59 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1710 194 Temperature_Celsius 0x0022 120 094 000 Old_age Always - 23 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 1028 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0
Had this disc running for about 11 months now, still encountering the same errors, but the disc doesn't seem to have degraded. Unsure if it's a bug or a problem with the controller. Just noticed I never replied to Christian Stadelmann's question: 00:0e.0 IDE interface: NVIDIA Corporation MCP51 Serial ATA Controller (rev a1) Subsystem: ASUSTeK Computer Inc. Device 81c0 Kernel driver in use: sata_nv 00:0f.0 IDE interface: NVIDIA Corporation MCP51 Serial ATA Controller (rev a1) Subsystem: ASUSTeK Computer Inc. Device 81c0 Kernel driver in use: sata_nv
Ok, I thought maybe your are unlucky having the same SATA controller… I have seen at least 3 different causes for this problem: 1. loose connections at HDDs (try another cable, try another power cable, try another SATA jack on your mainboard) 2. defective HDDs (try another HDD on same controller) 3. driver problems You should usually be able to find out which part causes the problem by replacing separate parts. In my case the controller (Marvel 88SE9123) is the reason. Hope that helps.
This message is a reminder that Fedora 18 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 18. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '18'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 18's end of life. Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 18 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 18's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.