Description of problem: - Hibernate laptop. - Power on again. - Laptop appears to resume OK. - While doing something on a terminal, e.g. running "yum update", the machine becomes unresponsive and the disk LED is almost solid on. - In those occasions I see this in the log: Nov 08 16:57:51 gaspode kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40000 action 0x6 frozen Nov 08 16:57:51 gaspode kernel: ata1: SError: { CommWake } Nov 08 16:57:51 gaspode kernel: ata1.00: failed command: FLUSH CACHE EXT Nov 08 16:57:51 gaspode kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 20 res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 08 16:57:51 gaspode kernel: ata1.00: status: { DRDY } Nov 08 16:57:51 gaspode kernel: ata1: hard resetting link Nov 08 16:57:51 gaspode kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 08 16:57:51 gaspode kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded Nov 08 16:57:51 gaspode kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out Nov 08 16:57:51 gaspode kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out Nov 08 16:57:51 gaspode kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded Nov 08 16:57:51 gaspode kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out Nov 08 16:57:51 gaspode kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out Nov 08 16:57:51 gaspode kernel: ata1.00: configured for UDMA/133 Nov 08 16:57:51 gaspode kernel: ata1.00: retrying FLUSH 0xea Emask 0x4 Nov 08 16:57:51 gaspode kernel: ata1.00: device reported invalid CHS sector 0 Nov 08 16:57:51 gaspode kernel: ata1: EH complete I have to reboot the machine in order to stop it from periodically going unresponsive after that. Version-Release number of selected component (if applicable): Saw this only after upgrading to 3.16.7-200.fc20. Just upgraded to 3.17.2-200.fc20, I'll update here if this happens again. How reproducible: Occasional. I hibernate/resume several times a day, only had this happen twice in the last 3-4 days. Steps to Reproduce: 1. Hibernate then resume. Additional info: 00:1f.2 SATA controller: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] (rev 03) (prog-if 01 [AHCI 1.0]) Subsystem: Lenovo Device 20f8 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 27 Region 0: I/O ports at 1c48 [size=8] Region 1: I/O ports at 183c [size=4] Region 2: I/O ports at 1c40 [size=8] Region 3: I/O ports at 1838 [size=4] Region 4: I/O ports at 1c20 [size=32] Region 5: Memory at f2826000 (32-bit, non-prefetchable) [size=2K] Capabilities: <access denied> Kernel driver in use: ahci smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.2-200.fc20.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Intel 520 Series SSDs Device Model: INTEL SSDSC2CW240A3 Serial Number: (...) LU WWN Device Id: 5 5cd2e4 000039ecb Firmware Version: 400i User Capacity: 240,057,409,536 bytes [240 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.0, 3.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sat Nov 8 18:44:44 2014 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled
Better lspci output: 00:1f.2 SATA controller: Intel Corporation 82801IBM/IEM (ICH9M/ICH9M-E) 4 port SATA Controller [AHCI mode] (rev 03) (prog-if 01 [AHCI 1.0]) Subsystem: Lenovo Device 20f8 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 27 Region 0: I/O ports at 1c48 [size=8] Region 1: I/O ports at 183c [size=4] Region 2: I/O ports at 1c40 [size=8] Region 3: I/O ports at 1838 [size=4] Region 4: I/O ports at 1c20 [size=32] Region 5: Memory at f2826000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/16 Maskable- 64bit- Address: fee0200c Data: 4172 Capabilities: [70] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004 Capabilities: [b0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: ahci
Just got this again with 3.17.2-200.fc20.x86_64: Nov 09 12:16:22 gaspode kernel: ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x40000 action 0x6 frozen Nov 09 12:16:22 gaspode kernel: ata1: SError: { CommWake } Nov 09 12:16:22 gaspode kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 09 12:16:22 gaspode kernel: ata1.00: cmd 61/00:00:18:1a:61/04:00:07:00:00/40 tag 0 ncq 524288 out res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout) Nov 09 12:16:22 gaspode kernel: ata1.00: status: { DRDY } Nov 09 12:16:22 gaspode kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 09 12:16:22 gaspode kernel: ata1.00: cmd 61/08:08:18:1e:61/00:00:07:00:00/40 tag 1 ncq 4096 out res 40/00:1e:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) Nov 09 12:16:22 gaspode kernel: ata1.00: status: { DRDY } Nov 09 12:16:22 gaspode kernel: ata1.00: failed command: WRITE FPDMA QUEUED Nov 09 12:16:22 gaspode kernel: ata1.00: cmd 61/08:10:08:22:aa/00:00:1a:00:00/40 tag 2 ncq 4096 out res 40/00:01:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout) [... more similar entries,then: ] Nov 09 12:16:22 gaspode kernel: ata1: hard resetting link Nov 09 12:16:22 gaspode kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 09 12:16:22 gaspode kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded Nov 09 12:16:22 gaspode kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out Nov 09 12:16:22 gaspode kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out Nov 09 12:16:22 gaspode kernel: ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded Nov 09 12:16:22 gaspode kernel: ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out Nov 09 12:16:22 gaspode kernel: ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out Nov 09 12:16:22 gaspode kernel: ata1.00: configured for UDMA/133 Nov 09 12:16:22 gaspode kernel: ata1.00: device reported invalid CHS sector 0 Nov 09 12:16:22 gaspode kernel: ata1.00: device reported invalid CHS sector 0 Nov 09 12:16:22 gaspode kernel: ata1.00: device reported invalid CHS sector 0 Nov 09 12:16:22 gaspode kernel: ata1.00: device reported invalid CHS sector 0 [...] Nov 09 12:16:22 gaspode kernel: ata1: EH complete
Just to confirm the pattern after more incidents of this: It looks unlikely to be a hardware problem because it consistently happens after resume. Previous to resuming - including both for hours of heavy development use and solid reading in of the hibernation image *while resuming* - the disk behaves flawlessly. Once, after resuming, it starts happening though, it gets pretty bad (machine effectively locks up every few minutes), requiring a reboot to clear up. Some kind of power management race condition? Should I play with ASPM boot options - or something else?
I just saw this happen on a fresh boot. So, maybe a drive problem after all. I'm troubleshooting it with Intel, please leave this open for the time being; I should have some update in the next few days.
This looked like a hardware failure; The drive has been replaced with an RMA unit which seems to have fixed this. Closing.