Created attachment 883515 [details] /run/initramfs/rdsosreport.log from SystemD emergency shell Description of problem: System fails to boot due to disk read errors. Version-Release number of selected component (if applicable): kernel-3.13.9-200.fc20.x86_64 How reproducible: Always. Steps to Reproduce: 1. Try booting any F20 kernel Actual results: Repeated errors like below: [ 31.747119] sakura kernel: ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen [ 31.747163] sakura kernel: ata1.00: failed command: READ FPDMA QUEUED [ 31.747193] sakura kernel: ata1.00: cmd 60/08:00:38:08:00/00:00:00:00:00/40 tag 0 ncq 4096 in res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 31.747257] sakura kernel: ata1.00: status: { DRDY } [ 31.747277] sakura kernel: ata1.00: failed command: READ FPDMA QUEUED [ 31.747305] sakura kernel: ata1.00: cmd 60/08:08:38:48:06/00:00:00:00:00/40 tag 1 ncq 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 31.747369] sakura kernel: ata1.00: status: { DRDY } [ 31.747390] sakura kernel: ata1: hard resetting link [ 32.054328] sakura kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 32.059747] sakura kernel: ata1.00: configured for UDMA/133 [ 32.059753] sakura kernel: ata1.00: device reported invalid CHS sector 0 [ 32.059755] sakura kernel: ata1.00: device reported invalid CHS sector 0 [ 32.059763] sakura kernel: ata1: EH complete SystemD eventually drops out to emergency shell. Expected results: Normal boot. Additional info: A widely reported workaround of adding libata.force=noncq to kernel command line allows system to boot and function normally. The machine is a Sony Vaio Pro 13 with Samsung XP941 SSD. lspci -vn [...] 03:00.0 0106: 144d:a800 (rev 01) (prog-if 01 [AHCI 1.0]) Subsystem: 144d:a811 Flags: bus master, fast devsel, latency 0, IRQ 56 Memory at f6010000 (32-bit, non-prefetchable) [size=8K] Expansion ROM at f6000000 [disabled] [size=64K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable+ Count=1/2 Maskable+ 64bit+ Capabilities: [70] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-00-00-00-00-00-00-00 Capabilities: [150] Power Budgeting <?> Capabilities: [160] Latency Tolerance Reporting Kernel driver in use: ahci $ sudo hdparm -I /dev/sda /dev/sda: ATA device, with non-removable media Model Number: SAMSUNG MZHPU128HCGM-00000 Serial Number: xxxxxxxxxxx60 Firmware Revision: UXM6401Q Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0 Standards: Used: unknown (minor revision code 0x0039) Supported: 9 8 7 6 5 Likely used: 9 Configuration: Logical max current cylinders 16383 16383 heads 16 16 sectors/track 63 63 -- CHS current addressable sectors: 16514064 LBA user addressable sectors: 250069680 LBA48 user addressable sectors: 250069680 Logical Sector size: 512 bytes Physical Sector size: 512 bytes Logical Sector-0 offset: 0 bytes device size with M = 1024*1024: 122104 MBytes device size with M = 1000*1000: 128035 MBytes (128 GB) cache/buffer size = unknown Nominal Media Rotation Rate: Solid State Device Capabilities: LBA, IORDY(can be disabled) Queue depth: 32 Standby timer values: spec'd by Standard, no device specific minimum R/W multiple sector transfer: Max = 16 Current = 16 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6 Cycle time: min=120ns recommended=120ns PIO: pio0 pio1 pio2 pio3 pio4 Cycle time: no flow control=120ns IORDY flow control=120ns Commands/features: Enabled Supported: * SMART feature set Security Mode feature set * Power Management feature set * Write cache * Look-ahead * Host Protected Area feature set * WRITE_BUFFER command * READ_BUFFER command * NOP cmd * DOWNLOAD_MICROCODE SET_MAX security extension * 48-bit Address feature set * Device Configuration Overlay feature set * Mandatory FLUSH_CACHE * FLUSH_CACHE_EXT * SMART error logging * SMART self-test * General Purpose Logging feature set * WRITE_{DMA|MULTIPLE}_FUA_EXT * 64-bit World wide name Write-Read-Verify feature set * WRITE_UNCORRECTABLE_EXT command * {READ,WRITE}_DMA_EXT_GPL commands * Segmented DOWNLOAD_MICROCODE * Gen1 signaling speed (1.5Gb/s) * Gen2 signaling speed (3.0Gb/s) * Gen3 signaling speed (6.0Gb/s) * Native Command Queueing (NCQ) * Phy event counters * unknown 76[15] DMA Setup Auto-Activate optimization * Software settings preservation * SMART Command Transport (SCT) feature set * SCT Write Same (AC2) * SCT Error Recovery Control (AC3) * SCT Features Control (AC4) * SCT Data Tables (AC5) * SET MAX SETPASSWORD/UNLOCK DMA commands * WRITE BUFFER DMA command * READ BUFFER DMA command * Data Set Management TRIM supported (limit 8 blocks) Security: Master password revision code = 65534 supported not enabled not locked frozen not expired: security count supported: enhanced erase 6min for SECURITY ERASE UNIT. 32min for ENHANCED SECURITY ERASE UNIT. Logical Unit WWN Device Identifier: 5002538xxxxxxxxx NAA : 5 IEEE OUI : 002538 Unique ID : xxxxxxxxx Integrity word not set (found 0x27ef, expected 0x100a5)
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.14.4-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
Yes, the issue still persists with kernel-3.14.2-200.fc20.x86_64.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 20 kernel bugs. Fedora 20 has now been rebased to 3.17.2-200.fc20. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 21, and are still experiencing this issue, please change the version to Fedora 21. If you experience different issues, please open a new bug report for those.
Confirming this is still happening with kernel-3.17.4-200.fc20.x86_64, but this kernel recovers gracefully by disabling NCQ automatically: Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: NCQ disabled due to excessive errors Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: exception Emask 0x0 SAct 0x300 SErr 0x0 action 0x6 frozen Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: failed command: READ FPDMA QUEUED Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: cmd 60/18:40:20:00:16/00:00:00:00:00/40 tag 8 ncq 12288 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: status: { DRDY } Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: failed command: READ FPDMA QUEUED Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: cmd 60/08:48:10:00:16/00:00:00:00:00/40 tag 9 ncq 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 01 16:54:26 sakura.greysector.net kernel: ata1.00: status: { DRDY } Dec 01 16:54:26 sakura.greysector.net kernel: ata1: hard resetting link Dec 01 16:54:27 sakura.greysector.net kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Dec 01 16:54:27 sakura.greysector.net kernel: ata1.00: configured for UDMA/133 Dec 01 16:54:27 sakura.greysector.net kernel: ata1.00: device reported invalid CHS sector 0 Dec 01 16:54:27 sakura.greysector.net kernel: ata1.00: device reported invalid CHS sector 0 Dec 01 16:54:27 sakura.greysector.net kernel: ata1: EH complete
Looks like the patch from https://bugzilla.kernel.org/show_bug.cgi?id=89171#c8 fixes this issue. Please include it in Fedora package while it's making its way to the main tree.
FYI this is now part of 3.17-stable queue: https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/commit/?id=211d32be66b621940515b45ddf60865dcda246b8
(In reply to Dominik 'Rathann' Mierzejewski from comment #6) > FYI this is now part of 3.17-stable queue: > > https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/commit/ > ?id=211d32be66b621940515b45ddf60865dcda246b8 Thanks again for the pointer. I've added it to Fedora git today and it will be in the next build of each.
kernel-3.17.7-300.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/kernel-3.17.7-300.fc21
kernel-3.17.7-200.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.17.7-200.fc20
Package kernel-3.17.7-200.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.17.7-200.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-17283/kernel-3.17.7-200.fc20 then log in and leave karma (feedback).
kernel-3.17.7-200.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
kernel-3.17.7-300.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report.