Description of problem: I am seeing SATA disconnect kernel messages like these intermittently (on average a few times per hour, sometimes just a few minutes apart): Aug 17 10:22:54.157169 cinnamon.djc.id.au kernel: ata2: exception Emask 0x50 SAct 0x0 SErr 0x4090800 action 0xe frozen Aug 17 10:22:54.918160 cinnamon.djc.id.au kernel: ata2: irq_stat 0x00400040, connection status changed Aug 17 10:22:54.918180 cinnamon.djc.id.au kernel: ata2: SError: { HostInt PHYRdyChg 10B8B DevExch } Aug 17 10:22:54.918204 cinnamon.djc.id.au kernel: ata2: hard resetting link Aug 17 10:22:54.918225 cinnamon.djc.id.au kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 17 10:22:54.918244 cinnamon.djc.id.au kernel: ata2.00: configured for UDMA/100 Aug 17 10:22:54.918261 cinnamon.djc.id.au kernel: ata2: EH complete There are no other symptoms when the disconnect happens. Even though there are mounted filesystems on the disk the kernel seems to recover fine. I did some reading, there is a lot of info online suggesting that this is not a kernel problem but rather the disk itself is dropping the link intermittently due to a loose connection or insufficient power supply. However in my case I think this is actually a kernel bug, because: * it happens reliably on 3.16 kernels but when I boot 3.15.8 I do not see any disconnects (I have run 3.15.8 for several days now) * I have triple-checked the SATA link and power connections for the disk * the hardware is all near-new * it's a Thinkstation E32 in (almost) stock configuration, so there's nothing dodgy involved like old disks, cheap cables, under-specced PSU, or anything like that Actually there is one non-standard piece of hardware in the system, a Samsung XP491 PCI-express SSD which appears as a SATA disk. However that's probably not related to this problem (the SSD does not exhibit any SATA disconnects). Version-Release number of selected component (if applicable): observed on: kernel-3.16.1-300.fc21.x86_64 kernel-3.16.0-1.fc21.x86_64 kernel-3.16.0-0.rc7.git4.1.fc21.x86_64 cannot reproduce on: kernel-3.15.8-200.fc20.x86_64 How reproducible: On my system, reliably reproducible given a few hours. Steps to Reproduce: 1. Boot my system and let it run for a while. Actual results: SATA disconnect messages appear. Expected results: No SATA disconnects. Additional info: The motherboard chipset is Intel C226. 00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05) The disk is a Seagate Barracuda 2TB SATA3, model ST2000DM001.
Created attachment 927423 [details] kernel messages with 3.16.1 Attaching complete kernel messages from 3.16.1. I only left it running for around an hour and there were 5 SATA disconnects.
Created attachment 927424 [details] kernel messages with 3.15.8 For comparison, also attaching complete kernel messages from 3.15.8, which does not exhibit the SATA disconnects.
Actually I have seen a few disconnects on 3.15.8 now as well, although they seem to be a bit less frequent.
I have exactly the same on this hardware: 00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller [RAID mode] (rev 04) But it looks this have no side effects. I have no damaged fs.
I discovered something interesting about this problem over the holidays... I have seen the SATA disconnects with all kernel versions I've tried (3.15-3.17). In an effort to rule out hardware problems I swapped out the disk itself, the SATA cable, I swapped to a different SATA port on the motherboard, and I even ran the disk connected to a separate independent power supply to rule out power supply problems. In all cases the disconnects were still occurring. So if it was a hardware problem it must be the SATA controller on the motherboard itself. In preparation for making a warranty claim I removed the Samsung XP941 PCI-express SSD which I had installed in the system after-market. But with the SSD removed, the disconnects are no longer occurring! The system has now been running for several weeks in that configuration. So it seems that the presence of the PCI-express SSD, which appears as a regular SATA device, is somehow causing the *other* SATA controller on the motherboard to intermittently disconnect? The disk is on ata2 (the onboard SATA controller), the SSD is ata6. Only ata2 experiences the disconnects, not ata6: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) ata2.00: ATA-9: ST2000DM001-1CH164, CC77, max UDMA/100 ata2.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA ata2.00: configured for UDMA/100 scsi 1:0:0:0: Direct-Access ATA ST2000DM001-1CH1 CC77 PQ: 0 ANSI: 5 sd 1:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB) sd 1:0:0:0: [sda] 4096-byte physical blocks sd 1:0:0:0: Attached scsi generic sg0 type 0 sd 1:0:0:0: [sda] Write Protect is off sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300) ata6.00: ATA-9: SAMSUNG MZHPU256HCGL-00004, UXM6501Q, max UDMA/133 ata6.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA ata6.00: configured for UDMA/133 usb 1-1: new high-speed USB device number 2 using ehci-pci sda: sda1 sda2 sda3 sd 1:0:0:0: [sda] Attached SCSI disk usb 1-1: New USB device found, idVendor=8087, idProduct=8008 usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 hub 1-1:1.0: USB hub found hub 1-1:1.0: 6 ports detected usb 2-1: new high-speed USB device number 2 using ehci-pci ata4: SATA link down (SStatus 0 SControl 300) scsi 5:0:0:0: Direct-Access ATA SAMSUNG MZHPU256 501Q PQ: 0 ANSI: 5 sd 5:0:0:0: [sdb] 500118192 512-byte logical blocks: (256 GB/238 GiB) sd 5:0:0:0: [sdb] Write Protect is off sd 5:0:0:0: [sdb] Mode Sense: 00 3a 00 00 sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 5:0:0:0: Attached scsi generic sg1 type 0 sdb: sdb2 sdb3 sdb4 sd 5:0:0:0: [sdb] Attached SCSI disk
(In reply to Daniel Rindt from comment #4) > I have exactly the same on this hardware: > 00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller > [RAID mode] (rev 04) Daniel, do you also have a Samsung XP941, or some other PCI-express SSD in the system? Do you have multiple SATA controllers? If not, then I guess your problem might be unrelated.
(In reply to Dan Callaghan from comment #6) > Daniel, do you also have a Samsung XP941, or some other PCI-express SSD in > the system? Do you have multiple SATA controllers? If not, then I guess your > problem might be unrelated. I am not sure if its unrelated, i just found this bug with same symptoms. There are 2x SanDisk SD6SP1M128G1102 SSD connected to that above mentioned controller.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There are a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 21 kernel bugs. Fedora 21 has now been rebased to 3.18.3-201.fc21. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you experience different issues, please open a new bug report for those.
Still occurs with kernel-3.18.7-200.fc21.x86_64.
Any debug options I could try which might give some hints what is going wrong here? Or a debug kernel maybe?
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 21 kernel bugs. Fedora 21 has now been rebased to 3.19.5-200.fc21. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 22, and are still experiencing this issue, please change the version to Fedora 22. If you experience different issues, please open a new bug report for those.
This message is a reminder that Fedora 21 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 21. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '21'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 21 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.