Description of problem: After installing FC5 anew (not a upgrade from FC4), it boots up nicely but after a interval of about a few minutes after logging in and playing with Pirut, everything seems to slow down immensely, and after a few seconds the machine locks up (I am able to type in things, move the mouse, and click things, but with no response from both within X11 and in the terminal). Booting from Knoppix, I checked my logs and found many of the follow error littered at short intervals (each a few minutes apart): ata1: command 0x35 timeout, stat 0x50 host_stat A Google search turned up a thread on the LKML about LibPATA code issues in 2.6.15.4, which resulted in the poster resolved it by finding out that his disk was dying. Fortunately, this does not seem to be the case for me, as this disk (Western Digital Raptor, model WD740GD-41FLC2) works just fine in Core 4 after a reinstallation and full update. This is on an ABIT VT7 motherboard (VIA PT880-8237 chipset with a VT6420 SATA RAID Controller; not using RAID). Version-Release number of selected component (if applicable): kernel-2.6.15-1.2054_FC5 How reproducible: Every time. Steps to Reproduce: 1. Install FC5. 2. Reboot. 3. Try to login and do things. Actual results: The ata1 errors in my kernel logs as mentioned and a general system crash, as outlined above. Expected results: Expected FC5 to work well on it, as FC4 does. Additional info: I re-installed FC4 and am using that for the time being. The following is the relevant part of my kernel log from FC4: SCSI subsystem initialized libata version 1.20 loaded. sata_via 0000:00:0f.0: version 1.1 ACPI: PCI Interrupt 0000:00:0f.0[B] -> Link [LNKB] -> GSI 11 (level, low) -> IRQ 11 sata_via 0000:00:0f.0: routed to hard irq line 11 ata1: SATA max UDMA/133 cmd 0xB400 ctl 0xB802 bmdma 0xC400 irq 11 ata1: SATA link up 1.5 Gbps (SStatus 113) ata1: dev 0 cfg 49:2f00 82:74eb 83:7f63 84:4003 85:74e9 86:3c43 87:4003 88:407f ata1: dev 0 ATA-6, max UDMA/133, 145226112 sectors: LBA48 ata1: dev 0 configured for UDMA/133 scsi0 : sata_via Vendor: ATA Model: WDC WD740GD-41FL Rev: 31.0 Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB) SCSI device sda: drive cache: write back SCSI device sda: 145226112 512-byte hdwr sectors (74356 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 > sd 0:0:0:0: Attached scsi disk sda
*** Bug 186841 has been marked as a duplicate of this bug. ***
Kernel 2.6.16-1.2069_FC4 also has similar troubles on Fedora Core 4. I can't verify that they are the exact same, as no "ata1: ..." messsage is left in the logs, but the slowdown-then-hardlock-soon-after-boot symptom is the same.
As an addendum, if I boot with 'acpi=off pci=routeirq', my troubles with SATA appear to vanish (uptime of almost 20 minutes so far with 2.6.16-1.2069_FC4).
Another addendum: if I boot without those options and without 'quiet rhgb', then I get the exact same "ata1: command 0x35 timeout... " error messsages on the console.
I'm rather happy to report that booting with 'noapic acpi=off pci=routeirq' appears to be a nice workaround in FC5, both for the installed default kernel and the updated 2.6.16-1.2096_FC5 kernel.
So after a few days of playing around with it, it appears that the 'noapic' option is the proper workaround for this issue. I don't remember local APIC being enabled on uniprocessor x86 builds being default prior to this (though my Prescott says that it supports APIC). I'm not sure if this is a hardware or software bug now, as booting with 'lapic' in FC4 kernel builds seemed to work just fine; or was this option silently ignored in those builds? Thanks.
I'm having this issue as well, although the noapic workaround doesn't seem to be an option for me. If I try that I can't even boot. After a random amount of time working (half hour or quicker usually), the disks stop responding, then the UI some time after that. If I'm logged in from another box I can do a dmesg and see that the following lines (which never make it to the log files of course): ata2: command 0x35 timeout, stat 0x50 host_stat 0x24 ata1: command 0x35 timeout, stat 0x50 host_stat 0x24 I ran Hitachi's disk check software against the hard drives and they both came back clean. I'm also not having any trouble with them when I boot under Windows, so I'm pretty sure the disks are good. I ran memcheck to look for memory problems, but things look good there too. The problem has occured for every FC5 SMP kernel through the current one 2.6.16-1.2111_FC5. The problem does not seem to occur when I boot a non SMP kernel, but obviously I'd like to use my other processor. I will attach an lspci -v and dmesg dump.
Created attachment 129107 [details] dmesg from 2.6.16-1.2111_FC5 kernel
Created attachment 129108 [details] lspci
I'm quite thrilled to report that this appears to be fixed in the updated Rawhide kernels for me. My box has successfully stayed on and responsive for a couple of days so far, without the noapic workaround active. Andrew: Do you have a Rawhide/FC6 install that you could test this on? If so, does the kernel on it work for you? Thanks.
I'll mark this as RESOLVED/RAWHIDE, as the Developments kernels have yet to give me any more of such errors. (Yay!) Andrew: If you still experience this bug, please feel free to repoen with more details or clone it as new bug, etc. as needed. Thanks.