Description of problem: On my home built linux server I see many disk errors relating to the SATA drive attached along the lines of ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x2400000 action 0x0 ata2.00: (BMDMA2 stat 0x750001) ata2.00: cmd 35/00:c8:59:aa:7d/00:01:02:00:00/e0 tag 0 cdb 0x0 data 233472 out res 51/04:47:da:ab:7d/00:00:02:00:00/e0 Emask 0x1 (device error) ata2.00: configured for UDMA/100 ata2: EH complete SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) sda: Write Protect is off sda: Mode Sense: 00 3a 00 00 SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA The server was FC5 and the problem was severe enough to cause major timeouts and eventually the disk was rendered corrupt. Replaced the SATA disk with a brand new one of the same make and performed a clean install of FC6 and I still see the same errors in the log. Version-Release number of selected component (if applicable): Linux server002 2.6.20-1.2962.fc6 #1 SMP Tue Jun 19 18:24:12 EDT 2007 i686 i686 i386 GNU/Linux HD : WD2500KS (250Gb SATA2 drive with 16Mb cache) Controller : SYBA SD-SATA150 RT (Based on Silicon Image, Inc. SiI 3512 [SATALink/SATARaid] Serial ATA Controller (rev 01)) Motherboard : PC CHips M789 with VIA C3 CPU All running at stock speeds/BIOS settings. How reproducible: Very, I see the messages in the current clean FC6 install with a brand new SATA drive and worse on a previous FC5 install with a 10 month old SATA drive of the same make and type, this being the reason for the Fedora update and the new drive. Steps to Reproduce: 1. Start Linux server 2. Copy files to/from mounted partition on the SATA drive 3. Observe exception messages in /var/log/messages Actual results: ATA errors in the log. With previous disk got long timeouts attempting to access the disk, especially noticeable when using SMB clients. Expected results: Clean access to disk. Additional info: I only recall seeing this after upgrading to the 2.6.20 series kernel on both the original FC5 install and now the FC6 install. Having replaced the hard disk and the Linux version I don't *think* its a hardware problem and Googling the error there appears to be some chatter related to SATA drives and the newer versions of libata/linux kernel. Having said that the timeouts I saw with FC5/original drive don't appear to happen (yet) with FC6/new drive so these ATA messages may be harmless and I really do have a bad drive. Unfortunately WD's Diag tool failed to boot on the server box when I tried to check it out so I don't know if the orig. drive is OK.
Created attachment 158684 [details] dmesg output
Created attachment 158723 [details] A later dmesg log file, this time with timeout/retry message on the drive I started seeing drive reset messages that make the SATA drive unavailable for periods of time. This after I noticed that copying backup data from an attached USB HD to the SATA drive had slowed from ~700Mb a minute down to ~70Mb/minute or thereabouts (very unscientific measurement, I had a while loop doing df -k . on the SATA drive with a sleep 60 to give a minutes pause)
Created attachment 158735 [details] lspci log file
Created attachment 158736 [details] lspci -vv log file
Created attachment 158737 [details] uname -a ouput
Bought a Promise Technology, Inc. PDC40775 (SATA 300 TX2plus) (rev 02) to replace the Silicon Image, Inc. SiI 3512 Syba controller (working on the principle that I had a bad controller card) but the SATA errors persisted.
Downloaded a 2.16.9 Kernel from kernel.org, used the FC6 config-2.6.18-1.2798.fc6 configuration to build the kernel and installed as my default kernel. The disk errors have vanished and I can now reliably access either SATA drive locally on the server or from Samba shares. Not sure how viable a workaround this is for most people though.
I'd also tried kernels 2.6.18, 2.6.20 and 2.6.22 and all of these showed the above disk errors.
(In reply to comment #8) > I'd also tried kernels 2.6.18, 2.6.20 and 2.6.22 and all of these showed the > above disk errors. Were the latest FC6 2.6.22 kernels tried, or just the vanilla ones?
I tried an intermediate FC6 2.6.22 from updates-testing during the rebase from 2.6.20, 2.6.22.1-32.fc6, and then 2.6.22.2-42.fc6 from the standard updates area. The 2.6.16.9 is the only vanilla + FC6 config kernel I've tried so far.
Having said all that I later got a ton of disk errors copying some data back from a backup onto one of the two WD SATA drives while running the 2.6.16.9 kernel so it looks like I jumped to a conclusion too early :( I'm planning on hooking these drives up to my Win XP box and using the Windows based WD diagnostic tools to check out both of my drives as it may be possible that both were either duff when I got them or went duff when they were in my server. If it turns out that they are bad (what are the odds ? Maybe not that high ...) I'll RMA them as they're both under warranty and start over with new drives. Maybe I can then see if its a kernel driver issue between 2.6.16 and 2.6.{18,20,22} or it was bogus hardware all along.
(This is a mass-update to all current FC6 kernel bugs in NEW state) Hello, I'm reviewing this bug list as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug, however this version of Fedora is no longer maintained. Please attempt to reproduce this bug with a current version of Fedora (presently Fedora 8). If the bug no longer exists, please close the bug or I'll do so in a few days if there is no further information lodged. Thanks for using Fedora!
Per the previous comment in this bug, I am closing it as INSUFFICIENT_DATA, since no information has been lodged for over 30 days. Please re-open this bug or file a new one if you can provide the requested data, and thanks for filing the original report!