Bug 462425

Summary: Kernel 2.6.26.3-29.fc9.x86_64 drive goes offline
Product: [Fedora] Fedora Reporter: Brian Rademacher <rad>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: urgent Docs Contact:
Priority: medium    
Version: 9CC: brian.mosher, dave, emcnabb, erwan, fdor6, fujisan43, gijsbert.wiesenekker, herrold, jpiszcz, kernel-maint, mathguthrie, peterm, qr7atgwu, rad, rainer.traut, redhat, scott
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-10 00:30:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/messages
none
Patch for 2.6.28: sata_mv: remove update races from hc_irq_cause register
none
sata_mv: Fix timeouts on Marvell 6081 ports 0..3. none

Description Brian Rademacher 2008-09-16 06:25:55 UTC
I recently upgraded from Kernel 2.6.25.14-108.fc9.x86_64 to 2.6.26.3-29.fc9.x86_64 and ended up with 323 megs of error logs (see messages.zip attachment) related to at least one of my drives going offline.  Prior to the kernel update, I never had a problem.  This occurred during very heavy disk activity (I was running an rdiff backup and transfering a large file locally over gigabit ethernet via samba, plus all of the normal server activity taking place (httpd, sendmail, spamassassin, etc.)

Drives are 4 Seagate 320gb 7200.10 (ST3320620AS) drives in software RAID-5/ext3.

Backup was going to a single Samsung Spinpoint F1 (HD103UJ) 1TB drive/ext4.

mcelog is clean, and always has been, and the server has been stable through various Fedora releases...

Controller for all drives is:

02:03.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Unknown device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at e000 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=02:03.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
        Kernel driver in use: sata_mv
        Kernel modules: sata_mv



Extra things I have in rc.local:

/sbin/hdparm -B1 /dev/sde
echo 128 > /sys/block/sda/queue/max_sectors_kb
echo 128 > /sys/block/sdb/queue/max_sectors_kb
echo 128 > /sys/block/sdc/queue/max_sectors_kb
echo 128 > /sys/block/sdd/queue/max_sectors_kb
echo 16384 > /sys/block/md4/md/stripe_cache_size
blockdev --setra 4096 /dev/sda
blockdev --setra 4096 /dev/sdb
blockdev --setra 4096 /dev/sdc
blockdev --setra 4096 /dev/sdd
blockdev --setra 4096 /dev/sde
blockdev --setra 32768 /dev/md4




Section of where things start to go bad and the first trace occurs in the message log:

Sep 15 12:22:20 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 15 12:22:20 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Sep 15 12:22:20 radfiles kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 15 12:22:20 radfiles kernel: ata1.00: status: { DRDY }
Sep 15 12:22:20 radfiles kernel: ata1: hard resetting link
Sep 15 12:22:20 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 15 12:22:21 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:22:21 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:22:21 radfiles kernel: ata1.00: configured for UDMA/133
Sep 15 12:22:21 radfiles kernel: ata1: EH complete
Sep 15 12:22:21 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
Sep 15 12:22:21 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 15 12:22:21 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 15 12:24:36 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 15 12:24:36 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Sep 15 12:24:36 radfiles kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 15 12:24:36 radfiles kernel: ata1.00: status: { DRDY }
Sep 15 12:24:36 radfiles kernel: ata1: hard resetting link
Sep 15 12:24:37 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 15 12:24:37 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:24:37 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:24:37 radfiles kernel: ata1.00: configured for UDMA/133
Sep 15 12:24:37 radfiles kernel: ata1: EH complete
Sep 15 12:24:37 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
Sep 15 12:24:37 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 15 12:24:37 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 15 12:25:38 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 15 12:25:38 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Sep 15 12:25:38 radfiles kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 15 12:25:38 radfiles kernel: ata1.00: status: { DRDY }
Sep 15 12:25:38 radfiles kernel: ata1: hard resetting link
Sep 15 12:25:38 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 15 12:25:39 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:25:39 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:25:39 radfiles kernel: ata1.00: configured for UDMA/133
Sep 15 12:25:39 radfiles kernel: ata1: EH complete
Sep 15 12:25:39 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
Sep 15 12:25:39 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 15 12:25:39 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 15 12:26:39 radfiles kernel: ata1.00: NCQ disabled due to excessive errors
Sep 15 12:26:39 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 15 12:26:39 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Sep 15 12:26:39 radfiles kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 15 12:26:39 radfiles kernel: ata1.00: status: { DRDY }
Sep 15 12:26:39 radfiles kernel: ata1: hard resetting link
Sep 15 12:26:40 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 15 12:26:40 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:26:40 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 15 12:26:40 radfiles kernel: ata1.00: configured for UDMA/133
Sep 15 12:26:40 radfiles kernel: ata1: EH complete
Sep 15 12:26:40 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
Sep 15 12:26:40 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 15 12:26:40 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 15 12:26:40 radfiles kernel: ------------[ cut here ]------------
Sep 15 12:26:40 radfiles kernel: WARNING: at drivers/ata/libata-core.c:4752 ata_qc_issue+0x41/0x2aa [libata]() (Not tainted)
Sep 15 12:26:40 radfiles kernel: Modules linked in: ext4dev jbd2 crc16 dm_mirror dm_log dm_mod sr_mod cdrom pata_acpi floppy pcspkr sg tg3 k8temp hwmo
n pata_ali ata_generic raid1 shpchp sata_mv libata sd_mod scsi_mod raid456 async_xor async_memcpy async_tx xor ext3 jbd mbcache uhci_hcd ohci_hcd ehci
_hcd [last unloaded: scsi_wait_scan]
Sep 15 12:26:40 radfiles kernel: Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.26.3-29.fc9.x86_64 #1
Sep 15 12:26:40 radfiles kernel:
Sep 15 12:26:40 radfiles kernel: Call Trace:
Sep 15 12:26:40 radfiles kernel: <IRQ>  [<ffffffff81036db7>] warn_on_slowpath+0x60/0xa3
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008c0e2>] ? :scsi_mod:scsi_sg_alloc+0x43/0x45
Sep 15 12:26:40 radfiles kernel: [<ffffffff81140246>] ? __sg_alloc_table+0x6d/0xf1
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008c051>] ? :scsi_mod:scsi_init_sgtable+0x96/0x9f
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008c2b6>] ? :scsi_mod:scsi_init_io+0x22/0xcc
Sep 15 12:26:40 radfiles kernel: [<ffffffffa00b73e8>] ? :libata:ata_build_rw_tf+0x19d/0x250
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008c3e9>] ? :scsi_mod:scsi_setup_fs_cmnd+0x89/0x91
Sep 15 12:26:40 radfiles kernel: [<ffffffffa00b92d0>] :libata:ata_qc_issue+0x41/0x2aa
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008668b>] ? :scsi_mod:scsi_done+0x0/0x21
Sep 15 12:26:40 radfiles kernel: [<ffffffffa00be504>] :libata:ata_scsi_translate+0x11f/0x155
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008668b>] ? :scsi_mod:scsi_done+0x0/0x21
Sep 15 12:26:40 radfiles kernel: [<ffffffffa00c05f3>] :libata:ata_scsi_queuecmd+0x17d/0x1cd
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008699b>] :scsi_mod:scsi_dispatch_cmd+0x1cd/0x259
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008bcc3>] :scsi_mod:scsi_request_fn+0x320/0x3ff
Sep 15 12:26:40 radfiles kernel: [<ffffffff811298ef>] __blk_run_queue+0x7d/0xf6
Sep 15 12:26:40 radfiles kernel: [<ffffffff81129989>] blk_run_queue+0x21/0x35
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008b367>] :scsi_mod:scsi_run_queue+0x279/0x2ac
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008bfab>] :scsi_mod:scsi_next_command+0x36/0x46
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008c161>] :scsi_mod:scsi_end_request+0x7d/0x90
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008c721>] :scsi_mod:scsi_io_completion+0x1b3/0x3b0
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008663b>] :scsi_mod:scsi_finish_command+0xce/0xd7
Sep 15 12:26:40 radfiles kernel: [<ffffffffa008cb8f>] :scsi_mod:scsi_softirq_done+0xe4/0xed
Sep 15 12:26:40 radfiles kernel: [<ffffffff81128005>] blk_done_softirq+0x77/0x87
Sep 15 12:26:40 radfiles kernel: [<ffffffff8103bfaa>] __do_softirq+0x6d/0xe1
Sep 15 12:26:40 radfiles kernel: [<ffffffff8100d4ec>] call_softirq+0x1c/0x28
Sep 15 12:26:40 radfiles kernel: <EOI>  [<ffffffff8100e770>] do_softirq+0x44/0x8b
Sep 15 12:26:40 radfiles kernel: [<ffffffff8103bbdf>] ksoftirqd+0x58/0xcf
Sep 15 12:26:40 radfiles kernel: [<ffffffff8103bb87>] ? ksoftirqd+0x0/0xcf
Sep 15 12:26:40 radfiles kernel: [<ffffffff81049baf>] kthread+0x49/0x76
Sep 15 12:26:40 radfiles kernel: [<ffffffff8100d148>] child_rip+0xa/0x12
Sep 15 12:26:40 radfiles kernel: [<ffffffff81049b66>] ? kthread+0x0/0x76
Sep 15 12:26:40 radfiles kernel: [<ffffffff8100d13e>] ? child_rip+0x0/0x12
Sep 15 12:26:40 radfiles kernel:
Sep 15 12:26:40 radfiles kernel: ---[ end trace e87afce5152dfd41 ]---
Sep 15 12:26:40 radfiles kernel: ------------[ cut here ]------------




I don't think this is a bad drive issue, but I can also provide smartctl info if it helps any.  Not sure if this is a libata/sata_mv issue, or??  I'm still running the same kernel for more testing, as it didn't seem to cause any data corruption that I'm aware of, but the system did hang for quite a while...

Comment 1 Brian Rademacher 2008-09-16 06:29:29 UTC
Created attachment 316813 [details]
/var/log/messages

Comment 2 Brian Rademacher 2008-09-16 19:10:59 UTC
Crashed again (without all of the nasty trace info this time since I caught it right away) during my scheduled rdiff backup (no additional disk IO this time as before).  I went back to kernel 2.6.25.14-108.fc9.x86_64 and completed the same rdiff backup with no problem.  Here is dmesg output this time:

Sep 16 12:37:43 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 16 12:37:43 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Sep 16 12:37:43 radfiles kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 16 12:37:43 radfiles kernel: ata1.00: status: { DRDY }
Sep 16 12:37:43 radfiles kernel: ata1: hard resetting link
Sep 16 12:37:43 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 16 12:37:43 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 16 12:37:43 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 16 12:37:43 radfiles kernel: ata1.00: configured for UDMA/133
Sep 16 12:37:43 radfiles kernel: ata1: EH complete
Sep 16 12:37:43 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
Sep 16 12:37:43 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 16 12:37:43 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 16 12:39:15 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 16 12:39:15 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Sep 16 12:39:15 radfiles kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 16 12:39:15 radfiles kernel: ata1.00: status: { DRDY }
Sep 16 12:39:15 radfiles kernel: ata1: hard resetting link
Sep 16 12:39:15 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 16 12:39:16 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 16 12:39:16 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ
Sep 16 12:39:16 radfiles kernel: ata1.00: configured for UDMA/133
Sep 16 12:39:16 radfiles kernel: ata1: EH complete
Sep 16 12:39:16 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
Sep 16 12:39:16 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 16 12:39:16 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA




Also, just for reference, kernel 2.6.25.14-108.fc9.x86_64 has NCQ enabled for sata_mv, which is relatively new, but functioning under that kernel:
ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata3.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata4.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32)

Comment 3 Brian Rademacher 2008-09-20 22:27:49 UTC
I can get the same error under Kernel 2.6.25.14-108.fc9.x86_64 now that I added a 5th drive to the RAID array, but it only shows up either during a RAID resync or about once every few hours.  With only 4 drives, it only showed under a resync and never under regular operation.

With Kernel 2.6.26.3-29.fc9.x86_64 I'm lucky if I can boot...It resets the bus every few minutes on average.  With only 4 drives I was ok until heavy IO (like the backup mentioned in the bug), but with 5 it's unuseable.

Comment 4 Marcos Martins da Silva 2008-09-23 00:11:49 UTC
Hi Brian, I have about 10 computers with very different hardware running with fedora 9 (i386 e x86_64). All with the last kernel. All are OK except one: a computer I have used ext4. When using the 2.6.25.14-108 version it boots clean but with 2.6.26.3-29 my "/" partition is invisible. Searching the web I have seen 2.6.26 requires a patch to use ext4 partitions:

"2008-08-20: The 2.6.26-ext4-7 patchset has been released. People who are using ext4 wih 2.6.26 should really take this patch.

2008-07-15: Delayed allocation has been merged into Linus's ext4 git tree! We have started maintaining patches against the latest 2.6 mainline kernel for make it easier for people to try out ext4. " (http://ext4.wiki.kernel.org/index.php/Main_Page)

As in your first message we can read:
"Sep 15 12:26:40 radfiles kernel: Modules linked in: ext4dev " and you said you have 4 ext3 drives that works OK and the 5th with ext4 partition is bad I think that was your initial problem.
For me all I have to do is using 2.6.25 until Fedora team releases another version for 2.6.26. For you perhaps you can try the patch I cited above.

Comment 5 Brian Rademacher 2008-09-23 01:32:22 UTC
The only thing I use ext4 for is on a terabyte backup drive, so it only mounts during the backup process and then umounts otherwise.  The failure occurs at any time during heavy IO (ext4 aside), which is why I was seeing it during backups.

I don't think that ext4dev should be interacting with anything when the drive isn't even mounted, but for now I have removed the module just to see if anything changes.  I'll skip tomorrow's backup and see how it goes.  If it works, I'll then try the patchset.

Thanks for the idea!  I'll try anything at this point...

Comment 6 Brian Rademacher 2008-09-23 11:48:42 UTC
It didn't work...

Comment 7 Brian Rademacher 2008-09-27 18:06:44 UTC
A few more updates:

-smartctl reported that /dev/hda is fine, through 2 long tests and 3 short.

-Disabling smartd didn't help.

-Disabling NCQ didn't help, it just changed the error from NCQ to DMA.

-Manually failing sda and later sde and going back to 4 drives (much less IO) worked fine, also showing that sda likely isn't the problem.

Comment 8 Alan Cox 2008-10-02 14:50:36 UTC
SATA so reassigning to Jeff. Looks like another case of the bug Mark Lord fixed which I think is queued for .27

Comment 9 Brian Rademacher 2008-10-02 16:37:49 UTC
Any idea where that bug/patch might be?  I'm getting about 6 or so of these lockups a day, so I wouldn't mind trying to push my own fix a little early...

Comment 10 Justin Piszcz 2008-10-02 23:02:53 UTC
I look forward to this patch as well, do you have a link to it? I also use the Intel e1000e driver so I'd prefer the standalone patch vs. moving to 2.6.27.

[420781.333179] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[420781.333189] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[420781.333190]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[420781.333194] ata6.00: status: { DRDY }
[420781.333200] ata6: hard resetting link
[420781.638589] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[420781.662166] ata6.00: configured for UDMA/133
[420781.662166] ata6: EH complete
[420781.662989] sd 5:0:0:0: [sdf] 586072368 512-byte hardware sectors (300069 MB)
[420781.669416] sd 5:0:0:0: [sdf] Write Protect is off
[420781.669416] sd 5:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[420781.669416] sd 5:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[469680.004637] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[469680.004648] ata2.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[469680.004649]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[469680.004654] ata2.00: status: { DRDY }
[469680.004660] ata2: hard resetting link
[469680.309567] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[469680.333461] ata2.00: configured for UDMA/133
[469680.333477] ata2: EH complete
[469680.333461] sd 1:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB)
[469680.340461] sd 1:0:0:0: [sdb] Write Protect is off
[469680.340461] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[469680.345461] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Comment 11 Brian Rademacher 2008-10-07 02:11:25 UTC
(In reply to comment #8)
> SATA so reassigning to Jeff. Looks like another case of the bug Mark Lord fixed
> which I think is queued for .27

Alan, in the mean time, is there something I can change/disable to return stability back to my server (kernel options, libata options in modprobe.conf, etc.)?  I'd be willing to take a huge performance hit for stability...

Comment 12 Chuck Ebbert 2008-10-10 06:57:36 UTC
I can't find that fix either.

Comment 13 Brian Rademacher 2008-10-22 06:52:03 UTC
I manually failed one of my RAID-5 drives, which has brought back stability to the system.  Other than the performance hit and living on the edge of catastrophe if another HD fails, it's working.  Certainly not a fix, but for now better than the constant freezing...

Comment 14 Brian Rademacher 2008-10-29 08:24:30 UTC
I found a workaround (that works for me at least) - Disabling the drive write cache on all RAID member drives with hdparm -W0 seems to work.  Maybe this is a clue for diagnosing as well.  I didn't mention it above, but I have my RAID mounted with data=writeback if that could be having an effect.

This may be all for not if it's truly fixed in .27 anyway.  I'll be looking forward to the F9 .27 kernel update if/when it comes...

Comment 15 Chuck Ebbert 2008-11-02 00:50:42 UTC
(In reply to comment #14)
> 
> This may be all for not if it's truly fixed in .27 anyway.  I'll be looking
> forward to the F9 .27 kernel update if/when it comes...

https://admin.fedoraproject.org/updates/kernel-2.6.27.4-19.fc9

Comment 16 Brian Rademacher 2008-11-02 01:44:48 UTC
Woo hoo!  I shall test when it hits the testing repo...

Comment 17 Justin Piszcz 2008-11-02 08:03:56 UTC
With 2.6.27.4 (Vanilla) the problem still occurs.

Justin.

Comment 18 Justin Piszcz 2008-11-02 08:08:44 UTC
[198231.048036] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[198231.048045] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
[198231.048046]          res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[198231.048050] ata5.00: status: { DRDY }
[198231.048054] ata5: hard resetting link
[198231.353033] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[198231.377941] ata5.00: configured for UDMA/133
[198231.377954] ata5: EH complete
[198231.378140] sd 4:0:0:0: [sde] 586072368 512-byte hardware sectors (300069 MB)
[198231.385337] sd 4:0:0:0: [sde] Write Protect is off
[198231.385344] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
[198231.385383] sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

$ uname -a
Linux box 2.6.27.4 #1 SMP Sun Oct 26 04:46:17 EDT 2008 x86_64 GNU/Linux

Comment 19 Brian Rademacher 2008-11-03 05:54:09 UTC
Justin, did you test disabling write caching on the drives themselves to see what happens?  I have been running that way since I posted that workaround with no trouble under 2.6.26.6-79.fc9.x86_64.

I'm just wondering if we are experiencing the same problem with the same workaround.  That may help with future debugging of this issue...

Comment 20 Justin Piszcz 2008-11-03 15:19:40 UTC
I have just turned off the cache on all of the drives now and will see if this problem recurs.

Justin.

Comment 21 Justin Piszcz 2008-11-03 15:20:48 UTC
I used hdparm -W0 /dev/sda etc to turn it off, is that the method you used (incase variance matters)?

Comment 22 Brian Rademacher 2008-11-03 18:43:43 UTC
That's exactly what I did...

Comment 23 Justin Piszcz 2008-11-04 23:09:27 UTC
I am still trying to reproduce it with the cache off, so far, I have not had any luck.

Comment 24 Chuck Ebbert 2008-11-05 20:52:43 UTC
Can you test 2.6.27.4:

https://admin.fedoraproject.org/updates/kernel-2.6.27.4-24.fc9

Comment 25 Justin Piszcz 2008-11-05 21:42:51 UTC
Brian, I believe that was directed at you-- BTW so far you're correct, turning the cache off seems to fix the problem, but who's problem is it? The kernel's? Western Digitals? Intel/chipset?

Comment 26 Brian Rademacher 2008-11-05 23:47:01 UTC
Is there an RPM for 2.6.27.4 somewhere yet? (and the dependencies).  Much easier to test that I way.  I haven't seen it hit the testing repo yet...

I think we're on to something with this write caching thing - Mine is still stable, and I'm running 5 Seagate 7200.10 drives, so different than your WD setup...

As I recall, my chipset/hardware is quiet a bit different as well:

00:02.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge
00:03.0 ISA bridge: ALi Corporation M1563 HyperTransport South Bridge (rev 20)
00:03.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU]
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01)
00:0e.0 IDE interface: ALi Corporation M5229 IDE (rev c5)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
02:03.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
03:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
03:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)

Comment 27 Justin Piszcz 2008-11-06 09:06:40 UTC
Happened again, this time, with cache OFF:

Nov  6 01:20:07 p34 kernel: [639232.946183] ata13.00: exception Emask 0x0 SAct
0x0 SErr 0x0 action 0x6 frozen
Nov  6 01:20:07 p34 kernel: [639232.946193] ata13.00: cmd
ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
Nov  6 01:20:07 p34 kernel: [639232.946195]          res
40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Nov  6 01:20:07 p34 kernel: [639232.946200] ata13.00: status: { DRDY }
Nov  6 01:20:07 p34 kernel: [639232.946206] ata13: hard resetting link
Nov  6 01:20:08 p34 kernel: [639233.403168] ata13: SATA link up 3.0 Gbps
(SStatus 123 SControl 300)
Nov  6 01:20:08 p34 kernel: [639233.440207] ata13.00: configured for UDMA/133
Nov  6 01:20:08 p34 kernel: [639233.449851] sd 12:0:0:0: [sdi] Write Protect is
off
Nov  6 01:20:08 p34 kernel: [639233.449858] sd 12:0:0:0: [sdi] Mode Sense: 00 3a
00 00
Nov  6 01:20:08 p34 kernel: [639233.476367] sd 12:0:0:0: [sdi] Write cache:
disabled, read cache: enabled, doesn't support DPO or FUA

Comment 28 Brian Rademacher 2008-11-07 01:17:30 UTC
Well mine didn't take long!  Two freezes right on boot with 2.6.27.4-19.fc9.x86_64 #1 SMP Thu Oct 30 19:30:01 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux...


ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: max_sectors limited to 256 for NCQ
ata1.00: max_sectors limited to 256 for NCQ
ata1.00: configured for UDMA/133
ata1: EH complete

sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out
         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: max_sectors limited to 256 for NCQ
ata1.00: max_sectors limited to 256 for NCQ
ata1.00: configured for UDMA/133
ata1: EH complete


I turned off write caching, which I assume will work based on my previous experience...

Comment 29 q 2008-11-09 08:21:44 UTC
Running 7 disk raid 5 array with the following card:
SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)

I saw discussion of this in the linux-kernel mailing list and someone mentioned they where seeing my same issue with the super micro AOC-SAT2-MV8. That's also the card I'm using. 

file system is XFS.

On heavy transfers i'm seeing a lot of this. I've been getting it since late august. Not going to lie, using ubuntu. see my initial bug report here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263160/

If you read down, you'll see i _WAS_ using a RHEL based distro (2.6.18 32bit) just fine, and then i moved to ubuntu (2.6.27.2 64bit) and started getting these issues. 

-- since this posting, i've upgraded to 2.6.27-7 and its now gotten so bad that its desync'd my raid on a transfer. i'm now worried about loosing the data and have completely disconnected the drives. I'm not going to risk a rebuild without these issues fixed. really wish we could figure this out after 2 months of reported problems. I'm not sure if the redhat bugzilla is the right place to report this, but if someone replies i'll provide any information that i can.

dmesg:
[11285.918535] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11285.918567] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11285.918568] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11285.918619] ata9.00: status: { DRDY }
[11285.918635] ata9: hard resetting link
[11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11286.460065] ata9.00: max_sectors limited to 256 for NCQ
[11286.520054] ata9.00: max_sectors limited to 256 for NCQ
[11286.520059] ata9.00: configured for UDMA/133
[11286.520077] ata9: EH complete
[11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11286.520132] sd 8:0:0:0: [sdd] Write Protect is off
[11286.520134] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11326.988529] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11326.988554] ata8.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11326.988555] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11326.988606] ata8.00: status: { DRDY }
[11326.988623] ata8: hard resetting link
[11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11327.580053] ata8.00: max_sectors limited to 256 for NCQ
[11327.657199] ata8.00: max_sectors limited to 256 for NCQ
[11327.657202] ata8.00: configured for UDMA/133
[11327.657207] ata8: EH complete
[11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
[11327.657272] sd 7:0:0:0: [sdc] Write Protect is off
[11327.657273] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11377.938532] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11377.938557] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11377.938558] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11377.938608] ata7.00: status: { DRDY }
[11377.938624] ata7: hard resetting link
[11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11378.520056] ata7.00: max_sectors limited to 256 for NCQ
[11378.600065] ata7.00: max_sectors limited to 256 for NCQ
[11378.600068] ata7.00: configured for UDMA/133
[11378.600073] ata7: EH complete
[11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11378.600133] sd 6:0:0:0: [sdb] Write Protect is off
[11378.600135] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11711.718523] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11711.718548] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11711.718549] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11711.718600] ata9.00: status: { DRDY }
[11711.718616] ata9: hard resetting link
[11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11712.260058] ata9.00: max_sectors limited to 256 for NCQ
[11712.320057] ata9.00: max_sectors limited to 256 for NCQ
[11712.320066] ata9.00: configured for UDMA/133
[11712.320072] ata9: EH complete
[11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11712.320125] sd 8:0:0:0: [sdd] Write Protect is off
[11712.320127] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11849.328524] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11849.328549] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11849.328549] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11849.328600] ata7.00: status: { DRDY }
[11849.328617] ata7: hard resetting link
[11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11849.910070] ata7.00: max_sectors limited to 256 for NCQ
[11849.990053] ata7.00: max_sectors limited to 256 for NCQ
[11849.990057] ata7.00: configured for UDMA/133
[11849.990069] ata7: EH complete
[11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11849.990123] sd 6:0:0:0: [sdb] Write Protect is off
[11849.990125] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11909.629773] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11909.629797] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11909.629798] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11909.629849] ata9.00: status: { DRDY }
[11909.629865] ata9: hard resetting link
[11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11910.180068] ata9.00: max_sectors limited to 256 for NCQ
[11910.231316] ata9.00: max_sectors limited to 256 for NCQ
[11910.231319] ata9.00: configured for UDMA/133
[11910.231327] ata9: EH complete
[11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
[11910.231394] sd 8:0:0:0: [sdd] Write Protect is off
[11910.231396] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[11996.729773] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[11996.729797] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out
[11996.729798] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[11996.729848] ata7.00: status: { DRDY }
[11996.729865] ata7: hard resetting link
[11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[11997.311308] ata7.00: max_sectors limited to 256 for NCQ
[11997.391306] ata7.00: max_sectors limited to 256 for NCQ
[11997.391316] ata7.00: configured for UDMA/133
[11997.391322] ata7: EH complete
[11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[11997.391378] sd 6:0:0:0: [sdb] Write Protect is off
[11997.391380] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FU

/var/log/messages:
Aug 30 20:12:43 isis kernel: [11285.918635] ata9: hard resetting link
Aug 30 20:12:43 isis kernel: [11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:12:43 isis kernel: [11286.460065] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:12:43 isis kernel: [11286.520054] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:12:43 isis kernel: [11286.520059] ata9.00: configured for UDMA/133
Aug 30 20:12:43 isis kernel: [11286.520077] ata9: EH complete
Aug 30 20:12:43 isis kernel: [11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:12:43 isis kernel: [11286.520132] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:12:43 isis kernel: [11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:13:24 isis kernel: [11326.988623] ata8: hard resetting link
Aug 30 20:13:24 isis kernel: [11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:13:24 isis kernel: [11327.580053] ata8.00: max_sectors limited to 256 for NCQ
Aug 30 20:13:24 isis kernel: [11327.657199] ata8.00: max_sectors limited to 256 for NCQ
Aug 30 20:13:24 isis kernel: [11327.657202] ata8.00: configured for UDMA/133
Aug 30 20:13:24 isis kernel: [11327.657207] ata8: EH complete
Aug 30 20:13:24 isis kernel: [11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:13:24 isis kernel: [11327.657272] sd 7:0:0:0: [sdc] Write Protect is off
Aug 30 20:13:24 isis kernel: [11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:14:15 isis kernel: [11377.938624] ata7: hard resetting link
Aug 30 20:14:15 isis kernel: [11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:14:15 isis kernel: [11378.520056] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:14:15 isis kernel: [11378.600065] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:14:15 isis kernel: [11378.600068] ata7.00: configured for UDMA/133
Aug 30 20:14:15 isis kernel: [11378.600073] ata7: EH complete
Aug 30 20:14:15 isis kernel: [11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:14:15 isis kernel: [11378.600133] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:14:15 isis kernel: [11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:19:48 isis kernel: [11711.718616] ata9: hard resetting link
Aug 30 20:19:49 isis kernel: [11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:19:49 isis kernel: [11712.260058] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:19:49 isis kernel: [11712.320057] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:19:49 isis kernel: [11712.320066] ata9.00: configured for UDMA/133
Aug 30 20:19:49 isis kernel: [11712.320072] ata9: EH complete
Aug 30 20:19:49 isis kernel: [11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:19:49 isis kernel: [11712.320125] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:19:49 isis kernel: [11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:22:06 isis kernel: [11849.328617] ata7: hard resetting link
Aug 30 20:22:06 isis kernel: [11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:22:06 isis kernel: [11849.910070] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:22:07 isis kernel: [11849.990053] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:22:07 isis kernel: [11849.990057] ata7.00: configured for UDMA/133
Aug 30 20:22:07 isis kernel: [11849.990069] ata7: EH complete
Aug 30 20:22:07 isis kernel: [11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:22:07 isis kernel: [11849.990123] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:22:07 isis kernel: [11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:23:06 isis kernel: [11909.629865] ata9: hard resetting link
Aug 30 20:23:07 isis kernel: [11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:23:07 isis kernel: [11910.180068] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:23:07 isis kernel: [11910.231316] ata9.00: max_sectors limited to 256 for NCQ
Aug 30 20:23:07 isis kernel: [11910.231319] ata9.00: configured for UDMA/133
Aug 30 20:23:07 isis kernel: [11910.231327] ata9: EH complete
Aug 30 20:23:07 isis kernel: [11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:23:07 isis kernel: [11910.231394] sd 8:0:0:0: [sdd] Write Protect is off
Aug 30 20:23:07 isis kernel: [11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 30 20:24:33 isis kernel: [11996.729865] ata7: hard resetting link
Aug 30 20:24:34 isis kernel: [11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 30 20:24:34 isis kernel: [11997.311308] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:24:34 isis kernel: [11997.391306] ata7.00: max_sectors limited to 256 for NCQ
Aug 30 20:24:34 isis kernel: [11997.391316] ata7.00: configured for UDMA/133
Aug 30 20:24:34 isis kernel: [11997.391322] ata7: EH complete
Aug 30 20:24:34 isis kernel: [11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
Aug 30 20:24:34 isis kernel: [11997.391378] sd 6:0:0:0: [sdb] Write Protect is off
Aug 30 20:24:34 isis kernel: [11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

I've replaced the card and cables and i'm still getting the issue.

This card&raid was working on a RHEL last week (2.6.18 32bit).
Replaced OS (ubuntu 64bit), cpu (core2duo), mobo (asus p5k pro)

I'm really at a loss here, not sure what else to do. I stressed the other components of the system in windows and they seemed fine. not sure if its the card or something with the newer kernels.

Comment 30 Brian Rademacher 2008-11-10 02:36:56 UTC
I think this problem tends to get ignored because there are so many things that can cause it (bad drives, cables, power supplies, or any combination thereof)..

Even with this bug, you can see that in my case disabling write caching solves the problem (not a great solution mind you, but a workaround for now), yet didn't help Justin.

BTW, disabling write caching under the new kernel works for me, as with the older kernel.

It seems that the one thing we do have in common is a larger than average number of drives in RAID.  I have the least at 5, you have 7, and Justin 10 I believe...When I had 4, it was difficult to get this problem to show except for under heavy IO.  With 5, I can simply boot...

Comment 31 Alan Cox 2008-11-15 13:30:27 UTC
The write cache hack around is really only relevant to that specific type of drive (and at this point appears to be a bug in the drive itself)

Comment 32 Brian Rademacher 2008-11-15 23:22:11 UTC
If it were a bug in the drive itself, wouldn't it show under most all write conditions/kernels?  I never even saw this under a 4 drive RAID 5 until later kernel revisions.  It was completely stable otherwise.  Adding the 5th disk is what sent it over the edge with any kernel...

Comment 33 q 2008-11-16 01:06:31 UTC
Not sure if you took the time to read my post on the ubuntu bug tracker, but i'm getting the errors on both WDC and seagate drives.

giving a thread back in september about this on the linux-kernel mailing list and another reference to the MV88SX6081 8-port SATA II PCI-X Controller (super micro AOC-SAT2-MV8) i was leaning towards that being the cause...

Comment 34 Brian Rademacher 2008-11-16 01:11:01 UTC
That is another possibility (the 88SX6081 controller), although that isn't what Justin is using.  Justin's problem seems hard to create, whereas mine and yours is hard to avoid (based on your "...its now gotten so bad that
its desync'd my raid on a transfer...")  Could be two different issues, but glad you see it with different drives...

Comment 35 Brian Rademacher 2008-11-16 02:43:41 UTC
Just tested 2.6.27.5-37.fc9.x86_64 and same thing...

Comment 36 Alan Cox 2008-11-16 15:00:16 UTC
"If it were a bug in the drive itself, wouldn't it show under most all write
conditions/kernels"

From past experience of drive firmware funnies probably not. If they were simple to cause the vendor would have discovered them before shipping product.

Comment 37 Alan Cox 2008-11-16 15:01:34 UTC
Also btw I don't see any reason to believe the various bugs muddled together here are at all connected..

Comment 38 Brian Rademacher 2008-11-23 03:37:13 UTC
Searching on the controller and "frozen", I found an interesting comment from Mark Lord, where he said this in response to freezing issues with the Marvell controller:

"My recollection is that the worst errata are for the 60x1 chips on PCI-X."

(which happens to be my situation)

He also mentioned that he was going to be resuming work on sata_mv as of October 28th.  Original post here:  http://webui.sourcelabs.com/kernel/issues/10321

Can someone who knows him point him in this direction while he is working on incorporating errata into the driver?  I'd hate to miss out on an opportunity to get this resolved!

Comment 39 Brian Rademacher 2008-11-23 04:02:51 UTC
I think I found his email address (at least it didn't bounce yet), so we'll see...

Comment 40 Brian Rademacher 2008-11-29 23:27:40 UTC
I did a clean install of F10 and still see the same problem.  It also has the same solution of disabling write cache.

I see this under F10/ext4 now though:
kernel: JBD: barrier-based sync failed on md3:8 - disabling barriers

So I disabled them in fstab for now.  Not to mix that in with this bug though..I'm sure that is likely something else...

Comment 41 q 2008-12-01 07:49:33 UTC
Pretty fed up with people saying this could be so many different issues. So much so that i finally decided to risk my data to prove it.... read the following.

***___This has got to be the card / chipset / sata_mv driver._____***

Short and simple version of my issues:
    - This does not depend on drive types
    - Appears to be caused by MV88SX6081 chipset
    - Could be a problem in SATA_MV driver
    - I need replacement controller suggestions

Details to all non believers (it’s not a power / hardware issue):
I moved 5 of the 7 drives to my onboard controller (have 6 sata ports on the mobo, last was used by the system drive).
Left 2 of the western digital drives on the MV88SX6081 8-port SATA II:
    - sdg	
    - sdh

After the advice of some through email, I unplugged everything that wasn't needed. They assumed that it could have been power giving the number of drives I had in the machine. What was left on a tx750w corsair power supply:
    - mobo (c2d, 4gb ram)
    - 7 sata raid drives - spread across multiple power supply rails
    - 1 sata system drive
    - Super Micro SAT2-MV8 (MV88SX6081 8-port SATA II)
    - intel pcie 10/100/1000 network card

Then I replaced the sate cables 1 more time with old cables I knew worked. I also threw in the brand new controller card as well (have a few spares lying around).
I brought everything up and upgraded to:


Then I started to rebuild the raid. Everything went fine, no freezes.
**This was the first indication that this only happens under heavy load on multiple ports as has been brought up before.
So then I started copying data over. About 180GB's the card hard reset both of the drives attached to it and knocked them both out of the raid.
**This was also significantly different from before when I was utilizing all the ports as it seemed to work great for quite some time, it wasn't until I was well into the process that the card finally gave up.
See the attached dmesg and /var/log/messages. This is the 2nd time I’ve had this card degrade my raid and almost give me a heart attack.

The cards are going in the trash at this point. I'm open to suggestions as to possibly replacement. I don’t need a hardware raid card, just a decent controller with great *nix support and lots of ports.
::sigh:: I don’t know who to contact but this is the end of the line for me with this controller and hopefully my issues.

Attempting to get my data back as we speak with 2 failed drives in a raid 5... wonderful times. 

dmsg of the event:
[ 1061.040118] md: recovery of RAID array md1
[ 1061.040120] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[ 1061.040122] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[ 1061.040126] md: using 128k window, over a total of 488383744 blocks.
[11208.852220] md: md1: recovery done.
[11209.020072] RAID5 conf printout:
[11209.020076]  --- rd:7 wd:7
[11209.020079]  disk 0, o:1, dev:sdd1
[11209.020080]  disk 1, o:1, dev:sdb1
[11209.020081]  disk 2, o:1, dev:sdh1
[11209.020082]  disk 3, o:1, dev:sdc1
[11209.020083]  disk 4, o:1, dev:sdf1
[11209.020084]  disk 5, o:1, dev:sde1
[11209.020085]  disk 6, o:1, dev:sdg1
[19844.431690] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
[19844.433148] SGI XFS Quota Management subsystem
[19844.442507] Filesystem "md1": Disabling barriers, trial barrier write failed
[19844.442658] XFS mounting filesystem md1
[19844.893398] Ending clean XFS mount for filesystem: md1
[27027.170016] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[27027.170041] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[27027.170041]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[27027.170083] ata5.00: status: { DRDY }
[27027.170099] ata5: hard resetting link
[27027.680034] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[27027.720050] ata5.00: max_sectors limited to 256 for NCQ
[27027.780047] ata5.00: max_sectors limited to 256 for NCQ
[27027.780050] ata5.00: configured for UDMA/133
[27027.780055] end_request: I/O error, dev sdg, sector 73
[27027.780073] md: super_written gets error=-5, uptodate=0
[27027.780076] raid5: Disk failure on sdg1, disabling device.
[27027.780077] raid5: Operation continuing on 6 devices.
[27027.780117] ata5: EH complete
[27027.780674] sd 4:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
[27027.780800] sd 4:0:0:0: [sdg] Write Protect is off
[27027.780803] sd 4:0:0:0: [sdg] Mode Sense: 00 3a 00 00
[27027.781038] sd 4:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[27057.930015] ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[27057.930039] ata12.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[27057.930040]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[27057.930081] ata12.00: status: { DRDY }
[27057.930098] ata12: hard resetting link
[27058.440033] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[27058.480049] ata12.00: max_sectors limited to 256 for NCQ
[27058.540047] ata12.00: max_sectors limited to 256 for NCQ
[27058.540050] ata12.00: configured for UDMA/133
[27058.540055] end_request: I/O error, dev sdh, sector 71
[27058.540072] md: super_written gets error=-5, uptodate=0
[27058.540075] raid5: Disk failure on sdh1, disabling device.
[27058.540076] raid5: Operation continuing on 5 devices.
[27058.540113] ata12: EH complete
[27058.540754] sd 11:0:0:0: [sdh] 976773168 512-byte hardware sectors (500108 MB)
[27058.540879] sd 11:0:0:0: [sdh] Write Protect is off
[27058.540882] sd 11:0:0:0: [sdh] Mode Sense: 00 3a 00 00
[27058.541070] sd 11:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[27058.584017] RAID5 conf printout:
[27058.584020]  --- rd:7 wd:5
[27058.584022]  disk 0, o:1, dev:sdd1
[27058.584023]  disk 1, o:1, dev:sdb1
[27058.584024]  disk 2, o:0, dev:sdh1
[27058.584025]  disk 3, o:1, dev:sdc1
[27058.584027]  disk 4, o:1, dev:sdf1
[27058.584028]  disk 5, o:1, dev:sde1
[27058.584029]  disk 6, o:0, dev:sdg1
[27061.521245] BUG: soft lockup - CPU#1 stuck for 61s! [smbd:28171]
[27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[27061.521251] CPU 1:
[27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
[27061.521251] Pid: 28171, comm: smbd Not tainted 2.6.27-9-server #1
[27061.521251] RIP: 0010:[<ffffffff802abf0c>]  [<ffffffff802abf0c>] find_get_pages+0x6c/0x110
[27061.521251] RSP: 0018:ffff880129453358  EFLAGS: 00000246
[27061.521251] RAX: ffff880128d89330 RBX: ffff880129453398 RCX: 0000000000000002
[27061.521251] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffe200022e9e80
[27061.521251] RBP: ffff880129453308 R08: ffffe200009df6c8 R09: 0000000000000005
[27061.521251] R10: 0000000000000037 R11: 00000000001c5778 R12: ffffffff802b6b29
[27061.521251] R13: ffff880123a107d0 R14: ffffe20001c6f6c0 R15: 0000000000000286
[27061.521251] FS:  00007fb72cdf6700(0000) GS:ffff88012fc02980(0000) knlGS:0000000000000000
[27061.521251] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[27061.521251] CR2: 00007f1648629000 CR3: 000000012956d000 CR4: 00000000000006e0
[27061.521251] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[27061.521251] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[27061.521251]
[27061.521251] Call Trace:
[27061.521251]  [<ffffffff802abee3>] ? find_get_pages+0x43/0x110
[27061.521251]  [<ffffffff802b6984>] ? pagevec_lookup+0x24/0x30
[27061.521251]  [<ffffffffa04e100d>] ? xfs_cluster_write+0xad/0x180 [xfs]
[27061.521251]  [<ffffffffa04e1578>] ? xfs_page_state_convert+0x498/0x760 [xfs]
[27061.521251]  [<ffffffffa04e19a1>] ? xfs_vm_writepage+0x71/0x120 [xfs]
[27061.521251]  [<ffffffff802b9274>] ? pageout+0x124/0x270
[27061.521251]  [<ffffffff802ab06a>] ? page_waitqueue+0xa/0x90
[27061.521251]  [<ffffffff802b986d>] ? shrink_page_list+0x34d/0x530
[27061.521251]  [<ffffffff802b8e49>] ? __isolate_lru_page+0x79/0xb0
[27061.521251]  [<ffffffff802b8f0a>] ? isolate_lru_pages+0x8a/0x220
[27061.521251]  [<ffffffff802b9bf2>] ? shrink_inactive_list+0x1a2/0x4b0
[27061.521251]  [<ffffffff802b9f7b>] ? shrink_zone+0x7b/0x160
[27061.521251]  [<ffffffff802ba0ed>] ? shrink_zones+0x8d/0x150
[27061.521251]  [<ffffffff802ba236>] ? do_try_to_free_pages+0x86/0x2e0
[27061.521251]  [<ffffffff802ba587>] ? try_to_free_pages+0x67/0x70
[27061.521251]  [<ffffffff802b90a0>] ? isolate_pages_global+0x0/0x50
[27061.521251]  [<ffffffff802b28b1>] ? __alloc_pages_internal+0x241/0x510
[27061.521251]  [<ffffffff802d565d>] ? alloc_pages_current+0xad/0x110
[27061.521251]  [<ffffffff802ac477>] ? __page_cache_alloc+0x67/0x80
[27061.521251]  [<ffffffff802ad0b3>] ? __grab_cache_page+0x63/0xb0
[27061.521251]  [<ffffffff80316a59>] ? block_write_begin+0x89/0xf0
[27061.521251]  [<ffffffffa04e04ca>] ? xfs_vm_write_begin+0x2a/0x30 [xfs]
[27061.521251]  [<ffffffffa04e0040>] ? xfs_get_blocks+0x0/0x20 [xfs]
[27061.521251]  [<ffffffff802ab7ac>] ? generic_perform_write+0xbc/0x1c0
[27061.521251]  [<ffffffff802ad512>] ? generic_file_buffered_write+0x92/0x170
[27061.521251]  [<ffffffffa04e92d3>] ? xfs_write+0x6b3/0x9b0 [xfs]
[27061.521251]  [<ffffffff80385a69>] ? apparmor_socket_recvmsg+0x19/0x20
[27061.521251]  [<ffffffff803aaf70>] ? memset_c+0x20/0x30
[27061.521251]  [<ffffffffa04e4c88>] ? xfs_file_aio_write+0x58/0x60 [xfs]
[27061.521251]  [<ffffffff802e9559>] ? do_sync_write+0xf9/0x140
[27061.521251]  [<ffffffff802e9699>] ? do_sync_read+0xf9/0x140
[27061.521251]  [<ffffffff80266fb0>] ? autoremove_wake_function+0x0/0x40
[27061.521251]  [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0
[27061.521251]  [<ffffffff80386948>] ? apparmor_file_permission+0x28/0x30
[27061.521251]  [<ffffffff803613e6>] ? security_file_permission+0x16/0x20
[27061.521251]  [<ffffffff802e9c1b>] ? vfs_write+0xcb/0x130
[27061.521251]  [<ffffffff802e9d1a>] ? sys_pwrite64+0x9a/0xa0
[27061.521251]  [<ffffffff8021285a>] ? system_call_fastpath+0x16/0x1b
[27061.521251]
[27095.080066] RAID5 conf printout:
[27095.080071]  --- rd:7 wd:5
[27095.080074]  disk 0, o:1, dev:sdd1
[27095.080076]  disk 1, o:1, dev:sdb1
[27095.080077]  disk 2, o:0, dev:sdh1
[27095.080079]  disk 3, o:1, dev:sdc1
[27095.080080]  disk 4, o:1, dev:sdf1
[27095.080082]  disk 5, o:1, dev:sde1
[27095.080090] RAID5 conf printout:
[27095.080091]  --- rd:7 wd:5
[27095.080092]  disk 0, o:1, dev:sdd1
[27095.080093]  disk 1, o:1, dev:sdb1
[27095.080094]  disk 2, o:0, dev:sdh1
[27095.080095]  disk 3, o:1, dev:sdc1
[27095.080097]  disk 4, o:1, dev:sdf1
[27095.080098]  disk 5, o:1, dev:sde1
[27095.140011] RAID5 conf printout:
[27095.140017]  --- rd:7 wd:5
[27095.140019]  disk 0, o:1, dev:sdd1
[27095.140022]  disk 1, o:1, dev:sdb1
[27095.140024]  disk 3, o:1, dev:sdc1
[27095.140026]  disk 4, o:1, dev:sdf1
[27095.140027]  disk 5, o:1, dev:sde1
[27095.140511] Buffer I/O error on device md1, logical block 455870845
[27095.140545] lost page write due to I/O error on md1
[27095.140550] Buffer I/O error on device md1, logical block 455870846
[27095.140567] lost page write due to I/O error on md1
[27095.140569] Buffer I/O error on device md1, logical block 455870847
[27095.140585] lost page write due to I/O error on md1
[27095.140587] Buffer I/O error on device md1, logical block 455870848
[27095.140604] lost page write due to I/O error on md1
[27095.140606] Buffer I/O error on device md1, logical block 455870849
[27095.140622] lost page write due to I/O error on md1
[27095.140624] Buffer I/O error on device md1, logical block 455870850
[27095.140641] lost page write due to I/O error on md1
[27095.140642] Buffer I/O error on device md1, logical block 455870851
[27095.140659] lost page write due to I/O error on md1
[27095.140661] Buffer I/O error on device md1, logical block 455870852
[27095.140677] lost page write due to I/O error on md1
[27095.140679] Buffer I/O error on device md1, logical block 455870853
[27095.140696] lost page write due to I/O error on md1
[27095.140697] Buffer I/O error on device md1, logical block 455870854
[27095.140714] lost page write due to I/O error on md1
[27095.141327] I/O error in filesystem ("md1") meta-data dev md1 block 0xaeaa9810       ("xlog_iodone") error 5 buf count 12288
[27095.141359] xfs_force_shutdown(md1,0x2) called from line 1056 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_log.c.  Return address = 0xffffffffa04c80d3
[27095.141380] Filesystem "md1": Log I/O Error Detected.  Shutting down filesystem: md1
[27095.141407] Please umount the filesystem, and rectify the problem(s)
[27100.140015] Filesystem "md1": xfs_log_force: error 5 returned.
[27113.440011] Filesystem "md1": xfs_log_force: error 5 returned.
[27143.440010] Filesystem "md1": xfs_log_force: error 5 returned.
[27173.440009] Filesystem "md1": xfs_log_force: error 5 returned.
[27203.440012] Filesystem "md1": xfs_log_force: error 5 returned.


/var/log/messages:
Nov 30 18:39:24 isis kernel: [ 1061.040118] md: recovery of RAID array md1
Nov 30 18:39:24 isis kernel: [ 1061.040120] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
Nov 30 18:39:24 isis kernel: [ 1061.040122] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Nov 30 18:39:24 isis kernel: [ 1061.040126] md: using 128k window, over a total of 488383744 blocks.
Nov 30 19:02:08 isis -- MARK --
Nov 30 19:22:08 isis -- MARK --
Nov 30 19:42:08 isis -- MARK --
Nov 30 20:02:08 isis -- MARK --
Nov 30 20:22:08 isis -- MARK --
Nov 30 20:42:08 isis -- MARK --
Nov 30 21:02:08 isis -- MARK --
Nov 30 21:22:08 isis -- MARK --
Nov 30 21:28:32 isis kernel: [11208.852220] md: md1: recovery done.
Nov 30 21:28:32 isis kernel: [11209.020072] RAID5 conf printout:
Nov 30 21:28:32 isis kernel: [11209.020076]  --- rd:7 wd:7
Nov 30 21:28:32 isis kernel: [11209.020079]  disk 0, o:1, dev:sdd1
Nov 30 21:28:32 isis kernel: [11209.020080]  disk 1, o:1, dev:sdb1
Nov 30 21:28:32 isis kernel: [11209.020081]  disk 2, o:1, dev:sdh1
Nov 30 21:28:32 isis kernel: [11209.020082]  disk 3, o:1, dev:sdc1
Nov 30 21:28:32 isis kernel: [11209.020083]  disk 4, o:1, dev:sdf1
Nov 30 21:28:32 isis kernel: [11209.020084]  disk 5, o:1, dev:sde1
Nov 30 21:28:32 isis kernel: [11209.020085]  disk 6, o:1, dev:sdg1
Nov 30 21:42:08 isis -- MARK --
Nov 30 22:02:08 isis -- MARK --
Nov 30 22:22:08 isis -- MARK --
Nov 30 22:42:08 isis -- MARK --
Nov 30 23:02:08 isis -- MARK --
Nov 30 23:22:08 isis -- MARK --
Nov 30 23:42:08 isis -- MARK --
Nov 30 23:52:27 isis kernel: [19844.431690] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
Nov 30 23:52:27 isis kernel: [19844.433148] SGI XFS Quota Management subsystem
Nov 30 23:52:27 isis kernel: [19844.442507] Filesystem "md1": Disabling barriers, trial barrier write failed
Nov 30 23:52:27 isis kernel: [19844.442658] XFS mounting filesystem md1
Dec  1 00:22:08 isis -- MARK --
Dec  1 00:42:08 isis -- MARK --
Dec  1 01:02:08 isis -- MARK --
Dec  1 01:22:08 isis -- MARK --
Dec  1 01:42:08 isis -- MARK --
Dec  1 01:52:10 isis kernel: [27027.170099] ata5: hard resetting link
Dec  1 01:52:10 isis kernel: [27027.680034] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec  1 01:52:11 isis kernel: [27027.720050] ata5.00: max_sectors limited to 256 for NCQ
Dec  1 01:52:11 isis kernel: [27027.780047] ata5.00: max_sectors limited to 256 for NCQ
Dec  1 01:52:11 isis kernel: [27027.780050] ata5.00: configured for UDMA/133
Dec  1 01:52:11 isis kernel: [27027.780073] md: super_written gets error=-5, uptodate=0
Dec  1 01:52:11 isis kernel: [27027.780117] ata5: EH complete
Dec  1 01:52:11 isis kernel: [27027.780674] sd 4:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
Dec  1 01:52:11 isis kernel: [27027.780800] sd 4:0:0:0: [sdg] Write Protect is off
Dec  1 01:52:11 isis kernel: [27027.781038] sd 4:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec  1 01:52:41 isis kernel: [27057.930098] ata12: hard resetting link
Dec  1 01:52:41 isis kernel: [27058.440033] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec  1 01:52:41 isis kernel: [27058.480049] ata12.00: max_sectors limited to 256 for NCQ
Dec  1 01:52:41 isis kernel: [27058.540047] ata12.00: max_sectors limited to 256 for NCQ
Dec  1 01:52:41 isis kernel: [27058.540050] ata12.00: configured for UDMA/133
Dec  1 01:52:41 isis kernel: [27058.540072] md: super_written gets error=-5, uptodate=0
Dec  1 01:52:41 isis kernel: [27058.540113] ata12: EH complete
Dec  1 01:52:41 isis kernel: [27058.540754] sd 11:0:0:0: [sdh] 976773168 512-byte hardware sectors (500108 MB)
Dec  1 01:52:41 isis kernel: [27058.540879] sd 11:0:0:0: [sdh] Write Protect is off
Dec  1 01:52:41 isis kernel: [27058.541070] sd 11:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Dec  1 01:52:41 isis kernel: [27058.584017] RAID5 conf printout:
Dec  1 01:52:41 isis kernel: [27058.584020]  --- rd:7 wd:5
Dec  1 01:52:41 isis kernel: [27058.584022]  disk 0, o:1, dev:sdd1
Dec  1 01:52:41 isis kernel: [27058.584023]  disk 1, o:1, dev:sdb1
Dec  1 01:52:41 isis kernel: [27058.584024]  disk 2, o:0, dev:sdh1
Dec  1 01:52:41 isis kernel: [27058.584025]  disk 3, o:1, dev:sdc1
Dec  1 01:52:41 isis kernel: [27058.584027]  disk 4, o:1, dev:sdf1
Dec  1 01:52:41 isis kernel: [27058.584028]  disk 5, o:1, dev:sde1
Dec  1 01:52:41 isis kernel: [27058.584029]  disk 6, o:0, dev:sdg1
Dec  1 01:52:44 isis kernel: [27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
Dec  1 01:52:44 isis kernel: [27061.521251] CPU 1:
Dec  1 01:52:44 isis kernel: [27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
Dec  1 01:52:44 isis kernel: [27061.521251] Pid: 28171, comm: smbd Not tainted 2.6.27-9-server #1
Dec  1 01:52:44 isis kernel: [27061.521251] RIP: 0010:[<ffffffff802abf0c>]  [<ffffffff802abf0c>] find_get_pages+0x6c/0x110
Dec  1 01:52:44 isis kernel: [27061.521251] RSP: 0018:ffff880129453358  EFLAGS: 00000246
Dec  1 01:52:44 isis kernel: [27061.521251] RAX: ffff880128d89330 RBX: ffff880129453398 RCX: 0000000000000002
Dec  1 01:52:44 isis kernel: [27061.521251] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffe200022e9e80
Dec  1 01:52:44 isis kernel: [27061.521251] RBP: ffff880129453308 R08: ffffe200009df6c8 R09: 0000000000000005
Dec  1 01:52:44 isis kernel: [27061.521251] R10: 0000000000000037 R11: 00000000001c5778 R12: ffffffff802b6b29
Dec  1 01:52:44 isis kernel: [27061.521251] R13: ffff880123a107d0 R14: ffffe20001c6f6c0 R15: 0000000000000286
Dec  1 01:52:44 isis kernel: [27061.521251] FS:  00007fb72cdf6700(0000) GS:ffff88012fc02980(0000) knlGS:0000000000000000
Dec  1 01:52:44 isis kernel: [27061.521251] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec  1 01:52:44 isis kernel: [27061.521251] CR2: 00007f1648629000 CR3: 000000012956d000 CR4: 00000000000006e0
Dec  1 01:52:44 isis kernel: [27061.521251] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec  1 01:52:44 isis kernel: [27061.521251] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec  1 01:52:44 isis kernel: [27061.521251]
Dec  1 01:52:44 isis kernel: [27061.521251] Call Trace:
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802abee3>] ? find_get_pages+0x43/0x110
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b6984>] ? pagevec_lookup+0x24/0x30
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffffa04e100d>] ? xfs_cluster_write+0xad/0x180 [xfs]
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffffa04e1578>] ? xfs_page_state_convert+0x498/0x760 [xfs]
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffffa04e19a1>] ? xfs_vm_writepage+0x71/0x120 [xfs]
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b9274>] ? pageout+0x124/0x270
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ab06a>] ? page_waitqueue+0xa/0x90
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b986d>] ? shrink_page_list+0x34d/0x530
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b8e49>] ? __isolate_lru_page+0x79/0xb0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b8f0a>] ? isolate_lru_pages+0x8a/0x220
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b9bf2>] ? shrink_inactive_list+0x1a2/0x4b0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b9f7b>] ? shrink_zone+0x7b/0x160
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ba0ed>] ? shrink_zones+0x8d/0x150
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ba236>] ? do_try_to_free_pages+0x86/0x2e0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ba587>] ? try_to_free_pages+0x67/0x70
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b90a0>] ? isolate_pages_global+0x0/0x50
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802b28b1>] ? __alloc_pages_internal+0x241/0x510
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802d565d>] ? alloc_pages_current+0xad/0x110
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ac477>] ? __page_cache_alloc+0x67/0x80
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ad0b3>] ? __grab_cache_page+0x63/0xb0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff80316a59>] ? block_write_begin+0x89/0xf0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffffa04e04ca>] ? xfs_vm_write_begin+0x2a/0x30 [xfs]
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffffa04e0040>] ? xfs_get_blocks+0x0/0x20 [xfs]
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ab7ac>] ? generic_perform_write+0xbc/0x1c0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802ad512>] ? generic_file_buffered_write+0x92/0x170
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffffa04e92d3>] ? xfs_write+0x6b3/0x9b0 [xfs]
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff80385a69>] ? apparmor_socket_recvmsg+0x19/0x20
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff803aaf70>] ? memset_c+0x20/0x30
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffffa04e4c88>] ? xfs_file_aio_write+0x58/0x60 [xfs]
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802e9559>] ? do_sync_write+0xf9/0x140
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802e9699>] ? do_sync_read+0xf9/0x140
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff80266fb0>] ? autoremove_wake_function+0x0/0x40
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff80386948>] ? apparmor_file_permission+0x28/0x30
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff803613e6>] ? security_file_permission+0x16/0x20
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802e9c1b>] ? vfs_write+0xcb/0x130
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff802e9d1a>] ? sys_pwrite64+0x9a/0xa0
Dec  1 01:52:44 isis kernel: [27061.521251]  [<ffffffff8021285a>] ? system_call_fastpath+0x16/0x1b
Dec  1 01:52:44 isis kernel: [27061.521251]
Dec  1 01:53:18 isis kernel: [27095.080066] RAID5 conf printout:
Dec  1 01:53:18 isis kernel: [27095.080071]  --- rd:7 wd:5
Dec  1 01:53:18 isis kernel: [27095.080074]  disk 0, o:1, dev:sdd1
Dec  1 01:53:18 isis kernel: [27095.080076]  disk 1, o:1, dev:sdb1
Dec  1 01:53:18 isis kernel: [27095.080077]  disk 2, o:0, dev:sdh1
Dec  1 01:53:18 isis kernel: [27095.080079]  disk 3, o:1, dev:sdc1
Dec  1 01:53:18 isis kernel: [27095.080080]  disk 4, o:1, dev:sdf1
Dec  1 01:53:18 isis kernel: [27095.080082]  disk 5, o:1, dev:sde1
Dec  1 01:53:18 isis kernel: [27095.080090] RAID5 conf printout:
Dec  1 01:53:18 isis kernel: [27095.080091]  --- rd:7 wd:5
Dec  1 01:53:18 isis kernel: [27095.080092]  disk 0, o:1, dev:sdd1
Dec  1 01:53:18 isis kernel: [27095.080093]  disk 1, o:1, dev:sdb1
Dec  1 01:53:18 isis kernel: [27095.080094]  disk 2, o:0, dev:sdh1
Dec  1 01:53:18 isis kernel: [27095.080095]  disk 3, o:1, dev:sdc1
Dec  1 01:53:18 isis kernel: [27095.080097]  disk 4, o:1, dev:sdf1
Dec  1 01:53:18 isis kernel: [27095.080098]  disk 5, o:1, dev:sde1
Dec  1 01:53:18 isis kernel: [27095.140011] RAID5 conf printout:
Dec  1 01:53:18 isis kernel: [27095.140017]  --- rd:7 wd:5
Dec  1 01:53:18 isis kernel: [27095.140019]  disk 0, o:1, dev:sdd1
Dec  1 01:53:18 isis kernel: [27095.140022]  disk 1, o:1, dev:sdb1
Dec  1 01:53:18 isis kernel: [27095.140024]  disk 3, o:1, dev:sdc1
Dec  1 01:53:18 isis kernel: [27095.140026]  disk 4, o:1, dev:sdf1
Dec  1 01:53:18 isis kernel: [27095.140027]  disk 5, o:1, dev:sde1
Dec  1 01:53:18 isis kernel: [27095.140545] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140567] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140585] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140604] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140622] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140641] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140659] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140677] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140696] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.140714] lost page write due to I/O error on md1
Dec  1 01:53:18 isis kernel: [27095.141359] xfs_force_shutdown(md1,0x2) called from line 1056 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_log.c.  Return address = 0xffffffffa04c80d3
Dec  1 01:53:23 isis kernel: [27100.140015] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:53:36 isis kernel: [27113.440011] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:54:06 isis kernel: [27143.440010] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:54:36 isis kernel: [27173.440009] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:55:06 isis kernel: [27203.440012] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:55:36 isis kernel: [27233.440011] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:56:06 isis kernel: [27263.440011] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:56:36 isis kernel: [27293.440010] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:57:06 isis kernel: [27323.440016] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:57:36 isis kernel: [27353.440015] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:58:06 isis kernel: [27383.440015] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 01:58:36 isis kernel: [27413.440016] Filesystem "md1": xfs_log_force: error 5 returned.
^^^^^^continues this for a while
Dec  1 02:12:06 isis kernel: [28223.440015] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:12:36 isis kernel: [28253.440013] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:13:06 isis kernel: [28283.440014] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:13:36 isis kernel: [28313.440013] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:06 isis kernel: [28343.440013] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:36 isis kernel: [28373.440012] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28395.820448] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28395.820456] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28395.820462] xfs_force_shutdown(md1,0x1) called from line 420 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_rw.c.  Return address = 0xffffffffa04decc3
Dec  1 02:14:59 isis kernel: [28395.820466] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28395.820468] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28395.820471] xfs_force_shutdown(md1,0x1) called from line 420 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_rw.c.  Return address = 0xffffffffa04decc3
Dec  1 02:14:59 isis kernel: [28396.669470] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28396.669487] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28396.669517] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28396.669525] Filesystem "md1": xfs_log_force: error 5 returned.
Dec  1 02:14:59 isis kernel: [28396.669635] Filesystem "md1": xfs_log_force: error 5 returned.

Comment 42 q 2008-12-01 08:02:50 UTC
sorry, upgraded to:
Linux isis 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64 GNU/Linux
yes... still using ubuntu.. realize this isn't a strict redhat issues, but hope this is shedding some light on other peoples problems here.

Comment 43 Justin Piszcz 2008-12-01 15:44:35 UTC
I gave up as well and bought a 3ware controller.  I will install it today or tomorrow.  One thing I noticed though is when I disabled all the smart tests and hddtemp daemons and anything else that queries the disk regularly (besides just having smart 'monitor' the statistics) I have not had a repeat event yet, but also, I had the same problem you did, two disks dropped out of my raid5 and everything went bye bye, I had most it backed up elsewhere but yeah I got sick of it too.

Justin.

Comment 44 Brian Rademacher 2008-12-06 02:01:11 UTC
I received an email from Mark Lord, who said that he would likely be implementing more Marvell errata before Christmas.  Don't know how long it would take to hit a Fedora update after that, but this is good news!

Still not sure about your problem Justin, but I hope the new controller works for ya...Hate to see all those good 10k drives go to waste (what do you use that thing for anyway?)

Comment 45 Justin Piszcz 2008-12-06 09:11:18 UTC
I am back on my raptor150s for now.  I just like/prefer fast disk/access time.

Comment 46 Brian Rademacher 2008-12-10 22:43:48 UTC
Disabling write caching on the drives apparently does not entirely resolve this issue.  I got it again last night:

ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata2.00: cmd 60/18:00:b3:f8:ba/00:00:00:00:00/40 tag 0 ncq 12288 in
ata2.00: status: { DRDY }
ata2: hard resetting link
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: max_sectors limited to 256 for NCQ
ata2.00: max_sectors limited to 256 for NCQ
ata2.00: configured for UDMA/133
ata2: EH complete

I'll take one a month over one every few minutes though.

We'll just have to see how Mark's errata implementation goes...

Comment 47 Justin Piszcz 2008-12-10 22:49:40 UTC
I replaced my (12) Velociraptors with (12) Raptor150s, not a single error.

I suggest (if you can) try other drives.

Comment 48 Rainer Traut 2008-12-28 21:02:27 UTC
I'm seeing the same errors on a Fujitsu Siemens Econel 50 server on EL5 U2 running kernel 2.6.18-92.1.22.el5.

There was running EL4 for two years without problem.

HW: Intel ICH6R in AHCI mode

Comment 49 jas 2009-01-06 16:54:44 UTC
My comment only applies indirectly ...

I'm running RHEL 4, kernel 2.6.9-67.0.15.EL and recently got:

Dec 28 06:31:02 forest kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen 
Dec 28 06:31:02 forest kernel: ata1.00: cmd ca/00:10:76:0c:43/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 out 
Dec 28 06:31:02 forest kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) 
Dec 28 06:31:09 forest kernel: ata1: port is slow to respond, please be patient (Status 0xd0) 
Dec 28 06:31:32 forest kernel: ata1: port failed to respond (30 secs, Status 0xd0) 
Dec 28 06:31:32 forest kernel: ata1: soft resetting port 
Dec 28 06:32:02 forest kernel: ata1.00: qc timeout (cmd 0xec) 
Dec 28 06:32:02 forest kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) 
Dec 28 06:32:02 forest kernel: ata1.00: revalidation failed (errno=-5) 
Dec 28 06:32:02 forest kernel: ata1: failed to recover some devices, retrying in 5 secs 
Dec 28 06:32:14 forest kernel: ata1: port is slow to respond, please be patient (Status 0xd0) 
Dec 28 06:32:37 forest kernel: ata1: port failed to respond (30 secs, Status 0xd0) 
Dec 28 06:32:37 forest kernel: ata1: soft resetting port 
Dec 28 06:33:07 forest kernel: ata1.00: qc timeout (cmd 0xec) 
Dec 28 06:33:07 forest kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) 
Dec 28 06:33:07 forest kernel: ata1.00: revalidation failed (errno=-5) 
Dec 28 06:33:07 forest kernel: ata1: failed to recover some devices, retrying in 5 secs 
Dec 28 06:33:19 forest kernel: ata1: port is slow to respond, please be patient (Status 0xd0) 
Dec 28 06:33:42 forest kernel: ata1: port failed to respond (30 secs, Status 0xd0) 
Dec 28 06:33:42 forest kernel: ata1: soft resetting port 
Dec 28 06:34:13 forest kernel: ata1.00: qc timeout (cmd 0xec) 
Dec 28 06:34:13 forest kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) 
Dec 28 06:34:13 forest kernel: ata1.00: revalidation failed (errno=-5) 
Dec 28 06:34:13 forest kernel: ata1.00: disabled 
Dec 28 06:34:13 forest kernel: ata1: EH complete 

This is just one disk, no RAID.

... since I rebooted on the 28th, everything has been fine.

I will receive a brand new disk today (the other one was almost new), perform a complete Seagate diagnostics on the disk, then replace the root disk, and do a complete diagnostics on the old disk, but I doubt it's the disk that's the problem here.

MB: Intel S5000PSL
ata1: SATA max UDMA/133 cmd 0x40C8 ctl 0x40E6 bmdma 0x40A0 irq 193
ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : ata_piix
  Vendor: ATA       Model: ST3250410AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0

This is just to say that the problem might apply to older kernels as well.

Comment 50 Alan Cox 2009-01-06 23:31:58 UTC
Your trace is fairly clear

The drive stops responding
We notice the timeout
It reports 0xD0 (busy)
We reset it
We ask it to identify
Its still wedged.

Difficult to see how that can be a kernel problem when the drive won't respond to a reset. Could be PSU - that has been an issue with some systems but could also be the drive firmware went castors up.

Comment 51 Mark Lord 2009-01-12 20:44:20 UTC
The original bug at the top of this report was fixed in 2.6.26.xx --> this was the mv_qc_defer() bug that Tejun found way back then.

The other reports also on this bug are for different problems, yet to be sorted out.  There do seem to be a number of "timeouts" reported here and elsewhere, with the ATA opcode often being an NCQ R/W ("FPDMA") command, or a "FLUSH_CACHE_EXT" command.

Apart from that, there's not a lot of useful information yet.  I need to see specific kernel versions (kernel.org, not vendor kernels), and knowing the exact drive models and PCI bus type (eg. is the 6081 card on a 133MHz/64-bit PCI-X slot, or a 33Mhz/32-bit PCI slot, or a ...).  These chips have a number of quirks that are specific to particular bus types.

Scream now, and you'll be heard!

-Mark

Comment 52 Brian Rademacher 2009-01-12 23:24:38 UTC
(room goes silent - Marvell owners bow down in the presence of Mark Lord)

Mine is all the same - Here are the last 3 errors I got:

Jan 11 14:12:56 radfiles kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Jan 11 14:12:56 radfiles kernel: ata2.00: cmd 61/08:00:cb:d5:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Jan 11 14:12:56 radfiles kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 11 14:12:56 radfiles kernel: ata2.00: status: { DRDY }
Jan 11 14:12:56 radfiles kernel: ata2: hard resetting link
Jan 11 14:12:56 radfiles kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 11 14:12:56 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ
Jan 11 14:12:56 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ
Jan 11 14:12:56 radfiles kernel: ata2.00: configured for UDMA/133
Jan 11 14:12:56 radfiles kernel: ata2: EH complete
Jan 11 14:12:56 radfiles kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB)
Jan 11 14:12:56 radfiles kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jan 11 14:12:56 radfiles kernel: sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA


Jan 11 14:15:02 radfiles kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Jan 11 14:15:02 radfiles kernel: ata2.00: cmd 61/08:00:cb:d5:42/00:00:25:00:00/40 tag 0 ncq 4096 out
Jan 11 14:15:02 radfiles kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 11 14:15:02 radfiles kernel: ata2.00: status: { DRDY }
Jan 11 14:15:02 radfiles kernel: ata2: hard resetting link
Jan 11 14:15:03 radfiles kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 11 14:15:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ
Jan 11 14:15:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ
Jan 11 14:15:03 radfiles kernel: ata2.00: configured for UDMA/133
Jan 11 14:15:03 radfiles kernel: ata2: EH complete
Jan 11 14:15:03 radfiles kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB)
Jan 11 14:15:03 radfiles kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jan 11 14:15:03 radfiles kernel: sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA


Jan 11 14:26:03 radfiles kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Jan 11 14:26:03 radfiles kernel: ata2.00: cmd 60/08:00:3b:aa:47/00:00:00:00:00/40 tag 0 ncq 4096 in
Jan 11 14:26:03 radfiles kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 11 14:26:03 radfiles kernel: ata2.00: status: { DRDY }
Jan 11 14:26:03 radfiles kernel: ata2: hard resetting link
Jan 11 14:26:03 radfiles kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jan 11 14:26:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ
Jan 11 14:26:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ
Jan 11 14:26:03 radfiles kernel: ata2.00: configured for UDMA/133
Jan 11 14:26:03 radfiles kernel: ata2: EH complete
Jan 11 14:26:03 radfiles kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB)
Jan 11 14:26:03 radfiles kernel: sd 1:0:0:0: [sdb] Write Protect is off
Jan 11 14:26:03 radfiles kernel: sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA


uname -a:
Linux radfiles.net 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
(is there something more I can do here to get you more specific information?)

lspci -vv:

00:02.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: fb000000-fcffffff
        Prefetchable memory behind bridge: e2000000-e20fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [b0] HyperTransport: Slave or Primary Interface
                Command: BaseUnitID=3 UnitCnt=1 MastHost- DefDir- DUL-
                Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
                Link Config 0: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut- LWI=8bit DwFcInEn- LWO=8bit DwFcOutEn-
                Link Control 1: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b-
                Link Config 1: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut- LWI=8bit DwFcInEn- LWO=8bit DwFcOutEn-
                Revision ID: 1.04
                Link Frequency 0: 200MHz
                Link Error 0: <Prot- <Ovfl- <EOC- CTLTm-
                Link Frequency Capability 0: 200MHz+ 300MHz+ 400MHz+ 500MHz- 600MHz- 800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend-
                Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD-
                Link Frequency 1: 200MHz
                Link Error 1: <Prot- <Ovfl- <EOC- CTLTm-
                Link Frequency Capability 1: 200MHz- 300MHz- 400MHz- 500MHz- 600MHz- 800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend-
                Error Handling: PFlE- OFlE- PFE- OFE- EOCFE- RFE- CRCFE- SERRFE- CF- RE- PNFE- ONFE- EOCNFE- RNFE- CRCNFE- SERRNFE-
                Prefetchable memory behind bridge Upper: 00-00
                Bus Number: 00
        Capabilities: [f0] HyperTransport: Interrupt Discovery and Configuration
        Kernel modules: shpchp

00:03.0 ISA bridge: ALi Corporation M1563 HyperTransport South Bridge (rev 20)
        Subsystem: Device 19d5:2203
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0 (250ns min, 6000ns max)

00:03.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU]
        Subsystem: Device 19d5:2203
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Kernel modules: alim7101_wdt

00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=32
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: fd000000-fd0fffff
        Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [a0] PCI-X bridge device
                Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz
                Status: Dev=00:0a.0 64bit+ 133MHz+ SCD- USC- SCO- SRD-
                Upstream: Capacity=14 CommitmentLimit=65535
                Downstream: Capacity=2 CommitmentLimit=65535
        Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration
        Capabilities: [c0] HyperTransport: Slave or Primary Interface
                !!! Possibly incomplete decoding
                Command: BaseUnitID=10 UnitCnt=2 MastHost- DefDir-
                Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0
                Link Config 0: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit
                Link Control 1: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0
                Link Config 1: MLWI=8bit MLWO=8bit LWI=8bit LWO=8bit
                Revision ID: 1.02
        Kernel modules: shpchp

00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) (prog-if 10 [IO-APIC])
        Subsystem: Device 19d5:2203
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 0: Memory at febfe000 (64-bit, non-prefetchable) [size=4K]

00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32
        Bus: primary=00, secondary=03, subordinate=03, sec-latency=32
        Memory behind bridge: fd100000-fd1fffff
        Prefetchable memory behind bridge: 00000000e2100000-00000000e21fffff
        Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [a0] PCI-X bridge device
                Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz
                Status: Dev=00:0b.0 64bit+ 133MHz+ SCD- USC- SCO- SRD-
                Upstream: Capacity=14 CommitmentLimit=65535
                Downstream: Capacity=2 CommitmentLimit=65535
        Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration
        Kernel modules: shpchp

00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) (prog-if 10 [IO-APIC])
        Subsystem: Device 19d5:2203
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 0: Memory at febff000 (64-bit, non-prefetchable) [size=4K]

00:0e.0 IDE interface: ALi Corporation M5229 IDE (rev c5) (prog-if fa)
        Subsystem: Device 19d5:2203
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32
        Interrupt: pin A routed to IRQ 19
        Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [disabled] [size=8]
        Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [disabled] [size=1]
        Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [disabled] [size=8]
        Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [disabled] [size=1]
        Region 4: I/O ports at f000 [size=16]
        Kernel driver in use: pata_ali
        Kernel modules: pata_ali, pata_acpi, ata_generic

00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Capabilities: [80] HyperTransport: Host or Secondary Interface
                !!! Possibly incomplete decoding
                Command: WarmRst+ DblEnd-
                Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0
                Link Config: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit
                Revision ID: 1.02
        Capabilities: [a0] HyperTransport: Host or Secondary Interface
                !!! Possibly incomplete decoding
                Command: WarmRst+ DblEnd-
                Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0
                Link Config: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit
                Revision ID: 1.02
        Capabilities: [c0] HyperTransport: Host or Secondary Interface
                !!! Possibly incomplete decoding
                Command: WarmRst+ DblEnd-
                Link Control: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0
                Link Config: MLWI=16bit MLWO=16bit LWI=N/C LWO=N/C
                Revision ID: 1.02

00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Kernel driver in use: k8temp
        Kernel modules: k8temp

00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Capabilities: [80] HyperTransport: Host or Secondary Interface
                !!! Possibly incomplete decoding
                Command: WarmRst+ DblEnd-
                Link Control: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0
                Link Config: MLWI=16bit MLWO=16bit LWI=N/C LWO=N/C
                Revision ID: 1.02
        Capabilities: [a0] HyperTransport: Host or Secondary Interface
                !!! Possibly incomplete decoding
                Command: WarmRst+ DblEnd-
                Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0
                Link Config: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit
                Revision ID: 1.02
        Capabilities: [c0] HyperTransport: Host or Secondary Interface
                !!! Possibly incomplete decoding
                Command: WarmRst+ DblEnd-
                Link Control: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0
                Link Config: MLWI=16bit MLWO=16bit LWI=N/C LWO=N/C
                Revision ID: 1.02

00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Kernel driver in use: k8temp
        Kernel modules: k8temp

01:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA controller])
        Subsystem: ATI Technologies Inc Rage XL
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32 (2000ns min), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 7
        Region 0: Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: I/O ports at d000 [size=256]
        Region 2: Memory at fc020000 (32-bit, non-prefetchable) [size=4K]
        [virtual] Expansion ROM at e2000000 [disabled] [size=128K]
        Capabilities: [5c] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Kernel modules: atyfb

02:03.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at e000 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=02:03.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
        Kernel driver in use: sata_mv
        Kernel modules: sata_mv

03:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
        Subsystem: ABIT Computer Corp. Device 2202
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32 (16000ns min), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 31
        Region 0: Memory at fd100000 (64-bit, non-prefetchable) [size=64K]
        Region 2: Memory at fd110000 (64-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at e2100000 [disabled] [size=64K]
        Capabilities: [40] PCI-X non-bridge device
                Command: DPERE- ERO+ RBC=512 OST=1
                Status: Dev=03:04.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data <?>
        Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8 Enable-
                Address: 24100073000144a4  Data: 10d0
        Kernel driver in use: tg3
        Kernel modules: tg3

03:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03)
        Subsystem: ABIT Computer Corp. Device 2202
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 32 (16000ns min), Cache Line Size: 32 bytes
        Interrupt: pin B routed to IRQ 28
        Region 0: Memory at fd120000 (64-bit, non-prefetchable) [size=64K]
        Region 2: Memory at fd130000 (64-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at e2110000 [disabled] [size=64K]
        Capabilities: [40] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=2048 OST=1
                Status: Dev=03:04.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data <?>
        Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8 Enable-
                Address: 2c02d024720c49a0  Data: 5103
        Kernel driver in use: tg3
        Kernel modules: tg3



(write caching forced off on all drives using hdparm)

/dev/sda:

 Model=ST3320620AS                             , FwRev=3.AAM   , SerialNo=            5QF3T3XP
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=disabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

/dev/sdb:

 Model=ST3320620AS                             , FwRev=3.AAM   , SerialNo=            5QF3V2C3
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=disabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

/dev/sdc:

 Model=ST3320620AS                             , FwRev=3.AAM   , SerialNo=            5QF3T3YM
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=disabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

/dev/sdd:

 Model=ST3320620AS                             , FwRev=3.AAM   , SerialNo=            5QF3RA0R
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=disabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

/dev/sde:

 Model=ST3320620AS                             , FwRev=3.AAM   , SerialNo=            9QFAH509
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=no WriteCache=disabled
 Drive conforms to: Unspecified:  ATA/ATAPI-1,2,3,4,5,6,7

/proc/mdstat:
md2 : active raid1 sdc2[0] sdd2[1]
      1052160 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sde1[4](S) sdd1[3] sdc1[2] sdb1[1]
      64128 blocks [4/4] [UUUU]

md1 : active raid1 sda2[0] sde2[2](S) sdb2[1]
      1052160 blocks [2/2] [UU]

md3 : active raid5 sda3[0] sde3[4] sdd3[3] sdc3[2] sdb3[1]
      1245807616 blocks level 5, 256k chunk, algorithm 2 [5/5] [UUUUU]


(part of dmesg showing sata_mv ver)
sata_mv 0000:02:03.0: version 1.24
sata_mv 0000:02:03.0: PCI INT A -> GSI 26 (level, low) -> IRQ 26
sata_mv 0000:02:03.0: Gen-II 32 slots 8 ports SCSI mode IRQ via INTx
scsi0 : sata_mv
scsi1 : sata_mv
scsi2 : sata_mv
scsi3 : sata_mv
scsi4 : sata_mv
scsi5 : sata_mv
scsi6 : sata_mv
scsi7 : sata_mv
ata1: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd022000 irq 26
ata2: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd024000 irq 26
ata3: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd026000 irq 26
ata4: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd028000 irq 26
ata5: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd032000 irq 26
ata6: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd034000 irq 26
ata7: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd036000 irq 26
ata8: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd038000 irq 26


As I said in my email to you, let me know if there is anything I can do to assist.  I can only imagine how difficult things like this are to track down...

Comment 53 Mark Lord 2009-01-13 21:09:33 UTC
That's great information, thanks.

Now, there may be multiple issues here, but I have found one possible cause of the reported behaviour.  Brian's info above indicates that we are losing an NCQ interrupt somehow, from time to time.

So I spent this afternoon nitpicking and bitpicking through the interrupt code in sata_mv.c, and I believe I found a race on the hc_irq_cause register.  The code was "helpfully" attempting to use read-modify-write to clear individual port bits there, but this is impossible to do in a race-free fashion.

So.. the obvious fix is to just write the bits being cleared, without touching anything else.  This will also be faster, too, since no read is required or desired.  I really don't see a downside, as long as it actually works!  :)

Patch to be attached here for trial use only.  I still need to run it past Marvell as well as the linux-ide development list.

Cheers

Comment 54 Mark Lord 2009-01-13 21:11:11 UTC
Created attachment 328914 [details]
Patch for 2.6.28: sata_mv: remove update races from hc_irq_cause register

Try and report back.  This bug should be affecting all users of sata_mv, so anyone on the wire could help by testing it and posting results here.

Thanks

Comment 55 Mark Lord 2009-01-14 22:01:16 UTC
Okay, FOUND IT!

But first.. a very important question:  Has anyone ever seen the timeouts on ports 4,5,6,7 of the 6081?  My theory is that this only ever happens on ports 0,1,2,3 -- because that's where I've finally found the bug.

So, please:

(1) tell me if ports 4,5,6,7 have every given you timeout grief (check your logs if need be, this is important).  Thanks.

(2) regardless, apply the next patch I'm about to attach, which fixes incorrect use of port numbers on the 6081 chip.

(3) run with the patch applied, and report back ASAP.  Once I hear from you folks, I'll feed the patch upstream/backstream, as this is a rather important fix.

Thanks.

Comment 56 Mark Lord 2009-01-14 22:03:04 UTC
Created attachment 329048 [details]
sata_mv: Fix timeouts on Marvell 6081 ports 0..3.

This patch should fix the remaining "timeout" issues for Marvell 6081 chipset users.  Please apply and report back ASAP.

Thanks

Comment 57 Mark Lord 2009-01-14 22:04:22 UTC
By the way, I also suspect that timeouts NEVER happen when:

(1) there are no drives on ports 0..3, OR
(2) there are no drives on ports 4..7.

So if only half of the chip is in use, either the upper or lower half, this bug is probably never seen.

Cheers

Comment 58 Brian Rademacher 2009-01-14 22:07:08 UTC
Old patch didn't work - Failed on boot:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
ata1.00: cmd 61/08:00:cb:d5:42/00:00:25:00:00/40 tag 0 ncq 4096 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: max_sectors limited to 256 for NCQ
ata1.00: max_sectors limited to 256 for NCQ
ata1.00: configured for UDMA/133
ata1: EH complete


And to answer your question (1) - NEVER, and I have ports 0-5 filled, with 0-4
comprising the same software RAID array...

Testing new patch now!

Comment 59 Brian Rademacher 2009-01-14 22:51:07 UTC
I hate to speak prematurely, but IT WORKS!!!  No errors, and I've tried copying quite a bit of data (let alone all of the other server stuff going on in the background), and NOTHING.  This is with write caching enabled, which before would cause errors very frequently.  Although early in the testing, I feel very confident that this is the fix based on how quickly I could get it to fail before...

Thank you, thank you, thank you! (and to Harri Olin on the dev mailing list that mentioned the port issue - that was apparently the key).

It's really nice to see a lengthy bug come together like this and result in something so positive...

Comment 60 Brian Rademacher 2009-01-14 22:52:59 UTC
BTW, I tested only with the new patch and not along with the "remove update races from hc_irq_cause register" patch...

Comment 61 Mark Lord 2009-01-14 23:07:06 UTC
That's fine.  The first patch does not fix the problem, but merely speeds up your system by a fraction of a percent.  :)

-ml

Comment 62 Scott Phelps 2009-01-15 16:30:14 UTC
@mlord Hi, just joining the party here...  I too was seeing this error:

[ 105.430353] sda:<3>ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[ 135.842355] ata1.00: cmd 60/08:00:00:00:00/00:00:00:00:00/40 tag 0 ncq 4096 in
[ 135.842355] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 135.846352] ata1.00: status: { DRDY }
[ 135.850353] ata1: hard resetting link


..on a Sun x4500 with a Marvell MV88SX6081 controller.  Your first "sata_mv_fix_hc_irq_cause_race" allowed me to boot successfully.

Uptime is 1.5 days on 2.6.28 with only your patch applied. Thanks!

-sp

Comment 63 Mark Lord 2009-01-16 21:13:04 UTC
Okay, we have lots of confirmations of success now (using only the second patch from me), on the 6081 chipset as well as for the 508x 8-port controller.

I believe this bugzilla entry belongs to Jeff Garzik, so he can take it from here.

Cheers

Mark

Comment 64 Erwan Velu 2009-02-02 19:03:11 UTC
Hello,

It sounds like the 2.6.18-92 series are affected by, at least, the timeout effect on ports 0..3 as it runs sata_mv 1.01 (backported from the 2.6.24).

Is there any plan to backport that in the 2.6.18-92 series ?

Sincerly,

Comment 65 Stefan Neufeind 2009-02-03 01:06:45 UTC
And RHEL-backports maybe? :-)

Comment 66 Chuck Ebbert 2009-02-04 18:28:58 UTC
Patch is in the queue for 2.6.27.15

Comment 67 bmos 2009-02-18 20:55:03 UTC
Hey all 
I here looking for a solution to the same or simular issue?

 ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
 ata4.01: cmd c8/00:08:c7:d8:ba/00:00:00:00:00/f1 tag 0 dma 4096 in
       res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
 ata4.01: status: { DRDY }
 ata4: soft resetting link
 ata4.01: configured for UDMA/133
 ata4: EH complete

These are not the system volumes. (different file systems)

This dev was working properly until this issue appeared? the only things that have changes is an updated kernel and i had plugged in a new USB dev (lexmark printer) 

The gui reported the free space on the dev's ?
logged out and back in system froze ?
the system would not shut-down in this state 

hard reset- power down remove added usb printer **I had noted the system was booting slower that previously **
system can up as normal-- 
all dev can be mounted and used a required

Sooooo it appears that there is an issue may be with the usb, which is nothing new, this board and chipset is not the best (wonky to say the least) 
asus P5LD2 

0:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02)
	Subsystem: Intel Corporation Unknown device 2580
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: <access denied>

00:01.0 PCI bridge: Intel Corporation 82945G/GZ/P/PL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
	Latency: 0, Cache Line Size: 16 bytes
	Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
	I/O behind bridge: 0000e000-0000efff
	Memory behind bridge: cff00000-cfffffff
	Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport-driver
	Kernel modules: shpchp

00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01)
	Subsystem: ASUSTeK Computer Inc. Unknown device 8237
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 16 bytes
	Interrupt: pin A routed to IRQ 19
	Region 0: Memory at cfcf8000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: HDA Intel
	Kernel modules: snd-hda-intel

00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 16 bytes
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport-driver
	Kernel modules: shpchp

00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 16 bytes
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000c000-0000cfff
	Memory behind bridge: cfe00000-cfefffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>
	Kernel driver in use: pcieport-driver
	Kernel modules: shpchp

00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 20
	Region 4: I/O ports at 7000 [size=32]
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin B routed to IRQ 17
	Region 4: I/O ports at 7400 [size=32]
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin C routed to IRQ 18
	Region 4: I/O ports at 7800 [size=32]
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin D routed to IRQ 19
	Region 4: I/O ports at 8000 [size=32]
	Kernel driver in use: uhci_hcd
	Kernel modules: uhci-hcd

00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 20
	Region 0: Memory at cfcff800 (32-bit, non-prefetchable) [size=1K]
	Capabilities: <access denied>
	Kernel driver in use: ehci_hcd
	Kernel modules: ehci-hcd

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
	I/O behind bridge: 0000a000-0000bfff
	Memory behind bridge: cfd00000-cfdfffff
	Prefetchable memory behind bridge: 00000000cc000000-00000000cc0fffff
	Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied>

00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01)
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: <access denied>
	Kernel modules: iTCO_wdt, intel-rng

00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 22
	Region 0: I/O ports at 01f0 [size=8]
	Region 1: I/O ports at 03f4 [size=1]
	Region 2: I/O ports at 0170 [size=8]
	Region 3: I/O ports at 0374 [size=1]
	Region 4: I/O ports at ffa0 [size=16]
	Kernel driver in use: ata_piix
	Kernel modules: ata_generic, ata_piix, pata_acpi

00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO])
	Subsystem: ASUSTeK Computer Inc. Unknown device 2601
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin B routed to IRQ 23
	Region 0: I/O ports at 9800 [size=8]
	Region 1: I/O ports at 9400 [size=4]
	Region 2: I/O ports at 9000 [size=8]
	Region 3: I/O ports at 8800 [size=4]
	Region 4: I/O ports at 8400 [size=16]
	Region 5: Memory at cfcffc00 (32-bit, non-prefetchable) [size=1K]
	Capabilities: <access denied>
	Kernel driver in use: ata_piix
	Kernel modules: ata_generic, ata_piix, pata_acpi

00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01)
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin B routed to IRQ 23
	Region 4: I/O ports at 0400 [size=32]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c-i801

01:03.0 Mass storage controller: Integrated Technology Express, Inc. ITE 8211F Single Channel UDMA 133 (rev 11)
	Subsystem: ASUSTeK Computer Inc. P5GD1-VW Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 64 (2000ns min, 2000ns max)
	Interrupt: pin A routed to IRQ 20
	Region 0: I/O ports at b800 [size=8]
	Region 1: I/O ports at b400 [size=4]
	Region 2: I/O ports at b000 [size=8]
	Region 3: I/O ports at a800 [size=4]
	Region 4: I/O ports at a400 [size=16]
	Expansion ROM at cc000000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: pata_it821x
	Kernel modules: pata_it821x

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19)
	Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 16 bytes
	Interrupt: pin A routed to IRQ 19
	Region 0: Memory at cfefc000 (64-bit, non-prefetchable) [size=16K]
	Region 2: I/O ports at c800 [size=256]
	Expansion ROM at cfec0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: sky2
	Kernel modules: sky2

04:00.0 VGA compatible controller: ATI Technologies Inc RV515 PRO [Radeon X1300/X1550 Series] (prog-if 00 [VGA controller])
	Subsystem: ASUSTeK Computer Inc. EAX1300PRO/TD/256M
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 16 bytes
	Interrupt: pin A routed to IRQ 5
	Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at cffe0000 (64-bit, non-prefetchable) [size=64K]
	Region 4: I/O ports at e000 [size=256]
	Expansion ROM at cffc0000 [disabled] [size=128K]
	Capabilities: <access denied>

04:00.1 Display controller: ATI Technologies Inc RV515 PRO [Radeon X1300/X1550 Series] (Secondary)
	Subsystem: ASUSTeK Computer Inc. Unknown device 0143
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 16 bytes
	Region 0: Memory at cfff0000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: <access denied>



dmesg

Initializing cgroup subsys cpuset
Linux version 2.6.27.12-78.2.8.fc9.x86_64 (mockbuild@) (gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC) ) #1 SMP Mon Jan 19 19:25:03 EST 2009
Command line: ro root=/dev/VolGroup00/LogVol00 vga=791 
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000c7f90000 (usable)
 BIOS-e820: 00000000c7f90000 - 00000000c7f9e000 (ACPI data)
 BIOS-e820: 00000000c7f9e000 - 00000000c7fe0000 (ACPI NVS)
 BIOS-e820: 00000000c7fe0000 - 00000000c8000000 (reserved)
 BIOS-e820: 00000000ffb80000 - 0000000100000000 (reserved)
DMI 2.4 present.
AMI BIOS detected: BIOS may corrupt low RAM, working it around.
last_pfn = 0xc7f90 max_arch_pfn = 0x3ffffffff
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
init_memory_mapping
 0000000000 - 00c7e00000 page 2M
 00c7e00000 - 00c7f90000 page 4k
kernel direct mapping tables up to c7f90000 @ 10000-16000
last_map_addr: c7f90000 end: c7f90000
RAMDISK: 37c31000 - 37fefa2c
ACPI: RSDP 000FACA0, 0024 (r2 ACPIAM)
ACPI: XSDT C7F90100, 004C (r1 ������ ��������  7000720 MSFT       97)
ACPI: FACP C7F90290, 00F4 (r3 A_M_I_ OEMFACP   7000720 MSFT       97)
ACPI: DSDT C7F90590, 8391 (r1  A0227 A0227000        0 INTL 20051117)
ACPI: FACS C7F9E000, 0040
ACPI: APIC C7F90390, 0080 (r1 A_M_I_ OEMAPIC   7000720 MSFT       97)
ACPI: SLIC C7F90410, 0176 (r1 ������ ��������  7000720 MSFT       97)
ACPI: OEMB C7F9E040, 0066 (r1 A_M_I_ AMI_OEM   7000720 MSFT       97)
ACPI: MCFG C7F98930, 003C (r1 A_M_I_ OEMMCFG   7000720 MSFT       97)
No NUMA configuration found
Faking a node at 0000000000000000-00000000c7f90000
Bootmem setup node 0 0000000000000000-00000000c7f90000
  NODE_DATA [0000000000014000 - 0000000000028fff]
  bootmap [0000000000029000 -  0000000000041ff7] pages 19
(6 early reservations) ==> bootmem [0000000000 - 00c7f90000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0000200000 - 0000972d2c]    TEXT DATA BSS ==> [0000200000 - 0000972d2c]
  #3 [0037c31000 - 0037fefa2c]          RAMDISK ==> [0037c31000 - 0037fefa2c]
  #4 [000009fc00 - 0000100000]    BIOS reserved ==> [000009fc00 - 0000100000]
  #5 [0000010000 - 0000014000]          PGTABLE ==> [0000010000 - 0000014000]
found SMP MP-table at [ffff8800000ff780] 000ff780
 [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff880001200000-ffff880003dfffff] on node 0
Zone PFN ranges:
  DMA      0x00000010 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00100000
Movable zone start PFN for each node
early_node_map[2] active PFN ranges
    0: 0x00000010 -> 0x0000009f
    0: 0x00000100 -> 0x000c7f90
On node 0 totalpages: 818975
  DMA zone: 1916 pages, LIFO batch:0
  DMA32 zone: 803849 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled)
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 0, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
SMP: Allowing 4 CPUs, 2 hotplug CPUs
PM: Registered nosave memory: 000000000009f000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000e4000
PM: Registered nosave memory: 00000000000e4000 - 0000000000100000
Allocating PCI resources starting at cc000000 (gap: c8000000:37b80000)
PERCPU: Allocating 64928 bytes of per cpu data
NR_CPUS: 64, nr_cpu_ids: 4, nr_node_ids 1
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 805765
Policy zone: DMA32
Kernel command line: ro root=/dev/VolGroup00/LogVol00 vga=791 
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 32768 bytes)
TSC: PIT calibration confirmed by PMTIMER.
TSC: using PMTIMER calibration value
Detected 2424.936 MHz processor.
Console: colour dummy device 80x25
console [tty0] enabled
Checking aperture...
No AGP bridge found
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
Memory: 3218840k/3276352k available (2850k kernel code, 57060k reserved, 1581k data, 1268k init)
CPA: page pool initialized 1 of 1 pages preallocated
SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
Calibrating delay loop (skipped), value calculated using timer frequency.. 4849.87 BogoMIPS (lpj=2424936)
Security Framework initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys ns
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 2048K
CPU 0/0 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
using mwait in idle threads.
ACPI: Core revision 20080609
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz stepping 02
Using local APIC timer interrupts.
APIC timer calibration result 21651235
Detected 21.651 MHz APIC timer.
Booting processor 1/1 ip 6000
Initializing CPU#1
Calibrating delay using timer specific routine.. 4849.88 BogoMIPS (lpj=2424941)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 2048K
CPU 1/1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU1: Thermal monitoring enabled (TM2)
x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
CPU1: Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz stepping 02
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Brought up 2 CPUs
Total of 2 processors activated (9699.75 BogoMIPS).
sizeof(vma)=176 bytes
sizeof(page)=56 bytes
sizeof(inode)=560 bytes
sizeof(dentry)=208 bytes
sizeof(ext3inode)=760 bytes
sizeof(buffer_head)=104 bytes
sizeof(skbuff)=232 bytes
sizeof(task_struct)=5856 bytes
CPU0 attaching sched-domain:
 domain 0: span 0-1 level MC
  groups: 0 1
  domain 1: span 0-1 level NODE
   groups: 0-1
CPU1 attaching sched-domain:
 domain 0: span 0-1 level MC
  groups: 1 0
  domain 1: span 0-1 level NODE
   groups: 0-1
net_namespace: 1552 bytes
Booting paravirtualized kernel on bare hardware
Time: 14:57:21  Date: 02/18/09
NET: Registered protocol family 16
No dock devices found.
ACPI: bus type pci registered
PCI: Found Intel Corporation 945G/GZ/P/PL Express Memory Controller Hub without MMCONFIG support.
PCI: Using configuration type 1 for base access
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S1 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
pci 0000:00:01.0: PME# disabled
PCI: 0000:00:1b.0 reg 10 64bit mmio: [cfcf8000, cfcfbfff]
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1b.0: PME# disabled
pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.0: PME# disabled
pci 0000:00:1c.3: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.3: PME# disabled
PCI: 0000:00:1d.0 reg 20 io port: [7000, 701f]
PCI: 0000:00:1d.1 reg 20 io port: [7400, 741f]
PCI: 0000:00:1d.2 reg 20 io port: [7800, 781f]
PCI: 0000:00:1d.3 reg 20 io port: [8000, 801f]
PCI: 0000:00:1d.7 reg 10 32bit mmio: [cfcff800, cfcffbff]
pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold
pci 0000:00:1d.7: PME# disabled
pci 0000:00:1f.0: Force enabled HPET at 0xfed00000
pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.0: quirk: region 0480-04bf claimed by ICH6 GPIO
PCI: 0000:00:1f.1 reg 10 io port: [0, 7]
PCI: 0000:00:1f.1 reg 14 io port: [0, 3]
PCI: 0000:00:1f.1 reg 18 io port: [0, 7]
PCI: 0000:00:1f.1 reg 1c io port: [0, 3]
PCI: 0000:00:1f.1 reg 20 io port: [ffa0, ffaf]
PCI: 0000:00:1f.2 reg 10 io port: [9800, 9807]
PCI: 0000:00:1f.2 reg 14 io port: [9400, 9403]
PCI: 0000:00:1f.2 reg 18 io port: [9000, 9007]
PCI: 0000:00:1f.2 reg 1c io port: [8800, 8803]
PCI: 0000:00:1f.2 reg 20 io port: [8400, 840f]
PCI: 0000:00:1f.2 reg 24 32bit mmio: [cfcffc00, cfcfffff]
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
PCI: 0000:00:1f.3 reg 20 io port: [400, 41f]
PCI: 0000:04:00.0 reg 10 64bit mmio: [d0000000, dfffffff]
PCI: 0000:04:00.0 reg 18 64bit mmio: [cffe0000, cffeffff]
PCI: 0000:04:00.0 reg 20 io port: [e000, e0ff]
PCI: 0000:04:00.0 reg 30 32bit mmio: [cffc0000, cffdffff]
pci 0000:04:00.0: supports D1
pci 0000:04:00.0: supports D2
PCI: 0000:04:00.1 reg 10 64bit mmio: [cfff0000, cfffffff]
pci 0000:04:00.1: supports D1
pci 0000:04:00.1: supports D2
Pre-1.1 PCIe device detected, disable ASPM for 0000:00:01.0. It can be enabled forcedly with 'pcie_aspm=force'
PCI: bridge 0000:00:01.0 io port: [e000, efff]
PCI: bridge 0000:00:01.0 32bit mmio: [cff00000, cfffffff]
PCI: bridge 0000:00:01.0 64bit mmio pref: [d0000000, dfffffff]
PCI: bridge 0000:00:1c.0 io port: [d000, dfff]
PCI: 0000:02:00.0 reg 10 64bit mmio: [cfefc000, cfefffff]
PCI: 0000:02:00.0 reg 18 io port: [c800, c8ff]
PCI: 0000:02:00.0 reg 30 32bit mmio: [cfec0000, cfedffff]
pci 0000:02:00.0: supports D1
pci 0000:02:00.0: supports D2
pci 0000:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold
pci 0000:02:00.0: PME# disabled
Pre-1.1 PCIe device detected, disable ASPM for 0000:00:1c.3. It can be enabled forcedly with 'pcie_aspm=force'
PCI: bridge 0000:00:1c.3 io port: [c000, cfff]
PCI: bridge 0000:00:1c.3 32bit mmio: [cfe00000, cfefffff]
PCI: 0000:01:03.0 reg 10 io port: [b800, b807]
PCI: 0000:01:03.0 reg 14 io port: [b400, b403]
PCI: 0000:01:03.0 reg 18 io port: [b000, b007]
PCI: 0000:01:03.0 reg 1c io port: [a800, a803]
PCI: 0000:01:03.0 reg 20 io port: [a400, a40f]
PCI: 0000:01:03.0 reg 30 32bit mmio: [cfde0000, cfdfffff]
pci 0000:00:1e.0: transparent bridge
PCI: bridge 0000:00:1e.0 io port: [a000, bfff]
PCI: bridge 0000:00:1e.0 32bit mmio: [cfd00000, cfdfffff]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P3._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 *4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 *7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs *3 4 5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] - 72, should be 49 [20080609]
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 14 devices
ACPI: ACPI bus type pnp unregistered
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
PCI-GART: No AMD northbridge found.
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0
hpet0: 3 64-bit timers, 14318180 Hz
tracer: 1286 pages allocated for 65536 entries of 80 bytes
   actual entries 65586
ACPI: RTC can wake from S4
system 00:01: iomem range 0xfed13000-0xfed19fff has been reserved
system 00:07: ioport range 0x290-0x297 has been reserved
system 00:08: ioport range 0x4d0-0x4d1 has been reserved
system 00:08: ioport range 0x800-0x87f has been reserved
system 00:08: ioport range 0x480-0x4bf has been reserved
system 00:08: ioport range 0x900-0x91f has been reserved
system 00:08: iomem range 0xfed1c000-0xfed1ffff has been reserved
system 00:08: iomem range 0xfed20000-0xfed8ffff has been reserved
system 00:08: iomem range 0xffb00000-0xffbfffff could not be reserved
system 00:08: iomem range 0xfff00000-0xffffffff could not be reserved
system 00:09: iomem range 0xfec00000-0xfec00fff has been reserved
system 00:09: iomem range 0xfee00000-0xfee00fff has been reserved
system 00:0c: iomem range 0xf0000000-0xf3ffffff has been reserved
system 00:0d: iomem range 0x0-0x9ffff could not be reserved
system 00:0d: iomem range 0xc0000-0xdffff has been reserved
system 00:0d: iomem range 0xe0000-0xfffff could not be reserved
system 00:0d: iomem range 0x100000-0xc7ffffff could not be reserved
pci 0000:00:01.0: PCI bridge, secondary bus 0000:04
pci 0000:00:01.0:   IO window: 0xe000-0xefff
pci 0000:00:01.0:   MEM window: 0xcff00000-0xcfffffff
pci 0000:00:01.0:   PREFETCH window: 0x000000d0000000-0x000000dfffffff
pci 0000:00:1c.0: PCI bridge, secondary bus 0000:03
pci 0000:00:1c.0:   IO window: 0xd000-0xdfff
pci 0000:00:1c.0:   MEM window: disabled
pci 0000:00:1c.0:   PREFETCH window: disabled
pci 0000:00:1c.3: PCI bridge, secondary bus 0000:02
pci 0000:00:1c.3:   IO window: 0xc000-0xcfff
pci 0000:00:1c.3:   MEM window: 0xcfe00000-0xcfefffff
pci 0000:00:1c.3:   PREFETCH window: disabled
pci 0000:00:1e.0: PCI bridge, secondary bus 0000:01
pci 0000:00:1e.0:   IO window: 0xa000-0xbfff
pci 0000:00:1e.0:   MEM window: 0xcfd00000-0xcfdfffff
pci 0000:00:1e.0:   PREFETCH window: 0x000000cc000000-0x000000cc0fffff
pci 0000:00:01.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:01.0: setting latency timer to 64
pci 0000:00:1c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:1c.0: setting latency timer to 64
pci 0000:00:1c.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19
pci 0000:00:1c.3: setting latency timer to 64
pci 0000:00:1e.0: setting latency timer to 64
bus: 00 index 0 io port: [0, ffff]
bus: 00 index 1 mmio: [0, ffffffffffffffff]
bus: 04 index 0 io port: [e000, efff]
bus: 04 index 1 mmio: [cff00000, cfffffff]
bus: 04 index 2 mmio: [d0000000, dfffffff]
bus: 04 index 3 mmio: [0, 0]
bus: 03 index 0 io port: [d000, dfff]
bus: 03 index 1 mmio: [0, 0]
bus: 03 index 2 mmio: [0, 0]
bus: 03 index 3 mmio: [0, 0]
bus: 02 index 0 io port: [c000, cfff]
bus: 02 index 1 mmio: [cfe00000, cfefffff]
bus: 02 index 2 mmio: [0, 0]
bus: 02 index 3 mmio: [0, 0]
bus: 01 index 0 io port: [a000, bfff]
bus: 01 index 1 mmio: [cfd00000, cfdfffff]
bus: 01 index 2 mmio: [cc000000, cc0fffff]
bus: 01 index 3 io port: [0, ffff]
bus: 01 index 4 mmio: [0, ffffffffffffffff]
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
NET: Registered protocol family 1
checking if image is initramfs... it is
Freeing initrd memory: 3834k freed
audit: initializing netlink socket (disabled)
type=2000 audit(1234969041.423:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
msgmni has been set to 6294
SELinux:  Registering netfilter hooks
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci 0000:04:00.0: Boot video device
pcieport-driver 0000:00:01.0: setting latency timer to 64
pcieport-driver 0000:00:01.0: found MSI capability
pci_express 0000:00:01.0:pcie00: allocate port service
pcieport-driver 0000:00:1c.0: setting latency timer to 64
pcieport-driver 0000:00:1c.0: found MSI capability
pci_express 0000:00:1c.0:pcie00: allocate port service
pci_express 0000:00:1c.0:pcie02: allocate port service
pcieport-driver 0000:00:1c.3: setting latency timer to 64
pcieport-driver 0000:00:1c.3: found MSI capability
pci_express 0000:00:1c.3:pcie00: allocate port service
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
vesafb: framebuffer at 0xd0000000, mapped to 0xffffc20001080000, using 3072k, total 16384k
vesafb: mode is 1024x768x16, linelength=2048, pages=9
vesafb: scrolling: redraw
vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
input: Power Button (FF) as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: Power Button (FF) [PWRF]
input: Power Button (CM) as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input1
ACPI: Power Button (CM) [PWRB]
ACPI: SSDT C7F9E0B0, 01C6 (r1    AMI   CPU1PM        1 INTL 20051117)
processor ACPI0007:00: registered as cooling_device0
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: SSDT C7F9E280, 013A (r1    AMI   CPU2PM        1 INTL 20051117)
processor ACPI0007:01: registered as cooling_device1
ACPI: Processor [CPU2] (supports 8 throttling states)
Non-volatile memory driver v1.2
Linux agpgart interface v0.103
Serial: 8250/16550 driver4 ports, IRQ sharing enabled
brd: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input2
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 KBD port at 0x60,0x64 irq 1
serio: i8042 AUX port at 0x60,0x64 irq 12
mice: PS/2 mouse device common for all mice
rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one month, hpet irqs
cpuidle: using governor ladder
cpuidle: using governor menu
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
  Magic number: 13:254:991
Freeing unused kernel memory: 1268k freed
Write protecting the kernel read-only data: 4060k
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0
ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20
ehci_hcd 0000:00:1d.7: setting latency timer to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: debug port 1
ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported
ehci_hcd 0000:00:1d.7: irq 20, io mem 0xcfcff800
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 ehci_hcd
usb usb1: SerialNumber: 0000:00:1d.7
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
USB Universal Host Controller Interface driver v3.0
uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
uhci_hcd 0000:00:1d.0: setting latency timer to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 20, io base 0x00007000
usb usb2: configuration #1 chosen from 1 choice
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: UHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd
usb usb2: SerialNumber: 0000:00:1d.0
uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
uhci_hcd 0000:00:1d.1: setting latency timer to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 17, io base 0x00007400
usb usb3: configuration #1 chosen from 1 choice
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
usb usb3: New USB device found, idVendor=1d6b, idProduct=0001
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd
usb usb3: SerialNumber: 0000:00:1d.1
uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
uhci_hcd 0000:00:1d.2: setting latency timer to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.2: irq 18, io base 0x00007800
usb usb4: configuration #1 chosen from 1 choice
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
usb 3-2: new low speed USB device using uhci_hcd and address 2
usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: UHCI Host Controller
usb usb4: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd
usb usb4: SerialNumber: 0000:00:1d.2
uhci_hcd 0000:00:1d.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19
uhci_hcd 0000:00:1d.3: setting latency timer to 64
uhci_hcd 0000:00:1d.3: UHCI Host Controller
uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.3: irq 19, io base 0x00008000
usb usb5: configuration #1 chosen from 1 choice
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
usb 3-2: configuration #1 chosen from 1 choice
input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.1/usb3/3-2/3-2:1.0/input/input3
usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb5: Product: UHCI Host Controller
usb usb5: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd
usb usb5: SerialNumber: 0000:00:1d.3
input,hidraw0: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:1d.1-2
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel
input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.1/usb3/3-2/3-2:1.1/input/input4
input,hidraw1: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:1d.1-2
usb 3-2: New USB device found, idVendor=046d, idProduct=c505
usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 3-2: Product: USB Receiver
usb 3-2: Manufacturer: Logitech
SCSI subsystem initialized
Driver 'sd' needs updating - please use bus_type methods
libata version 3.00 loaded.
pata_acpi 0000:00:1f.1: PCI INT A -> GSI 22 (level, low) -> IRQ 22
pata_acpi 0000:00:1f.1: setting latency timer to 64
pata_acpi 0000:00:1f.1: PCI INT A disabled
pata_acpi 0000:00:1f.2: PCI INT B -> GSI 23 (level, low) -> IRQ 23
pata_acpi 0000:00:1f.2: setting latency timer to 64
pata_acpi 0000:00:1f.2: PCI INT B disabled
ata_piix 0000:00:1f.1: version 2.12
ata_piix 0000:00:1f.1: PCI INT A -> GSI 22 (level, low) -> IRQ 22
ata_piix 0000:00:1f.1: setting latency timer to 64
scsi0 : ata_piix
scsi1 : ata_piix
ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
ata1.00: ATA-7: WDC WD2500JB-00REA0, 20.00K20, max UDMA/100
ata1.00: 488397168 sectors, multi 16: LBA48 
ata1.01: ATAPI: HL-DT-ST DVDRAM GSA-H42N, RL01, max UDMA/66
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/66
isa bounce pool size: 16 pages
scsi 0:0:0:0: Direct-Access     ATA      WDC WD2500JB-00R 20.0 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 0:0:1:0: CD-ROM            HL-DT-ST DVDRAM GSA-H42N  RL01 PQ: 0 ANSI: 5
ata_piix 0000:00:1f.2: PCI INT B -> GSI 23 (level, low) -> IRQ 23
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ata_piix 0000:00:1f.2: setting latency timer to 64
scsi2 : ata_piix
scsi3 : ata_piix
ata3: SATA max UDMA/133 cmd 0x9800 ctl 0x9400 bmdma 0x8400 irq 23
ata4: SATA max UDMA/133 cmd 0x9000 ctl 0x8800 bmdma 0x8408 irq 23
ata4.01: ATA-7: ST3320620AS, 3.AAK, max UDMA/133
ata4.01: 625142448 sectors, multi 16: LBA48 NCQ (depth 0/32)
ata4.01: configured for UDMA/133
scsi 3:0:1:0: Direct-Access     ATA      ST3320620AS      3.AA PQ: 0 ANSI: 5
sd 3:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB)
sd 3:0:1:0: [sdb] Write Protect is off
sd 3:0:1:0: [sdb] Mode Sense: 00 3a 00 00
sd 3:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 3:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB)
sd 3:0:1:0: [sdb] Write Protect is off
sd 3:0:1:0: [sdb] Mode Sense: 00 3a 00 00
sd 3:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1 sdb2
sd 3:0:1:0: [sdb] Attached SCSI disk
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
type=1404 audit(1234969056.599:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295
SELinux: 8192 avtab hash slots, 177506 rules.
SELinux: 8192 avtab hash slots, 177506 rules.
SELinux:  8 users, 12 roles, 2428 types, 118 bools, 1 sens, 1024 cats
SELinux:  73 classes, 177506 rules
SELinux:  Completing initialization.
SELinux:  Setting up existing superblocks.
SELinux: initialized (dev dm-1, type ext3), uses xattr
SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts
SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs
SELinux: initialized (dev hugetlbfs, type hugetlbfs), uses genfs_contexts
SELinux: initialized (dev devpts, type devpts), uses transition SIDs
SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
SELinux: initialized (dev anon_inodefs, type anon_inodefs), uses genfs_contexts
SELinux: initialized (dev pipefs, type pipefs), uses task SIDs
SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts
SELinux: initialized (dev sockfs, type sockfs), uses task SIDs
SELinux: initialized (dev proc, type proc), uses genfs_contexts
SELinux: initialized (dev bdev, type bdev), uses genfs_contexts
SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts
SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts
type=1403 audit(1234969056.909:3): policy loaded auid=4294967295 ses=4294967295
sky2 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
sky2 0000:02:00.0: setting latency timer to 64
sky2 0000:02:00.0: v1.22 addr 0xcfefc000 irq 19 Yukon-2 EC rev 2
sky2 eth0: addr 00:18:f3:1a:33:c9
intel_rng: FWH not detected
sd 0:0:0:0: Attached scsi generic sg0 type 0
scsi 0:0:1:0: Attached scsi generic sg1 type 5
sd 3:0:1:0: Attached scsi generic sg2 type 0
Driver 'sr' needs updating - please use bus_type methods
sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 0:0:1:0: Attached scsi CD-ROM sr0
pata_it821x 0000:01:03.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
pata_it821x: controller in pass through mode.
pata_it821x 0000:01:03.0: setting latency timer to 64
scsi4 : pata_it821x
scsi5 : pata_it821x
ata5: PATA max UDMA/133 cmd 0xb800 ctl 0xb400 bmdma 0xa400 irq 20
ata6: PATA max UDMA/133 cmd 0xb000 ctl 0xa800 bmdma 0xa408 irq 20
iTCO_vendor_support: vendor-support=0
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.03 (30-Apr-2008)
iTCO_wdt: Found a ICH7 or ICH7R TCO device (Version=2, TCOBASE=0x0860)
iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
gameport: NS558 PnP Gameport is pnp00:0a/gameport0, io 0x200, speed 826kHz
input: PC Speaker as /devices/platform/pcspkr/input/input5
i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23
ACPI: I/O resource 0000:00:1f.3 [0x400-0x41f] conflicts with ACPI region SMRG [0x400-0x40f]
ACPI: Device needs an ACPI driver
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
HDA Intel 0000:00:1b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
HDA Intel 0000:00:1b.0: setting latency timer to 64
hda_codec: Unknown model for ALC883, trying auto-probe from BIOS...
ALSA sound/pci/hda/hda_codec.c:3021: autoconfig: line_outs=4 (0x14/0x15/0x16/0x17/0x0)
ALSA sound/pci/hda/hda_codec.c:3025:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
ALSA sound/pci/hda/hda_codec.c:3029:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
ALSA sound/pci/hda/hda_codec.c:3030:    mono: mono_out=0x0
ALSA sound/pci/hda/hda_codec.c:3038:    inputs: mic=0x18, fmic=0x19, line=0x1a, fline=0x0, cd=0x0, aux=0x0
device-mapper: multipath: version 1.0.5 loaded
EXT3 FS on dm-1, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on dm-2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev dm-2, type ext3), uses xattr
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
SELinux: initialized (dev sda1, type ext3), uses xattr
SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
Adding 2031608k swap on /dev/mapper/VolGroup00-LogVol01.  Priority:-1 extents:1 across:2031608k
SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts
IA-32 Microcode Update Driver: v1.14a <tigran.co.uk>
firmware: requesting intel-ucode/06-0f-02
firmware: requesting intel-ucode/06-0f-02
microcode: CPU0 updated from revision 0x51 to 0x5a, date = 09262007 
microcode: CPU1 updated from revision 0x51 to 0x5a, date = 09262007 
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
ip6_tables: (C) 2000-2006 Netfilter Core Team
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Plase use
nf_conntrack.acct=1 kernel paramater, acct=1 nf_conntrack module option or
sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
ip_tables: (C) 2000-2006 Netfilter Core Team
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts
warning: `dbus-daemon' uses deprecated v2 capabilities in a way that may be insecure.
sky2 eth0: enabling interface
ADDRCONF(NETDEV_UP): eth0: link is not ready
sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
vboxdrv: Trying to deactivate the NMI watchdog permanently...
vboxdrv: Successfully done.
vboxdrv: Found 2 processor cores.
VBoxDrv: dbg - g_abExecMemory=ffffffffa038c180
vboxdrv: fAsync=0 offMin=0x2d1 offMax=0x1195
vboxdrv: TSC mode is 'synchronous', kernel timer mode is 'normal'.
vboxdrv: Successfully loaded version 2.1.2 (interface 0x000a0009).
VBoxNetFlt: dbg - g_abExecMemory=ffffffffa0526f60
eth0: no IPv6 routers present
fuse init (API version 7.9)
SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
SELinux: initialized (dev sdb1, type fuseblk), uses genfs_contexts
SELinux: initialized (dev sdb2, type fuseblk), uses genfs_contexts

This may not be very helpfull as the suspected dev has been removed

If you require I can recreate the issue and submit the info?

Comment 68 Jeff Garzik 2009-02-18 21:10:33 UTC
An ATA driver timeout

   ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
   ata4.01: cmd c8/00:08:c7:d8:ba/00:00:00:00:00/f1 tag 0 dma 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

is a VERY generic diagnostic message.  It could mean anything, and should not be assumed associated with any particular bug.

All that means is that a timeout occurred, for unknown reasons.

Comment 69 Scott Phelps 2009-03-02 03:08:09 UTC
@mlord - just some info you may find useful:

I patched a vanilla stable branch (2.6.28.3) with only the patch that you posted on 1/14/2009:
"sata_mv_fix_timeouts_on_Marvell_6081_ports_0..3"

Current uptime is 24 days!  I've hit this x4500 with very heavy disk and NFS I/O pretty consistently.

My machine has the following components:
SATA
----
0b:01.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)
        Subsystem: Marvell Technology Group Ltd. Device 11ab
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 76
        Region 0: Memory at fe000000 (64-bit, non-prefetchable) [size=1M]
        Region 2: I/O ports at dc00 [size=256]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
                Address: 0000000000000000  Data: 0000
        Capabilities: [60] PCI-X non-bridge device
                Command: DPERE- ERO- RBC=512 OST=4
                Status: Dev=0b:01.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz-
        Kernel driver in use: sata_mv


HDDs (48)
---------
Device Model:     HITACHI HUA7210SASUN1.0T 0830GPLE8E
Serial Number:    GTE002PAKPLE8E
Firmware Version: GKAOA90A
User Capacity:    1,000,204,886,016 bytes
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1


Please let me know if you need more data.  Is this patch is queued up for a merge the stable branch yet?

Comment 70 Scott Phelps 2009-03-02 03:20:50 UTC
Never mind on that last question, I see it was merged in 2.6.28.4. 
;-)

Cheers,
Scott

Comment 71 Mark Lord 2009-03-02 12:57:44 UTC
It's also now in the latest 2.6.27 kernels.
Dunno if/when it will appear in a RedHat / Fedora kernel.
That part is up to Jeff G., I think.

Cheers

Comment 72 Brian Rademacher 2009-03-03 07:17:28 UTC
(In reply to comment #71)
> It's also now in the latest 2.6.27 kernels.
> Dunno if/when it will appear in a RedHat / Fedora kernel.
> That part is up to Jeff G., I think.
> Cheers

Hopefully soon!  Getting the source and compiling my own driver and redoing initrd on every kernel update is getting a bit old...

2.6.27.12-170.2.5.fc10 just came out today, and nothing yet :(

Comment 73 Brian Rademacher 2009-03-06 20:51:19 UTC
Mark, do you know if your patch could cause the filesystem to disappear?  Since this patch (now running kernel 2.6.27.19-170.2.35.fc10.x86_64), I've had two major system crashes where the filesystem just vanishes (I'm guessing).  The hardware itself is still active (ie - it isn't locking up), but the system becomes totally unresponsive to logins, http requests, etc., and nothing is logged, which is why I'm guessing that the filesystem is going offline...

Comment 74 Mark Lord 2009-03-06 21:00:25 UTC
No, it would not cause that to happen.

Cheers

Comment 75 Brian Rademacher 2009-03-06 22:38:07 UTC
I thought not...I'll start looking at other things...

Comment 76 gijsbert.wiesenekker 2009-04-14 05:27:36 UTC
I am having exactly the same problem with an XFS filesystem residing on a Samsung HD103UJ but with a different controller:

00:1f.2 IDE interface: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO])
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3116
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 19
	Region 0: I/O ports at f900 [size=8]
	Region 1: I/O ports at f800 [size=4]
	Region 2: I/O ports at f700 [size=8]
	Region 3: I/O ports at f600 [size=4]
	Region 4: I/O ports at f500 [size=16]
	Region 5: I/O ports at f400 [size=16]
	Capabilities: [70] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [b0] Vendor Specific Information <?>
	Kernel driver in use: ata_piix
	Kernel modules: ata_generic, pata_acpi

00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller (rev 02) (prog-if 85 [Master SecO PriO])
	Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3116
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 19
	Region 0: I/O ports at f200 [size=8]
	Region 1: I/O ports at f100 [size=4]
	Region 2: I/O ports at f000 [size=8]
	Region 3: I/O ports at ef00 [size=4]
	Region 4: I/O ports at ee00 [size=16]
	Region 5: I/O ports at ed00 [size=16]
	Capabilities: [70] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [b0] Vendor Specific Information <?>
	Kernel driver in use: ata_piix
	Kernel modules: ata_generic, pata_acpi

/var/log/messages shows the error:

Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1.00: cmd 35/00:00:bf:95:f7/00:04:62:00:00/e0 tag 0 dma 524288 out
Apr 13 21:47:19 xpcsp35p2p131 kernel:         res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1.00: status: { DRDY }
Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1: hard resetting link
Apr 13 21:47:25 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr 13 21:47:29 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16)
Apr 13 21:47:29 xpcsp35p2p131 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1.00: qc timeout (cmd 0xec)
Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1.00: revalidation failed (errno=-5)
Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1: hard resetting link
Apr 13 21:47:40 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr 13 21:47:44 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16)
Apr 13 21:47:44 xpcsp35p2p131 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1.00: qc timeout (cmd 0xec)
Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1.00: revalidation failed (errno=-5)
Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1: limiting SATA link speed to 1.5 Gbps
Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1: hard resetting link
Apr 13 21:48:00 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr 13 21:48:05 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16)
Apr 13 21:48:05 xpcsp35p2p131 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: qc timeout (cmd 0xec)
Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: revalidation failed (errno=-5)
Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: disabled
Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.01: failed to set xfermode (err_mask=0x40)
Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1: hard resetting link
Apr 13 21:48:36 xpcsp35p2p131 ntpd[2540]: kernel time sync status change 0001
Apr 13 21:48:40 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0)
Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16)
Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1.01: configured for UDMA/100
Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1: EH complete
Apr 13 21:48:45 xpcsp35p2p131 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 13 21:48:45 xpcsp35p2p131 kernel: end_request: I/O error, dev sda, sector 1660392895

Kernel is 2.6.27.21-170.2.56.fc10.x86_64

Would this controller need a similar patch?

Regards,
Gijsbert

Comment 77 gijsbert.wiesenekker 2009-04-26 20:18:47 UTC
FYI,

One of the workarounds I found on the internet was to insert a CD into the DVD-drive (see also https://bugs.launchpad.net/ubuntu/+bug/104581) and indeed this seems to work! Any ideas why?

Regards,
Gijsbert

Comment 78 Scott Phelps 2009-05-02 14:21:59 UTC
FYI: Patch is in intrepid-proposed:
https://launchpad.net/ubuntu/intrepid/+source/linux/2.6.27-14.33 

See my esteemed colleague's notes for enabling here:
http://ubuntuforums.org/showthread.php?t=1145513

Comment 79 gijsbert.wiesenekker 2009-05-06 19:33:05 UTC
This problem is getting quite annoying. I am also getting it now on my cluster nodes with entirely different hardware and an XFS filesystem residing on a SSD disk:

lspci -vv

00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO])
	Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard
	Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin B routed to IRQ 19
	Region 0: I/O ports at c080 [size=8]
	Region 1: I/O ports at c000 [size=4]
	Region 2: I/O ports at bc00 [size=8]
	Region 3: I/O ports at b880 [size=4]
	Region 4: I/O ports at b800 [size=16]
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Kernel driver in use: ata_piix
	Kernel modules: ata_generic, pata_acpi

/var/log/messages:

May  6 09:39:54 nodep141 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May  6 09:39:54 nodep141 kernel: ata4.00: cmd c8/00:08:1f:cc:d6/00:00:00:00:00/e0 tag 0 dma 4096 in
May  6 09:39:54 nodep141 kernel:         res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
May  6 09:39:54 nodep141 kernel: ata4.00: status: { DRDY }
May  6 09:39:54 nodep141 kernel: ata4: soft resetting link
May  6 09:39:59 nodep141 kernel: ata4.00: qc timeout (cmd 0xec)
May  6 09:39:59 nodep141 kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
May  6 09:39:59 nodep141 kernel: ata4.00: revalidation failed (errno=-5)
May  6 09:40:04 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:40:09 nodep141 kernel: ata4: device not ready (errno=-16), forcing hardreset
May  6 09:40:09 nodep141 kernel: ata4: soft resetting link
May  6 09:40:14 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:40:19 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:40:19 nodep141 kernel: ata4: soft resetting link
May  6 09:40:25 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:40:29 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:40:29 nodep141 kernel: ata4: soft resetting link
May  6 09:40:35 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:41:04 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:41:04 nodep141 kernel: ata4: soft resetting link
May  6 09:41:09 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:41:09 nodep141 kernel: ata4: reset failed, giving up
May  6 09:41:09 nodep141 kernel: ata4.00: disabled
May  6 09:41:09 nodep141 kernel: ata4.01: disabled
May  6 09:41:14 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:41:20 nodep141 kernel: ata4: device not ready (errno=-16), forcing hardreset
May  6 09:41:20 nodep141 kernel: ata4: soft resetting link
May  6 09:41:25 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:41:30 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:41:30 nodep141 kernel: ata4: soft resetting link
May  6 09:41:35 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:41:40 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:41:40 nodep141 kernel: ata4: soft resetting link
May  6 09:41:45 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0)
May  6 09:42:15 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:42:15 nodep141 kernel: ata4: soft resetting link
May  6 09:42:20 nodep141 kernel: ata4: SRST failed (errno=-16)
May  6 09:42:20 nodep141 kernel: ata4: reset failed, giving up
May  6 09:42:20 nodep141 kernel: ata4: EH complete
May  6 09:42:20 nodep141 kernel: sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
May  6 09:42:20 nodep141 kernel: end_request: I/O error, dev sdb, sector 14076959

uname -a:

Linux nodep141 2.6.27.21-170.2.56.fc10.x86_64 #1 SMP Mon Mar 23 23:08:10 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Has the problem been fixed in this kernel version?

Regards,
Gijsbert

Comment 80 Fujisan 2009-05-07 13:42:49 UTC
I had the same kind of message:

May  3 07:49:47 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May  3 07:49:47 localhost kernel: ata1.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
May  3 07:49:47 localhost kernel:         cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
May  3 07:49:47 localhost kernel:         res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation)
May  3 07:49:47 localhost kernel: ata1.00: status: { DRDY ERR }
May  3 07:49:47 localhost kernel: ata1: soft resetting link
May  3 07:49:47 localhost kernel: ata1.00: configured for UDMA/33
May  3 07:49:47 localhost kernel: ata1: EH complete

and appearently, adding 'acpi=off noapic' to the kernel in /etc.grub.conf seems to have solved the problem for me.

kernel /vmlinuz-2.6.27.21-170.2.56.fc10.i686 ro root=/dev/VolGroup00/LogVol00 rhgb quiet vga=792 acpi=off noapic

source:
http://forums.fedoraforum.org/showthread.php?t=213585

Comment 81 Bug Zapper 2009-06-10 02:43:52 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 82 Dave Lindsay 2009-07-01 23:01:14 UTC
Great thread and thank you Mark for providing a patch! I'm running RHEL5 with kernel  2.6.18-128.1.14.el5 with 2 PCI-X cards containing the Marvell chipset and am currently experiencing the exact same symptoms. I found the sata_mv.c file and edited the line in question and rebooted. Unfortunately doing just that didn't solve the problem so I believe I missed a critical step. Do I need to recompile the kernel itself or anything else in order to take advantage of this patch/bug fix? Yes, I'm fairly new to Linux troubleshooting, so any advice with regards to implementing the fix is greatly appreciated as it doesn't seem to be fixed in the latest Red Hat update. 

Regards,
Dave

Comment 83 gijsbert.wiesenekker 2009-07-02 22:05:46 UTC
Following up on the comment from Bug Zapper I now notice that this thread applies to Fedora Core 9. I was experiencing this problem on Fedora Core 10, so could this bug be assigned to Fedore Core 10?

Regards,
Gijsbert

Comment 84 Chuck Ebbert 2009-07-10 00:30:51 UTC
(In reply to comment #83)
> Following up on the comment from Bug Zapper I now notice that this thread
> applies to Fedora Core 9. I was experiencing this problem on Fedora Core 10, so
> could this bug be assigned to Fedore Core 10?

The original bug was fixed in 2.6.27.15 .

Comment 85 Fdor 2011-06-15 18:58:03 UTC
The bug is still in fedora 15.

My system has:

- Card:   Conceptronic Serial ATA & IDE Combo Card.  (pci card)
- Chip:   VIA Technologies, Inc. VT6421 IDE RAID Controller (rev 50).
- O.S.:   Fedora release 15 (Lovelock).
- Kernel: 2.6.38.7-30.fc15.i686   (32 bits)

I'm sure my sata disk drive is ok (I've tested it with other controller and no errors appear). So the problem is at the controller hardware, or at the controller driver. I bet it's at the controller driver.

The error log is similar to the already posted ones:

---------------
  [ 1885.024110] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  [ 1885.024128] ata5.00: failed command: READ DMA EXT
  [ 1885.024145] ata5.00: cmd 25/00:00:80:e7:1c/00:02:04:00:00/e0 tag 0 dma 262144 in
  [ 1885.024148]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
  [ 1885.024155] ata5.00: status: { DRDY }
  [ 1885.024169] ata5: hard resetting link
  [ 1885.329091] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [ 1885.448805] ata5.00: configured for UDMA/133
  [ 1885.448818] ata5.00: device reported invalid CHS sector 0
  [ 1885.448840] ata5: EH complete
  [ 3123.040076] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  [ 3123.040088] ata5.00: failed command: READ DMA
  [ 3123.040103] ata5.00: cmd c8/00:00:80:f6:72/00:00:00:00:00/e2 tag 0 dma 131072 in
  [ 3123.040107]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
  [ 3123.040113] ata5.00: status: { DRDY }
  [ 3123.040128] ata5: hard resetting link
  [ 3123.347077] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [ 3123.475194] ata5.00: configured for UDMA/133
  [ 3123.475216] ata5.00: device reported invalid CHS sector 0
  [ 3123.475261] ata5: EH complete
---------------

As you see, the communication is frozen, so a hard reset is launched and the link is re-stablished. No data corruption is done, but the computer is frozen until the link is reset. Same as posted by other guys.

Some news on this bug? Is it going to be fixed? Is there some trick to decently work until it's fixed?

Thanks

(P.S. Please reopen this bug, update bug product version to "fedora 15", and add the 32-bit version to the bug plattforms)

Comment 86 gijsbert.wiesenekker 2011-06-15 21:12:48 UTC
FYI,

I switched from Fedora to CentOS a couple of years ago because I needed GFS2 support on my cluster nodes, but got the same error frequently initially. However, the frequency has gone down with every kernel update over the years, and hardly ever occurs with the current CentOS kernel (2.6.18-238.9.1.el5), but still does now and then (say once every two month's on one of the cluster nodes).
So you might give CentOS a try to see if that helps.

Regards,
Gijsbert

Comment 87 Chuck Ebbert 2011-06-25 15:20:18 UTC
(In reply to comment #85)
> (P.S. Please reopen this bug, update bug product version to "fedora 15", and
> add the 32-bit version to the bug plattforms)

Please open a new bug against F15, since your errors are not the same as the ones reported here and there are 86 comments on this bug that we would have to wade through when working on it.

Comment 88 Fdor 2011-07-03 11:15:01 UTC
(In reply to comment #87)
> Please open a new bug against F15, since your errors are not the same as the
> ones reported here and there are 86 comments on this bug that we would have to
> wade through when working on it.

Done. Bug 718475