Bug 462425
Summary: | Kernel 2.6.26.3-29.fc9.x86_64 drive goes offline | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Brian Rademacher <rad> | ||||||||
Component: | kernel | Assignee: | Jeff Garzik <jgarzik> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 9 | CC: | brian.mosher, dave, emcnabb, erwan, fdor6, fujisan43, gijsbert.wiesenekker, herrold, jpiszcz, kernel-maint, mathguthrie, peterm, qr7atgwu, rad, rainer.traut, redhat, scott | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2009-07-10 00:30:51 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Brian Rademacher
2008-09-16 06:25:55 UTC
Created attachment 316813 [details]
/var/log/messages
Crashed again (without all of the nasty trace info this time since I caught it right away) during my scheduled rdiff backup (no additional disk IO this time as before). I went back to kernel 2.6.25.14-108.fc9.x86_64 and completed the same rdiff backup with no problem. Here is dmesg output this time: Sep 16 12:37:43 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Sep 16 12:37:43 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out Sep 16 12:37:43 radfiles kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 16 12:37:43 radfiles kernel: ata1.00: status: { DRDY } Sep 16 12:37:43 radfiles kernel: ata1: hard resetting link Sep 16 12:37:43 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 16 12:37:43 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ Sep 16 12:37:43 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ Sep 16 12:37:43 radfiles kernel: ata1.00: configured for UDMA/133 Sep 16 12:37:43 radfiles kernel: ata1: EH complete Sep 16 12:37:43 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) Sep 16 12:37:43 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off Sep 16 12:37:43 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Sep 16 12:39:15 radfiles kernel: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Sep 16 12:39:15 radfiles kernel: ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out Sep 16 12:39:15 radfiles kernel: res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Sep 16 12:39:15 radfiles kernel: ata1.00: status: { DRDY } Sep 16 12:39:15 radfiles kernel: ata1: hard resetting link Sep 16 12:39:15 radfiles kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Sep 16 12:39:16 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ Sep 16 12:39:16 radfiles kernel: ata1.00: max_sectors limited to 256 for NCQ Sep 16 12:39:16 radfiles kernel: ata1.00: configured for UDMA/133 Sep 16 12:39:16 radfiles kernel: ata1: EH complete Sep 16 12:39:16 radfiles kernel: sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) Sep 16 12:39:16 radfiles kernel: sd 0:0:0:0: [sda] Write Protect is off Sep 16 12:39:16 radfiles kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Also, just for reference, kernel 2.6.25.14-108.fc9.x86_64 has NCQ enabled for sata_mv, which is relatively new, but functioning under that kernel: ata1.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) ata2.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) ata3.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) ata4.00: 625142448 sectors, multi 0: LBA48 NCQ (depth 31/32) ata5.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32) I can get the same error under Kernel 2.6.25.14-108.fc9.x86_64 now that I added a 5th drive to the RAID array, but it only shows up either during a RAID resync or about once every few hours. With only 4 drives, it only showed under a resync and never under regular operation. With Kernel 2.6.26.3-29.fc9.x86_64 I'm lucky if I can boot...It resets the bus every few minutes on average. With only 4 drives I was ok until heavy IO (like the backup mentioned in the bug), but with 5 it's unuseable. Hi Brian, I have about 10 computers with very different hardware running with fedora 9 (i386 e x86_64). All with the last kernel. All are OK except one: a computer I have used ext4. When using the 2.6.25.14-108 version it boots clean but with 2.6.26.3-29 my "/" partition is invisible. Searching the web I have seen 2.6.26 requires a patch to use ext4 partitions: "2008-08-20: The 2.6.26-ext4-7 patchset has been released. People who are using ext4 wih 2.6.26 should really take this patch. 2008-07-15: Delayed allocation has been merged into Linus's ext4 git tree! We have started maintaining patches against the latest 2.6 mainline kernel for make it easier for people to try out ext4. " (http://ext4.wiki.kernel.org/index.php/Main_Page) As in your first message we can read: "Sep 15 12:26:40 radfiles kernel: Modules linked in: ext4dev " and you said you have 4 ext3 drives that works OK and the 5th with ext4 partition is bad I think that was your initial problem. For me all I have to do is using 2.6.25 until Fedora team releases another version for 2.6.26. For you perhaps you can try the patch I cited above. The only thing I use ext4 for is on a terabyte backup drive, so it only mounts during the backup process and then umounts otherwise. The failure occurs at any time during heavy IO (ext4 aside), which is why I was seeing it during backups. I don't think that ext4dev should be interacting with anything when the drive isn't even mounted, but for now I have removed the module just to see if anything changes. I'll skip tomorrow's backup and see how it goes. If it works, I'll then try the patchset. Thanks for the idea! I'll try anything at this point... It didn't work... A few more updates: -smartctl reported that /dev/hda is fine, through 2 long tests and 3 short. -Disabling smartd didn't help. -Disabling NCQ didn't help, it just changed the error from NCQ to DMA. -Manually failing sda and later sde and going back to 4 drives (much less IO) worked fine, also showing that sda likely isn't the problem. SATA so reassigning to Jeff. Looks like another case of the bug Mark Lord fixed which I think is queued for .27 Any idea where that bug/patch might be? I'm getting about 6 or so of these lockups a day, so I wouldn't mind trying to push my own fix a little early... I look forward to this patch as well, do you have a link to it? I also use the Intel e1000e driver so I'd prefer the standalone patch vs. moving to 2.6.27. [420781.333179] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [420781.333189] ata6.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0 [420781.333190] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [420781.333194] ata6.00: status: { DRDY } [420781.333200] ata6: hard resetting link [420781.638589] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [420781.662166] ata6.00: configured for UDMA/133 [420781.662166] ata6: EH complete [420781.662989] sd 5:0:0:0: [sdf] 586072368 512-byte hardware sectors (300069 MB) [420781.669416] sd 5:0:0:0: [sdf] Write Protect is off [420781.669416] sd 5:0:0:0: [sdf] Mode Sense: 00 3a 00 00 [420781.669416] sd 5:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [469680.004637] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [469680.004648] ata2.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0 [469680.004649] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [469680.004654] ata2.00: status: { DRDY } [469680.004660] ata2: hard resetting link [469680.309567] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [469680.333461] ata2.00: configured for UDMA/133 [469680.333477] ata2: EH complete [469680.333461] sd 1:0:0:0: [sdb] 586072368 512-byte hardware sectors (300069 MB) [469680.340461] sd 1:0:0:0: [sdb] Write Protect is off [469680.340461] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [469680.345461] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA (In reply to comment #8) > SATA so reassigning to Jeff. Looks like another case of the bug Mark Lord fixed > which I think is queued for .27 Alan, in the mean time, is there something I can change/disable to return stability back to my server (kernel options, libata options in modprobe.conf, etc.)? I'd be willing to take a huge performance hit for stability... I can't find that fix either. I manually failed one of my RAID-5 drives, which has brought back stability to the system. Other than the performance hit and living on the edge of catastrophe if another HD fails, it's working. Certainly not a fix, but for now better than the constant freezing... I found a workaround (that works for me at least) - Disabling the drive write cache on all RAID member drives with hdparm -W0 seems to work. Maybe this is a clue for diagnosing as well. I didn't mention it above, but I have my RAID mounted with data=writeback if that could be having an effect. This may be all for not if it's truly fixed in .27 anyway. I'll be looking forward to the F9 .27 kernel update if/when it comes... (In reply to comment #14) > > This may be all for not if it's truly fixed in .27 anyway. I'll be looking > forward to the F9 .27 kernel update if/when it comes... https://admin.fedoraproject.org/updates/kernel-2.6.27.4-19.fc9 Woo hoo! I shall test when it hits the testing repo... With 2.6.27.4 (Vanilla) the problem still occurs. Justin. [198231.048036] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [198231.048045] ata5.00: cmd b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0 [198231.048046] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [198231.048050] ata5.00: status: { DRDY } [198231.048054] ata5: hard resetting link [198231.353033] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [198231.377941] ata5.00: configured for UDMA/133 [198231.377954] ata5: EH complete [198231.378140] sd 4:0:0:0: [sde] 586072368 512-byte hardware sectors (300069 MB) [198231.385337] sd 4:0:0:0: [sde] Write Protect is off [198231.385344] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00 [198231.385383] sd 4:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA $ uname -a Linux box 2.6.27.4 #1 SMP Sun Oct 26 04:46:17 EDT 2008 x86_64 GNU/Linux Justin, did you test disabling write caching on the drives themselves to see what happens? I have been running that way since I posted that workaround with no trouble under 2.6.26.6-79.fc9.x86_64. I'm just wondering if we are experiencing the same problem with the same workaround. That may help with future debugging of this issue... I have just turned off the cache on all of the drives now and will see if this problem recurs. Justin. I used hdparm -W0 /dev/sda etc to turn it off, is that the method you used (incase variance matters)? That's exactly what I did... I am still trying to reproduce it with the cache off, so far, I have not had any luck. Can you test 2.6.27.4: https://admin.fedoraproject.org/updates/kernel-2.6.27.4-24.fc9 Brian, I believe that was directed at you-- BTW so far you're correct, turning the cache off seems to fix the problem, but who's problem is it? The kernel's? Western Digitals? Intel/chipset? Is there an RPM for 2.6.27.4 somewhere yet? (and the dependencies). Much easier to test that I way. I haven't seen it hit the testing repo yet... I think we're on to something with this write caching thing - Mine is still stable, and I'm running 5 Seagate 7200.10 drives, so different than your WD setup... As I recall, my chipset/hardware is quiet a bit different as well: 00:02.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge 00:03.0 ISA bridge: ALi Corporation M1563 HyperTransport South Bridge (rev 20) 00:03.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU] 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 00:0e.0 IDE interface: ALi Corporation M5229 IDE (rev c5) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 02:03.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) 03:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) 03:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Happened again, this time, with cache OFF: Nov 6 01:20:07 p34 kernel: [639232.946183] ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Nov 6 01:20:07 p34 kernel: [639232.946193] ata13.00: cmd ec/00:00:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in Nov 6 01:20:07 p34 kernel: [639232.946195] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Nov 6 01:20:07 p34 kernel: [639232.946200] ata13.00: status: { DRDY } Nov 6 01:20:07 p34 kernel: [639232.946206] ata13: hard resetting link Nov 6 01:20:08 p34 kernel: [639233.403168] ata13: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Nov 6 01:20:08 p34 kernel: [639233.440207] ata13.00: configured for UDMA/133 Nov 6 01:20:08 p34 kernel: [639233.449851] sd 12:0:0:0: [sdi] Write Protect is off Nov 6 01:20:08 p34 kernel: [639233.449858] sd 12:0:0:0: [sdi] Mode Sense: 00 3a 00 00 Nov 6 01:20:08 p34 kernel: [639233.476367] sd 12:0:0:0: [sdi] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Well mine didn't take long! Two freezes right on boot with 2.6.27.4-19.fc9.x86_64 #1 SMP Thu Oct 30 19:30:01 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux... ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: hard resetting link ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: max_sectors limited to 256 for NCQ ata1.00: max_sectors limited to 256 for NCQ ata1.00: configured for UDMA/133 ata1: EH complete sd 0:0:0:0: [sda] 625142448 512-byte hardware sectors (320073 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata1.00: cmd 61/08:00:08:d6:42/00:00:25:00:00/40 tag 0 ncq 4096 out res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: hard resetting link ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: max_sectors limited to 256 for NCQ ata1.00: max_sectors limited to 256 for NCQ ata1.00: configured for UDMA/133 ata1: EH complete I turned off write caching, which I assume will work based on my previous experience... Running 7 disk raid 5 array with the following card: SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) I saw discussion of this in the linux-kernel mailing list and someone mentioned they where seeing my same issue with the super micro AOC-SAT2-MV8. That's also the card I'm using. file system is XFS. On heavy transfers i'm seeing a lot of this. I've been getting it since late august. Not going to lie, using ubuntu. see my initial bug report here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263160/ If you read down, you'll see i _WAS_ using a RHEL based distro (2.6.18 32bit) just fine, and then i moved to ubuntu (2.6.27.2 64bit) and started getting these issues. -- since this posting, i've upgraded to 2.6.27-7 and its now gotten so bad that its desync'd my raid on a transfer. i'm now worried about loosing the data and have completely disconnected the drives. I'm not going to risk a rebuild without these issues fixed. really wish we could figure this out after 2 months of reported problems. I'm not sure if the redhat bugzilla is the right place to report this, but if someone replies i'll provide any information that i can. dmesg: [11285.918535] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [11285.918567] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out [11285.918568] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [11285.918619] ata9.00: status: { DRDY } [11285.918635] ata9: hard resetting link [11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [11286.460065] ata9.00: max_sectors limited to 256 for NCQ [11286.520054] ata9.00: max_sectors limited to 256 for NCQ [11286.520059] ata9.00: configured for UDMA/133 [11286.520077] ata9: EH complete [11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB) [11286.520132] sd 8:0:0:0: [sdd] Write Protect is off [11286.520134] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [11326.988529] ata8.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [11326.988554] ata8.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out [11326.988555] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [11326.988606] ata8.00: status: { DRDY } [11326.988623] ata8: hard resetting link [11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [11327.580053] ata8.00: max_sectors limited to 256 for NCQ [11327.657199] ata8.00: max_sectors limited to 256 for NCQ [11327.657202] ata8.00: configured for UDMA/133 [11327.657207] ata8: EH complete [11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB) [11327.657272] sd 7:0:0:0: [sdc] Write Protect is off [11327.657273] sd 7:0:0:0: [sdc] Mode Sense: 00 3a 00 00 [11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [11377.938532] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [11377.938557] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out [11377.938558] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [11377.938608] ata7.00: status: { DRDY } [11377.938624] ata7: hard resetting link [11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [11378.520056] ata7.00: max_sectors limited to 256 for NCQ [11378.600065] ata7.00: max_sectors limited to 256 for NCQ [11378.600068] ata7.00: configured for UDMA/133 [11378.600073] ata7: EH complete [11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) [11378.600133] sd 6:0:0:0: [sdb] Write Protect is off [11378.600135] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [11711.718523] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [11711.718548] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out [11711.718549] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [11711.718600] ata9.00: status: { DRDY } [11711.718616] ata9: hard resetting link [11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [11712.260058] ata9.00: max_sectors limited to 256 for NCQ [11712.320057] ata9.00: max_sectors limited to 256 for NCQ [11712.320066] ata9.00: configured for UDMA/133 [11712.320072] ata9: EH complete [11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB) [11712.320125] sd 8:0:0:0: [sdd] Write Protect is off [11712.320127] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [11849.328524] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [11849.328549] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out [11849.328549] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [11849.328600] ata7.00: status: { DRDY } [11849.328617] ata7: hard resetting link [11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [11849.910070] ata7.00: max_sectors limited to 256 for NCQ [11849.990053] ata7.00: max_sectors limited to 256 for NCQ [11849.990057] ata7.00: configured for UDMA/133 [11849.990069] ata7: EH complete [11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) [11849.990123] sd 6:0:0:0: [sdb] Write Protect is off [11849.990125] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [11909.629773] ata9.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [11909.629797] ata9.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out [11909.629798] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [11909.629849] ata9.00: status: { DRDY } [11909.629865] ata9: hard resetting link [11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [11910.180068] ata9.00: max_sectors limited to 256 for NCQ [11910.231316] ata9.00: max_sectors limited to 256 for NCQ [11910.231319] ata9.00: configured for UDMA/133 [11910.231327] ata9: EH complete [11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB) [11910.231394] sd 8:0:0:0: [sdd] Write Protect is off [11910.231396] sd 8:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [11996.729773] ata7.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [11996.729797] ata7.00: cmd 61/03:00:49:00:00/00:00:00:00:00/40 tag 0 ncq 1536 out [11996.729798] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [11996.729848] ata7.00: status: { DRDY } [11996.729865] ata7: hard resetting link [11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [11997.311308] ata7.00: max_sectors limited to 256 for NCQ [11997.391306] ata7.00: max_sectors limited to 256 for NCQ [11997.391316] ata7.00: configured for UDMA/133 [11997.391322] ata7: EH complete [11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) [11997.391378] sd 6:0:0:0: [sdb] Write Protect is off [11997.391380] sd 6:0:0:0: [sdb] Mode Sense: 00 3a 00 00 [11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FU /var/log/messages: Aug 30 20:12:43 isis kernel: [11285.918635] ata9: hard resetting link Aug 30 20:12:43 isis kernel: [11286.420039] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 30 20:12:43 isis kernel: [11286.460065] ata9.00: max_sectors limited to 256 for NCQ Aug 30 20:12:43 isis kernel: [11286.520054] ata9.00: max_sectors limited to 256 for NCQ Aug 30 20:12:43 isis kernel: [11286.520059] ata9.00: configured for UDMA/133 Aug 30 20:12:43 isis kernel: [11286.520077] ata9: EH complete Aug 30 20:12:43 isis kernel: [11286.520119] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB) Aug 30 20:12:43 isis kernel: [11286.520132] sd 8:0:0:0: [sdd] Write Protect is off Aug 30 20:12:43 isis kernel: [11286.520154] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 20:13:24 isis kernel: [11326.988623] ata8: hard resetting link Aug 30 20:13:24 isis kernel: [11327.500037] ata8: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 30 20:13:24 isis kernel: [11327.580053] ata8.00: max_sectors limited to 256 for NCQ Aug 30 20:13:24 isis kernel: [11327.657199] ata8.00: max_sectors limited to 256 for NCQ Aug 30 20:13:24 isis kernel: [11327.657202] ata8.00: configured for UDMA/133 Aug 30 20:13:24 isis kernel: [11327.657207] ata8: EH complete Aug 30 20:13:24 isis kernel: [11327.657257] sd 7:0:0:0: [sdc] 976773168 512-byte hardware sectors (500108 MB) Aug 30 20:13:24 isis kernel: [11327.657272] sd 7:0:0:0: [sdc] Write Protect is off Aug 30 20:13:24 isis kernel: [11327.657296] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 20:14:15 isis kernel: [11377.938624] ata7: hard resetting link Aug 30 20:14:15 isis kernel: [11378.440037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 30 20:14:15 isis kernel: [11378.520056] ata7.00: max_sectors limited to 256 for NCQ Aug 30 20:14:15 isis kernel: [11378.600065] ata7.00: max_sectors limited to 256 for NCQ Aug 30 20:14:15 isis kernel: [11378.600068] ata7.00: configured for UDMA/133 Aug 30 20:14:15 isis kernel: [11378.600073] ata7: EH complete Aug 30 20:14:15 isis kernel: [11378.600120] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) Aug 30 20:14:15 isis kernel: [11378.600133] sd 6:0:0:0: [sdb] Write Protect is off Aug 30 20:14:15 isis kernel: [11378.600155] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 20:19:48 isis kernel: [11711.718616] ata9: hard resetting link Aug 30 20:19:49 isis kernel: [11712.220041] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 30 20:19:49 isis kernel: [11712.260058] ata9.00: max_sectors limited to 256 for NCQ Aug 30 20:19:49 isis kernel: [11712.320057] ata9.00: max_sectors limited to 256 for NCQ Aug 30 20:19:49 isis kernel: [11712.320066] ata9.00: configured for UDMA/133 Aug 30 20:19:49 isis kernel: [11712.320072] ata9: EH complete Aug 30 20:19:49 isis kernel: [11712.320112] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB) Aug 30 20:19:49 isis kernel: [11712.320125] sd 8:0:0:0: [sdd] Write Protect is off Aug 30 20:19:49 isis kernel: [11712.320148] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 20:22:06 isis kernel: [11849.328617] ata7: hard resetting link Aug 30 20:22:06 isis kernel: [11849.830037] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 30 20:22:06 isis kernel: [11849.910070] ata7.00: max_sectors limited to 256 for NCQ Aug 30 20:22:07 isis kernel: [11849.990053] ata7.00: max_sectors limited to 256 for NCQ Aug 30 20:22:07 isis kernel: [11849.990057] ata7.00: configured for UDMA/133 Aug 30 20:22:07 isis kernel: [11849.990069] ata7: EH complete Aug 30 20:22:07 isis kernel: [11849.990109] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) Aug 30 20:22:07 isis kernel: [11849.990123] sd 6:0:0:0: [sdb] Write Protect is off Aug 30 20:22:07 isis kernel: [11849.990147] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 20:23:06 isis kernel: [11909.629865] ata9: hard resetting link Aug 30 20:23:07 isis kernel: [11910.131295] ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 30 20:23:07 isis kernel: [11910.180068] ata9.00: max_sectors limited to 256 for NCQ Aug 30 20:23:07 isis kernel: [11910.231316] ata9.00: max_sectors limited to 256 for NCQ Aug 30 20:23:07 isis kernel: [11910.231319] ata9.00: configured for UDMA/133 Aug 30 20:23:07 isis kernel: [11910.231327] ata9: EH complete Aug 30 20:23:07 isis kernel: [11910.231381] sd 8:0:0:0: [sdd] 976773168 512-byte hardware sectors (500108 MB) Aug 30 20:23:07 isis kernel: [11910.231394] sd 8:0:0:0: [sdd] Write Protect is off Aug 30 20:23:07 isis kernel: [11910.231417] sd 8:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 30 20:24:33 isis kernel: [11996.729865] ata7: hard resetting link Aug 30 20:24:34 isis kernel: [11997.231291] ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Aug 30 20:24:34 isis kernel: [11997.311308] ata7.00: max_sectors limited to 256 for NCQ Aug 30 20:24:34 isis kernel: [11997.391306] ata7.00: max_sectors limited to 256 for NCQ Aug 30 20:24:34 isis kernel: [11997.391316] ata7.00: configured for UDMA/133 Aug 30 20:24:34 isis kernel: [11997.391322] ata7: EH complete Aug 30 20:24:34 isis kernel: [11997.391366] sd 6:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB) Aug 30 20:24:34 isis kernel: [11997.391378] sd 6:0:0:0: [sdb] Write Protect is off Aug 30 20:24:34 isis kernel: [11997.391400] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA I've replaced the card and cables and i'm still getting the issue. This card&raid was working on a RHEL last week (2.6.18 32bit). Replaced OS (ubuntu 64bit), cpu (core2duo), mobo (asus p5k pro) I'm really at a loss here, not sure what else to do. I stressed the other components of the system in windows and they seemed fine. not sure if its the card or something with the newer kernels. I think this problem tends to get ignored because there are so many things that can cause it (bad drives, cables, power supplies, or any combination thereof).. Even with this bug, you can see that in my case disabling write caching solves the problem (not a great solution mind you, but a workaround for now), yet didn't help Justin. BTW, disabling write caching under the new kernel works for me, as with the older kernel. It seems that the one thing we do have in common is a larger than average number of drives in RAID. I have the least at 5, you have 7, and Justin 10 I believe...When I had 4, it was difficult to get this problem to show except for under heavy IO. With 5, I can simply boot... The write cache hack around is really only relevant to that specific type of drive (and at this point appears to be a bug in the drive itself) If it were a bug in the drive itself, wouldn't it show under most all write conditions/kernels? I never even saw this under a 4 drive RAID 5 until later kernel revisions. It was completely stable otherwise. Adding the 5th disk is what sent it over the edge with any kernel... Not sure if you took the time to read my post on the ubuntu bug tracker, but i'm getting the errors on both WDC and seagate drives. giving a thread back in september about this on the linux-kernel mailing list and another reference to the MV88SX6081 8-port SATA II PCI-X Controller (super micro AOC-SAT2-MV8) i was leaning towards that being the cause... That is another possibility (the 88SX6081 controller), although that isn't what Justin is using. Justin's problem seems hard to create, whereas mine and yours is hard to avoid (based on your "...its now gotten so bad that its desync'd my raid on a transfer...") Could be two different issues, but glad you see it with different drives... Just tested 2.6.27.5-37.fc9.x86_64 and same thing... "If it were a bug in the drive itself, wouldn't it show under most all write conditions/kernels" From past experience of drive firmware funnies probably not. If they were simple to cause the vendor would have discovered them before shipping product. Also btw I don't see any reason to believe the various bugs muddled together here are at all connected.. Searching on the controller and "frozen", I found an interesting comment from Mark Lord, where he said this in response to freezing issues with the Marvell controller: "My recollection is that the worst errata are for the 60x1 chips on PCI-X." (which happens to be my situation) He also mentioned that he was going to be resuming work on sata_mv as of October 28th. Original post here: http://webui.sourcelabs.com/kernel/issues/10321 Can someone who knows him point him in this direction while he is working on incorporating errata into the driver? I'd hate to miss out on an opportunity to get this resolved! I think I found his email address (at least it didn't bounce yet), so we'll see... I did a clean install of F10 and still see the same problem. It also has the same solution of disabling write cache. I see this under F10/ext4 now though: kernel: JBD: barrier-based sync failed on md3:8 - disabling barriers So I disabled them in fstab for now. Not to mix that in with this bug though..I'm sure that is likely something else... Pretty fed up with people saying this could be so many different issues. So much so that i finally decided to risk my data to prove it.... read the following. ***___This has got to be the card / chipset / sata_mv driver._____*** Short and simple version of my issues: - This does not depend on drive types - Appears to be caused by MV88SX6081 chipset - Could be a problem in SATA_MV driver - I need replacement controller suggestions Details to all non believers (it’s not a power / hardware issue): I moved 5 of the 7 drives to my onboard controller (have 6 sata ports on the mobo, last was used by the system drive). Left 2 of the western digital drives on the MV88SX6081 8-port SATA II: - sdg - sdh After the advice of some through email, I unplugged everything that wasn't needed. They assumed that it could have been power giving the number of drives I had in the machine. What was left on a tx750w corsair power supply: - mobo (c2d, 4gb ram) - 7 sata raid drives - spread across multiple power supply rails - 1 sata system drive - Super Micro SAT2-MV8 (MV88SX6081 8-port SATA II) - intel pcie 10/100/1000 network card Then I replaced the sate cables 1 more time with old cables I knew worked. I also threw in the brand new controller card as well (have a few spares lying around). I brought everything up and upgraded to: Then I started to rebuild the raid. Everything went fine, no freezes. **This was the first indication that this only happens under heavy load on multiple ports as has been brought up before. So then I started copying data over. About 180GB's the card hard reset both of the drives attached to it and knocked them both out of the raid. **This was also significantly different from before when I was utilizing all the ports as it seemed to work great for quite some time, it wasn't until I was well into the process that the card finally gave up. See the attached dmesg and /var/log/messages. This is the 2nd time I’ve had this card degrade my raid and almost give me a heart attack. The cards are going in the trash at this point. I'm open to suggestions as to possibly replacement. I don’t need a hardware raid card, just a decent controller with great *nix support and lots of ports. ::sigh:: I don’t know who to contact but this is the end of the line for me with this controller and hopefully my issues. Attempting to get my data back as we speak with 2 failed drives in a raid 5... wonderful times. dmsg of the event: [ 1061.040118] md: recovery of RAID array md1 [ 1061.040120] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 1061.040122] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. [ 1061.040126] md: using 128k window, over a total of 488383744 blocks. [11208.852220] md: md1: recovery done. [11209.020072] RAID5 conf printout: [11209.020076] --- rd:7 wd:7 [11209.020079] disk 0, o:1, dev:sdd1 [11209.020080] disk 1, o:1, dev:sdb1 [11209.020081] disk 2, o:1, dev:sdh1 [11209.020082] disk 3, o:1, dev:sdc1 [11209.020083] disk 4, o:1, dev:sdf1 [11209.020084] disk 5, o:1, dev:sde1 [11209.020085] disk 6, o:1, dev:sdg1 [19844.431690] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled [19844.433148] SGI XFS Quota Management subsystem [19844.442507] Filesystem "md1": Disabling barriers, trial barrier write failed [19844.442658] XFS mounting filesystem md1 [19844.893398] Ending clean XFS mount for filesystem: md1 [27027.170016] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [27027.170041] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [27027.170041] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [27027.170083] ata5.00: status: { DRDY } [27027.170099] ata5: hard resetting link [27027.680034] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [27027.720050] ata5.00: max_sectors limited to 256 for NCQ [27027.780047] ata5.00: max_sectors limited to 256 for NCQ [27027.780050] ata5.00: configured for UDMA/133 [27027.780055] end_request: I/O error, dev sdg, sector 73 [27027.780073] md: super_written gets error=-5, uptodate=0 [27027.780076] raid5: Disk failure on sdg1, disabling device. [27027.780077] raid5: Operation continuing on 6 devices. [27027.780117] ata5: EH complete [27027.780674] sd 4:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB) [27027.780800] sd 4:0:0:0: [sdg] Write Protect is off [27027.780803] sd 4:0:0:0: [sdg] Mode Sense: 00 3a 00 00 [27027.781038] sd 4:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [27057.930015] ata12.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [27057.930039] ata12.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 [27057.930040] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [27057.930081] ata12.00: status: { DRDY } [27057.930098] ata12: hard resetting link [27058.440033] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [27058.480049] ata12.00: max_sectors limited to 256 for NCQ [27058.540047] ata12.00: max_sectors limited to 256 for NCQ [27058.540050] ata12.00: configured for UDMA/133 [27058.540055] end_request: I/O error, dev sdh, sector 71 [27058.540072] md: super_written gets error=-5, uptodate=0 [27058.540075] raid5: Disk failure on sdh1, disabling device. [27058.540076] raid5: Operation continuing on 5 devices. [27058.540113] ata12: EH complete [27058.540754] sd 11:0:0:0: [sdh] 976773168 512-byte hardware sectors (500108 MB) [27058.540879] sd 11:0:0:0: [sdh] Write Protect is off [27058.540882] sd 11:0:0:0: [sdh] Mode Sense: 00 3a 00 00 [27058.541070] sd 11:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [27058.584017] RAID5 conf printout: [27058.584020] --- rd:7 wd:5 [27058.584022] disk 0, o:1, dev:sdd1 [27058.584023] disk 1, o:1, dev:sdb1 [27058.584024] disk 2, o:0, dev:sdh1 [27058.584025] disk 3, o:1, dev:sdc1 [27058.584027] disk 4, o:1, dev:sdf1 [27058.584028] disk 5, o:1, dev:sde1 [27058.584029] disk 6, o:0, dev:sdg1 [27061.521245] BUG: soft lockup - CPU#1 stuck for 61s! [smbd:28171] [27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse [27061.521251] CPU 1: [27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse [27061.521251] Pid: 28171, comm: smbd Not tainted 2.6.27-9-server #1 [27061.521251] RIP: 0010:[<ffffffff802abf0c>] [<ffffffff802abf0c>] find_get_pages+0x6c/0x110 [27061.521251] RSP: 0018:ffff880129453358 EFLAGS: 00000246 [27061.521251] RAX: ffff880128d89330 RBX: ffff880129453398 RCX: 0000000000000002 [27061.521251] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffe200022e9e80 [27061.521251] RBP: ffff880129453308 R08: ffffe200009df6c8 R09: 0000000000000005 [27061.521251] R10: 0000000000000037 R11: 00000000001c5778 R12: ffffffff802b6b29 [27061.521251] R13: ffff880123a107d0 R14: ffffe20001c6f6c0 R15: 0000000000000286 [27061.521251] FS: 00007fb72cdf6700(0000) GS:ffff88012fc02980(0000) knlGS:0000000000000000 [27061.521251] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [27061.521251] CR2: 00007f1648629000 CR3: 000000012956d000 CR4: 00000000000006e0 [27061.521251] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [27061.521251] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [27061.521251] [27061.521251] Call Trace: [27061.521251] [<ffffffff802abee3>] ? find_get_pages+0x43/0x110 [27061.521251] [<ffffffff802b6984>] ? pagevec_lookup+0x24/0x30 [27061.521251] [<ffffffffa04e100d>] ? xfs_cluster_write+0xad/0x180 [xfs] [27061.521251] [<ffffffffa04e1578>] ? xfs_page_state_convert+0x498/0x760 [xfs] [27061.521251] [<ffffffffa04e19a1>] ? xfs_vm_writepage+0x71/0x120 [xfs] [27061.521251] [<ffffffff802b9274>] ? pageout+0x124/0x270 [27061.521251] [<ffffffff802ab06a>] ? page_waitqueue+0xa/0x90 [27061.521251] [<ffffffff802b986d>] ? shrink_page_list+0x34d/0x530 [27061.521251] [<ffffffff802b8e49>] ? __isolate_lru_page+0x79/0xb0 [27061.521251] [<ffffffff802b8f0a>] ? isolate_lru_pages+0x8a/0x220 [27061.521251] [<ffffffff802b9bf2>] ? shrink_inactive_list+0x1a2/0x4b0 [27061.521251] [<ffffffff802b9f7b>] ? shrink_zone+0x7b/0x160 [27061.521251] [<ffffffff802ba0ed>] ? shrink_zones+0x8d/0x150 [27061.521251] [<ffffffff802ba236>] ? do_try_to_free_pages+0x86/0x2e0 [27061.521251] [<ffffffff802ba587>] ? try_to_free_pages+0x67/0x70 [27061.521251] [<ffffffff802b90a0>] ? isolate_pages_global+0x0/0x50 [27061.521251] [<ffffffff802b28b1>] ? __alloc_pages_internal+0x241/0x510 [27061.521251] [<ffffffff802d565d>] ? alloc_pages_current+0xad/0x110 [27061.521251] [<ffffffff802ac477>] ? __page_cache_alloc+0x67/0x80 [27061.521251] [<ffffffff802ad0b3>] ? __grab_cache_page+0x63/0xb0 [27061.521251] [<ffffffff80316a59>] ? block_write_begin+0x89/0xf0 [27061.521251] [<ffffffffa04e04ca>] ? xfs_vm_write_begin+0x2a/0x30 [xfs] [27061.521251] [<ffffffffa04e0040>] ? xfs_get_blocks+0x0/0x20 [xfs] [27061.521251] [<ffffffff802ab7ac>] ? generic_perform_write+0xbc/0x1c0 [27061.521251] [<ffffffff802ad512>] ? generic_file_buffered_write+0x92/0x170 [27061.521251] [<ffffffffa04e92d3>] ? xfs_write+0x6b3/0x9b0 [xfs] [27061.521251] [<ffffffff80385a69>] ? apparmor_socket_recvmsg+0x19/0x20 [27061.521251] [<ffffffff803aaf70>] ? memset_c+0x20/0x30 [27061.521251] [<ffffffffa04e4c88>] ? xfs_file_aio_write+0x58/0x60 [xfs] [27061.521251] [<ffffffff802e9559>] ? do_sync_write+0xf9/0x140 [27061.521251] [<ffffffff802e9699>] ? do_sync_read+0xf9/0x140 [27061.521251] [<ffffffff80266fb0>] ? autoremove_wake_function+0x0/0x40 [27061.521251] [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0 [27061.521251] [<ffffffff80386948>] ? apparmor_file_permission+0x28/0x30 [27061.521251] [<ffffffff803613e6>] ? security_file_permission+0x16/0x20 [27061.521251] [<ffffffff802e9c1b>] ? vfs_write+0xcb/0x130 [27061.521251] [<ffffffff802e9d1a>] ? sys_pwrite64+0x9a/0xa0 [27061.521251] [<ffffffff8021285a>] ? system_call_fastpath+0x16/0x1b [27061.521251] [27095.080066] RAID5 conf printout: [27095.080071] --- rd:7 wd:5 [27095.080074] disk 0, o:1, dev:sdd1 [27095.080076] disk 1, o:1, dev:sdb1 [27095.080077] disk 2, o:0, dev:sdh1 [27095.080079] disk 3, o:1, dev:sdc1 [27095.080080] disk 4, o:1, dev:sdf1 [27095.080082] disk 5, o:1, dev:sde1 [27095.080090] RAID5 conf printout: [27095.080091] --- rd:7 wd:5 [27095.080092] disk 0, o:1, dev:sdd1 [27095.080093] disk 1, o:1, dev:sdb1 [27095.080094] disk 2, o:0, dev:sdh1 [27095.080095] disk 3, o:1, dev:sdc1 [27095.080097] disk 4, o:1, dev:sdf1 [27095.080098] disk 5, o:1, dev:sde1 [27095.140011] RAID5 conf printout: [27095.140017] --- rd:7 wd:5 [27095.140019] disk 0, o:1, dev:sdd1 [27095.140022] disk 1, o:1, dev:sdb1 [27095.140024] disk 3, o:1, dev:sdc1 [27095.140026] disk 4, o:1, dev:sdf1 [27095.140027] disk 5, o:1, dev:sde1 [27095.140511] Buffer I/O error on device md1, logical block 455870845 [27095.140545] lost page write due to I/O error on md1 [27095.140550] Buffer I/O error on device md1, logical block 455870846 [27095.140567] lost page write due to I/O error on md1 [27095.140569] Buffer I/O error on device md1, logical block 455870847 [27095.140585] lost page write due to I/O error on md1 [27095.140587] Buffer I/O error on device md1, logical block 455870848 [27095.140604] lost page write due to I/O error on md1 [27095.140606] Buffer I/O error on device md1, logical block 455870849 [27095.140622] lost page write due to I/O error on md1 [27095.140624] Buffer I/O error on device md1, logical block 455870850 [27095.140641] lost page write due to I/O error on md1 [27095.140642] Buffer I/O error on device md1, logical block 455870851 [27095.140659] lost page write due to I/O error on md1 [27095.140661] Buffer I/O error on device md1, logical block 455870852 [27095.140677] lost page write due to I/O error on md1 [27095.140679] Buffer I/O error on device md1, logical block 455870853 [27095.140696] lost page write due to I/O error on md1 [27095.140697] Buffer I/O error on device md1, logical block 455870854 [27095.140714] lost page write due to I/O error on md1 [27095.141327] I/O error in filesystem ("md1") meta-data dev md1 block 0xaeaa9810 ("xlog_iodone") error 5 buf count 12288 [27095.141359] xfs_force_shutdown(md1,0x2) called from line 1056 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_log.c. Return address = 0xffffffffa04c80d3 [27095.141380] Filesystem "md1": Log I/O Error Detected. Shutting down filesystem: md1 [27095.141407] Please umount the filesystem, and rectify the problem(s) [27100.140015] Filesystem "md1": xfs_log_force: error 5 returned. [27113.440011] Filesystem "md1": xfs_log_force: error 5 returned. [27143.440010] Filesystem "md1": xfs_log_force: error 5 returned. [27173.440009] Filesystem "md1": xfs_log_force: error 5 returned. [27203.440012] Filesystem "md1": xfs_log_force: error 5 returned. /var/log/messages: Nov 30 18:39:24 isis kernel: [ 1061.040118] md: recovery of RAID array md1 Nov 30 18:39:24 isis kernel: [ 1061.040120] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Nov 30 18:39:24 isis kernel: [ 1061.040122] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Nov 30 18:39:24 isis kernel: [ 1061.040126] md: using 128k window, over a total of 488383744 blocks. Nov 30 19:02:08 isis -- MARK -- Nov 30 19:22:08 isis -- MARK -- Nov 30 19:42:08 isis -- MARK -- Nov 30 20:02:08 isis -- MARK -- Nov 30 20:22:08 isis -- MARK -- Nov 30 20:42:08 isis -- MARK -- Nov 30 21:02:08 isis -- MARK -- Nov 30 21:22:08 isis -- MARK -- Nov 30 21:28:32 isis kernel: [11208.852220] md: md1: recovery done. Nov 30 21:28:32 isis kernel: [11209.020072] RAID5 conf printout: Nov 30 21:28:32 isis kernel: [11209.020076] --- rd:7 wd:7 Nov 30 21:28:32 isis kernel: [11209.020079] disk 0, o:1, dev:sdd1 Nov 30 21:28:32 isis kernel: [11209.020080] disk 1, o:1, dev:sdb1 Nov 30 21:28:32 isis kernel: [11209.020081] disk 2, o:1, dev:sdh1 Nov 30 21:28:32 isis kernel: [11209.020082] disk 3, o:1, dev:sdc1 Nov 30 21:28:32 isis kernel: [11209.020083] disk 4, o:1, dev:sdf1 Nov 30 21:28:32 isis kernel: [11209.020084] disk 5, o:1, dev:sde1 Nov 30 21:28:32 isis kernel: [11209.020085] disk 6, o:1, dev:sdg1 Nov 30 21:42:08 isis -- MARK -- Nov 30 22:02:08 isis -- MARK -- Nov 30 22:22:08 isis -- MARK -- Nov 30 22:42:08 isis -- MARK -- Nov 30 23:02:08 isis -- MARK -- Nov 30 23:22:08 isis -- MARK -- Nov 30 23:42:08 isis -- MARK -- Nov 30 23:52:27 isis kernel: [19844.431690] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled Nov 30 23:52:27 isis kernel: [19844.433148] SGI XFS Quota Management subsystem Nov 30 23:52:27 isis kernel: [19844.442507] Filesystem "md1": Disabling barriers, trial barrier write failed Nov 30 23:52:27 isis kernel: [19844.442658] XFS mounting filesystem md1 Dec 1 00:22:08 isis -- MARK -- Dec 1 00:42:08 isis -- MARK -- Dec 1 01:02:08 isis -- MARK -- Dec 1 01:22:08 isis -- MARK -- Dec 1 01:42:08 isis -- MARK -- Dec 1 01:52:10 isis kernel: [27027.170099] ata5: hard resetting link Dec 1 01:52:10 isis kernel: [27027.680034] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 1 01:52:11 isis kernel: [27027.720050] ata5.00: max_sectors limited to 256 for NCQ Dec 1 01:52:11 isis kernel: [27027.780047] ata5.00: max_sectors limited to 256 for NCQ Dec 1 01:52:11 isis kernel: [27027.780050] ata5.00: configured for UDMA/133 Dec 1 01:52:11 isis kernel: [27027.780073] md: super_written gets error=-5, uptodate=0 Dec 1 01:52:11 isis kernel: [27027.780117] ata5: EH complete Dec 1 01:52:11 isis kernel: [27027.780674] sd 4:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB) Dec 1 01:52:11 isis kernel: [27027.780800] sd 4:0:0:0: [sdg] Write Protect is off Dec 1 01:52:11 isis kernel: [27027.781038] sd 4:0:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 1 01:52:41 isis kernel: [27057.930098] ata12: hard resetting link Dec 1 01:52:41 isis kernel: [27058.440033] ata12: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Dec 1 01:52:41 isis kernel: [27058.480049] ata12.00: max_sectors limited to 256 for NCQ Dec 1 01:52:41 isis kernel: [27058.540047] ata12.00: max_sectors limited to 256 for NCQ Dec 1 01:52:41 isis kernel: [27058.540050] ata12.00: configured for UDMA/133 Dec 1 01:52:41 isis kernel: [27058.540072] md: super_written gets error=-5, uptodate=0 Dec 1 01:52:41 isis kernel: [27058.540113] ata12: EH complete Dec 1 01:52:41 isis kernel: [27058.540754] sd 11:0:0:0: [sdh] 976773168 512-byte hardware sectors (500108 MB) Dec 1 01:52:41 isis kernel: [27058.540879] sd 11:0:0:0: [sdh] Write Protect is off Dec 1 01:52:41 isis kernel: [27058.541070] sd 11:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Dec 1 01:52:41 isis kernel: [27058.584017] RAID5 conf printout: Dec 1 01:52:41 isis kernel: [27058.584020] --- rd:7 wd:5 Dec 1 01:52:41 isis kernel: [27058.584022] disk 0, o:1, dev:sdd1 Dec 1 01:52:41 isis kernel: [27058.584023] disk 1, o:1, dev:sdb1 Dec 1 01:52:41 isis kernel: [27058.584024] disk 2, o:0, dev:sdh1 Dec 1 01:52:41 isis kernel: [27058.584025] disk 3, o:1, dev:sdc1 Dec 1 01:52:41 isis kernel: [27058.584027] disk 4, o:1, dev:sdf1 Dec 1 01:52:41 isis kernel: [27058.584028] disk 5, o:1, dev:sde1 Dec 1 01:52:41 isis kernel: [27058.584029] disk 6, o:0, dev:sdg1 Dec 1 01:52:44 isis kernel: [27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse Dec 1 01:52:44 isis kernel: [27061.521251] CPU 1: Dec 1 01:52:44 isis kernel: [27061.521251] Modules linked in: xfs aes_x86_64 aes_generic ecb crypto_blkcipher ecryptfs ipv6 af_packet iptable_filter ip_tables x_tables ac sbp2 parport_pc lp parport loop psmouse pcspkr serio_raw iTCO_wdt iTCO_vendor_support evdev button intel_agp snd_hda_intel snd_pcm shpchp snd_timer pci_hotplug snd soundcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif sg pata_acpi pata_marvell usbhid hid ohci1394 ieee1394 sata_mv ata_generic ata_piix libata scsi_mod dock sky2 e1000e ehci_hcd uhci_hcd usbcore raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse Dec 1 01:52:44 isis kernel: [27061.521251] Pid: 28171, comm: smbd Not tainted 2.6.27-9-server #1 Dec 1 01:52:44 isis kernel: [27061.521251] RIP: 0010:[<ffffffff802abf0c>] [<ffffffff802abf0c>] find_get_pages+0x6c/0x110 Dec 1 01:52:44 isis kernel: [27061.521251] RSP: 0018:ffff880129453358 EFLAGS: 00000246 Dec 1 01:52:44 isis kernel: [27061.521251] RAX: ffff880128d89330 RBX: ffff880129453398 RCX: 0000000000000002 Dec 1 01:52:44 isis kernel: [27061.521251] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffffe200022e9e80 Dec 1 01:52:44 isis kernel: [27061.521251] RBP: ffff880129453308 R08: ffffe200009df6c8 R09: 0000000000000005 Dec 1 01:52:44 isis kernel: [27061.521251] R10: 0000000000000037 R11: 00000000001c5778 R12: ffffffff802b6b29 Dec 1 01:52:44 isis kernel: [27061.521251] R13: ffff880123a107d0 R14: ffffe20001c6f6c0 R15: 0000000000000286 Dec 1 01:52:44 isis kernel: [27061.521251] FS: 00007fb72cdf6700(0000) GS:ffff88012fc02980(0000) knlGS:0000000000000000 Dec 1 01:52:44 isis kernel: [27061.521251] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 1 01:52:44 isis kernel: [27061.521251] CR2: 00007f1648629000 CR3: 000000012956d000 CR4: 00000000000006e0 Dec 1 01:52:44 isis kernel: [27061.521251] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 1 01:52:44 isis kernel: [27061.521251] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Dec 1 01:52:44 isis kernel: [27061.521251] Dec 1 01:52:44 isis kernel: [27061.521251] Call Trace: Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802abee3>] ? find_get_pages+0x43/0x110 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b6984>] ? pagevec_lookup+0x24/0x30 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffffa04e100d>] ? xfs_cluster_write+0xad/0x180 [xfs] Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffffa04e1578>] ? xfs_page_state_convert+0x498/0x760 [xfs] Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffffa04e19a1>] ? xfs_vm_writepage+0x71/0x120 [xfs] Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b9274>] ? pageout+0x124/0x270 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ab06a>] ? page_waitqueue+0xa/0x90 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b986d>] ? shrink_page_list+0x34d/0x530 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b8e49>] ? __isolate_lru_page+0x79/0xb0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b8f0a>] ? isolate_lru_pages+0x8a/0x220 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b9bf2>] ? shrink_inactive_list+0x1a2/0x4b0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b9f7b>] ? shrink_zone+0x7b/0x160 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ba0ed>] ? shrink_zones+0x8d/0x150 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ba236>] ? do_try_to_free_pages+0x86/0x2e0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ba587>] ? try_to_free_pages+0x67/0x70 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b90a0>] ? isolate_pages_global+0x0/0x50 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802b28b1>] ? __alloc_pages_internal+0x241/0x510 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802d565d>] ? alloc_pages_current+0xad/0x110 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ac477>] ? __page_cache_alloc+0x67/0x80 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ad0b3>] ? __grab_cache_page+0x63/0xb0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff80316a59>] ? block_write_begin+0x89/0xf0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffffa04e04ca>] ? xfs_vm_write_begin+0x2a/0x30 [xfs] Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffffa04e0040>] ? xfs_get_blocks+0x0/0x20 [xfs] Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ab7ac>] ? generic_perform_write+0xbc/0x1c0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802ad512>] ? generic_file_buffered_write+0x92/0x170 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffffa04e92d3>] ? xfs_write+0x6b3/0x9b0 [xfs] Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff80385a69>] ? apparmor_socket_recvmsg+0x19/0x20 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff803aaf70>] ? memset_c+0x20/0x30 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffffa04e4c88>] ? xfs_file_aio_write+0x58/0x60 [xfs] Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802e9559>] ? do_sync_write+0xf9/0x140 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802e9699>] ? do_sync_read+0xf9/0x140 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff80266fb0>] ? autoremove_wake_function+0x0/0x40 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff80386821>] ? aa_file_permission+0x21/0xf0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff80386948>] ? apparmor_file_permission+0x28/0x30 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff803613e6>] ? security_file_permission+0x16/0x20 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802e9c1b>] ? vfs_write+0xcb/0x130 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff802e9d1a>] ? sys_pwrite64+0x9a/0xa0 Dec 1 01:52:44 isis kernel: [27061.521251] [<ffffffff8021285a>] ? system_call_fastpath+0x16/0x1b Dec 1 01:52:44 isis kernel: [27061.521251] Dec 1 01:53:18 isis kernel: [27095.080066] RAID5 conf printout: Dec 1 01:53:18 isis kernel: [27095.080071] --- rd:7 wd:5 Dec 1 01:53:18 isis kernel: [27095.080074] disk 0, o:1, dev:sdd1 Dec 1 01:53:18 isis kernel: [27095.080076] disk 1, o:1, dev:sdb1 Dec 1 01:53:18 isis kernel: [27095.080077] disk 2, o:0, dev:sdh1 Dec 1 01:53:18 isis kernel: [27095.080079] disk 3, o:1, dev:sdc1 Dec 1 01:53:18 isis kernel: [27095.080080] disk 4, o:1, dev:sdf1 Dec 1 01:53:18 isis kernel: [27095.080082] disk 5, o:1, dev:sde1 Dec 1 01:53:18 isis kernel: [27095.080090] RAID5 conf printout: Dec 1 01:53:18 isis kernel: [27095.080091] --- rd:7 wd:5 Dec 1 01:53:18 isis kernel: [27095.080092] disk 0, o:1, dev:sdd1 Dec 1 01:53:18 isis kernel: [27095.080093] disk 1, o:1, dev:sdb1 Dec 1 01:53:18 isis kernel: [27095.080094] disk 2, o:0, dev:sdh1 Dec 1 01:53:18 isis kernel: [27095.080095] disk 3, o:1, dev:sdc1 Dec 1 01:53:18 isis kernel: [27095.080097] disk 4, o:1, dev:sdf1 Dec 1 01:53:18 isis kernel: [27095.080098] disk 5, o:1, dev:sde1 Dec 1 01:53:18 isis kernel: [27095.140011] RAID5 conf printout: Dec 1 01:53:18 isis kernel: [27095.140017] --- rd:7 wd:5 Dec 1 01:53:18 isis kernel: [27095.140019] disk 0, o:1, dev:sdd1 Dec 1 01:53:18 isis kernel: [27095.140022] disk 1, o:1, dev:sdb1 Dec 1 01:53:18 isis kernel: [27095.140024] disk 3, o:1, dev:sdc1 Dec 1 01:53:18 isis kernel: [27095.140026] disk 4, o:1, dev:sdf1 Dec 1 01:53:18 isis kernel: [27095.140027] disk 5, o:1, dev:sde1 Dec 1 01:53:18 isis kernel: [27095.140545] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140567] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140585] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140604] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140622] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140641] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140659] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140677] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140696] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.140714] lost page write due to I/O error on md1 Dec 1 01:53:18 isis kernel: [27095.141359] xfs_force_shutdown(md1,0x2) called from line 1056 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_log.c. Return address = 0xffffffffa04c80d3 Dec 1 01:53:23 isis kernel: [27100.140015] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:53:36 isis kernel: [27113.440011] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:54:06 isis kernel: [27143.440010] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:54:36 isis kernel: [27173.440009] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:55:06 isis kernel: [27203.440012] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:55:36 isis kernel: [27233.440011] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:56:06 isis kernel: [27263.440011] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:56:36 isis kernel: [27293.440010] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:57:06 isis kernel: [27323.440016] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:57:36 isis kernel: [27353.440015] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:58:06 isis kernel: [27383.440015] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 01:58:36 isis kernel: [27413.440016] Filesystem "md1": xfs_log_force: error 5 returned. ^^^^^^continues this for a while Dec 1 02:12:06 isis kernel: [28223.440015] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:12:36 isis kernel: [28253.440013] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:13:06 isis kernel: [28283.440014] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:13:36 isis kernel: [28313.440013] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:06 isis kernel: [28343.440013] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:36 isis kernel: [28373.440012] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28395.820448] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28395.820456] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28395.820462] xfs_force_shutdown(md1,0x1) called from line 420 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_rw.c. Return address = 0xffffffffa04decc3 Dec 1 02:14:59 isis kernel: [28395.820466] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28395.820468] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28395.820471] xfs_force_shutdown(md1,0x1) called from line 420 of file /build/buildd/linux-2.6.27/fs/xfs/xfs_rw.c. Return address = 0xffffffffa04decc3 Dec 1 02:14:59 isis kernel: [28396.669470] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28396.669487] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28396.669517] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28396.669525] Filesystem "md1": xfs_log_force: error 5 returned. Dec 1 02:14:59 isis kernel: [28396.669635] Filesystem "md1": xfs_log_force: error 5 returned. sorry, upgraded to: Linux isis 2.6.27-9-server #1 SMP Thu Nov 20 22:56:07 UTC 2008 x86_64 GNU/Linux yes... still using ubuntu.. realize this isn't a strict redhat issues, but hope this is shedding some light on other peoples problems here. I gave up as well and bought a 3ware controller. I will install it today or tomorrow. One thing I noticed though is when I disabled all the smart tests and hddtemp daemons and anything else that queries the disk regularly (besides just having smart 'monitor' the statistics) I have not had a repeat event yet, but also, I had the same problem you did, two disks dropped out of my raid5 and everything went bye bye, I had most it backed up elsewhere but yeah I got sick of it too. Justin. I received an email from Mark Lord, who said that he would likely be implementing more Marvell errata before Christmas. Don't know how long it would take to hit a Fedora update after that, but this is good news! Still not sure about your problem Justin, but I hope the new controller works for ya...Hate to see all those good 10k drives go to waste (what do you use that thing for anyway?) I am back on my raptor150s for now. I just like/prefer fast disk/access time. Disabling write caching on the drives apparently does not entirely resolve this issue. I got it again last night: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata2.00: cmd 60/18:00:b3:f8:ba/00:00:00:00:00/40 tag 0 ncq 12288 in ata2.00: status: { DRDY } ata2: hard resetting link ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata2.00: max_sectors limited to 256 for NCQ ata2.00: max_sectors limited to 256 for NCQ ata2.00: configured for UDMA/133 ata2: EH complete I'll take one a month over one every few minutes though. We'll just have to see how Mark's errata implementation goes... I replaced my (12) Velociraptors with (12) Raptor150s, not a single error. I suggest (if you can) try other drives. I'm seeing the same errors on a Fujitsu Siemens Econel 50 server on EL5 U2 running kernel 2.6.18-92.1.22.el5. There was running EL4 for two years without problem. HW: Intel ICH6R in AHCI mode My comment only applies indirectly ... I'm running RHEL 4, kernel 2.6.9-67.0.15.EL and recently got: Dec 28 06:31:02 forest kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Dec 28 06:31:02 forest kernel: ata1.00: cmd ca/00:10:76:0c:43/00:00:00:00:00/e0 tag 0 cdb 0x0 data 8192 out Dec 28 06:31:02 forest kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 28 06:31:09 forest kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Dec 28 06:31:32 forest kernel: ata1: port failed to respond (30 secs, Status 0xd0) Dec 28 06:31:32 forest kernel: ata1: soft resetting port Dec 28 06:32:02 forest kernel: ata1.00: qc timeout (cmd 0xec) Dec 28 06:32:02 forest kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 28 06:32:02 forest kernel: ata1.00: revalidation failed (errno=-5) Dec 28 06:32:02 forest kernel: ata1: failed to recover some devices, retrying in 5 secs Dec 28 06:32:14 forest kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Dec 28 06:32:37 forest kernel: ata1: port failed to respond (30 secs, Status 0xd0) Dec 28 06:32:37 forest kernel: ata1: soft resetting port Dec 28 06:33:07 forest kernel: ata1.00: qc timeout (cmd 0xec) Dec 28 06:33:07 forest kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 28 06:33:07 forest kernel: ata1.00: revalidation failed (errno=-5) Dec 28 06:33:07 forest kernel: ata1: failed to recover some devices, retrying in 5 secs Dec 28 06:33:19 forest kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Dec 28 06:33:42 forest kernel: ata1: port failed to respond (30 secs, Status 0xd0) Dec 28 06:33:42 forest kernel: ata1: soft resetting port Dec 28 06:34:13 forest kernel: ata1.00: qc timeout (cmd 0xec) Dec 28 06:34:13 forest kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Dec 28 06:34:13 forest kernel: ata1.00: revalidation failed (errno=-5) Dec 28 06:34:13 forest kernel: ata1.00: disabled Dec 28 06:34:13 forest kernel: ata1: EH complete This is just one disk, no RAID. ... since I rebooted on the 28th, everything has been fine. I will receive a brand new disk today (the other one was almost new), perform a complete Seagate diagnostics on the disk, then replace the root disk, and do a complete diagnostics on the old disk, but I doubt it's the disk that's the problem here. MB: Intel S5000PSL ata1: SATA max UDMA/133 cmd 0x40C8 ctl 0x40E6 bmdma 0x40A0 irq 193 ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 0/32) ata1.00: ata1: dev 0 multi count 16 ata1.00: configured for UDMA/133 scsi1 : ata_piix Vendor: ATA Model: ST3250410AS Rev: 3.AA Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back SCSI device sda: 488397168 512-byte hdwr sectors (250059 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 This is just to say that the problem might apply to older kernels as well. Your trace is fairly clear The drive stops responding We notice the timeout It reports 0xD0 (busy) We reset it We ask it to identify Its still wedged. Difficult to see how that can be a kernel problem when the drive won't respond to a reset. Could be PSU - that has been an issue with some systems but could also be the drive firmware went castors up. The original bug at the top of this report was fixed in 2.6.26.xx --> this was the mv_qc_defer() bug that Tejun found way back then. The other reports also on this bug are for different problems, yet to be sorted out. There do seem to be a number of "timeouts" reported here and elsewhere, with the ATA opcode often being an NCQ R/W ("FPDMA") command, or a "FLUSH_CACHE_EXT" command. Apart from that, there's not a lot of useful information yet. I need to see specific kernel versions (kernel.org, not vendor kernels), and knowing the exact drive models and PCI bus type (eg. is the 6081 card on a 133MHz/64-bit PCI-X slot, or a 33Mhz/32-bit PCI slot, or a ...). These chips have a number of quirks that are specific to particular bus types. Scream now, and you'll be heard! -Mark (room goes silent - Marvell owners bow down in the presence of Mark Lord) Mine is all the same - Here are the last 3 errors I got: Jan 11 14:12:56 radfiles kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Jan 11 14:12:56 radfiles kernel: ata2.00: cmd 61/08:00:cb:d5:42/00:00:25:00:00/40 tag 0 ncq 4096 out Jan 11 14:12:56 radfiles kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 11 14:12:56 radfiles kernel: ata2.00: status: { DRDY } Jan 11 14:12:56 radfiles kernel: ata2: hard resetting link Jan 11 14:12:56 radfiles kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 11 14:12:56 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ Jan 11 14:12:56 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ Jan 11 14:12:56 radfiles kernel: ata2.00: configured for UDMA/133 Jan 11 14:12:56 radfiles kernel: ata2: EH complete Jan 11 14:12:56 radfiles kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) Jan 11 14:12:56 radfiles kernel: sd 1:0:0:0: [sdb] Write Protect is off Jan 11 14:12:56 radfiles kernel: sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Jan 11 14:15:02 radfiles kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Jan 11 14:15:02 radfiles kernel: ata2.00: cmd 61/08:00:cb:d5:42/00:00:25:00:00/40 tag 0 ncq 4096 out Jan 11 14:15:02 radfiles kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 11 14:15:02 radfiles kernel: ata2.00: status: { DRDY } Jan 11 14:15:02 radfiles kernel: ata2: hard resetting link Jan 11 14:15:03 radfiles kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 11 14:15:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ Jan 11 14:15:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ Jan 11 14:15:03 radfiles kernel: ata2.00: configured for UDMA/133 Jan 11 14:15:03 radfiles kernel: ata2: EH complete Jan 11 14:15:03 radfiles kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) Jan 11 14:15:03 radfiles kernel: sd 1:0:0:0: [sdb] Write Protect is off Jan 11 14:15:03 radfiles kernel: sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Jan 11 14:26:03 radfiles kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen Jan 11 14:26:03 radfiles kernel: ata2.00: cmd 60/08:00:3b:aa:47/00:00:00:00:00/40 tag 0 ncq 4096 in Jan 11 14:26:03 radfiles kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 11 14:26:03 radfiles kernel: ata2.00: status: { DRDY } Jan 11 14:26:03 radfiles kernel: ata2: hard resetting link Jan 11 14:26:03 radfiles kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 11 14:26:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ Jan 11 14:26:03 radfiles kernel: ata2.00: max_sectors limited to 256 for NCQ Jan 11 14:26:03 radfiles kernel: ata2.00: configured for UDMA/133 Jan 11 14:26:03 radfiles kernel: ata2: EH complete Jan 11 14:26:03 radfiles kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) Jan 11 14:26:03 radfiles kernel: sd 1:0:0:0: [sdb] Write Protect is off Jan 11 14:26:03 radfiles kernel: sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA uname -a: Linux radfiles.net 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 EST 2008 x86_64 x86_64 x86_64 GNU/Linux (is there something more I can do here to get you more specific information?) lspci -vv: 00:02.0 PCI bridge: ALi Corporation M5249 HTT to PCI Bridge (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=32 I/O behind bridge: 0000d000-0000dfff Memory behind bridge: fb000000-fcffffff Prefetchable memory behind bridge: e2000000-e20fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [b0] HyperTransport: Slave or Primary Interface Command: BaseUnitID=3 UnitCnt=1 MastHost- DefDir- DUL- Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b- Link Config 0: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut- LWI=8bit DwFcInEn- LWO=8bit DwFcOutEn- Link Control 1: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0 IsocEn- LSEn- ExtCTL- 64b- Link Config 1: MLWI=8bit DwFcIn- MLWO=8bit DwFcOut- LWI=8bit DwFcInEn- LWO=8bit DwFcOutEn- Revision ID: 1.04 Link Frequency 0: 200MHz Link Error 0: <Prot- <Ovfl- <EOC- CTLTm- Link Frequency Capability 0: 200MHz+ 300MHz+ 400MHz+ 500MHz- 600MHz- 800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend- Feature Capability: IsocFC- LDTSTOP+ CRCTM- ECTLT- 64bA- UIDRD- Link Frequency 1: 200MHz Link Error 1: <Prot- <Ovfl- <EOC- CTLTm- Link Frequency Capability 1: 200MHz- 300MHz- 400MHz- 500MHz- 600MHz- 800MHz- 1.0GHz- 1.2GHz- 1.4GHz- 1.6GHz- Vend- Error Handling: PFlE- OFlE- PFE- OFE- EOCFE- RFE- CRCFE- SERRFE- CF- RE- PNFE- ONFE- EOCNFE- RNFE- CRCNFE- SERRNFE- Prefetchable memory behind bridge Upper: 00-00 Bus Number: 00 Capabilities: [f0] HyperTransport: Interrupt Discovery and Configuration Kernel modules: shpchp 00:03.0 ISA bridge: ALi Corporation M1563 HyperTransport South Bridge (rev 20) Subsystem: Device 19d5:2203 Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 (250ns min, 6000ns max) 00:03.1 Bridge: ALi Corporation M7101 Power Management Controller [PMU] Subsystem: Device 19d5:2203 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel modules: alim7101_wdt 00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 Bus: primary=00, secondary=02, subordinate=02, sec-latency=32 I/O behind bridge: 0000e000-0000efff Memory behind bridge: fd000000-fd0fffff Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [a0] PCI-X bridge device Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz Status: Dev=00:0a.0 64bit+ 133MHz+ SCD- USC- SCO- SRD- Upstream: Capacity=14 CommitmentLimit=65535 Downstream: Capacity=2 CommitmentLimit=65535 Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration Capabilities: [c0] HyperTransport: Slave or Primary Interface !!! Possibly incomplete decoding Command: BaseUnitID=10 UnitCnt=2 MastHost- DefDir- Link Control 0: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 Link Config 0: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit Link Control 1: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 Link Config 1: MLWI=8bit MLWO=8bit LWI=8bit LWO=8bit Revision ID: 1.02 Kernel modules: shpchp 00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) (prog-if 10 [IO-APIC]) Subsystem: Device 19d5:2203 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at febfe000 (64-bit, non-prefetchable) [size=4K] 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 Bus: primary=00, secondary=03, subordinate=03, sec-latency=32 Memory behind bridge: fd100000-fd1fffff Prefetchable memory behind bridge: 00000000e2100000-00000000e21fffff Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [a0] PCI-X bridge device Secondary Status: 64bit+ 133MHz+ SCD- USC- SCO- SRD- Freq=100MHz Status: Dev=00:0b.0 64bit+ 133MHz+ SCD- USC- SCO- SRD- Upstream: Capacity=14 CommitmentLimit=65535 Downstream: Capacity=2 CommitmentLimit=65535 Capabilities: [b8] HyperTransport: Interrupt Discovery and Configuration Kernel modules: shpchp 00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) (prog-if 10 [IO-APIC]) Subsystem: Device 19d5:2203 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: Memory at febff000 (64-bit, non-prefetchable) [size=4K] 00:0e.0 IDE interface: ALi Corporation M5229 IDE (rev c5) (prog-if fa) Subsystem: Device 19d5:2203 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 Interrupt: pin A routed to IRQ 19 Region 0: [virtual] Memory at 000001f0 (32-bit, non-prefetchable) [disabled] [size=8] Region 1: [virtual] Memory at 000003f0 (type 3, non-prefetchable) [disabled] [size=1] Region 2: [virtual] Memory at 00000170 (32-bit, non-prefetchable) [disabled] [size=8] Region 3: [virtual] Memory at 00000370 (type 3, non-prefetchable) [disabled] [size=1] Region 4: I/O ports at f000 [size=16] Kernel driver in use: pata_ali Kernel modules: pata_ali, pata_acpi, ata_generic 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Capabilities: [80] HyperTransport: Host or Secondary Interface !!! Possibly incomplete decoding Command: WarmRst+ DblEnd- Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 Link Config: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit Revision ID: 1.02 Capabilities: [a0] HyperTransport: Host or Secondary Interface !!! Possibly incomplete decoding Command: WarmRst+ DblEnd- Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 Link Config: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit Revision ID: 1.02 Capabilities: [c0] HyperTransport: Host or Secondary Interface !!! Possibly incomplete decoding Command: WarmRst+ DblEnd- Link Control: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0 Link Config: MLWI=16bit MLWO=16bit LWI=N/C LWO=N/C Revision ID: 1.02 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel driver in use: k8temp Kernel modules: k8temp 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Capabilities: [80] HyperTransport: Host or Secondary Interface !!! Possibly incomplete decoding Command: WarmRst+ DblEnd- Link Control: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0 Link Config: MLWI=16bit MLWO=16bit LWI=N/C LWO=N/C Revision ID: 1.02 Capabilities: [a0] HyperTransport: Host or Secondary Interface !!! Possibly incomplete decoding Command: WarmRst+ DblEnd- Link Control: CFlE- CST- CFE- <LkFail- Init+ EOC- TXO- <CRCErr=0 Link Config: MLWI=16bit MLWO=16bit LWI=16bit LWO=16bit Revision ID: 1.02 Capabilities: [c0] HyperTransport: Host or Secondary Interface !!! Possibly incomplete decoding Command: WarmRst+ DblEnd- Link Control: CFlE- CST- CFE- <LkFail+ Init- EOC+ TXO+ <CRCErr=0 Link Config: MLWI=16bit MLWO=16bit LWI=N/C LWO=N/C Revision ID: 1.02 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Kernel driver in use: k8temp Kernel modules: k8temp 01:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) (prog-if 00 [VGA controller]) Subsystem: ATI Technologies Inc Rage XL Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (2000ns min), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 7 Region 0: Memory at fb000000 (32-bit, non-prefetchable) [size=16M] Region 1: I/O ports at d000 [size=256] Region 2: Memory at fc020000 (32-bit, non-prefetchable) [size=4K] [virtual] Expansion ROM at e2000000 [disabled] [size=128K] Capabilities: [5c] Power Management version 2 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Kernel modules: atyfb 02:03.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) Subsystem: Marvell Technology Group Ltd. Device 11ab Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 26 Region 0: Memory at fd000000 (64-bit, non-prefetchable) [size=1M] Region 2: I/O ports at e000 [size=256] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Count=1/1 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [60] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=4 Status: Dev=02:03.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- Kernel driver in use: sata_mv Kernel modules: sata_mv 03:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Subsystem: ABIT Computer Corp. Device 2202 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (16000ns min), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 31 Region 0: Memory at fd100000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at fd110000 (64-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at e2100000 [disabled] [size=64K] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=03:04.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data <?> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8 Enable- Address: 24100073000144a4 Data: 10d0 Kernel driver in use: tg3 Kernel modules: tg3 03:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethernet (rev 03) Subsystem: ABIT Computer Corp. Device 2202 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 32 (16000ns min), Cache Line Size: 32 bytes Interrupt: pin B routed to IRQ 28 Region 0: Memory at fd120000 (64-bit, non-prefetchable) [size=64K] Region 2: Memory at fd130000 (64-bit, non-prefetchable) [size=64K] [virtual] Expansion ROM at e2110000 [disabled] [size=64K] Capabilities: [40] PCI-X non-bridge device Command: DPERE- ERO- RBC=2048 OST=1 Status: Dev=03:04.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz- Capabilities: [48] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable+ DSel=0 DScale=1 PME- Capabilities: [50] Vital Product Data <?> Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Count=1/8 Enable- Address: 2c02d024720c49a0 Data: 5103 Kernel driver in use: tg3 Kernel modules: tg3 (write caching forced off on all drives using hdparm) /dev/sda: Model=ST3320620AS , FwRev=3.AAM , SerialNo= 5QF3T3XP Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=no WriteCache=disabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7 /dev/sdb: Model=ST3320620AS , FwRev=3.AAM , SerialNo= 5QF3V2C3 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=no WriteCache=disabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7 /dev/sdc: Model=ST3320620AS , FwRev=3.AAM , SerialNo= 5QF3T3YM Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=no WriteCache=disabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7 /dev/sdd: Model=ST3320620AS , FwRev=3.AAM , SerialNo= 5QF3RA0R Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=no WriteCache=disabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7 /dev/sde: Model=ST3320620AS , FwRev=3.AAM , SerialNo= 9QFAH509 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=16384kB, MaxMultSect=16, MultSect=?16? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=625142448 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=no WriteCache=disabled Drive conforms to: Unspecified: ATA/ATAPI-1,2,3,4,5,6,7 /proc/mdstat: md2 : active raid1 sdc2[0] sdd2[1] 1052160 blocks [2/2] [UU] md0 : active raid1 sda1[0] sde1[4](S) sdd1[3] sdc1[2] sdb1[1] 64128 blocks [4/4] [UUUU] md1 : active raid1 sda2[0] sde2[2](S) sdb2[1] 1052160 blocks [2/2] [UU] md3 : active raid5 sda3[0] sde3[4] sdd3[3] sdc3[2] sdb3[1] 1245807616 blocks level 5, 256k chunk, algorithm 2 [5/5] [UUUUU] (part of dmesg showing sata_mv ver) sata_mv 0000:02:03.0: version 1.24 sata_mv 0000:02:03.0: PCI INT A -> GSI 26 (level, low) -> IRQ 26 sata_mv 0000:02:03.0: Gen-II 32 slots 8 ports SCSI mode IRQ via INTx scsi0 : sata_mv scsi1 : sata_mv scsi2 : sata_mv scsi3 : sata_mv scsi4 : sata_mv scsi5 : sata_mv scsi6 : sata_mv scsi7 : sata_mv ata1: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd022000 irq 26 ata2: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd024000 irq 26 ata3: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd026000 irq 26 ata4: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd028000 irq 26 ata5: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd032000 irq 26 ata6: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd034000 irq 26 ata7: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd036000 irq 26 ata8: SATA max UDMA/133 mmio m1048576@0xfd000000 port 0xfd038000 irq 26 As I said in my email to you, let me know if there is anything I can do to assist. I can only imagine how difficult things like this are to track down... That's great information, thanks. Now, there may be multiple issues here, but I have found one possible cause of the reported behaviour. Brian's info above indicates that we are losing an NCQ interrupt somehow, from time to time. So I spent this afternoon nitpicking and bitpicking through the interrupt code in sata_mv.c, and I believe I found a race on the hc_irq_cause register. The code was "helpfully" attempting to use read-modify-write to clear individual port bits there, but this is impossible to do in a race-free fashion. So.. the obvious fix is to just write the bits being cleared, without touching anything else. This will also be faster, too, since no read is required or desired. I really don't see a downside, as long as it actually works! :) Patch to be attached here for trial use only. I still need to run it past Marvell as well as the linux-ide development list. Cheers Created attachment 328914 [details]
Patch for 2.6.28: sata_mv: remove update races from hc_irq_cause register
Try and report back. This bug should be affecting all users of sata_mv, so anyone on the wire could help by testing it and posting results here.
Thanks
Okay, FOUND IT! But first.. a very important question: Has anyone ever seen the timeouts on ports 4,5,6,7 of the 6081? My theory is that this only ever happens on ports 0,1,2,3 -- because that's where I've finally found the bug. So, please: (1) tell me if ports 4,5,6,7 have every given you timeout grief (check your logs if need be, this is important). Thanks. (2) regardless, apply the next patch I'm about to attach, which fixes incorrect use of port numbers on the 6081 chip. (3) run with the patch applied, and report back ASAP. Once I hear from you folks, I'll feed the patch upstream/backstream, as this is a rather important fix. Thanks. Created attachment 329048 [details]
sata_mv: Fix timeouts on Marvell 6081 ports 0..3.
This patch should fix the remaining "timeout" issues for Marvell 6081 chipset users. Please apply and report back ASAP.
Thanks
By the way, I also suspect that timeouts NEVER happen when: (1) there are no drives on ports 0..3, OR (2) there are no drives on ports 4..7. So if only half of the chip is in use, either the upper or lower half, this bug is probably never seen. Cheers Old patch didn't work - Failed on boot: ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen ata1.00: cmd 61/08:00:cb:d5:42/00:00:25:00:00/40 tag 0 ncq 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: hard resetting link ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) ata1.00: max_sectors limited to 256 for NCQ ata1.00: max_sectors limited to 256 for NCQ ata1.00: configured for UDMA/133 ata1: EH complete And to answer your question (1) - NEVER, and I have ports 0-5 filled, with 0-4 comprising the same software RAID array... Testing new patch now! I hate to speak prematurely, but IT WORKS!!! No errors, and I've tried copying quite a bit of data (let alone all of the other server stuff going on in the background), and NOTHING. This is with write caching enabled, which before would cause errors very frequently. Although early in the testing, I feel very confident that this is the fix based on how quickly I could get it to fail before... Thank you, thank you, thank you! (and to Harri Olin on the dev mailing list that mentioned the port issue - that was apparently the key). It's really nice to see a lengthy bug come together like this and result in something so positive... BTW, I tested only with the new patch and not along with the "remove update races from hc_irq_cause register" patch... That's fine. The first patch does not fix the problem, but merely speeds up your system by a fraction of a percent. :) -ml @mlord Hi, just joining the party here... I too was seeing this error: [ 105.430353] sda:<3>ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen [ 135.842355] ata1.00: cmd 60/08:00:00:00:00/00:00:00:00:00/40 tag 0 ncq 4096 in [ 135.842355] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 135.846352] ata1.00: status: { DRDY } [ 135.850353] ata1: hard resetting link ..on a Sun x4500 with a Marvell MV88SX6081 controller. Your first "sata_mv_fix_hc_irq_cause_race" allowed me to boot successfully. Uptime is 1.5 days on 2.6.28 with only your patch applied. Thanks! -sp Okay, we have lots of confirmations of success now (using only the second patch from me), on the 6081 chipset as well as for the 508x 8-port controller. I believe this bugzilla entry belongs to Jeff Garzik, so he can take it from here. Cheers Mark Hello, It sounds like the 2.6.18-92 series are affected by, at least, the timeout effect on ports 0..3 as it runs sata_mv 1.01 (backported from the 2.6.24). Is there any plan to backport that in the 2.6.18-92 series ? Sincerly, And RHEL-backports maybe? :-) Patch is in the queue for 2.6.27.15 Hey all I here looking for a solution to the same or simular issue? ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata4.01: cmd c8/00:08:c7:d8:ba/00:00:00:00:00/f1 tag 0 dma 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) ata4.01: status: { DRDY } ata4: soft resetting link ata4.01: configured for UDMA/133 ata4: EH complete These are not the system volumes. (different file systems) This dev was working properly until this issue appeared? the only things that have changes is an updated kernel and i had plugged in a new USB dev (lexmark printer) The gui reported the free space on the dev's ? logged out and back in system froze ? the system would not shut-down in this state hard reset- power down remove added usb printer **I had noted the system was booting slower that previously ** system can up as normal-- all dev can be mounted and used a required Sooooo it appears that there is an issue may be with the usb, which is nothing new, this board and chipset is not the best (wonky to say the least) asus P5LD2 0:00.0 Host bridge: Intel Corporation 82945G/GZ/P/PL Memory Controller Hub (rev 02) Subsystem: Intel Corporation Unknown device 2580 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- Latency: 0 Capabilities: <access denied> 00:01.0 PCI bridge: Intel Corporation 82945G/GZ/P/PL PCI Express Root Port (rev 02) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 I/O behind bridge: 0000e000-0000efff Memory behind bridge: cff00000-cfffffff Prefetchable memory behind bridge: 00000000d0000000-00000000dfffffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA- VGA+ MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport-driver Kernel modules: shpchp 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 01) Subsystem: ASUSTeK Computer Inc. Unknown device 8237 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at cfcf8000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: HDA Intel Kernel modules: snd-hda-intel 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 0000d000-0000dfff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport-driver Kernel modules: shpchp 00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 I/O behind bridge: 0000c000-0000cfff Memory behind bridge: cfe00000-cfefffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> Kernel driver in use: pcieport-driver Kernel modules: shpchp 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 20 Region 4: I/O ports at 7000 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 17 Region 4: I/O ports at 7400 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin C routed to IRQ 18 Region 4: I/O ports at 7800 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin D routed to IRQ 19 Region 4: I/O ports at 8000 [size=32] Kernel driver in use: uhci_hcd Kernel modules: uhci-hcd 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 20 Region 0: Memory at cfcff800 (32-bit, non-prefetchable) [size=1K] Capabilities: <access denied> Kernel driver in use: ehci_hcd Kernel modules: ehci-hcd 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=32 I/O behind bridge: 0000a000-0000bfff Memory behind bridge: cfd00000-cfdfffff Prefetchable memory behind bridge: 00000000cc000000-00000000cc0fffff Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity- SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: <access denied> 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Capabilities: <access denied> Kernel modules: iTCO_wdt, intel-rng 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 22 Region 0: I/O ports at 01f0 [size=8] Region 1: I/O ports at 03f4 [size=1] Region 2: I/O ports at 0170 [size=8] Region 3: I/O ports at 0374 [size=1] Region 4: I/O ports at ffa0 [size=16] Kernel driver in use: ata_piix Kernel modules: ata_generic, ata_piix, pata_acpi 00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: ASUSTeK Computer Inc. Unknown device 2601 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 23 Region 0: I/O ports at 9800 [size=8] Region 1: I/O ports at 9400 [size=4] Region 2: I/O ports at 9000 [size=8] Region 3: I/O ports at 8800 [size=4] Region 4: I/O ports at 8400 [size=16] Region 5: Memory at cfcffc00 (32-bit, non-prefetchable) [size=1K] Capabilities: <access denied> Kernel driver in use: ata_piix Kernel modules: ata_generic, ata_piix, pata_acpi 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin B routed to IRQ 23 Region 4: I/O ports at 0400 [size=32] Kernel driver in use: i801_smbus Kernel modules: i2c-i801 01:03.0 Mass storage controller: Integrated Technology Express, Inc. ITE 8211F Single Channel UDMA 133 (rev 11) Subsystem: ASUSTeK Computer Inc. P5GD1-VW Mainboard Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64 (2000ns min, 2000ns max) Interrupt: pin A routed to IRQ 20 Region 0: I/O ports at b800 [size=8] Region 1: I/O ports at b400 [size=4] Region 2: I/O ports at b000 [size=8] Region 3: I/O ports at a800 [size=4] Region 4: I/O ports at a400 [size=16] Expansion ROM at cc000000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: pata_it821x Kernel modules: pata_it821x 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19) Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at cfefc000 (64-bit, non-prefetchable) [size=16K] Region 2: I/O ports at c800 [size=256] Expansion ROM at cfec0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: sky2 Kernel modules: sky2 04:00.0 VGA compatible controller: ATI Technologies Inc RV515 PRO [Radeon X1300/X1550 Series] (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. EAX1300PRO/TD/256M Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Interrupt: pin A routed to IRQ 5 Region 0: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at cffe0000 (64-bit, non-prefetchable) [size=64K] Region 4: I/O ports at e000 [size=256] Expansion ROM at cffc0000 [disabled] [size=128K] Capabilities: <access denied> 04:00.1 Display controller: ATI Technologies Inc RV515 PRO [Radeon X1300/X1550 Series] (Secondary) Subsystem: ASUSTeK Computer Inc. Unknown device 0143 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 16 bytes Region 0: Memory at cfff0000 (64-bit, non-prefetchable) [size=64K] Capabilities: <access denied> dmesg Initializing cgroup subsys cpuset Linux version 2.6.27.12-78.2.8.fc9.x86_64 (mockbuild@) (gcc version 4.3.0 20080428 (Red Hat 4.3.0-8) (GCC) ) #1 SMP Mon Jan 19 19:25:03 EST 2009 Command line: ro root=/dev/VolGroup00/LogVol00 vga=791 KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable) BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved) BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000c7f90000 (usable) BIOS-e820: 00000000c7f90000 - 00000000c7f9e000 (ACPI data) BIOS-e820: 00000000c7f9e000 - 00000000c7fe0000 (ACPI NVS) BIOS-e820: 00000000c7fe0000 - 00000000c8000000 (reserved) BIOS-e820: 00000000ffb80000 - 0000000100000000 (reserved) DMI 2.4 present. AMI BIOS detected: BIOS may corrupt low RAM, working it around. last_pfn = 0xc7f90 max_arch_pfn = 0x3ffffffff x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 init_memory_mapping 0000000000 - 00c7e00000 page 2M 00c7e00000 - 00c7f90000 page 4k kernel direct mapping tables up to c7f90000 @ 10000-16000 last_map_addr: c7f90000 end: c7f90000 RAMDISK: 37c31000 - 37fefa2c ACPI: RSDP 000FACA0, 0024 (r2 ACPIAM) ACPI: XSDT C7F90100, 004C (r1 ������ �������� 7000720 MSFT 97) ACPI: FACP C7F90290, 00F4 (r3 A_M_I_ OEMFACP 7000720 MSFT 97) ACPI: DSDT C7F90590, 8391 (r1 A0227 A0227000 0 INTL 20051117) ACPI: FACS C7F9E000, 0040 ACPI: APIC C7F90390, 0080 (r1 A_M_I_ OEMAPIC 7000720 MSFT 97) ACPI: SLIC C7F90410, 0176 (r1 ������ �������� 7000720 MSFT 97) ACPI: OEMB C7F9E040, 0066 (r1 A_M_I_ AMI_OEM 7000720 MSFT 97) ACPI: MCFG C7F98930, 003C (r1 A_M_I_ OEMMCFG 7000720 MSFT 97) No NUMA configuration found Faking a node at 0000000000000000-00000000c7f90000 Bootmem setup node 0 0000000000000000-00000000c7f90000 NODE_DATA [0000000000014000 - 0000000000028fff] bootmap [0000000000029000 - 0000000000041ff7] pages 19 (6 early reservations) ==> bootmem [0000000000 - 00c7f90000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000] #2 [0000200000 - 0000972d2c] TEXT DATA BSS ==> [0000200000 - 0000972d2c] #3 [0037c31000 - 0037fefa2c] RAMDISK ==> [0037c31000 - 0037fefa2c] #4 [000009fc00 - 0000100000] BIOS reserved ==> [000009fc00 - 0000100000] #5 [0000010000 - 0000014000] PGTABLE ==> [0000010000 - 0000014000] found SMP MP-table at [ffff8800000ff780] 000ff780 [ffffe20000000000-ffffe20002bfffff] PMD -> [ffff880001200000-ffff880003dfffff] on node 0 Zone PFN ranges: DMA 0x00000010 -> 0x00001000 DMA32 0x00001000 -> 0x00100000 Normal 0x00100000 -> 0x00100000 Movable zone start PFN for each node early_node_map[2] active PFN ranges 0: 0x00000010 -> 0x0000009f 0: 0x00000100 -> 0x000c7f90 On node 0 totalpages: 818975 DMA zone: 1916 pages, LIFO batch:0 DMA32 zone: 803849 pages, LIFO batch:31 ACPI: PM-Timer IO Port: 0x808 ACPI: Local APIC address 0xfee00000 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x82] disabled) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x83] disabled) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) IOAPIC[0]: apic_id 2, version 0, address 0xfec00000, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) ACPI: IRQ0 used by override. ACPI: IRQ2 used by override. ACPI: IRQ9 used by override. Setting APIC routing to flat Using ACPI (MADT) for SMP configuration information SMP: Allowing 4 CPUs, 2 hotplug CPUs PM: Registered nosave memory: 000000000009f000 - 00000000000a0000 PM: Registered nosave memory: 00000000000a0000 - 00000000000e4000 PM: Registered nosave memory: 00000000000e4000 - 0000000000100000 Allocating PCI resources starting at cc000000 (gap: c8000000:37b80000) PERCPU: Allocating 64928 bytes of per cpu data NR_CPUS: 64, nr_cpu_ids: 4, nr_node_ids 1 Built 1 zonelists in Node order, mobility grouping on. Total pages: 805765 Policy zone: DMA32 Kernel command line: ro root=/dev/VolGroup00/LogVol00 vga=791 Initializing CPU#0 PID hash table entries: 4096 (order: 12, 32768 bytes) TSC: PIT calibration confirmed by PMTIMER. TSC: using PMTIMER calibration value Detected 2424.936 MHz processor. Console: colour dummy device 80x25 console [tty0] enabled Checking aperture... No AGP bridge found Calgary: detecting Calgary via BIOS EBDA area Calgary: Unable to locate Rio Grande table in EBDA - bailing! Memory: 3218840k/3276352k available (2850k kernel code, 57060k reserved, 1581k data, 1268k init) CPA: page pool initialized 1 of 1 pages preallocated SLUB: Genslabs=13, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 Calibrating delay loop (skipped), value calculated using timer frequency.. 4849.87 BogoMIPS (lpj=2424936) Security Framework initialized SELinux: Initializing. SELinux: Starting in permissive mode Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) Mount-cache hash table entries: 256 Initializing cgroup subsys ns Initializing cgroup subsys cpuacct Initializing cgroup subsys devices CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 2048K CPU 0/0 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 CPU0: Thermal monitoring enabled (TM2) using mwait in idle threads. ACPI: Core revision 20080609 ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz stepping 02 Using local APIC timer interrupts. APIC timer calibration result 21651235 Detected 21.651 MHz APIC timer. Booting processor 1/1 ip 6000 Initializing CPU#1 Calibrating delay using timer specific routine.. 4849.88 BogoMIPS (lpj=2424941) CPU: L1 I cache: 32K, L1 D cache: 32K CPU: L2 cache: 2048K CPU 1/1 -> Node 0 CPU: Physical Processor ID: 0 CPU: Processor Core ID: 1 CPU1: Thermal monitoring enabled (TM2) x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106 CPU1: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz stepping 02 checking TSC synchronization [CPU#0 -> CPU#1]: passed. Brought up 2 CPUs Total of 2 processors activated (9699.75 BogoMIPS). sizeof(vma)=176 bytes sizeof(page)=56 bytes sizeof(inode)=560 bytes sizeof(dentry)=208 bytes sizeof(ext3inode)=760 bytes sizeof(buffer_head)=104 bytes sizeof(skbuff)=232 bytes sizeof(task_struct)=5856 bytes CPU0 attaching sched-domain: domain 0: span 0-1 level MC groups: 0 1 domain 1: span 0-1 level NODE groups: 0-1 CPU1 attaching sched-domain: domain 0: span 0-1 level MC groups: 1 0 domain 1: span 0-1 level NODE groups: 0-1 net_namespace: 1552 bytes Booting paravirtualized kernel on bare hardware Time: 14:57:21 Date: 02/18/09 NET: Registered protocol family 16 No dock devices found. ACPI: bus type pci registered PCI: Found Intel Corporation 945G/GZ/P/PL Express Memory Controller Hub without MMCONFIG support. PCI: Using configuration type 1 for base access ACPI: EC: Look up EC in DSDT ACPI: Interpreter enabled ACPI: (supports S0 S1 S3 S4 S5) ACPI: Using IOAPIC for interrupt routing ACPI: PCI Root Bridge [PCI0] (0000:00) pci 0000:00:01.0: PME# supported from D0 D3hot D3cold pci 0000:00:01.0: PME# disabled PCI: 0000:00:1b.0 reg 10 64bit mmio: [cfcf8000, cfcfbfff] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold pci 0000:00:1b.0: PME# disabled pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold pci 0000:00:1c.0: PME# disabled pci 0000:00:1c.3: PME# supported from D0 D3hot D3cold pci 0000:00:1c.3: PME# disabled PCI: 0000:00:1d.0 reg 20 io port: [7000, 701f] PCI: 0000:00:1d.1 reg 20 io port: [7400, 741f] PCI: 0000:00:1d.2 reg 20 io port: [7800, 781f] PCI: 0000:00:1d.3 reg 20 io port: [8000, 801f] PCI: 0000:00:1d.7 reg 10 32bit mmio: [cfcff800, cfcffbff] pci 0000:00:1d.7: PME# supported from D0 D3hot D3cold pci 0000:00:1d.7: PME# disabled pci 0000:00:1f.0: Force enabled HPET at 0xfed00000 pci 0000:00:1f.0: quirk: region 0800-087f claimed by ICH6 ACPI/GPIO/TCO pci 0000:00:1f.0: quirk: region 0480-04bf claimed by ICH6 GPIO PCI: 0000:00:1f.1 reg 10 io port: [0, 7] PCI: 0000:00:1f.1 reg 14 io port: [0, 3] PCI: 0000:00:1f.1 reg 18 io port: [0, 7] PCI: 0000:00:1f.1 reg 1c io port: [0, 3] PCI: 0000:00:1f.1 reg 20 io port: [ffa0, ffaf] PCI: 0000:00:1f.2 reg 10 io port: [9800, 9807] PCI: 0000:00:1f.2 reg 14 io port: [9400, 9403] PCI: 0000:00:1f.2 reg 18 io port: [9000, 9007] PCI: 0000:00:1f.2 reg 1c io port: [8800, 8803] PCI: 0000:00:1f.2 reg 20 io port: [8400, 840f] PCI: 0000:00:1f.2 reg 24 32bit mmio: [cfcffc00, cfcfffff] pci 0000:00:1f.2: PME# supported from D3hot pci 0000:00:1f.2: PME# disabled PCI: 0000:00:1f.3 reg 20 io port: [400, 41f] PCI: 0000:04:00.0 reg 10 64bit mmio: [d0000000, dfffffff] PCI: 0000:04:00.0 reg 18 64bit mmio: [cffe0000, cffeffff] PCI: 0000:04:00.0 reg 20 io port: [e000, e0ff] PCI: 0000:04:00.0 reg 30 32bit mmio: [cffc0000, cffdffff] pci 0000:04:00.0: supports D1 pci 0000:04:00.0: supports D2 PCI: 0000:04:00.1 reg 10 64bit mmio: [cfff0000, cfffffff] pci 0000:04:00.1: supports D1 pci 0000:04:00.1: supports D2 Pre-1.1 PCIe device detected, disable ASPM for 0000:00:01.0. It can be enabled forcedly with 'pcie_aspm=force' PCI: bridge 0000:00:01.0 io port: [e000, efff] PCI: bridge 0000:00:01.0 32bit mmio: [cff00000, cfffffff] PCI: bridge 0000:00:01.0 64bit mmio pref: [d0000000, dfffffff] PCI: bridge 0000:00:1c.0 io port: [d000, dfff] PCI: 0000:02:00.0 reg 10 64bit mmio: [cfefc000, cfefffff] PCI: 0000:02:00.0 reg 18 io port: [c800, c8ff] PCI: 0000:02:00.0 reg 30 32bit mmio: [cfec0000, cfedffff] pci 0000:02:00.0: supports D1 pci 0000:02:00.0: supports D2 pci 0000:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold pci 0000:02:00.0: PME# disabled Pre-1.1 PCIe device detected, disable ASPM for 0000:00:1c.3. It can be enabled forcedly with 'pcie_aspm=force' PCI: bridge 0000:00:1c.3 io port: [c000, cfff] PCI: bridge 0000:00:1c.3 32bit mmio: [cfe00000, cfefffff] PCI: 0000:01:03.0 reg 10 io port: [b800, b807] PCI: 0000:01:03.0 reg 14 io port: [b400, b403] PCI: 0000:01:03.0 reg 18 io port: [b000, b007] PCI: 0000:01:03.0 reg 1c io port: [a800, a803] PCI: 0000:01:03.0 reg 20 io port: [a400, a40f] PCI: 0000:01:03.0 reg 30 32bit mmio: [cfde0000, cfdfffff] pci 0000:00:1e.0: transparent bridge PCI: bridge 0000:00:1e.0 io port: [a000, bfff] PCI: bridge 0000:00:1e.0 32bit mmio: [cfd00000, cfdfffff] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P3._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKB] (IRQs 3 *4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 *7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 *11 12 14 15) ACPI: PCI Interrupt Link [LNKE] (IRQs *3 4 5 6 7 10 11 12 14 15) ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, disabled. ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 *5 6 7 10 11 12 14 15) ACPI Warning (tbutils-0217): Incorrect checksum in table [OEMB] - 72, should be 49 [20080609] Linux Plug and Play Support v0.97 (c) Adam Belay pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 14 devices ACPI: ACPI bus type pnp unregistered usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default PCI-GART: No AMD northbridge found. hpet clockevent registered hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 hpet0: 3 64-bit timers, 14318180 Hz tracer: 1286 pages allocated for 65536 entries of 80 bytes actual entries 65586 ACPI: RTC can wake from S4 system 00:01: iomem range 0xfed13000-0xfed19fff has been reserved system 00:07: ioport range 0x290-0x297 has been reserved system 00:08: ioport range 0x4d0-0x4d1 has been reserved system 00:08: ioport range 0x800-0x87f has been reserved system 00:08: ioport range 0x480-0x4bf has been reserved system 00:08: ioport range 0x900-0x91f has been reserved system 00:08: iomem range 0xfed1c000-0xfed1ffff has been reserved system 00:08: iomem range 0xfed20000-0xfed8ffff has been reserved system 00:08: iomem range 0xffb00000-0xffbfffff could not be reserved system 00:08: iomem range 0xfff00000-0xffffffff could not be reserved system 00:09: iomem range 0xfec00000-0xfec00fff has been reserved system 00:09: iomem range 0xfee00000-0xfee00fff has been reserved system 00:0c: iomem range 0xf0000000-0xf3ffffff has been reserved system 00:0d: iomem range 0x0-0x9ffff could not be reserved system 00:0d: iomem range 0xc0000-0xdffff has been reserved system 00:0d: iomem range 0xe0000-0xfffff could not be reserved system 00:0d: iomem range 0x100000-0xc7ffffff could not be reserved pci 0000:00:01.0: PCI bridge, secondary bus 0000:04 pci 0000:00:01.0: IO window: 0xe000-0xefff pci 0000:00:01.0: MEM window: 0xcff00000-0xcfffffff pci 0000:00:01.0: PREFETCH window: 0x000000d0000000-0x000000dfffffff pci 0000:00:1c.0: PCI bridge, secondary bus 0000:03 pci 0000:00:1c.0: IO window: 0xd000-0xdfff pci 0000:00:1c.0: MEM window: disabled pci 0000:00:1c.0: PREFETCH window: disabled pci 0000:00:1c.3: PCI bridge, secondary bus 0000:02 pci 0000:00:1c.3: IO window: 0xc000-0xcfff pci 0000:00:1c.3: MEM window: 0xcfe00000-0xcfefffff pci 0000:00:1c.3: PREFETCH window: disabled pci 0000:00:1e.0: PCI bridge, secondary bus 0000:01 pci 0000:00:1e.0: IO window: 0xa000-0xbfff pci 0000:00:1e.0: MEM window: 0xcfd00000-0xcfdfffff pci 0000:00:1e.0: PREFETCH window: 0x000000cc000000-0x000000cc0fffff pci 0000:00:01.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 pci 0000:00:01.0: setting latency timer to 64 pci 0000:00:1c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 pci 0000:00:1c.0: setting latency timer to 64 pci 0000:00:1c.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19 pci 0000:00:1c.3: setting latency timer to 64 pci 0000:00:1e.0: setting latency timer to 64 bus: 00 index 0 io port: [0, ffff] bus: 00 index 1 mmio: [0, ffffffffffffffff] bus: 04 index 0 io port: [e000, efff] bus: 04 index 1 mmio: [cff00000, cfffffff] bus: 04 index 2 mmio: [d0000000, dfffffff] bus: 04 index 3 mmio: [0, 0] bus: 03 index 0 io port: [d000, dfff] bus: 03 index 1 mmio: [0, 0] bus: 03 index 2 mmio: [0, 0] bus: 03 index 3 mmio: [0, 0] bus: 02 index 0 io port: [c000, cfff] bus: 02 index 1 mmio: [cfe00000, cfefffff] bus: 02 index 2 mmio: [0, 0] bus: 02 index 3 mmio: [0, 0] bus: 01 index 0 io port: [a000, bfff] bus: 01 index 1 mmio: [cfd00000, cfdfffff] bus: 01 index 2 mmio: [cc000000, cc0fffff] bus: 01 index 3 io port: [0, ffff] bus: 01 index 4 mmio: [0, ffffffffffffffff] NET: Registered protocol family 2 IP route cache hash table entries: 131072 (order: 8, 1048576 bytes) TCP established hash table entries: 524288 (order: 11, 8388608 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 524288 bind 65536) TCP reno registered NET: Registered protocol family 1 checking if image is initramfs... it is Freeing initrd memory: 3834k freed audit: initializing netlink socket (disabled) type=2000 audit(1234969041.423:1): initialized HugeTLB registered 2 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.1 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) msgmni has been set to 6294 SELinux: Registering netfilter hooks Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) pci 0000:04:00.0: Boot video device pcieport-driver 0000:00:01.0: setting latency timer to 64 pcieport-driver 0000:00:01.0: found MSI capability pci_express 0000:00:01.0:pcie00: allocate port service pcieport-driver 0000:00:1c.0: setting latency timer to 64 pcieport-driver 0000:00:1c.0: found MSI capability pci_express 0000:00:1c.0:pcie00: allocate port service pci_express 0000:00:1c.0:pcie02: allocate port service pcieport-driver 0000:00:1c.3: setting latency timer to 64 pcieport-driver 0000:00:1c.3: found MSI capability pci_express 0000:00:1c.3:pcie00: allocate port service pci_hotplug: PCI Hot Plug PCI Core version: 0.5 acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 vesafb: framebuffer at 0xd0000000, mapped to 0xffffc20001080000, using 3072k, total 16384k vesafb: mode is 1024x768x16, linelength=2048, pages=9 vesafb: scrolling: redraw vesafb: Truecolor: size=0:5:6:5, shift=0:11:5:0 Console: switching to colour frame buffer device 128x48 fb0: VESA VGA frame buffer device input: Power Button (FF) as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 ACPI: Power Button (FF) [PWRF] input: Power Button (CM) as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input1 ACPI: Power Button (CM) [PWRB] ACPI: SSDT C7F9E0B0, 01C6 (r1 AMI CPU1PM 1 INTL 20051117) processor ACPI0007:00: registered as cooling_device0 ACPI: Processor [CPU1] (supports 8 throttling states) ACPI: SSDT C7F9E280, 013A (r1 AMI CPU2PM 1 INTL 20051117) processor ACPI0007:01: registered as cooling_device1 ACPI: Processor [CPU2] (supports 8 throttling states) Non-volatile memory driver v1.2 Linux agpgart interface v0.103 Serial: 8250/16550 driver4 ports, IRQ sharing enabled brd: module loaded input: Macintosh mouse button emulation as /devices/virtual/input/input2 PNP: No PS/2 controller found. Probing ports directly. serio: i8042 KBD port at 0x60,0x64 irq 1 serio: i8042 AUX port at 0x60,0x64 irq 12 mice: PS/2 mouse device common for all mice rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0 rtc0: alarms up to one month, hpet irqs cpuidle: using governor ladder cpuidle: using governor menu usbcore: registered new interface driver hiddev usbcore: registered new interface driver usbhid usbhid: v2.6:USB HID core driver TCP cubic registered Initializing XFRM netlink socket NET: Registered protocol family 17 registered taskstats version 1 Magic number: 13:254:991 Freeing unused kernel memory: 1268k freed Write protecting the kernel read-only data: 4060k Switched to high resolution mode on CPU 1 Switched to high resolution mode on CPU 0 ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20 ehci_hcd 0000:00:1d.7: setting latency timer to 64 ehci_hcd 0000:00:1d.7: EHCI Host Controller ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1 ehci_hcd 0000:00:1d.7: debug port 1 ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported ehci_hcd 0000:00:1d.7: irq 20, io mem 0xcfcff800 ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004 usb usb1: configuration #1 chosen from 1 choice hub 1-0:1.0: USB hub found hub 1-0:1.0: 8 ports detected usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb1: Product: EHCI Host Controller usb usb1: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 ehci_hcd usb usb1: SerialNumber: 0000:00:1d.7 ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver USB Universal Host Controller Interface driver v3.0 uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 uhci_hcd 0000:00:1d.0: setting latency timer to 64 uhci_hcd 0000:00:1d.0: UHCI Host Controller uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2 uhci_hcd 0000:00:1d.0: irq 20, io base 0x00007000 usb usb2: configuration #1 chosen from 1 choice hub 2-0:1.0: USB hub found hub 2-0:1.0: 2 ports detected usb usb2: New USB device found, idVendor=1d6b, idProduct=0001 usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb2: Product: UHCI Host Controller usb usb2: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd usb usb2: SerialNumber: 0000:00:1d.0 uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17 uhci_hcd 0000:00:1d.1: setting latency timer to 64 uhci_hcd 0000:00:1d.1: UHCI Host Controller uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3 uhci_hcd 0000:00:1d.1: irq 17, io base 0x00007400 usb usb3: configuration #1 chosen from 1 choice hub 3-0:1.0: USB hub found hub 3-0:1.0: 2 ports detected usb usb3: New USB device found, idVendor=1d6b, idProduct=0001 usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb3: Product: UHCI Host Controller usb usb3: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd usb usb3: SerialNumber: 0000:00:1d.1 uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18 uhci_hcd 0000:00:1d.2: setting latency timer to 64 uhci_hcd 0000:00:1d.2: UHCI Host Controller uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4 uhci_hcd 0000:00:1d.2: irq 18, io base 0x00007800 usb usb4: configuration #1 chosen from 1 choice hub 4-0:1.0: USB hub found hub 4-0:1.0: 2 ports detected usb 3-2: new low speed USB device using uhci_hcd and address 2 usb usb4: New USB device found, idVendor=1d6b, idProduct=0001 usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb4: Product: UHCI Host Controller usb usb4: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd usb usb4: SerialNumber: 0000:00:1d.2 uhci_hcd 0000:00:1d.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19 uhci_hcd 0000:00:1d.3: setting latency timer to 64 uhci_hcd 0000:00:1d.3: UHCI Host Controller uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5 uhci_hcd 0000:00:1d.3: irq 19, io base 0x00008000 usb usb5: configuration #1 chosen from 1 choice hub 5-0:1.0: USB hub found hub 5-0:1.0: 2 ports detected usb 3-2: configuration #1 chosen from 1 choice input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.1/usb3/3-2/3-2:1.0/input/input3 usb usb5: New USB device found, idVendor=1d6b, idProduct=0001 usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1 usb usb5: Product: UHCI Host Controller usb usb5: Manufacturer: Linux 2.6.27.12-78.2.8.fc9.x86_64 uhci_hcd usb usb5: SerialNumber: 0000:00:1d.3 input,hidraw0: USB HID v1.10 Keyboard [Logitech USB Receiver] on usb-0000:00:1d.1-2 device-mapper: uevent: version 1.0.3 device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel input: Logitech USB Receiver as /devices/pci0000:00/0000:00:1d.1/usb3/3-2/3-2:1.1/input/input4 input,hidraw1: USB HID v1.10 Mouse [Logitech USB Receiver] on usb-0000:00:1d.1-2 usb 3-2: New USB device found, idVendor=046d, idProduct=c505 usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 usb 3-2: Product: USB Receiver usb 3-2: Manufacturer: Logitech SCSI subsystem initialized Driver 'sd' needs updating - please use bus_type methods libata version 3.00 loaded. pata_acpi 0000:00:1f.1: PCI INT A -> GSI 22 (level, low) -> IRQ 22 pata_acpi 0000:00:1f.1: setting latency timer to 64 pata_acpi 0000:00:1f.1: PCI INT A disabled pata_acpi 0000:00:1f.2: PCI INT B -> GSI 23 (level, low) -> IRQ 23 pata_acpi 0000:00:1f.2: setting latency timer to 64 pata_acpi 0000:00:1f.2: PCI INT B disabled ata_piix 0000:00:1f.1: version 2.12 ata_piix 0000:00:1f.1: PCI INT A -> GSI 22 (level, low) -> IRQ 22 ata_piix 0000:00:1f.1: setting latency timer to 64 scsi0 : ata_piix scsi1 : ata_piix ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14 ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15 ata1.00: ATA-7: WDC WD2500JB-00REA0, 20.00K20, max UDMA/100 ata1.00: 488397168 sectors, multi 16: LBA48 ata1.01: ATAPI: HL-DT-ST DVDRAM GSA-H42N, RL01, max UDMA/66 ata1.00: configured for UDMA/100 ata1.01: configured for UDMA/66 isa bounce pool size: 16 pages scsi 0:0:0:0: Direct-Access ATA WDC WD2500JB-00R 20.0 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sd 0:0:0:0: [sda] Attached SCSI disk scsi 0:0:1:0: CD-ROM HL-DT-ST DVDRAM GSA-H42N RL01 PQ: 0 ANSI: 5 ata_piix 0000:00:1f.2: PCI INT B -> GSI 23 (level, low) -> IRQ 23 ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ] ata_piix 0000:00:1f.2: setting latency timer to 64 scsi2 : ata_piix scsi3 : ata_piix ata3: SATA max UDMA/133 cmd 0x9800 ctl 0x9400 bmdma 0x8400 irq 23 ata4: SATA max UDMA/133 cmd 0x9000 ctl 0x8800 bmdma 0x8408 irq 23 ata4.01: ATA-7: ST3320620AS, 3.AAK, max UDMA/133 ata4.01: 625142448 sectors, multi 16: LBA48 NCQ (depth 0/32) ata4.01: configured for UDMA/133 scsi 3:0:1:0: Direct-Access ATA ST3320620AS 3.AA PQ: 0 ANSI: 5 sd 3:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 3:0:1:0: [sdb] Write Protect is off sd 3:0:1:0: [sdb] Mode Sense: 00 3a 00 00 sd 3:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 3:0:1:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) sd 3:0:1:0: [sdb] Write Protect is off sd 3:0:1:0: [sdb] Mode Sense: 00 3a 00 00 sd 3:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sdb2 sd 3:0:1:0: [sdb] Attached SCSI disk EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. type=1404 audit(1234969056.599:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 SELinux: 8192 avtab hash slots, 177506 rules. SELinux: 8192 avtab hash slots, 177506 rules. SELinux: 8 users, 12 roles, 2428 types, 118 bools, 1 sens, 1024 cats SELinux: 73 classes, 177506 rules SELinux: Completing initialization. SELinux: Setting up existing superblocks. SELinux: initialized (dev dm-1, type ext3), uses xattr SELinux: initialized (dev usbfs, type usbfs), uses genfs_contexts SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs SELinux: initialized (dev selinuxfs, type selinuxfs), uses genfs_contexts SELinux: initialized (dev mqueue, type mqueue), uses transition SIDs SELinux: initialized (dev hugetlbfs, type hugetlbfs), uses genfs_contexts SELinux: initialized (dev devpts, type devpts), uses transition SIDs SELinux: initialized (dev inotifyfs, type inotifyfs), uses genfs_contexts SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs SELinux: initialized (dev anon_inodefs, type anon_inodefs), uses genfs_contexts SELinux: initialized (dev pipefs, type pipefs), uses task SIDs SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts SELinux: initialized (dev sockfs, type sockfs), uses task SIDs SELinux: initialized (dev proc, type proc), uses genfs_contexts SELinux: initialized (dev bdev, type bdev), uses genfs_contexts SELinux: initialized (dev rootfs, type rootfs), uses genfs_contexts SELinux: initialized (dev sysfs, type sysfs), uses genfs_contexts type=1403 audit(1234969056.909:3): policy loaded auid=4294967295 ses=4294967295 sky2 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 sky2 0000:02:00.0: setting latency timer to 64 sky2 0000:02:00.0: v1.22 addr 0xcfefc000 irq 19 Yukon-2 EC rev 2 sky2 eth0: addr 00:18:f3:1a:33:c9 intel_rng: FWH not detected sd 0:0:0:0: Attached scsi generic sg0 type 0 scsi 0:0:1:0: Attached scsi generic sg1 type 5 sd 3:0:1:0: Attached scsi generic sg2 type 0 Driver 'sr' needs updating - please use bus_type methods sr0: scsi3-mmc drive: 48x/48x writer dvd-ram cd/rw xa/form2 cdda tray Uniform CD-ROM driver Revision: 3.20 sr 0:0:1:0: Attached scsi CD-ROM sr0 pata_it821x 0000:01:03.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20 pata_it821x: controller in pass through mode. pata_it821x 0000:01:03.0: setting latency timer to 64 scsi4 : pata_it821x scsi5 : pata_it821x ata5: PATA max UDMA/133 cmd 0xb800 ctl 0xb400 bmdma 0xa400 irq 20 ata6: PATA max UDMA/133 cmd 0xb000 ctl 0xa800 bmdma 0xa408 irq 20 iTCO_vendor_support: vendor-support=0 iTCO_wdt: Intel TCO WatchDog Timer Driver v1.03 (30-Apr-2008) iTCO_wdt: Found a ICH7 or ICH7R TCO device (Version=2, TCOBASE=0x0860) iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) gameport: NS558 PnP Gameport is pnp00:0a/gameport0, io 0x200, speed 826kHz input: PC Speaker as /devices/platform/pcspkr/input/input5 i801_smbus 0000:00:1f.3: PCI INT B -> GSI 23 (level, low) -> IRQ 23 ACPI: I/O resource 0000:00:1f.3 [0x400-0x41f] conflicts with ACPI region SMRG [0x400-0x40f] ACPI: Device needs an ACPI driver Floppy drive(s): fd0 is 1.44M FDC 0 is a post-1991 82077 HDA Intel 0000:00:1b.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19 HDA Intel 0000:00:1b.0: setting latency timer to 64 hda_codec: Unknown model for ALC883, trying auto-probe from BIOS... ALSA sound/pci/hda/hda_codec.c:3021: autoconfig: line_outs=4 (0x14/0x15/0x16/0x17/0x0) ALSA sound/pci/hda/hda_codec.c:3025: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) ALSA sound/pci/hda/hda_codec.c:3029: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0) ALSA sound/pci/hda/hda_codec.c:3030: mono: mono_out=0x0 ALSA sound/pci/hda/hda_codec.c:3038: inputs: mic=0x18, fmic=0x19, line=0x1a, fline=0x0, cd=0x0, aux=0x0 device-mapper: multipath: version 1.0.5 loaded EXT3 FS on dm-1, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on dm-2, internal journal EXT3-fs: mounted filesystem with ordered data mode. SELinux: initialized (dev dm-2, type ext3), uses xattr kjournald starting. Commit interval 5 seconds EXT3 FS on sda1, internal journal EXT3-fs: mounted filesystem with ordered data mode. SELinux: initialized (dev sda1, type ext3), uses xattr SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs Adding 2031608k swap on /dev/mapper/VolGroup00-LogVol01. Priority:-1 extents:1 across:2031608k SELinux: initialized (dev binfmt_misc, type binfmt_misc), uses genfs_contexts IA-32 Microcode Update Driver: v1.14a <tigran.co.uk> firmware: requesting intel-ucode/06-0f-02 firmware: requesting intel-ucode/06-0f-02 microcode: CPU0 updated from revision 0x51 to 0x5a, date = 09262007 microcode: CPU1 updated from revision 0x51 to 0x5a, date = 09262007 NET: Registered protocol family 10 lo: Disabled Privacy Extensions ip6_tables: (C) 2000-2006 Netfilter Core Team nf_conntrack version 0.5.0 (16384 buckets, 65536 max) CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Plase use nf_conntrack.acct=1 kernel paramater, acct=1 nf_conntrack module option or sysctl net.netfilter.nf_conntrack_acct=1 to enable it. ip_tables: (C) 2000-2006 Netfilter Core Team RPC: Registered udp transport module. RPC: Registered tcp transport module. SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts warning: `dbus-daemon' uses deprecated v2 capabilities in a way that may be insecure. sky2 eth0: enabling interface ADDRCONF(NETDEV_UP): eth0: link is not ready sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready vboxdrv: Trying to deactivate the NMI watchdog permanently... vboxdrv: Successfully done. vboxdrv: Found 2 processor cores. VBoxDrv: dbg - g_abExecMemory=ffffffffa038c180 vboxdrv: fAsync=0 offMin=0x2d1 offMax=0x1195 vboxdrv: TSC mode is 'synchronous', kernel timer mode is 'normal'. vboxdrv: Successfully loaded version 2.1.2 (interface 0x000a0009). VBoxNetFlt: dbg - g_abExecMemory=ffffffffa0526f60 eth0: no IPv6 routers present fuse init (API version 7.9) SELinux: initialized (dev fuse, type fuse), uses genfs_contexts SELinux: initialized (dev sdb1, type fuseblk), uses genfs_contexts SELinux: initialized (dev sdb2, type fuseblk), uses genfs_contexts This may not be very helpfull as the suspected dev has been removed If you require I can recreate the issue and submit the info? An ATA driver timeout ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata4.01: cmd c8/00:08:c7:d8:ba/00:00:00:00:00/f1 tag 0 dma 4096 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) is a VERY generic diagnostic message. It could mean anything, and should not be assumed associated with any particular bug. All that means is that a timeout occurred, for unknown reasons. @mlord - just some info you may find useful: I patched a vanilla stable branch (2.6.28.3) with only the patch that you posted on 1/14/2009: "sata_mv_fix_timeouts_on_Marvell_6081_ports_0..3" Current uptime is 24 days! I've hit this x4500 with very heavy disk and NFS I/O pretty consistently. My machine has the following components: SATA ---- 0b:01.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) Subsystem: Marvell Technology Group Ltd. Device 11ab Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 76 Region 0: Memory at fe000000 (64-bit, non-prefetchable) [size=1M] Region 2: I/O ports at dc00 [size=256] Capabilities: [40] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [60] PCI-X non-bridge device Command: DPERE- ERO- RBC=512 OST=4 Status: Dev=0b:01.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=512 DMOST=4 DMCRS=8 RSCEM- 266MHz- 533MHz- Kernel driver in use: sata_mv HDDs (48) --------- Device Model: HITACHI HUA7210SASUN1.0T 0830GPLE8E Serial Number: GTE002PAKPLE8E Firmware Version: GKAOA90A User Capacity: 1,000,204,886,016 bytes ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 1 Please let me know if you need more data. Is this patch is queued up for a merge the stable branch yet? Never mind on that last question, I see it was merged in 2.6.28.4. ;-) Cheers, Scott It's also now in the latest 2.6.27 kernels. Dunno if/when it will appear in a RedHat / Fedora kernel. That part is up to Jeff G., I think. Cheers (In reply to comment #71) > It's also now in the latest 2.6.27 kernels. > Dunno if/when it will appear in a RedHat / Fedora kernel. > That part is up to Jeff G., I think. > Cheers Hopefully soon! Getting the source and compiling my own driver and redoing initrd on every kernel update is getting a bit old... 2.6.27.12-170.2.5.fc10 just came out today, and nothing yet :( Mark, do you know if your patch could cause the filesystem to disappear? Since this patch (now running kernel 2.6.27.19-170.2.35.fc10.x86_64), I've had two major system crashes where the filesystem just vanishes (I'm guessing). The hardware itself is still active (ie - it isn't locking up), but the system becomes totally unresponsive to logins, http requests, etc., and nothing is logged, which is why I'm guessing that the filesystem is going offline... No, it would not cause that to happen. Cheers I thought not...I'll start looking at other things... I am having exactly the same problem with an XFS filesystem residing on a Samsung HD103UJ but with a different controller: 00:1f.2 IDE interface: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 4 port SATA IDE Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3116 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at f900 [size=8] Region 1: I/O ports at f800 [size=4] Region 2: I/O ports at f700 [size=8] Region 3: I/O ports at f600 [size=4] Region 4: I/O ports at f500 [size=16] Region 5: I/O ports at f400 [size=16] Capabilities: [70] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [b0] Vendor Specific Information <?> Kernel driver in use: ata_piix Kernel modules: ata_generic, pata_acpi 00:1f.5 IDE interface: Intel Corporation 82801I (ICH9 Family) 2 port SATA IDE Controller (rev 02) (prog-if 85 [Master SecO PriO]) Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3116 Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 19 Region 0: I/O ports at f200 [size=8] Region 1: I/O ports at f100 [size=4] Region 2: I/O ports at f000 [size=8] Region 3: I/O ports at ef00 [size=4] Region 4: I/O ports at ee00 [size=16] Region 5: I/O ports at ed00 [size=16] Capabilities: [70] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [b0] Vendor Specific Information <?> Kernel driver in use: ata_piix Kernel modules: ata_generic, pata_acpi /var/log/messages shows the error: Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1.00: cmd 35/00:00:bf:95:f7/00:04:62:00:00/e0 tag 0 dma 524288 out Apr 13 21:47:19 xpcsp35p2p131 kernel: res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout) Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1.00: status: { DRDY } Apr 13 21:47:19 xpcsp35p2p131 kernel: ata1: hard resetting link Apr 13 21:47:25 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0) Apr 13 21:47:29 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16) Apr 13 21:47:29 xpcsp35p2p131 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1.00: qc timeout (cmd 0xec) Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1.00: revalidation failed (errno=-5) Apr 13 21:47:34 xpcsp35p2p131 kernel: ata1: hard resetting link Apr 13 21:47:40 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0) Apr 13 21:47:44 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16) Apr 13 21:47:44 xpcsp35p2p131 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1.00: qc timeout (cmd 0xec) Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1.00: revalidation failed (errno=-5) Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1: limiting SATA link speed to 1.5 Gbps Apr 13 21:47:54 xpcsp35p2p131 kernel: ata1: hard resetting link Apr 13 21:48:00 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0) Apr 13 21:48:05 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16) Apr 13 21:48:05 xpcsp35p2p131 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: qc timeout (cmd 0xec) Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: revalidation failed (errno=-5) Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.00: disabled Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1.01: failed to set xfermode (err_mask=0x40) Apr 13 21:48:35 xpcsp35p2p131 kernel: ata1: hard resetting link Apr 13 21:48:36 xpcsp35p2p131 ntpd[2540]: kernel time sync status change 0001 Apr 13 21:48:40 xpcsp35p2p131 kernel: ata1: link is slow to respond, please be patient (ready=0) Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1: SRST failed (errno=-16) Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1.01: configured for UDMA/100 Apr 13 21:48:45 xpcsp35p2p131 kernel: ata1: EH complete Apr 13 21:48:45 xpcsp35p2p131 kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Apr 13 21:48:45 xpcsp35p2p131 kernel: end_request: I/O error, dev sda, sector 1660392895 Kernel is 2.6.27.21-170.2.56.fc10.x86_64 Would this controller need a similar patch? Regards, Gijsbert FYI, One of the workarounds I found on the internet was to insert a CD into the DVD-drive (see also https://bugs.launchpad.net/ubuntu/+bug/104581) and indeed this seems to work! Any ideas why? Regards, Gijsbert FYI: Patch is in intrepid-proposed: https://launchpad.net/ubuntu/intrepid/+source/linux/2.6.27-14.33 See my esteemed colleague's notes for enabling here: http://ubuntuforums.org/showthread.php?t=1145513 This problem is getting quite annoying. I am also getting it now on my cluster nodes with entirely different hardware and an XFS filesystem residing on a SSD disk: lspci -vv 00:1f.2 IDE interface: Intel Corporation 82801GB/GR/GH (ICH7 Family) SATA IDE Controller (rev 01) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: ASUSTeK Computer Inc. P5KPL-VM Motherboard Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin B routed to IRQ 19 Region 0: I/O ports at c080 [size=8] Region 1: I/O ports at c000 [size=4] Region 2: I/O ports at bc00 [size=8] Region 3: I/O ports at b880 [size=4] Region 4: I/O ports at b800 [size=16] Capabilities: [70] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: ata_piix Kernel modules: ata_generic, pata_acpi /var/log/messages: May 6 09:39:54 nodep141 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen May 6 09:39:54 nodep141 kernel: ata4.00: cmd c8/00:08:1f:cc:d6/00:00:00:00:00/e0 tag 0 dma 4096 in May 6 09:39:54 nodep141 kernel: res 40/00:02:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout) May 6 09:39:54 nodep141 kernel: ata4.00: status: { DRDY } May 6 09:39:54 nodep141 kernel: ata4: soft resetting link May 6 09:39:59 nodep141 kernel: ata4.00: qc timeout (cmd 0xec) May 6 09:39:59 nodep141 kernel: ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 6 09:39:59 nodep141 kernel: ata4.00: revalidation failed (errno=-5) May 6 09:40:04 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:40:09 nodep141 kernel: ata4: device not ready (errno=-16), forcing hardreset May 6 09:40:09 nodep141 kernel: ata4: soft resetting link May 6 09:40:14 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:40:19 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:40:19 nodep141 kernel: ata4: soft resetting link May 6 09:40:25 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:40:29 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:40:29 nodep141 kernel: ata4: soft resetting link May 6 09:40:35 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:41:04 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:41:04 nodep141 kernel: ata4: soft resetting link May 6 09:41:09 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:41:09 nodep141 kernel: ata4: reset failed, giving up May 6 09:41:09 nodep141 kernel: ata4.00: disabled May 6 09:41:09 nodep141 kernel: ata4.01: disabled May 6 09:41:14 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:41:20 nodep141 kernel: ata4: device not ready (errno=-16), forcing hardreset May 6 09:41:20 nodep141 kernel: ata4: soft resetting link May 6 09:41:25 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:41:30 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:41:30 nodep141 kernel: ata4: soft resetting link May 6 09:41:35 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:41:40 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:41:40 nodep141 kernel: ata4: soft resetting link May 6 09:41:45 nodep141 kernel: ata4: link is slow to respond, please be patient (ready=0) May 6 09:42:15 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:42:15 nodep141 kernel: ata4: soft resetting link May 6 09:42:20 nodep141 kernel: ata4: SRST failed (errno=-16) May 6 09:42:20 nodep141 kernel: ata4: reset failed, giving up May 6 09:42:20 nodep141 kernel: ata4: EH complete May 6 09:42:20 nodep141 kernel: sd 3:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK May 6 09:42:20 nodep141 kernel: end_request: I/O error, dev sdb, sector 14076959 uname -a: Linux nodep141 2.6.27.21-170.2.56.fc10.x86_64 #1 SMP Mon Mar 23 23:08:10 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Has the problem been fixed in this kernel version? Regards, Gijsbert I had the same kind of message: May 3 07:49:47 localhost kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen May 3 07:49:47 localhost kernel: ata1.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 May 3 07:49:47 localhost kernel: cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 May 3 07:49:47 localhost kernel: res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x3 (HSM violation) May 3 07:49:47 localhost kernel: ata1.00: status: { DRDY ERR } May 3 07:49:47 localhost kernel: ata1: soft resetting link May 3 07:49:47 localhost kernel: ata1.00: configured for UDMA/33 May 3 07:49:47 localhost kernel: ata1: EH complete and appearently, adding 'acpi=off noapic' to the kernel in /etc.grub.conf seems to have solved the problem for me. kernel /vmlinuz-2.6.27.21-170.2.56.fc10.i686 ro root=/dev/VolGroup00/LogVol00 rhgb quiet vga=792 acpi=off noapic source: http://forums.fedoraforum.org/showthread.php?t=213585 This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Great thread and thank you Mark for providing a patch! I'm running RHEL5 with kernel 2.6.18-128.1.14.el5 with 2 PCI-X cards containing the Marvell chipset and am currently experiencing the exact same symptoms. I found the sata_mv.c file and edited the line in question and rebooted. Unfortunately doing just that didn't solve the problem so I believe I missed a critical step. Do I need to recompile the kernel itself or anything else in order to take advantage of this patch/bug fix? Yes, I'm fairly new to Linux troubleshooting, so any advice with regards to implementing the fix is greatly appreciated as it doesn't seem to be fixed in the latest Red Hat update. Regards, Dave Following up on the comment from Bug Zapper I now notice that this thread applies to Fedora Core 9. I was experiencing this problem on Fedora Core 10, so could this bug be assigned to Fedore Core 10? Regards, Gijsbert (In reply to comment #83) > Following up on the comment from Bug Zapper I now notice that this thread > applies to Fedora Core 9. I was experiencing this problem on Fedora Core 10, so > could this bug be assigned to Fedore Core 10? The original bug was fixed in 2.6.27.15 . The bug is still in fedora 15. My system has: - Card: Conceptronic Serial ATA & IDE Combo Card. (pci card) - Chip: VIA Technologies, Inc. VT6421 IDE RAID Controller (rev 50). - O.S.: Fedora release 15 (Lovelock). - Kernel: 2.6.38.7-30.fc15.i686 (32 bits) I'm sure my sata disk drive is ok (I've tested it with other controller and no errors appear). So the problem is at the controller hardware, or at the controller driver. I bet it's at the controller driver. The error log is similar to the already posted ones: --------------- [ 1885.024110] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 1885.024128] ata5.00: failed command: READ DMA EXT [ 1885.024145] ata5.00: cmd 25/00:00:80:e7:1c/00:02:04:00:00/e0 tag 0 dma 262144 in [ 1885.024148] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1885.024155] ata5.00: status: { DRDY } [ 1885.024169] ata5: hard resetting link [ 1885.329091] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 1885.448805] ata5.00: configured for UDMA/133 [ 1885.448818] ata5.00: device reported invalid CHS sector 0 [ 1885.448840] ata5: EH complete [ 3123.040076] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 3123.040088] ata5.00: failed command: READ DMA [ 3123.040103] ata5.00: cmd c8/00:00:80:f6:72/00:00:00:00:00/e2 tag 0 dma 131072 in [ 3123.040107] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [ 3123.040113] ata5.00: status: { DRDY } [ 3123.040128] ata5: hard resetting link [ 3123.347077] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [ 3123.475194] ata5.00: configured for UDMA/133 [ 3123.475216] ata5.00: device reported invalid CHS sector 0 [ 3123.475261] ata5: EH complete --------------- As you see, the communication is frozen, so a hard reset is launched and the link is re-stablished. No data corruption is done, but the computer is frozen until the link is reset. Same as posted by other guys. Some news on this bug? Is it going to be fixed? Is there some trick to decently work until it's fixed? Thanks (P.S. Please reopen this bug, update bug product version to "fedora 15", and add the 32-bit version to the bug plattforms) FYI, I switched from Fedora to CentOS a couple of years ago because I needed GFS2 support on my cluster nodes, but got the same error frequently initially. However, the frequency has gone down with every kernel update over the years, and hardly ever occurs with the current CentOS kernel (2.6.18-238.9.1.el5), but still does now and then (say once every two month's on one of the cluster nodes). So you might give CentOS a try to see if that helps. Regards, Gijsbert (In reply to comment #85) > (P.S. Please reopen this bug, update bug product version to "fedora 15", and > add the 32-bit version to the bug plattforms) Please open a new bug against F15, since your errors are not the same as the ones reported here and there are 86 comments on this bug that we would have to wade through when working on it. (In reply to comment #87) > Please open a new bug against F15, since your errors are not the same as the > ones reported here and there are 86 comments on this bug that we would have to > wade through when working on it. Done. Bug 718475 |