Description of problem: After upgrading to RHEL 5 update 3 (CentOS 5.3), I started seeing SATA bus resets. Kernel version: kernel-2.6.18-128.el5 SATA card version: 03:01.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09) /var/log/messages: Apr 1 00:11:02 raid kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Apr 1 00:11:02 raid kernel: ata10.00: cmd ca/00:08:bf:04:96/00:00:00:00:00/e9 tag 0 dma 4096 out Apr 1 00:11:02 raid kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Apr 1 00:11:02 raid kernel: ata10.00: status: { DRDY } Apr 1 00:11:02 raid kernel: ata10: hard resetting link Apr 1 00:11:02 raid kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Apr 1 00:11:02 raid kernel: ata10.00: configured for UDMA/100 Apr 1 00:11:02 raid kernel: ata10: EH complete Apr 1 00:11:02 raid kernel: SCSI device sdi: 160836480 512-byte hdwr sectors (82348 MB) Apr 1 00:11:02 raid kernel: sdi: Write Protect is off Version-Release number of selected component (if applicable): kernel-2.6.18-128.el5 How reproducible: System is a file server - after observing behavior I dropped back to previous kernel (2.6.18-92.1.22.el5) and no longer saw the problem. Steps to Reproduce: 1. Upgrade system to RHEL5 update 3 2. Observe errors in /var/log/messages 3. Actual results: Expected results: Additional info:
Hi, I think this has been fixed upstream commit b0bccb18bc523d1d5060d25958f12438062829a9 Author: Mark Lord <liml> Date: Mon Jan 19 18:04:37 2009 -0500 sata_mv: fix 8-port timeouts on 508x/6081 chips Would you please test the kernel-2.6.18-138.el5.bz493451.1 test kernel? http://people.redhat.com/dmilburn/
The test kernel appears to have fixed the issue. After installing the kernel and rebooting, I ran a RAID resync on a 13-drive RAID6 volume connected to 2 of these cards and didn't see any timeouts. Will this patch make it into an update kernel soon(ish)? Thanks, -- jeremy
Jeremy, The patch is under review hopefully it will be commited soon, thanks for the quick feedback.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-140.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
Hi Don, I don't see a directory at http://people.redhat.com/dzickus/el5 for 140.el5. Am I just too quick or do you still need to transfer the bits over? Thanks, -- jeremy
Doh. Sorry. -140.el5 should be uploading right now.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html
I'm seeing similar with 5.4 and 2.6.18-164.10.1.el5 (bug 554872). Anyone else?
Here's a crude work around that helped to mask the problem: In /etc/cron.hourly/disable-write-cache: /sbin/hdparm -W 0 /dev/sda /sbin/hdparm -W 0 /dev/sdb /sbin/hdparm -W 0 /dev/sdc This disables hardware write caching on the drives, which are in this case part of a software RAID5 array. Resets will still happen, although much less frequently. When a reset does occur, write cache will be re-enabled, hence the cron.hourly script. Note there may be a performance penalty or may not be effective for certain (write heavy) work loads. YMMV. I tried applying Mark Lord's patch to a 2.6.18 kernel without success. In looking at the driver, I think it has changed significantly as the patch was designed for 2.6.28.