Bug 492232 - (ATA, errors, mult-disks) 2.6.18-128.el5.3_x86_64 reports some ATA errors with multi-disks
2.6.18-128.el5.3_x86_64 reports some ATA errors with multi-disks
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
x86_64 Linux
low Severity high
: ---
: ---
Assigned To: David Milburn
Red Hat Kernel QE team
Depends On:
  Show dependency treegraph
Reported: 2009-03-25 21:39 EDT by Grace
Modified: 2010-05-20 10:53 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-03-30 13:05:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Grace 2009-03-25 21:39:08 EDT
Description of problem:

I have a server with 5 SATA disks and I installed RHEL5U3 (2.6.18-128_el5.3_x86_64) on the first disk. I have noticed that there are ATA errors on other four disks except the first one.

However, this issue is not noticed with RHEL5U2 (2.6.18-92_e15.2_x86_64). 

What's more, all the disks are working under AHCI mode.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Intall Redhat5.3(2.6.18-128_el5.3_x86_64) on multi-disks server 
2. reboot your server after installation
Actual results:
1. Non-boot disks working with RED light on
2. See the ATA errors

Expected results:
1. Non-boot disks working with RED light on
2. See the ATA errors

Additional info:

The detailed logs are listed below for your information.
> dmesg 
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000001
ata2.00: cmd c8/00:06:92:6d:7b/00:00:00:00:00/e0 tag 0 dma 3072 in
         res 51/04:06:92:6d:7b/00:00:00:00:00/e0 Emask 0x1 (device error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { ABRT }
ata2.00: configured for UDMA/133 (device error ignored)
ata2: EH complete

> lspci
00:1f.2 SATA controller: Intel Corporation ICH10 6 port SATA AHCI Controller
Comment 1 David Milburn 2009-03-26 14:08:43 EDT

The RHEL 5.3 ahci driver added support for enclosure management which 
manipulates the drive LEDs, we are actually working on BZ 488471 were drive
LEDs are reporting incorrect status for ich9r and ich10 when configured in
ahci mode (though we haven't seen any device errors in dmesg).

Would you please attach you full dmesg log after booting and the output of
"lspci -xxvvv"?

Also would you try disabling ahci_em_messages in your /etc/modprobe.conf and
rebuild your initrd? (Or, I can build you a test kernel).

options ahci ahci_em_message=0
Comment 2 Grace 2009-03-28 03:28:33 EDT
Thanks for your prompt response. Just as you said, the error status reports of LEDs also existed for my drives.

About the original issue I reported based on my several times of OS-reinstalling experiments, it seems that I have fixed it although I am not sure what's the cause of the fix.

My solution to bypass the reported issue as follows:
I pulled the 2nd SATA drive out and rebooted the OS, the problem was still there. Then I re-inserted the 2nd SATA drive back and re-installed the OS image, the problem went away. From my experience, the root cause should not be in the kernel which was my initial thought.

Thanks again for your time and help.
Comment 3 David Milburn 2009-03-30 13:05:09 EDT
Ok, thank you for the update.

Note You need to log in before you can comment on or make changes to this bug.