Bug 492232 (ATA, errors, mult-disks)

Summary: 2.6.18-128.el5.3_x86_64 reports some ATA errors with multi-disks
Product: Red Hat Enterprise Linux 5 Reporter: Grace <sysolve>
Component: kernelAssignee: David Milburn <dmilburn>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.3CC: dzickus, jane.lv
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-30 17:05:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Grace 2009-03-26 01:39:08 UTC
Description of problem:

I have a server with 5 SATA disks and I installed RHEL5U3 (2.6.18-128_el5.3_x86_64) on the first disk. I have noticed that there are ATA errors on other four disks except the first one.

However, this issue is not noticed with RHEL5U2 (2.6.18-92_e15.2_x86_64). 

What's more, all the disks are working under AHCI mode.

Version-Release number of selected component (if applicable):
2.6.18-128_el5.3_x86_64

How reproducible:


Steps to Reproduce:
1. Intall Redhat5.3(2.6.18-128_el5.3_x86_64) on multi-disks server 
2. reboot your server after installation
  
Actual results:
1. Non-boot disks working with RED light on
2. See the ATA errors

Expected results:
1. Non-boot disks working with RED light on
2. See the ATA errors

Additional info:

The detailed logs are listed below for your information.
> dmesg 
sdb: Write Protect is off
SCSI device sdb: drive cache: write back
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata2.00: irq_stat 0x40000001
ata2.00: cmd c8/00:06:92:6d:7b/00:00:00:00:00/e0 tag 0 dma 3072 in
         res 51/04:06:92:6d:7b/00:00:00:00:00/e0 Emask 0x1 (device error)
ata2.00: status: { DRDY ERR }
ata2.00: error: { ABRT }
ata2.00: configured for UDMA/133 (device error ignored)
ata2: EH complete

> lspci
00:1f.2 SATA controller: Intel Corporation ICH10 6 port SATA AHCI Controller

Comment 1 David Milburn 2009-03-26 18:08:43 UTC
Hi,

The RHEL 5.3 ahci driver added support for enclosure management which 
manipulates the drive LEDs, we are actually working on BZ 488471 were drive
LEDs are reporting incorrect status for ich9r and ich10 when configured in
ahci mode (though we haven't seen any device errors in dmesg).

Would you please attach you full dmesg log after booting and the output of
"lspci -xxvvv"?

Also would you try disabling ahci_em_messages in your /etc/modprobe.conf and
rebuild your initrd? (Or, I can build you a test kernel).

options ahci ahci_em_message=0

Comment 2 Grace 2009-03-28 07:28:33 UTC
Thanks for your prompt response. Just as you said, the error status reports of LEDs also existed for my drives.

About the original issue I reported based on my several times of OS-reinstalling experiments, it seems that I have fixed it although I am not sure what's the cause of the fix.

My solution to bypass the reported issue as follows:
I pulled the 2nd SATA drive out and rebooted the OS, the problem was still there. Then I re-inserted the 2nd SATA drive back and re-installed the OS image, the problem went away. From my experience, the root cause should not be in the kernel which was my initial thought.

Thanks again for your time and help.

Comment 3 David Milburn 2009-03-30 17:05:09 UTC
Ok, thank you for the update.