Bug 141349 - sata_sx4 driver shows errors to two known-good drives
sata_sx4 driver shows errors to two known-good drives
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Jeff Garzik
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-11-30 11:41 EST by Sean E. Millichamp
Modified: 2013-07-02 22:23 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-02-18 01:29:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Kernel messages relating to SATA drives (5.65 KB, text/plain)
2004-11-30 11:42 EST, Sean E. Millichamp
no flags Details

  None (edit)
Description Sean E. Millichamp 2004-11-30 11:41:30 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
Intel SE7501CW2 motherboard, one Xeon 2.4 GHz processor (HT capable),
1 GB DDR RAM

Promise FastTrak SATA150 SX4 controller with two Maxtor 7Y250M0 250 GB
SATA drives.

With the card removed from the system (or the drives removed from the
card) the system boots and works fine.  When the SATA card & drives
are installed the system is unable to access either drive and throws
many errors into the logs like these:

kernel: Buffer I/O error on device sdd, logical block 61279343
kernel: ata2: command timeout
kernel: ata2: status=0x51 { DriveReady SeekComplete Error }
kernel: SCSI error : <2 0 0 0> return code = 0x8000002
kernel: EOM ILI Current sdd: sense = 70 69
kernel: ASC=62 ASCQ=61
kernel: end_request: I/O error, dev sdd, sector 490234744

I will attach the full set of related messages.

The same behavior occurs regardless of whether I boot the UP or SMP
kernel.


Version-Release number of selected component (if applicable):
kernel-2.6.9-1.675_EL

How reproducible:
Always

Steps to Reproduce:
1. Install Promise Fasttrak SATA150 SX4 and SATA drives
2. Boot system
    

Actual Results:  The system boots slowly as it tries (and eventually
fails) to read the drives it detects.


Expected Results:  I should be able to access the drives.  It was my
understanding that this SATA controller was supported as of RHEL 3 U3.
 I know the drives are good and contain valid parition tables as when
I boot RHEL 3 U3 with a UP kernel they work just fine.

Additional info:

I see very similar errors and behavior with RHEL 3 U3 when booting the
SMP kernel.  I have reported this here:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=140352

The one biggest difference I've seen so far is that with RHEL 4 beta2
the errors occur regardless of which kernel I've booted.
Comment 1 Sean E. Millichamp 2004-11-30 11:42:33 EST
Created attachment 107633 [details]
Kernel messages relating to SATA drives
Comment 3 Jeff Garzik 2005-01-19 19:24:50 EST
Looks like a hardware problem of some sort, yes:

1) Have the cables been switched out (verified)?

2) Command timeout potentially means the platform isn't delivering
interrupts properly to this card.  Does "acpi=off" or "pci=biosirq"
solve the problem?

3) Check PCI bus, including possibly moving the card to another PCI
slot.  Lack of working PCI busmastering DMA could also present itself
as a command timeout.

Overall, this definitely sounds platform-related, if RHEL3 UP works.
Comment 4 Sean E. Millichamp 2005-01-20 16:51:02 EST
(In reply to comment #3)
> Looks like a hardware problem of some sort, yes:
> 
> 1) Have the cables been switched out (verified)?

I have swapped the 2 SATA cables in use with the only other ones I have: the
remaining 2 that shipped with the Promise card new in bag.  They are the exact
same type.  No change.  As this was our first SATA server experiment I have
nothing to try swapping the SATA pieces with beyond what came with the system.

> 2) Command timeout potentially means the platform isn't delivering
> interrupts properly to this card.  Does "acpi=off" or "pci=biosirq"
> solve the problem?

I have since wiped the RHEL 4 beta2 from the system and focused on getting RHEL
3 to function properly (to that end I have opened service request 454280 with RH
support).  I will reinstall RHEL 4beta2 (hopefully tomorrow) and try those
options.  FWIW, RH support had me try "noapic" with the RHEL 3 SMP kernel with
no success.

> 3) Check PCI bus, including possibly moving the card to another PCI
> slot.  Lack of working PCI busmastering DMA could also present itself
> as a command timeout.

The card shipped in the server in one of the higher speed (133 MHz, I think?)
bus slots and I have also tried it in the 33 MHz PCI slots.  Same behavior in both.

> Overall, this definitely sounds platform-related, if RHEL3 UP works.

Well, since posting that it seems that I wasn't quite correct.  I can access the
drives from RHEL3 UP but the performance seems well below what I would call
"working".  I seem to get about 12 MB/s to either drive.  Also, when I was
resigned to putting the server into operation with the UP kernel (before I
noticed the speed problem) it froze solid on me twice while I was trying to
rsync our data set from the old server to the SATA drives on the new server.  It
took a while before it crashed though.  The console was blanked when I went back
to look at it and it would not wake up (the keyboard lights wouldn't toggle
either) so it seemed to be solidly locked up.  After the first crash I saw how
slowly the md resync was going and that is when I noticed the performance issues.
Comment 5 Sean E. Millichamp 2005-01-27 13:59:03 EST
Sorry for the false alarm on the apparently SMP related crashes.... 
It turns out that the motherboard is defective in some subtle way that
only shows trouble without the Promise care installed in rare
circumstances.

I am still encountering the same slow (~12 MB/s) performance on a
different system with RHEL 3 but I have a RH support issue open on that.

I tried to install RHEL 4 beta 2 on this alternate system I am now
testing in to test the performance there, but I can't complete either
a CD or network install on that system (for reasons apparently
entirely unrelated to the Promise SATA card).
Comment 6 Jeff Garzik 2005-02-18 01:29:49 EST
The slow performance on SX4 is expected, due to the nature of the
sata_sx4 driver.

Note You need to log in before you can comment on or make changes to this bug.