Bug 140352 - sata_sx4 driver does not work with SMP kernel
Summary: sata_sx4 driver does not work with SMP kernel
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jeff Garzik
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-11-22 15:46 UTC by Sean E. Millichamp
Modified: 2013-07-03 02:22 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-02-18 06:31:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Sean E. Millichamp 2004-11-22 15:46:41 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5)
Gecko/20041111 Firefox/1.0

Description of problem:
We just received a new server to replace an older machine.

Intel SE7501CW2 motherboard, one Xeon 2.4 GHz processor HT-capable, 1
GB DDR RAM
Adaptec 29160, 2 72GB SCSI drives
Promise FastTrak SATA150 SX4 controller with two Maxtor 7Y250M0 250GB
SATA drives.

Initially, I tried to install RHEL 4 Beta 2 on this system and
encountered the exact same behavior.  As this system was going to need
to run RHEL 3 Update N for quite some time yet I did all of my
diagnosis with the system installed fresh using a minimal installion
of RHEL 3 Update 3.

First, anaconda would not complete the install with the SATA drives
connected to the controller so I installed the system by removing the
cables to both SATA drives and using the 2 SCSI drives.

As soon as anaconda started loading the sata_sx4 driver one of two
things would happen: the system would freeze solid or (more
frequently) I'd see errors like the following:
<3>ata2: command timeout
<4>scsi2: Error on channel 0, id 0, lun 0, CDB: 0x28 00 00 00 00 00 00
00 08 00
<4>Current sd08:30: sns=70 3
<4>ASC=11 ASCQ= 4
<4>Raw sense data: 0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00
0x00 0x00 0x11 0x04
<4>I/O Error dev 08:30, sector 0

That error would occur multiple times with the only variance
apparently in the device and sector numbers.

Once I got RHEL installed I experimented with a variety of setups to
try to determine what was wrong.

1) If one or both of the SATA drives were connected and I booted with
the SMP kernel then I would encouter problems & errors like above once
the sata_sx4 driver loaded.

2) If I booted with neither SATA drive attached using the SMP kernel
then it finished booting normally.

3) Booting with none, one, or both SATA drives attached and the UP
kernel always resulted in a successful boot, being able to partition
the drives, create filesystems, read/write data to them, etc.

4) Disabling HyperThread support in the motherboard's BIOS and booting
with the SMP kernel resulted in the exact same behavior as if
HyperThreading was enabled in the BIOS (the SATA errors occurred).

I presume the CD installer boots a SMP kernel which is why I could not
finish the install with the SATA drives attached.  Is there a way to
run the installer and instruct anaconda not to load a specific driver
(ie: don't load sata_sx4)?

With both drives attached the "command timeout" errors don't seem to
start appearing until "sdd:" appears (I believe when it is looking for
the partition table).  However, when it searches for the sdc partition
tables it reports that there is no valid partition table (which is
incorrect) so it seems that the driver is miscommunicating with the
drives before the "command timeout" series of errors are displayed.


Version-Release number of selected component (if applicable):
kernel-2.4.21-20.EL / kernel-smp-2.4.21-20.EL

How reproducible:
Always

Steps to Reproduce:
1. Attach one or more SATA drives to Promise FastTrak SX4 controller
2. Boot SMP kernel
3. Watch sata_sx4 driver load and SATA drive errors keep the machine
from completing a boot.


Actual Results:  Here is how the controller initialization looks under
a successful UP kernel boot with both drives attached:

kernel: PCI: Found IRQ 10 for device 03:01.0
kernel: PCI: Sharing IRQ 10 with 02:02.0
kernel: Local DIMM ECC Enabled
kernel: ata1: SATA max UDMA/133 cmd 0xF8934200 ctl 0xF8934238 bmdma
0x0 irq 10
kernel: ata2: SATA max UDMA/133 cmd 0xF8934280 ctl 0xF89342B8 bmdma
0x0 irq 10
kernel: ata3: SATA max UDMA/133 cmd 0xF8934300 ctl 0xF8934338 bmdma
0x0 irq 10
kernel: ata4: SATA max UDMA/133 cmd 0xF8934380 ctl 0xF89343B8 bmdma
0x0 irq 10
kernel: ata1: dev 0 ATA, max UDMA/133, 490234752 sectors: lba48
kernel: ata1: dev 0 configured for UDMA/133
kernel: ata2: dev 0 ATA, max UDMA/133, 490234752 sectors: lba48
kernel: ata2: dev 0 configured for UDMA/133
kernel: ATA: abnormal status 0x7F on port 0xF893431C
kernel: ATA: abnormal status 0x7F on port 0xF893439C
kernel: scsi1 : sata_sx4
kernel: scsi2 : sata_sx4
kernel: scsi3 : sata_sx4
kernel: scsi4 : sata_sx4
kernel:   Vendor: ATA       Model: Maxtor 7Y250M0    Rev: YAR5
kernel:   Type:   Direct-Access                      ANSI SCSI
revision: 05
kernel: Attached scsi disk sdc at scsi1, channel 0, id 0, lun 0
kernel: SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB)
kernel:  sdc: sdc1
kernel:   Vendor: ATA       Model: Maxtor 7Y250M0    Rev: YAR5
kernel:   Type:   Direct-Access                      ANSI SCSI
revision: 05
kernel: Attached scsi disk sdd at scsi2, channel 0, id 0, lun 0
kernel: SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB)
kernel:  sdd: sdd1


Additional info:

I updated the SX4's BIOS to the latest version available at the
Promise website hoping that would solve the problem.  It is now
running 2.00.0.24 but the same problems occured with the older version
that shipped with the card (2.0.00.21).  The card has a 64 MB ECC DIMM
(as required) installed in it and it passed the memory test utility
available at Promise's site.

I also updated the BIOS on the Intel motherboard to the latest version
on Intel's website in similar hopes that it would resolve the problems
(all prior to determining it worked fine with the UP kernels).

As I said at the beginning of the bug report, it also seemed to
exhibit the exact same behavior with RHEL 4 Beta 2.  Now that I have
isolated some workarounds I am going to try installing that Beta and
running it through similar tests.  I'll open another Bugzilla report
against RHEL 4 Beta 2 containing whatever I come up with.

Comment 1 Sean E. Millichamp 2005-01-27 19:13:57 UTC
False alarm on the crashes, after further testing it turns out to have
almost certainly been due to a defective motherboard.

In case anyone is interested, I am seeing problems relating to
performance at this time and have support request open with Red Hat.

I connected the drives to an on-board ICH5 controller and saw around
60 MB/s measured with bonnie++ but I have only been getting aroun
12-13 MB/s when connected via the Promise SATA150 SX4.

I feel the crashing issue is resolved though.



Note You need to log in before you can comment on or make changes to this bug.