From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.5) Gecko/20041111 Firefox/1.0 Description of problem: We just received a new server to replace an older machine. Intel SE7501CW2 motherboard, one Xeon 2.4 GHz processor HT-capable, 1 GB DDR RAM Adaptec 29160, 2 72GB SCSI drives Promise FastTrak SATA150 SX4 controller with two Maxtor 7Y250M0 250GB SATA drives. Initially, I tried to install RHEL 4 Beta 2 on this system and encountered the exact same behavior. As this system was going to need to run RHEL 3 Update N for quite some time yet I did all of my diagnosis with the system installed fresh using a minimal installion of RHEL 3 Update 3. First, anaconda would not complete the install with the SATA drives connected to the controller so I installed the system by removing the cables to both SATA drives and using the 2 SCSI drives. As soon as anaconda started loading the sata_sx4 driver one of two things would happen: the system would freeze solid or (more frequently) I'd see errors like the following: <3>ata2: command timeout <4>scsi2: Error on channel 0, id 0, lun 0, CDB: 0x28 00 00 00 00 00 00 00 08 00 <4>Current sd08:30: sns=70 3 <4>ASC=11 ASCQ= 4 <4>Raw sense data: 0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00 0x00 0x11 0x04 <4>I/O Error dev 08:30, sector 0 That error would occur multiple times with the only variance apparently in the device and sector numbers. Once I got RHEL installed I experimented with a variety of setups to try to determine what was wrong. 1) If one or both of the SATA drives were connected and I booted with the SMP kernel then I would encouter problems & errors like above once the sata_sx4 driver loaded. 2) If I booted with neither SATA drive attached using the SMP kernel then it finished booting normally. 3) Booting with none, one, or both SATA drives attached and the UP kernel always resulted in a successful boot, being able to partition the drives, create filesystems, read/write data to them, etc. 4) Disabling HyperThread support in the motherboard's BIOS and booting with the SMP kernel resulted in the exact same behavior as if HyperThreading was enabled in the BIOS (the SATA errors occurred). I presume the CD installer boots a SMP kernel which is why I could not finish the install with the SATA drives attached. Is there a way to run the installer and instruct anaconda not to load a specific driver (ie: don't load sata_sx4)? With both drives attached the "command timeout" errors don't seem to start appearing until "sdd:" appears (I believe when it is looking for the partition table). However, when it searches for the sdc partition tables it reports that there is no valid partition table (which is incorrect) so it seems that the driver is miscommunicating with the drives before the "command timeout" series of errors are displayed. Version-Release number of selected component (if applicable): kernel-2.4.21-20.EL / kernel-smp-2.4.21-20.EL How reproducible: Always Steps to Reproduce: 1. Attach one or more SATA drives to Promise FastTrak SX4 controller 2. Boot SMP kernel 3. Watch sata_sx4 driver load and SATA drive errors keep the machine from completing a boot. Actual Results: Here is how the controller initialization looks under a successful UP kernel boot with both drives attached: kernel: PCI: Found IRQ 10 for device 03:01.0 kernel: PCI: Sharing IRQ 10 with 02:02.0 kernel: Local DIMM ECC Enabled kernel: ata1: SATA max UDMA/133 cmd 0xF8934200 ctl 0xF8934238 bmdma 0x0 irq 10 kernel: ata2: SATA max UDMA/133 cmd 0xF8934280 ctl 0xF89342B8 bmdma 0x0 irq 10 kernel: ata3: SATA max UDMA/133 cmd 0xF8934300 ctl 0xF8934338 bmdma 0x0 irq 10 kernel: ata4: SATA max UDMA/133 cmd 0xF8934380 ctl 0xF89343B8 bmdma 0x0 irq 10 kernel: ata1: dev 0 ATA, max UDMA/133, 490234752 sectors: lba48 kernel: ata1: dev 0 configured for UDMA/133 kernel: ata2: dev 0 ATA, max UDMA/133, 490234752 sectors: lba48 kernel: ata2: dev 0 configured for UDMA/133 kernel: ATA: abnormal status 0x7F on port 0xF893431C kernel: ATA: abnormal status 0x7F on port 0xF893439C kernel: scsi1 : sata_sx4 kernel: scsi2 : sata_sx4 kernel: scsi3 : sata_sx4 kernel: scsi4 : sata_sx4 kernel: Vendor: ATA Model: Maxtor 7Y250M0 Rev: YAR5 kernel: Type: Direct-Access ANSI SCSI revision: 05 kernel: Attached scsi disk sdc at scsi1, channel 0, id 0, lun 0 kernel: SCSI device sdc: 490234752 512-byte hdwr sectors (251000 MB) kernel: sdc: sdc1 kernel: Vendor: ATA Model: Maxtor 7Y250M0 Rev: YAR5 kernel: Type: Direct-Access ANSI SCSI revision: 05 kernel: Attached scsi disk sdd at scsi2, channel 0, id 0, lun 0 kernel: SCSI device sdd: 490234752 512-byte hdwr sectors (251000 MB) kernel: sdd: sdd1 Additional info: I updated the SX4's BIOS to the latest version available at the Promise website hoping that would solve the problem. It is now running 2.00.0.24 but the same problems occured with the older version that shipped with the card (2.0.00.21). The card has a 64 MB ECC DIMM (as required) installed in it and it passed the memory test utility available at Promise's site. I also updated the BIOS on the Intel motherboard to the latest version on Intel's website in similar hopes that it would resolve the problems (all prior to determining it worked fine with the UP kernels). As I said at the beginning of the bug report, it also seemed to exhibit the exact same behavior with RHEL 4 Beta 2. Now that I have isolated some workarounds I am going to try installing that Beta and running it through similar tests. I'll open another Bugzilla report against RHEL 4 Beta 2 containing whatever I come up with.
False alarm on the crashes, after further testing it turns out to have almost certainly been due to a defective motherboard. In case anyone is interested, I am seeing problems relating to performance at this time and have support request open with Red Hat. I connected the drives to an on-board ICH5 controller and saw around 60 MB/s measured with bonnie++ but I have only been getting aroun 12-13 MB/s when connected via the Promise SATA150 SX4. I feel the crashing issue is resolved though.