Bug 60053

Summary:	(440GX)SCSI timeout with Symbios sym53c1010 controller on Red Hat 7.2 install
Product:	[Retired] Red Hat Linux	Reporter:	Jason Corley <jason.corley>
Component:	kernel	Assignee:	Arjan van de Ven <arjanv>
Status:	CLOSED CURRENTRELEASE	QA Contact:	Brian Brock <bbrock>
Severity:	high	Docs Contact:
Priority:	medium
Version:	7.2	CC:	caruso, kambiz
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2003-06-08 01:31:45 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jason Corley 2002-02-19 15:19:54 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.9-21 i686)

Description of problem:
I have submitted this bug at the request of arjanv.

I have a number of Dual Processor VA Linux 1220 servers that all have the same
sym53c1010 SCSI controller, with two 18 GB disks.  These machines were running
VA's customized version of Red Hat 6.2, which used a 2.2.18pre11-va2.1smp
kernel.  I am trying to upgrade these boxes to Red Hat 7.2 and the installer
fails when probing the sym53c1010-33 controller.  I have gotten the install to
work on a single processor version of the VA 1220.  I've tried this install on
two servers so far that worked perfectly with VA's modified Red Hat 6.2, and got
the same error on both machines.  The controller is set to auto terminate, the
disks are attached to controller 0, and the timeouts appear to occur while
probing controller 1.

sym53c1010-33-0-<1,*> FAST-80 WIDE SCSI 160.0 MB/s (12.5 ns, offset 62)
scsi : aborting command due to timeout : pid 0, scsi 0, channel 0, id 1, lun 0
Test Unit Ready 00 00 00 00 00
sym53c8xx_abort : pid=0 serial_number=68 serial_number_at_timeout=68

The serial_number and serial_number_at_timeout continue to increment if I let
the install spin.

VA's kernel used sym53c8xx-1.7.1-20000726 according to dmesg.
Red Hat 7.2 uses sym53c8xx-1.7.3c-20010512 according to the install output.

Thanks,
Jason

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.  Get a VA Linux 1220 with two processors and  two SCSI disks attached to the
sym53c1010-33 controller.
2.  Try to upgrade to Red Hat 7.2.

Actual Results:  The timeouts outlined above.

Additional info:

Comment 1 Arjan van de Ven 2002-02-19 15:38:43 UTC

Can you try to type
"linux apic"
on the syslinux prompt ? (the very very first screen)

If that helps, you MUST install the smp kernel to make later boots work.
(If this works, this is the generic Intel 440GX Bios bug which intel refuses to
fix for 9 months now)

Comment 2 Jason Corley 2002-02-19 16:27:11 UTC

Ok, I have been sick the last couple of days, and this bug report is a good
example of me not thinking clearly.  Here are some corrections or details I left
out:

- The timeouts do not occur on install, they occur on the first boot after
install.  This is not confined to the 2.4.7-10smp kernel however, because on the
post-install I also tried upgrading to the latest errata smp kernel and I still
had the same problem.
- Timeouts are not while probing the second controller, but while probing the
second disk (/dev/sda is SCSI ID 0 and /dev/sdb is SCSI ID 1).
- I have my disks configured in a software RAID (/, /boot, /tmp, /usr, /var, and
/var/www as RAID-1 mirrors, and a non-RAIDed swap partition on each disk).

I think the problem may indeed be related to APIC mappings (although attempting
to boot with the "apic" flag appended to my grub configuration made no
difference).  Also, since the install (I assume) doesn't use grub, could the
problem somehow be grub related?

Thanks,
Jason

Comment 3 f.labanvoye 2002-04-23 10:12:38 UTC

HEllo,

I have the same problem with a symbios SYM8952U SCSI card and RH 7.2 with the 
latest rpm kernel. My system have a on-board sym53c875 card,which works fine. 
but i can't install RH when sym895 card is plug in server.   after a card 
install without this card, i reboot and plug it in server. And i get the 
message:
 > SCSI host 0 abort (pid 0) timed out - resetting
 > SCSI bus is being reset for host 0 channel 0
 > sym53c8xx_abort : pid=0 serial_number=8 serial_number_at_timeout=8
 > ....

I use this card on RH 6.2 without any problem

f.labanvoye

Comment 4 Kambiz Aghaiepour 2002-07-11 12:56:04 UTC

We (redhat corporate IS) tried this out, and were not able to reproduce the
problem.  We have a va linux 1220 which worked fine both during install phase,
and reboot (with 7.2), using the install kernel, stock kernel, and with the
lastest errata kernel-smp-2.4.9-34.  In our initial tests, the 1220 only had a
single drive.  Rerunning the install with two drives also worked.  However, I
walked across the street and borrowed the drive that Jason was using and the
system would hang.  Using IBM drives was ok though.  I suspect a hardware issue
at this point.  Jason, do you concur?

Comment 5 Jason Corley 2002-07-11 20:20:58 UTC

I suspected the high quality Hitachi drives might be the culprit.  Still don't
know why RH 6.2 worked and RH 7.2 didn't work (on many many many machines) but
since I'm trying to decomission these servers due to no support or replacement
hardware, I'm not as concerned about this as I once was.  Blaming it on Hitachi
is fine with me. :-)

Comment 6 Need Real Name 2002-11-06 20:29:02 UTC

I was seeing the same behavior on the same hardware (VALinux 1220) with RH7.3.  
I was able to get it to boot successfully with the SMP-kernel/APIC workaround 
(and I believe this was regardless of which disks were in the machine, though 
I'd have to check to be sure, and the machine is at a distant colo).

The good news is that I recently installed RedHat 8.0 on this same machine and 
it was no longer necessary to use the APIC workaround, nor was it even 
necessary to use the SMP kernel--a straight install worked fine.  Very nice to 
see that this has apparently been resolved, or that a workaround has been built 
into the RedHat install.

BTW, the motherboard on the VALinux 1220 is an ASUS CUR-DLS, not a L440GX+ (as 
is used in most other VALinux boxes).

Comment 7 Alan Cox 2003-06-08 01:31:45 UTC

Red Hat 9 contains some fixes that should resolve this using info we finally got
from Intel.