Bug 64699

Summary: Crash on heavy Adaptec aic7xxx load: invalid SCB
Product: [Retired] Red Hat Linux Reporter: Peter Bieringer <pb>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3CC: gibbs, wgbrooks
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-12-17 03:16:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Peter Bieringer 2002-05-09 19:58:09 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.18-3 i686)

Description of problem:
currentKernel crashes on heavy Adaptec aic7xxx load, last message:
HOST_MSG_LOOP with invalid SCB 0


Version-Release number of selected component (if applicable):


How reproducible:
Sometimes

Steps to Reproduce:
Used hardware:
CPU: AMD 1700+ (no overclocking)
MB: ASUS A7V333 (BIOS: mostly default)
SCSI: Adaptec 2940 (without U)
Disk: Seagate ST 32430 N

Card was already placed into another PCI slot, don't help.	

Actual Results:  On heavy load on the SCSI bus using disk, kernel crashes in
interrupt.

Expected Results:  No crash

Additional info:

Google already shows results if looking for shown message - is the current
aic7xxx module really bugfree?

Comment 1 Uriah Welcome 2002-05-13 17:16:56 UTC
I'm seeing the exact samething on a 2 drive software raid 1  trying to run
cerberus.  I just get pages and pages of SCB errors..  This is on a VA Linux
Fullon 2251 (Dual PIII 1Gz, 1GB RAM 2 36GB SCSI Drives.

Comment 2 Peter Bieringer 2002-05-13 19:49:20 UTC
A second machine here crashes on heavy loaded SCSI.
It's a PII-350, ASUS P2B-LS (with 7890) and 1x 18 GB U2W, 1x4 GB U2W and 1x4 GB
U. Used kernel: 2.4.18-3 from RHL 7.3
Load was generated by running VMware (image on a VFAT partition) and a VCD image
generation.
System crashes (only echo-reply was sent back by kernel, nothing else working
anymore).

Since using now "aic7xxx_old", all such problems are gone.

Comment 3 Need Real Name 2002-05-16 16:00:57 UTC
RH7.2 - Replaced older Adaptec board with 2 AVA-2906 (now have two SCSI boxes).
Both the original and new boards hang when copying large files. Kernel messages
(/var/log/messages) show attempts to reset the bus.

Comment 4 Doug Ledford 2002-05-16 21:10:21 UTC
To: pb

Since switching from the aic7xxx driver to aic7xxx_old solved your problem I
have Cc:ed Justin Gibbs (author of the newer aic7xxx driver) on this report as
he would be the one to fix your problem with the new driver.  I suspect the same
is true for precision's problem.

To: wgbrooks

Your problem sounds unrelated and most likely like SCSI termination issues. 
Please check your SCSI cabling and termination.  If there is still a problem
after confirming that the scsi cabling and termination are correct then please
open a different bug report.  Please do not attach any more comments to this bug
report since the problem you have and the problem in this bug report are not the
same.

Comment 5 Need Real Name 2002-05-18 01:08:58 UTC
I believe the problem is related becuase it symptoms are similar to that
described elsewhere. The SCSI stacks have been in use with out change to
configuration for 1.5+ years. A Tekram board worked fine but wasn't easily
recognized with the 7.2 u/g. I installed a AHA2960? (7850-based) card and when
transferring large files to or from a SCSI drive there were timeout problems.
Note a cron job which ran backups overnight resulted in an interrupt error which
hung the machine. During my testing I forced a reboot (reset button required)
before letting it wander off and lock up.
I purchased to new AVA2906 cards and cables and installed them. Same problem.
Tonight I reinstalled the Tekram card, built a driver and installed it for stack
1. No problem copying  large files. The second stack, still Adaptec-based, when
I tried a large file with it the same reset cycle started. Lastly, before
putting the Tekram card in I compiled and tried aic7xxx_old and it did not fix
the problem.
I hope this additional information will be of help. I can provide a subset of
/var/log/messages if it would prove beneficial.

Comment 6 Justin T. Gibbs 2002-05-20 02:15:36 UTC
Please provide full console output from boot through reproduction of the
problem.  The aic7xxx driver is usually quite verbose when it encounters
problems, so you might need to use a serial console in order to capture
all of the messages.

Comment 7 Doug Ledford 2002-05-20 20:59:55 UTC
wgbrooks:

Your probably is definately different than the original post.  First, the
original post is about RH 7.3 (which uses Justin's aic7xxx driver by default)
while your problem is with RH 7.2 (which uses my aic7xxx driver by default, and
those two drivers are vastly different).  Further, the original poster has
already confirmed that in his case switching from the aic7xxx to aic7xxx_old
driver solved his problem.  In your case you say that both aic7xxx drivers fail.
 Finally, the original post specifically talks about a failure with messages
involving invalid SCBs while your report talks about bus resets locking the
system up.  These are vastly different symptoms and will require different fixes
to solve.  Having them both in the same bug report makes it almost impossible to
track when each bug gets fixed.  That's why I asked you to open a new bug
report, not to append to this one.

Comment 8 Dave Jones 2003-12-17 03:16:48 UTC
closing due to inactivity and pending EOL.