Red Hat Bugzilla – Bug 64699
Crash on heavy Adaptec aic7xxx load: invalid SCB
Last modified: 2007-04-18 12:42:30 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.18-3 i686)
Description of problem:
currentKernel crashes on heavy Adaptec aic7xxx load, last message:
HOST_MSG_LOOP with invalid SCB 0
Version-Release number of selected component (if applicable):
Steps to Reproduce:
CPU: AMD 1700+ (no overclocking)
MB: ASUS A7V333 (BIOS: mostly default)
SCSI: Adaptec 2940 (without U)
Disk: Seagate ST 32430 N
Card was already placed into another PCI slot, don't help.
Actual Results: On heavy load on the SCSI bus using disk, kernel crashes in
Expected Results: No crash
Google already shows results if looking for shown message - is the current
aic7xxx module really bugfree?
I'm seeing the exact samething on a 2 drive software raid 1 trying to run
cerberus. I just get pages and pages of SCB errors.. This is on a VA Linux
Fullon 2251 (Dual PIII 1Gz, 1GB RAM 2 36GB SCSI Drives.
A second machine here crashes on heavy loaded SCSI.
It's a PII-350, ASUS P2B-LS (with 7890) and 1x 18 GB U2W, 1x4 GB U2W and 1x4 GB
U. Used kernel: 2.4.18-3 from RHL 7.3
Load was generated by running VMware (image on a VFAT partition) and a VCD image
System crashes (only echo-reply was sent back by kernel, nothing else working
Since using now "aic7xxx_old", all such problems are gone.
RH7.2 - Replaced older Adaptec board with 2 AVA-2906 (now have two SCSI boxes).
Both the original and new boards hang when copying large files. Kernel messages
(/var/log/messages) show attempts to reset the bus.
Since switching from the aic7xxx driver to aic7xxx_old solved your problem I
have Cc:ed Justin Gibbs (author of the newer aic7xxx driver) on this report as
he would be the one to fix your problem with the new driver. I suspect the same
is true for firstname.lastname@example.org's problem.
Your problem sounds unrelated and most likely like SCSI termination issues.
Please check your SCSI cabling and termination. If there is still a problem
after confirming that the scsi cabling and termination are correct then please
open a different bug report. Please do not attach any more comments to this bug
report since the problem you have and the problem in this bug report are not the
I believe the problem is related becuase it symptoms are similar to that
described elsewhere. The SCSI stacks have been in use with out change to
configuration for 1.5+ years. A Tekram board worked fine but wasn't easily
recognized with the 7.2 u/g. I installed a AHA2960? (7850-based) card and when
transferring large files to or from a SCSI drive there were timeout problems.
Note a cron job which ran backups overnight resulted in an interrupt error which
hung the machine. During my testing I forced a reboot (reset button required)
before letting it wander off and lock up.
I purchased to new AVA2906 cards and cables and installed them. Same problem.
Tonight I reinstalled the Tekram card, built a driver and installed it for stack
1. No problem copying large files. The second stack, still Adaptec-based, when
I tried a large file with it the same reset cycle started. Lastly, before
putting the Tekram card in I compiled and tried aic7xxx_old and it did not fix
I hope this additional information will be of help. I can provide a subset of
/var/log/messages if it would prove beneficial.
Please provide full console output from boot through reproduction of the
problem. The aic7xxx driver is usually quite verbose when it encounters
problems, so you might need to use a serial console in order to capture
all of the messages.
Your probably is definately different than the original post. First, the
original post is about RH 7.3 (which uses Justin's aic7xxx driver by default)
while your problem is with RH 7.2 (which uses my aic7xxx driver by default, and
those two drivers are vastly different). Further, the original poster has
already confirmed that in his case switching from the aic7xxx to aic7xxx_old
driver solved his problem. In your case you say that both aic7xxx drivers fail.
Finally, the original post specifically talks about a failure with messages
involving invalid SCBs while your report talks about bus resets locking the
system up. These are vastly different symptoms and will require different fixes
to solve. Having them both in the same bug report makes it almost impossible to
track when each bug gets fixed. That's why I asked you to open a new bug
report, not to append to this one.
closing due to inactivity and pending EOL.