From Bugzilla Helper: User-Agent: Mozilla/4.78 [en] (X11; U; Linux 2.4.18-3 i686) Description of problem: currentKernel crashes on heavy Adaptec aic7xxx load, last message: HOST_MSG_LOOP with invalid SCB 0 Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: Used hardware: CPU: AMD 1700+ (no overclocking) MB: ASUS A7V333 (BIOS: mostly default) SCSI: Adaptec 2940 (without U) Disk: Seagate ST 32430 N Card was already placed into another PCI slot, don't help. Actual Results: On heavy load on the SCSI bus using disk, kernel crashes in interrupt. Expected Results: No crash Additional info: Google already shows results if looking for shown message - is the current aic7xxx module really bugfree?
I'm seeing the exact samething on a 2 drive software raid 1 trying to run cerberus. I just get pages and pages of SCB errors.. This is on a VA Linux Fullon 2251 (Dual PIII 1Gz, 1GB RAM 2 36GB SCSI Drives.
A second machine here crashes on heavy loaded SCSI. It's a PII-350, ASUS P2B-LS (with 7890) and 1x 18 GB U2W, 1x4 GB U2W and 1x4 GB U. Used kernel: 2.4.18-3 from RHL 7.3 Load was generated by running VMware (image on a VFAT partition) and a VCD image generation. System crashes (only echo-reply was sent back by kernel, nothing else working anymore). Since using now "aic7xxx_old", all such problems are gone.
RH7.2 - Replaced older Adaptec board with 2 AVA-2906 (now have two SCSI boxes). Both the original and new boards hang when copying large files. Kernel messages (/var/log/messages) show attempts to reset the bus.
To: pb Since switching from the aic7xxx driver to aic7xxx_old solved your problem I have Cc:ed Justin Gibbs (author of the newer aic7xxx driver) on this report as he would be the one to fix your problem with the new driver. I suspect the same is true for precision's problem. To: wgbrooks Your problem sounds unrelated and most likely like SCSI termination issues. Please check your SCSI cabling and termination. If there is still a problem after confirming that the scsi cabling and termination are correct then please open a different bug report. Please do not attach any more comments to this bug report since the problem you have and the problem in this bug report are not the same.
I believe the problem is related becuase it symptoms are similar to that described elsewhere. The SCSI stacks have been in use with out change to configuration for 1.5+ years. A Tekram board worked fine but wasn't easily recognized with the 7.2 u/g. I installed a AHA2960? (7850-based) card and when transferring large files to or from a SCSI drive there were timeout problems. Note a cron job which ran backups overnight resulted in an interrupt error which hung the machine. During my testing I forced a reboot (reset button required) before letting it wander off and lock up. I purchased to new AVA2906 cards and cables and installed them. Same problem. Tonight I reinstalled the Tekram card, built a driver and installed it for stack 1. No problem copying large files. The second stack, still Adaptec-based, when I tried a large file with it the same reset cycle started. Lastly, before putting the Tekram card in I compiled and tried aic7xxx_old and it did not fix the problem. I hope this additional information will be of help. I can provide a subset of /var/log/messages if it would prove beneficial.
Please provide full console output from boot through reproduction of the problem. The aic7xxx driver is usually quite verbose when it encounters problems, so you might need to use a serial console in order to capture all of the messages.
wgbrooks: Your probably is definately different than the original post. First, the original post is about RH 7.3 (which uses Justin's aic7xxx driver by default) while your problem is with RH 7.2 (which uses my aic7xxx driver by default, and those two drivers are vastly different). Further, the original poster has already confirmed that in his case switching from the aic7xxx to aic7xxx_old driver solved his problem. In your case you say that both aic7xxx drivers fail. Finally, the original post specifically talks about a failure with messages involving invalid SCBs while your report talks about bus resets locking the system up. These are vastly different symptoms and will require different fixes to solve. Having them both in the same bug report makes it almost impossible to track when each bug gets fixed. That's why I asked you to open a new bug report, not to append to this one.
closing due to inactivity and pending EOL.