Bug 64699
Summary: | Crash on heavy Adaptec aic7xxx load: invalid SCB | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Peter Bieringer <pb> |
Component: | kernel | Assignee: | Doug Ledford <dledford> |
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 7.3 | CC: | gibbs, wgbrooks |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | athlon | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-12-17 03:16:48 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Peter Bieringer
2002-05-09 19:58:09 UTC
I'm seeing the exact samething on a 2 drive software raid 1 trying to run cerberus. I just get pages and pages of SCB errors.. This is on a VA Linux Fullon 2251 (Dual PIII 1Gz, 1GB RAM 2 36GB SCSI Drives. A second machine here crashes on heavy loaded SCSI. It's a PII-350, ASUS P2B-LS (with 7890) and 1x 18 GB U2W, 1x4 GB U2W and 1x4 GB U. Used kernel: 2.4.18-3 from RHL 7.3 Load was generated by running VMware (image on a VFAT partition) and a VCD image generation. System crashes (only echo-reply was sent back by kernel, nothing else working anymore). Since using now "aic7xxx_old", all such problems are gone. RH7.2 - Replaced older Adaptec board with 2 AVA-2906 (now have two SCSI boxes). Both the original and new boards hang when copying large files. Kernel messages (/var/log/messages) show attempts to reset the bus. To: pb Since switching from the aic7xxx driver to aic7xxx_old solved your problem I have Cc:ed Justin Gibbs (author of the newer aic7xxx driver) on this report as he would be the one to fix your problem with the new driver. I suspect the same is true for precision's problem. To: wgbrooks Your problem sounds unrelated and most likely like SCSI termination issues. Please check your SCSI cabling and termination. If there is still a problem after confirming that the scsi cabling and termination are correct then please open a different bug report. Please do not attach any more comments to this bug report since the problem you have and the problem in this bug report are not the same. I believe the problem is related becuase it symptoms are similar to that described elsewhere. The SCSI stacks have been in use with out change to configuration for 1.5+ years. A Tekram board worked fine but wasn't easily recognized with the 7.2 u/g. I installed a AHA2960? (7850-based) card and when transferring large files to or from a SCSI drive there were timeout problems. Note a cron job which ran backups overnight resulted in an interrupt error which hung the machine. During my testing I forced a reboot (reset button required) before letting it wander off and lock up. I purchased to new AVA2906 cards and cables and installed them. Same problem. Tonight I reinstalled the Tekram card, built a driver and installed it for stack 1. No problem copying large files. The second stack, still Adaptec-based, when I tried a large file with it the same reset cycle started. Lastly, before putting the Tekram card in I compiled and tried aic7xxx_old and it did not fix the problem. I hope this additional information will be of help. I can provide a subset of /var/log/messages if it would prove beneficial. Please provide full console output from boot through reproduction of the problem. The aic7xxx driver is usually quite verbose when it encounters problems, so you might need to use a serial console in order to capture all of the messages. wgbrooks: Your probably is definately different than the original post. First, the original post is about RH 7.3 (which uses Justin's aic7xxx driver by default) while your problem is with RH 7.2 (which uses my aic7xxx driver by default, and those two drivers are vastly different). Further, the original poster has already confirmed that in his case switching from the aic7xxx to aic7xxx_old driver solved his problem. In your case you say that both aic7xxx drivers fail. Finally, the original post specifically talks about a failure with messages involving invalid SCBs while your report talks about bus resets locking the system up. These are vastly different symptoms and will require different fixes to solve. Having them both in the same bug report makes it almost impossible to track when each bug gets fixed. That's why I asked you to open a new bug report, not to append to this one. closing due to inactivity and pending EOL. |