Bug 50001
Summary: | (SCSI AIC7XXX)Machine dies with SCSI AIC7xxx problems | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | gsr.bugs |
Component: | kernel | Assignee: | Doug Ledford <dledford> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brock Organ <borgan> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.1 | CC: | chedong, djuran, gibbs |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i586 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-09-30 15:39:05 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
gsr.bugs
2001-07-25 21:18:43 UTC
I tried one more time and changed to VT4 ASAP, and I also saw the following lines: <4> (scsi0) BRKADDRINT error (0x20): <4> Scratch Ram/SCB Array Ram Parity Error <4> (scsi0) SEQADDR=0x69 It repeats changing the 0x69 to other two digit hex numbers like 0x68 or 0x6a, and after some sec with that the resetting messages appear. This appears to be a general problem with the 2.4 kernel and your machine that is merely makeing the aic7xxx driver and card look suspect. Try going into the BIOS on your motherboard and changing the settings related PCI delayed transactions, PCI Write Posting, and any PCI caching. Hopefully, one of those settings will get your machine working with the 2.4 kernel. Outside of that, I have no way to reproduce your problem here. The BIOS only seems to have Passive Release (default Enabled) and Delayed Transaction (default Disabled). I tried with both Enabled and both Disabled, same results, except that now the partitions need manual checking, and so I delayed up the fourth test until tomorrow (Disabled & Enabled). It also has PnP OS, that is set to default value of No. There is nothing about Write Posting or Caching (except the CPU cache). Do I change PnP OS option too? The defaults worked pretty fine with kernels 2.2.x. When you say general, you mean general as in "many people", or something that is inside the kernel and is hard to trace? If you point me to the instructions I could try to get all the info since bootup via serial console with other computer or dump to floppy (if it does not damage it due errors) so you can get a better idea of what the kernel is doing. Could have been fixed in a newer kernel? Any kernel params I should use? OK, tried the other posibility, Disabled & Enabled, and it still has the problems with the SCB and the reseting. I am getting a nice collection of files in lost+found. :[ All seem to be Berkeley DB files, and as the errors start after detecting where is the previous install, in the Finding packages (probably doing the backup copy of the DB, cos the RPM DB still works but I am getting a lots of anaconda-rebuilddb dirs in /var/lib/), I guess this error is of the "aic7xxx has problems under load" kind, as noted in the gotchas page. Other things to test? Try the new aic7xxx driver instead of the old (default) one. To do that, boot with the command line "linux noprobe" at the boot disk's boot: prompt. Then when it asks for drivers, select SCSI driver, then select the new aic7xxx driver (not the standard aic7xxx driver) and see if it makes a difference. OK, booted with linux text expert noprobe, and used the modules marked as new experimental aic7xxx_mod. Created a device for the disk and tried to do a dd if=/dev/sda of=/dev/null, which made the led go red without errors. Then deleted the device and keep on with the install. Now it dies with something like the following text, when trying to detect the packages installed so it can perform the update (= same place, different text): <4> scsi0: brkaddr, Scratch or SCB Memory Parity Error at seqaddr = 0x196 <4> scsi0: Dumping Card State at seqaddr 0x196 <4> scsiseq = 0x12, sblkctl = 0x2, sstatq 0x0 <4> SCB count = 28 <4> kernel NEXTQSCB = 3 <4> Card NEXTQSCB = 22 <4> QINFIFO entries: 22 23 16 17 18 19 12 13 <4> Waiting Queue entries: <4> Disconnected Queue entries: 14:20 15:27 <4> QOUTFIFO entries <4> Sequencer Free SCB List: 7 3 6 5 12 8 9 11 2 0 1 10 4 <4> Pending List: 13 12 19 18 17 16 23 22 21 20 27 <4> Kernel Free SCB List: 0 6 11 2 10 14 8 15 7 1 9 5 4 26 25 24 <4> DevQ(0:0:0): 0 waiting <4> DevQ(0:1:0): 0 waiting Some secs after that, I get two of those CPU register dumps done by kernel, talking about unabling to handle a kernel null pointer dereference at virtual address 00000000 (I was tired of copying number). I had to power off and then on, cos with hot reboot the disk was not detected and the SCSI BIOS was not installed. On the boot to RH70, I had to do a manual fsck, and a dir was connected to lost+found. I don't have the hardware to duplicate your problem (and I'm unable to cause it on any of my systems here). However, this obviously isn't just a problem with the old aic7xxx driver, it also effects the new aic7xxx driver. So, I've Cc:ed Justin Gibbs on this report in case he knows what the problem is or in case he has the necessary hardware to reproduce the problem. I'm trying to install Redhat 7.1 on a PC with Adpatec 7896 SCSI driver and Intel 440 motherboard. I've read through the bug report #29555, Thanks to Doug Ledford's solution, I'm able to successfully install the package onto that PC. (FYI, here's what I did: 1. install with "linux apic" option 2. Select "kernel-smp" package during installation, and 3. Boot with "default=linux-smp" option.) Then I tried to recompile the kernel (you know, to make it smaller and faster), then I got the following error during boot up: .... blk: queue c7dc7e18 I/O limit 4095Mb (scsi0:0:0:0) Synchornous at 80.0 Mbyte/sec, offset 15 SCSI device sda: 17783240 512-byte hdwr sectors 9105MB Partition check /dev/scsi/host0/bas0/target0/lun0: p1 p2 <p5 p6> ... mkrootdev: mknod failed 17 mount error 16 mounting ext2 pivot_root: pivot_root (sysroot, /sysroot/initrd) failed: 2 ... Kernel panic ... I compared it with the booting message of a healthy kernel (the default one downloaded by ftp), and noticed that the difference is during "Partition check", which the healthy one gives the following message: ... Partition check sda: sda1 sda2 < sda5 sda6 > .... Seems that the scsi driver can't map file system correctly onto the hard disk. I've tried with both aic7xxx.o and aic7xxx_old.o drivers, either of them gives me the same answer. Is it a aic bug or file system bug? Or maybe I forget to include some option during kernel-compiling? You enabled devfs support in your kernel by default, which we don't do, and which changes how drives are mapped. I tried following version of redhat on my ( pIII 866*2 with SCSI aha-2940): 6.2 hang up at disk druid 7.1 hang up at probing mouse 7.2 hang up at disk druid 7.3 hang up at disk druid 8.0 hang up at loading aic7xxx driver(some time can pass through) Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |