|Summary:||aic7xxx module cause system to hang|
|Product:||[Retired] Red Hat Linux||Reporter:||Helge Skrivervik <helge>|
|Component:||kernel||Assignee:||Doug Ledford <dledford>|
|Status:||CLOSED NOTABUG||QA Contact:||Brock Organ <borgan>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2001-08-31 15:00:57 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Helge Skrivervik 2001-08-28 18:16:07 UTC
From Bugzilla Helper: User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.5 i586) Description of problem: The system hangs when loading the aic7xxx module. All 2.6.x kernels show the same problem, with minor differences: We get to fsck, which completes chekcing the IDE disks, then hangs when attempting to load the aic driver. Now, here is what's really interesing: The 2.4.2 kernel does not halt the system completely, the modprobe simply never returns. The other kernels cause a complete halt UNLESS THE 2.4.2 KERNEL (driver) HAS just had the opportunity to initialize the adapter. So, we do have a workable system by following the following procedure: 1) boot 2.4.2 2) modprobe the aic2xxx driver, send it to the background 3) sleep 5 seconds (the system is still alive but irky - it stops for a few seconds now and then) 4) reboot the 2.4.x (actually 2.4.5 or 2.4.7) kernel 5) the aic7xxx modules loads fine and works fine A real pain to go through (not to mention to find out), but it works. the hardware is an Epox motherboard (EP-MVP3G2 Via MVP3 chipset), 500 MHz K6 cpu, adaptex 29160 ultra160 adapter, what is connected to the adapter seems to be irrelevant. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. simply boot the system with any 2.4.x kernel 2. 3. Actual Results: System hangs Additional info:
Comment 1 Doug Ledford 2001-08-28 19:29:05 UTC
Sounds like you need to update the BIOS on your motherboard. We had a different person with a similar problem. The new aic7xxx driver used in the 2.4.5 and 2.4.7 kernel would hard lock on the AMD CPU/Via chipset motherboard, but the old aic7xxx driver (which is what you are getting when you boot the 2.4.2 kernel) would work OK (although in your case it is getting SCSI timeouts or some such, so it's not OK, but it still isn't hardlocking like the new aic7xxx driver). A BIOS update on that machine solved all the problems. If that doesn't work in your case, then re-open the bug.
Comment 2 Helge Skrivervik 2001-08-29 08:20:30 UTC
Sorry, I should have mentioned that: I have updated the bios on the mobo and the interface. No changes in the behaviour.
Comment 3 Doug Ledford 2001-08-29 14:32:55 UTC
OK, in order to try and diagnose this (and I'm still not totally convinced it isn't hardware related, but we'll try and see what we can find), I need you to boot the 2.4.2 kernel (the old aic7xxx driver) and as the aic7xxx driver is loaded, I need you to pass the option aic7xxx=verose:0xff39 to the driver and then send me the output of that (capture with a serial console works best, or store it in a log file, I personally redirect all of my kernel log messages to a separate log file from anything else in syslog.conf and since your root appears to be on IDE, you should still be able to record those log messages before rebooting without having to write anything down or mess with serial consoles, then you can attach the log messages to this bug report). Since your aic7xxx module is getting loaded at boot time but after the fs check, it isn't in the initrd. That means you should be able to just put the line: options aic7xxx aic7xxx=verbose:0xff39 into your /etc/modules.conf file and have the options take effect at the next boot automatically.
Comment 4 Helge Skrivervik 2001-08-30 08:09:00 UTC
OK - here we go, first the output from the 2.4.2 kernel: SCSI subsystem driver Revision: 1.00 PCI: Found IRQ 9 for device 00:0c.0 PCI: The same IRQ used for device 00:0a.0 aic7xxx: <Adaptec AHA-294X Ultra SCSI host adapter> at PCI 12/0 aic7xxx: Initial PCI_COMMAND value was 0x7 aic7xxx: Initial DEVCONFIG value was 0x580 aic7xxx: Loading serial EEPROM...done PCI: Found IRQ 10 for device 00:08.0 aic7xxx: <Adaptec AIC-7892 Ultra 160/m SCSI host adapter> at PCI 8/0 aic7xxx: Initial PCI_COMMAND value was 0x7 aic7xxx: Initial DEVCONFIG value was 0x482 aic7xxx: Loading serial EEPROM...done (scsi0) <Adaptec AIC-7892 Ultra 160/m SCSI host adapter> found at PCI 0/8/0 (scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs (scsi0) BIOS disabled, IO Port 0xd800, IRQ 10 (scsi0) IO Memory at 0xeb000000, MMAP Memory at 0xc4852000 (scsi0) LVD/Primary Low byte termination Enabled (scsi0) LVD/Primary High byte termination Enabled (scsi0) Secondary Low byte termination Enabled (scsi0) Secondary High byte termination Enabled (scsi0) Downloading sequencer code... 396 instructions downloaded (scsi0) Resetting channel (scsi1) <Adaptec AHA-294X Ultra SCSI host adapter> found at PCI 0/12/0 (scsi1) Narrow Channel, SCSI ID=7, 16/255 SCBs (scsi1) BIOS disabled, IO Port 0xe400, IRQ 9 (scsi1) IO Memory at 0xeb001000, MMAP Memory at 0xc4850000 (scsi1) Cables present (Int-50 NO, Ext-50 NO) (scsi1) EEPROM is present. (scsi1) SE Low byte termination Enabled (scsi1) Downloading sequencer code... 436 instructions downloaded (scsi1) Resetting channel scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0 <Adaptec AIC-7892 Ultra 160/m SCSI host adapter> scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0 <Adaptec AHA-294X Ultra SCSI host adapter> ... this is where it stops, cat /proc/modules says: aic7xxx 107568 (initializing) scsi_mod 55728 0 [aic7xxx] ... and it stays like this forever. The system is responding, but hangs for short periods intermittently. BY the way, the presence or absence of the old adaptec controller makes no difference whatsoever. Now, here's what the 2.4.5 kernel says, after having the interface initialized like above: SCSI subsystem driver Revision: 1.00 PCI: Found IRQ 10 for device 00:08.0 ahc_pci:0:8:0: Reading SEEPROM...done. ahc_pci:0:8:0: BIOS eeprom is present ahc_pci:0:8:0: Secondary High byte termination Enabled ahc_pci:0:8:0: Secondary Low byte termination Enabled ahc_pci:0:8:0: Primary Low Byte termination Enabled ahc_pci:0:8:0: Primary High Byte termination Enabled ahc_pci:0:8:0: Downloading Sequencer Program... 419 instructions downloaded PCI: Found IRQ 9 for device 00:0c.0 PCI: The same IRQ used for device 00:0a.0 ahc_pci:0:12:0: Reading SEEPROM...done. ahc_pci:0:12:0: internal 50 cable not present ahc_pci:0:12:0: external cable not present ahc_pci:0:12:0: BIOS eeprom not present ahc_pci:0:12:0: Low byte termination Enabled ahc_pci:0:12:0: Downloading Sequencer Program... 426 instructions downloaded scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13 <Adaptec 29160 Ultra160 SCSI adapter> aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13 <Adaptec 2940 Ultra SCSI adapter> aic7880: Single Channel A, SCSI Id=7, 16/255 SCBs (scsi0:A:0:0): Sending WDTR 1 (scsi0:A:0:0): Received WDTR 1 filtered to 1 (scsi0:A:0): 6.600MB/s transfers (16bit) scsi0: target 0 using 16bit transfers (scsi0:A:0:0): Sending SDTR period c, offset 7f (scsi0:A:0:0): Received SDTR period c, offset 3f Filtered to period c, offset 3f (scsi0:A:0): 40.000MB/s transfers (20.000MHz, offset 63, 16bit) scsi0: target 0 synchronous at 20.0MHz, offset = 0x3f Vendor: IBM Model: DDYS-T09170N Rev: S96H Type: Direct-Access ANSI SCSI revision: 03 (scsi0:A:0): 40.000MB/s transfers (20.000MHz, offset 63, 16bit) scsi0:0:0:0: Tagged Queuing enabled. Depth 253 Vendor: SONY Model: SDT-9000 Rev: 0400 Type: Sequential-Access ANSI SCSI revision: 02 Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 (scsi0:A:0:0): Sending PPR bus_width 1, period c, offset 7f, ppr_options 0 (scsi0:A:0:0): Received PPR width 1, period c, offset 3f,options 0 Filtered to width 1, period c, offset 3f, options 0 (scsi0:A:0:0): Sending PPR bus_width 1, period c, offset 3f, ppr_options 0 (scsi0:A:0:0): Received PPR width 1, period c, offset 3f,options 0 Filtered to width 1, period c, offset 3f, options 0 SCSI device sda: 17916240 512-byte hdwr sectors (9173 MB) sda: sda1 sda2 Detected scsi tape st0 at scsi1, channel 0, id 5, lun 0 st: bufsize 32768, wrt 30720, max init. buffers 4, s/g segs 16. parport0: PC-style at 0x378 [PCSPP(,...)] parport0: irq 7 detected PCI: Found IRQ 9 for device 00:0a.0 PCI: The same IRQ used for device 00:0c.0 3c59x.c:LK1.1.13 27 Jan 2001 Donald Becker and others. http://www.scyld.com/network/vortex.html See Documentation/networking/vortex.txt eth0: 3Com PCI 3c905 Boomerang 100baseTx at 0xe000, 00:60:08:8d:e9:f9, IRQ 9 product code 4b4b rev 00.0 date 08-27-97 8K word-wide RAM 3:5 Rx:Tx split, autoselect/MII interface. MII transceiver found at address 24, status 786f. Enabling bus-master transmits and whole-frame receives. eth0: scatter/gather enabled. h/w checksums disabled eth0: first available media type: MII
Comment 5 Doug Ledford 2001-08-30 15:13:32 UTC
OK, next step. I need you to load the aic7xxx driver (the old one) and let it get into the state where it just hangs there. Then, I need you to get me the content of your System.map for this kernel and when you load the modules (scsi_mod.o and aic7xxx.o, which you will have to load by hand with insmod instead of modprobe or letting them automatically get loaded), load them with the -m option to insmod which will produce a module load map. Save those maps to a couple files in /tmp. Then, press Ctrl-Scroll_Lock to get me a dump of all the processes on the system (which will show where the insmod process is hanging up at with a backtrace of the call stack that I can decode via the System.map and the two module load maps). Send that directly to me, don't attach it here, and I'll see what I can figure out as far as what's happening on your machine.
Comment 6 Brock Organ 2001-08-31 14:43:29 UTC
Created attachment 30317 [details] Helge Skrivervik's email with additional information
Comment 7 Doug Ledford 2001-11-22 18:09:22 UTC
Quoting an email from Helge: Doug - i promised to get back to you on this issue as soon as I got the chance. The usual problem is of course to make a critical system available for experimentation. Well, to make a long story short, we decided to upgrade the motherboard, and (with the 2.4.14 kernel) the problem is gone. In other words most likely a hardware problem just like you suggested initially. In any case - thanks for your time!