Bug 52751 - aic7xxx module cause system to hang
Summary: aic7xxx module cause system to hang
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Doug Ledford
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-08-28 18:16 UTC by Helge Skrivervik
Modified: 2007-04-18 16:36 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-08-31 15:00:57 UTC
Embargoed:


Attachments (Terms of Use)
Helge Skrivervik's email with additional information (243.30 KB, text/plain)
2001-08-31 14:43 UTC, Brock Organ
no flags Details

Description Helge Skrivervik 2001-08-28 18:16:07 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.76 [en] (X11; U; Linux 2.4.5 i586)

Description of problem:
The system hangs when loading the aic7xxx module. All 2.6.x kernels show
the same problem, with minor differences: We get to fsck, which completes
chekcing the IDE disks, then hangs when attempting to load the aic driver.
Now, here is what's really interesing: The 2.4.2 kernel does not halt the
system completely, the modprobe simply never returns. The other kernels
cause a complete halt UNLESS THE 2.4.2 KERNEL (driver) HAS just had the
opportunity to initialize the adapter. So, we do have a workable system by
following the following procedure:
1) boot 2.4.2
2) modprobe the aic2xxx driver, send it to the background
3) sleep 5 seconds (the system is still alive but irky - it stops for a few
seconds now and then)
4) reboot the 2.4.x (actually 2.4.5 or 2.4.7) kernel
5) the aic7xxx modules loads fine and works fine
A real pain to go through (not to mention to find out), but it works. the
hardware is 
an Epox motherboard (EP-MVP3G2 Via MVP3 chipset), 500 MHz K6 cpu, adaptex
29160 ultra160 adapter, what is connected to the adapter seems to be
irrelevant.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. simply boot the system with any 2.4.x kernel
2.
3.
	

Actual Results:  System hangs

Additional info:

Comment 1 Doug Ledford 2001-08-28 19:29:05 UTC
Sounds like you need to update the BIOS on your motherboard.  We had a different
person with a similar problem.  The new aic7xxx driver used in the 2.4.5 and
2.4.7 kernel would hard lock on the AMD CPU/Via chipset motherboard, but the old
aic7xxx driver (which is what you are getting when you boot the 2.4.2 kernel)
would work OK (although in your case it is getting SCSI timeouts or some such,
so it's not OK, but it still isn't hardlocking like the new aic7xxx driver).  A
BIOS update on that machine solved all the problems.  If that doesn't work in
your case, then re-open the bug.

Comment 2 Helge Skrivervik 2001-08-29 08:20:30 UTC
Sorry, I should have mentioned that: I have updated the bios on the mobo and the interface. No changes in the behaviour.

Comment 3 Doug Ledford 2001-08-29 14:32:55 UTC
OK, in order to try and diagnose this (and I'm still not totally convinced it
isn't hardware related, but we'll try and see what we can find), I need you to
boot the 2.4.2 kernel (the old aic7xxx driver) and as the aic7xxx driver is
loaded, I need you to pass the option aic7xxx=verose:0xff39 to the driver and
then send me the output of that (capture with a serial console works best, or
store it in a log file, I personally redirect all of my kernel log messages to a
separate log file from anything else in syslog.conf and since your root appears
to be on IDE, you should still be able to record those log messages before
rebooting without having to write anything down or mess with serial consoles,
then you can attach the log messages to this bug report).  Since your aic7xxx
module is getting loaded at boot time but after the fs check, it isn't in the
initrd.  That means you should be able to just put the line:

options aic7xxx aic7xxx=verbose:0xff39

into your /etc/modules.conf file and have the options take effect at the next
boot automatically.

Comment 4 Helge Skrivervik 2001-08-30 08:09:00 UTC
OK - here we go, first the output from the 2.4.2 kernel:
SCSI subsystem driver Revision: 1.00
PCI: Found IRQ 9 for device 00:0c.0
PCI: The same IRQ used for device 00:0a.0
aic7xxx: <Adaptec AHA-294X Ultra SCSI host adapter> at PCI 12/0
aic7xxx: Initial PCI_COMMAND value was 0x7
aic7xxx: Initial DEVCONFIG value was 0x580
aic7xxx: Loading serial EEPROM...done
PCI: Found IRQ 10 for device 00:08.0
aic7xxx: <Adaptec AIC-7892 Ultra 160/m SCSI host adapter> at PCI 8/0
aic7xxx: Initial PCI_COMMAND value was 0x7
aic7xxx: Initial DEVCONFIG value was 0x482
aic7xxx: Loading serial EEPROM...done
(scsi0) <Adaptec AIC-7892 Ultra 160/m SCSI host adapter> found at PCI 0/8/0
(scsi0) Wide Channel, SCSI ID=7, 32/255 SCBs
(scsi0) BIOS disabled, IO Port 0xd800, IRQ 10
(scsi0) IO Memory at 0xeb000000, MMAP Memory at 0xc4852000
(scsi0) LVD/Primary Low byte termination Enabled
(scsi0) LVD/Primary High byte termination Enabled
(scsi0) Secondary Low byte termination Enabled
(scsi0) Secondary High byte termination Enabled
(scsi0) Downloading sequencer code... 396 instructions downloaded
(scsi0) Resetting channel
(scsi1) <Adaptec AHA-294X Ultra SCSI host adapter> found at PCI 0/12/0
(scsi1) Narrow Channel, SCSI ID=7, 16/255 SCBs
(scsi1) BIOS disabled, IO Port 0xe400, IRQ 9
(scsi1) IO Memory at 0xeb001000, MMAP Memory at 0xc4850000
(scsi1) Cables present (Int-50 NO, Ext-50 NO)
(scsi1) EEPROM is present.
(scsi1) SE Low byte termination Enabled
(scsi1) Downloading sequencer code... 436 instructions downloaded
(scsi1) Resetting channel
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
       <Adaptec AIC-7892 Ultra 160/m SCSI host adapter>
scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.2.4/5.2.0
       <Adaptec AHA-294X Ultra SCSI host adapter>

... this is where it stops, cat /proc/modules says:
aic7xxx               107568 (initializing)
scsi_mod               55728   0 [aic7xxx]
... and it stays like this forever. The system is responding, but hangs for short periods intermittently.
BY the way, the presence or absence of the old adaptec controller makes no difference whatsoever.
Now, here's what the 2.4.5 kernel says, after having the interface initialized like above:

SCSI subsystem driver Revision: 1.00
PCI: Found IRQ 10 for device 00:08.0
ahc_pci:0:8:0: Reading SEEPROM...done.
ahc_pci:0:8:0: BIOS eeprom is present
ahc_pci:0:8:0: Secondary High byte termination Enabled
ahc_pci:0:8:0: Secondary Low byte termination Enabled
ahc_pci:0:8:0: Primary Low Byte termination Enabled
ahc_pci:0:8:0: Primary High Byte termination Enabled
ahc_pci:0:8:0: Downloading Sequencer Program... 419 instructions downloaded
PCI: Found IRQ 9 for device 00:0c.0
PCI: The same IRQ used for device 00:0a.0
ahc_pci:0:12:0: Reading SEEPROM...done.
ahc_pci:0:12:0: internal 50 cable not present
ahc_pci:0:12:0: external cable not present
ahc_pci:0:12:0: BIOS eeprom not present
ahc_pci:0:12:0: Low byte termination Enabled
ahc_pci:0:12:0: Downloading Sequencer Program... 426 instructions downloaded
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13
        <Adaptec 29160 Ultra160 SCSI adapter>
        aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.13
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Single Channel A, SCSI Id=7, 16/255 SCBs

(scsi0:A:0:0): Sending WDTR 1
(scsi0:A:0:0): Received WDTR 1 filtered to 1
(scsi0:A:0): 6.600MB/s transfers (16bit)
scsi0: target 0 using 16bit transfers
(scsi0:A:0:0): Sending SDTR period c, offset 7f
(scsi0:A:0:0): Received SDTR period c, offset 3f
        Filtered to period c, offset 3f
(scsi0:A:0): 40.000MB/s transfers (20.000MHz, offset 63, 16bit)
scsi0: target 0 synchronous at 20.0MHz, offset = 0x3f
  Vendor: IBM       Model: DDYS-T09170N      Rev: S96H
  Type:   Direct-Access                      ANSI SCSI revision: 03
(scsi0:A:0): 40.000MB/s transfers (20.000MHz, offset 63, 16bit)
scsi0:0:0:0: Tagged Queuing enabled.  Depth 253
  Vendor: SONY      Model: SDT-9000          Rev: 0400
  Type:   Sequential-Access                  ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
(scsi0:A:0:0): Sending PPR bus_width 1, period c, offset 7f, ppr_options 0
(scsi0:A:0:0): Received PPR width 1, period c, offset 3f,options 0
        Filtered to width 1, period c, offset 3f, options 0
(scsi0:A:0:0): Sending PPR bus_width 1, period c, offset 3f, ppr_options 0
(scsi0:A:0:0): Received PPR width 1, period c, offset 3f,options 0
        Filtered to width 1, period c, offset 3f, options 0
SCSI device sda: 17916240 512-byte hdwr sectors (9173 MB)
 sda: sda1 sda2
Detected scsi tape st0 at scsi1, channel 0, id 5, lun 0
st: bufsize 32768, wrt 30720, max init. buffers 4, s/g segs 16.
parport0: PC-style at 0x378 [PCSPP(,...)]
parport0: irq 7 detected
PCI: Found IRQ 9 for device 00:0a.0
PCI: The same IRQ used for device 00:0c.0
3c59x.c:LK1.1.13 27 Jan 2001  Donald Becker and others. http://www.scyld.com/network/vortex.html
See Documentation/networking/vortex.txt
eth0: 3Com PCI 3c905 Boomerang 100baseTx at 0xe000,  00:60:08:8d:e9:f9, IRQ 9
  product code 4b4b rev 00.0 date 08-27-97
  8K word-wide RAM 3:5 Rx:Tx split, autoselect/MII interface.
  MII transceiver found at address 24, status 786f.
  Enabling bus-master transmits and whole-frame receives.
eth0: scatter/gather enabled. h/w checksums disabled
eth0: first available media type: MII


Comment 5 Doug Ledford 2001-08-30 15:13:32 UTC
OK, next step.  I need you to load the aic7xxx driver (the old one) and let it
get into the state where it just hangs there.  Then, I need you to get me the
content of your System.map for this kernel and when you load the modules
(scsi_mod.o and aic7xxx.o, which you will have to load by hand with insmod
instead of modprobe or letting them automatically get loaded), load them with
the -m option to insmod which will produce a module load map.  Save those maps
to a couple files in /tmp.  Then, press Ctrl-Scroll_Lock to get me a dump of all
the processes on the system (which will show where the insmod process is hanging
up at with a backtrace of the call stack that I can decode via the System.map
and the two module load maps).  Send that directly to me, don't attach it here,
and I'll see what I can figure out as far as what's happening on your machine.

Comment 6 Brock Organ 2001-08-31 14:43:29 UTC
Created attachment 30317 [details]
Helge Skrivervik's email with additional information

Comment 7 Doug Ledford 2001-11-22 18:09:22 UTC
Quoting an email from Helge:

Doug - i promised to get back to you on this issue as soon as I got the chance.
The usual problem is of course to make a critical system available for
experimentation.
Well, to make a long story short, we decided to upgrade the motherboard, and
(with the 2.4.14 kernel) the problem is gone.  In other words most likely a
hardware problem just like you suggested initially.
In any case - thanks for your time!


Note You need to log in before you can comment on or make changes to this bug.