Bug 30980 - (440GX)Boot freezes when trying to insmod DAC960
(440GX)Boot freezes when trying to insmod DAC960
Status: CLOSED ERRATA
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
7.1
i686 Linux
high Severity high
: ---
: ---
Assigned To: Doug Ledford
Brock Organ
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2001-03-07 14:29 EST by Daniel Senie
Modified: 2005-10-31 17:00 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2003-06-09 11:11:17 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Daniel Senie 2001-03-07 14:29:04 EST
For this test, I used a Mylex AcceleRAID 150, which has been supported for 
a very long time. Unlike the experience with the 170, the 150 WAS 
recognized by the installer, and resulted in the DAC960 driver being 
insmod'd. When that happened, I happened to be on VC4, and saw the 
following message:

<5>DAC960: ***** DAC960 RAID Driver Version 2.4.10 of 1 February 2001 
******

whereupon the machine locked up solid.

I suspect there's some issue with the driver vs. this controller, but I 
don't have a good way to test that.

The loader system doesn't recognize newer cards (see another bug I 
entered), so I was trying to load with the older card so I could add the 
sense info and get the newer card working. That didn't work either. Seems 
like things are going from bad to worse...
Comment 1 Daniel Senie 2001-03-07 16:26:58 EST
I should note this bug, and testing with the present RedHat 6.2 (including 
patches, burned to a new CD) indicate RedHat installation no longer supports 
installation to the Mylex controllers.

I do hope RedHat will be interested in fixing this situation with Wolverine.
Comment 2 Glen Foster 2001-03-07 16:55:20 EST
Locking up when a module is insmod'ed sounds more like a kernel issue
Comment 3 Daniel Senie 2001-03-07 17:03:25 EST
OK. I've changed it over to Kernel.
Comment 4 David Lawrence 2001-03-13 15:18:40 EST
I cannot replicate this problem with the latest kernel build in our devel tree.
You can try installing 7.0 on your machine and then apply the newer 2.4 kernel
from our rawhide tree to test that it does work with that card.

ftp://ftp.redhat.com/rawhide/i386/RedHat/RPMS/kernel-2.4.2-0.1.25.i386.rpm
Comment 5 Michael K. Johnson 2001-03-13 16:39:29 EST
There have been no driver changes in the DAC960 driver between
Wolverine and our latest tree.
Comment 6 Arjan van de Ven 2001-04-07 20:15:14 EDT
Fixed in anaconda (or worked around the hardware bug in anaconda
because it can't be done in the kernel, depending on how you look
at it).
Comment 7 Daniel Senie 2001-04-17 17:58:02 EDT
Well, I hate to tell you this, but in RedHat 7.1, the problem is STILL broken. 
On a system with a Mylex AcceleRAID 170 and an on-board AIC7xxx (L440GX+ 
motherboard), the system locks up solid. When this happened, the screen was 
indicating it was loading the DAC960 driver.

I have an EXCELLENT test system for testing this bug. I will offer again: If 
you build an ISO image for a fixed RedHat 7.1 disc for me to test, I'll be VERY 
happy to burn it and test it.

This is an absolute show-stopper. We'd been planning to upgrade many systems to 
7.1, but all systems contain Mylex AcceleRAID cards of various models (150, 
250, and 170 models).
Comment 8 Don Munroe 2001-04-30 14:28:19 EDT
I have the same problem on a VA linux Fullon 2250 server.  
The server has an Intel L440GX+ motherboard with an onboard Adaptec AIC-7896N (aic7xxx) controller and a mylex DAC1164P Raid controller.
I have 3 73GB disks on the mylex controller in a RAID 5 config.  No devices on the Adaptec controller.
(I have also tried a Mylex DAC960 controller with the same result)
In all modes (expert, text, etc) the aic7xxx and the dac960 drivers are loaded.
The aic7xxx module takes a long time to initialize.  Switching to virtual console 4 I can see that it is probing each channel, id and lun with a message 
like <4> scsi : aborting command due to timeout : pid 0, scsi 1, channel 0, id 14, lun 0 0x12 00 00 00 ff 00

Then the DAC960 module loads with a message on VC4 like
<5>DAC960: ***** DAC960 RAID Driver Version 2.4.10 of 1 February 2001 *****
<5>DAC960: Copyright ...

And the system hangs there forever!
Please HELP...
Comment 9 Arjan van de Ven 2001-04-30 15:11:29 EDT
Machines based on the L440GX+ reference design seem to have a bug / problem 
wrt interrupts. We are currently investigating this and will release a new
installer-floppy to fix this as soon as possible.
Comment 10 Don Munroe 2001-04-30 15:32:37 EDT
I found a work-around....

At the installer syslinux prompt, going into 'linux expert noprobe' mode doesn't autoload the aic7xxx or the dac960 driver.

Manually adding the DAC960 device when prompted works!  I'm installing 7.1 on a VA Linux Fullon 2x2 2250 right now...

Must be a conflict between the aic7xxx and the dac960 driver?  Could the aic7xxx probing get the Mylex DAC960 / 1164P into an unstable state?

Yay!
Comment 11 Daniel Senie 2001-05-03 11:20:33 EDT
There's another bug listing the odd behavior of the AIC7xxx driver. Most of us 
don't need that driver regardless, since we're using RAID controllers (those 
who care about this particular issue).

I agree the DAC960 driver in Anaconda is working fine, and that the aic7xxx 
driver is the real culprit, doing some sort of damage. I was able to install 
using:

     text noprobe

from the boot disk/cdrom startup. As noted, select the DAC960 driver, and do 
NOT select the AIC7xxx driver, and things work just great. I do wish Intel had 
a way to completely disable the AIC7xxx on the Lancewood motherboard (it is 
possible to do so on the newer SLT2 motherboard, BTW).

So, for the RedHat folks: Please chase down the AIC7xxx problems. You may want 
to add a work-around note to the appropriate area of the Support website for 
folks who need to do DAC960 installs.

It would be nice if a proper fix were made (e.g. put the AIC7xxx driver out of 
its misery).
Comment 12 Daniel Senie 2001-05-03 18:05:04 EDT
OK. Time to follow up my own comment with another... It didn't work...

using "text noprobe" I was able to complete an installation of RedHat onto a 
system with an AcceleRAID 170. Clearly, eliminating the AIC7xxx driver is the 
key to this phase. The problem doesn't end there, though...

Now I reboot the system, and during the boot from hard disk, the kernel 
attempts to probe the SCSI buses of the AIC7xxx chip on the Lancewood (L440GX+) 
motherboard, and falls over dead. Anyone know if there's a magic incantation to 
put on the linux boot line to tell it NOT to load the aic7xxx driver? If that's 
possible, then the next question is what to put into the lilo config or 
elsewhere (or just build a custom kernel without the AIC7xxx driver).

Seems like we've got a way to go before this is cleanly resolved.
Comment 13 Don Munroe 2001-05-03 18:56:47 EDT
Hmmm... My install went fine using text noprobe?  

My system booted fine after the install.  (L440GX+)
The strange this is, the aic7xxx module still loads, even when it's not 
in /etc/modules.conf  (See below)

The one thing I have done is disabled the Adaptec Bios using the <Ctrl-A> 
thing.  It was under an Advanced menu.  (I did that before discovering the text 
noprobe option to the install though)

I can post my dmesg if that would help?

--Don
Here's my /etc/modules.conf:

[root@newproxy /root]# cat /etc/modules.conf
alias scsi_hostadapter DAC960
alias parport_lowlevel parport_pc
alias eth0 eepro100
alias eth1 3c59x
#alias scsi_hostadapter1 aic7xxx
#alias scsi_hostadapter2 aic7xxx
alias usb-controller usb-uhci

Comment 14 Alessandro 2001-10-04 17:44:04 EDT
Help me ! :-)

I've a Intel 440BX based (not GX) motherboard, an embedded Symbios 53C876 
(active but not used: there are no drive connected), two PIII-500 SMP cpu's, 1 
GB RAM, Mylex DAC960PJ with 64MB ram on board connected to 3 HD SCSI in raid5 
behaviour. No EIDE HD, only an EIDE CDROM and an intel 82555 based network 
embedded on the motherboard.

This PC has worked with RH 6.2 for 8 months without ANY problem.
I've had the BAD idea to install RH 7.1 from scratch.

I've read many many message here (like 29555 bug) and in the usenet about the 
freeze of system during installation of RH 7.1 : when PC load DAC960.o it 
stopped to work. It is not freeze (ALT-F2 works) but installation stopped.

With ALT-F3 I see:
"* going to insmod DAC960.o (path is NULL)"
and nothing else

With ALT-F4 :
"<6>PCI: Assigned IRQ 11 for device 00:0c.1"
"<5>DAC960:  **** DAC960 RAID Driver etc. etc."
"<5>DAC960: Copyright 1998-2001 by etc. etc."
and nothing else

I've tried with "boot: linux text noprobe" and say to load only DAC960, but it 
doesn't work. Also with "boot: linux pci=biosirq".

I've tried the boot.img suggested from RedHat (in 29555 bug report)(only for 
GX chipset to workaroud APIC bug) but it doesn't load DAC960 driver (it seems 
not to be there). http://people.redhat.com/dledford/440gx/boot.img

I see a message during kernel boot:
"Warning only 896 MB will be used" (why ?)
..(omissis)..
"PCI: Probing PCI hardware"
"Unknown bridge resource 0 : assuming transparent"
"Unknown bridge resource 1 : assuming transparent"
"Unknown bridge resource 2 : assuming transparent"
"PCI: using IRQ router PIIX [8086/7110] at 00:12.0"
"PCI: Cannot allocate resource region 4 of device 00:12.1"

And it not detect the second CPU.

Any idea ?

If I install an HD IDE, made installation on it, when system works (with newer 
kernel, i.e. 2.4.10) then I copy installation on RAID5 and remove ide HD ?

Remove for the install process the second CPU ?

help :-)
Comment 15 Need Real Name 2001-10-11 12:31:01 EDT
We are installing RH7.1 on a VALinux 2140 machine that uses a Mylex dac960 raid
controller. Trying a normal install, it just hangs while loading. I have also
toyed "text noprono probe" loading just the DAC960 driver by itself, with the
same result.

We also have an adaptec scsi controller, but we have already tryed disabling it
during troubleshooting with the same result.

We noticed that the BOIS and firmware versions on the DAC960 currently are about
2-3 years old, but are unable to find new drivers. The post above discribes our
issue almost to a T.  Just letting you we are having the issue too.  We are
waiting on this to move from NT to Linux for our mail.
Comment 16 Michael K. Johnson 2001-10-11 14:05:11 EDT
>I've tried the boot.img suggested from RedHat (in 29555 bug report)(only for 
>GX chipset to workaroud APIC bug) but it doesn't load DAC960 driver (it seems 
>not to be there). http://people.redhat.com/dledford/440gx/boot.img

Doug, looks like that image may need updating to include the DAC960 driver?

Fundamentally, all this is an attempt to work around bugs in Intel's
BIOS.
Comment 17 Doug Ledford 2001-10-12 16:00:47 EDT
The DAC960 driver should be on the image already.  I didn't change the
modules.cgz ball, I just put a new kernel on there.  It may not be autodetected
(which is something the other reports have mentioned with some models of the
DOC960 card).  I would try the linux noprobe option with the new boot disk from
my site, then select the DAC960 from the list (it should be there).
Comment 18 Need Real Name 2001-11-16 01:39:57 EST
I've run into the same problem with AcceleRAID 170 and 7.1/7.2.
I did manage to get the module loaded with an ide drive with 7.1 2.4.2
preinstalled,it took over 2 minuts to load and recognise the drive (3 18gig
cheetas in raid5), but then things fizzled out after an attempt of actually
writing to the disks. umount for each of /dev/rd/c0d0pN took over 120 sec.

I tryed AMI Express 500 (megaraid.o) prior to trying Mylex, with even less of
success. Drive would get recognised, fdisk can be run.Partition seems to be
writen untill you try to run mke2fs, at wich point the partition table goes
away. It seems that hardware RAID and 7.1/7.2 do seem to compatible unless you
are IBM.
Sorry if I'm getting pissy, but just spent 7 hours banging my head agains the
problem.

BTW,the board is tyan s2505, via chipset dual PIII without aic7xxx adapter
present.
Comment 19 Need Real Name 2002-01-22 02:36:01 EST
I am in the same boat as the rest of these people. DAC960, Intel L440GX, AIC 
7xxx, VA Linux 2240. I did download the boot disk from 
http://people.redhat.com/dledford/440gx/boot.img and used linux expert noprobe 
which allowed me to select DAC960, however the install still hangs. I can 
install redhat 7.0 on the same box however. What changes were made between 7.0 
and 7.1/7.2 that woulds affect the DAC960 or AIC7xxx driver? I also downloaded 
the latest bios/firmware/boot code/assistant from www.mylex.com for the DAC960 
PRL but that didn't help either. Looks like there is a lot of people with this 
problem that has now spanned 2 distributions. When, if ever, will there be a 
fix?
Comment 20 Need Real Name 2002-01-22 02:40:55 EST
BTW, if you want to see a syslog from my install attempt using 7.2 look at 
http://www.billpratt.net/downloads/syslog.txt
Comment 21 Need Real Name 2002-01-23 03:36:57 EST
I have the solution. I have tested it on 5 boxes now and it works perfect. This 
should also work for bug #29555 as well. When you look at the beginning of an 
install syslog that is seeing this problem you can tell where the kernel does 
not recognize the apic and therefore regards it an unknown resource (see below):

<3>Unknown bridge resource 0: assuming transparent
<3>Unknown bridge resource 1: assuming transparent
<3>Unknown bridge resource 2: assuming transparent
<3>Unknown bridge resource 0: assuming transparent
<3>Unknown bridge resource 1: assuming transparent
<3>Unknown bridge resource 2: assuming transparent
<3>Unknown bridge resource 0: assuming transparent
<3>Unknown bridge resource 1: assuming transparent
<3>Unknown bridge resource 2: assuming transparent

The kernel just needs to be told to look for the apic and there is no need for 
noprobe, expert or text options. The full instructions are below and have been 
tested on 5 different VA Linux 2240 servers with the following setup:

MB: Intel Lancewood L440GX+
PROCS: DUAL Intel PIII 600E's
RAM: 2GB ECC SDRAM
HD: 4 x Quantam  18.2GB 10,000rpm
SCSI: AIC7xxx (no drives)
RAID: Mylex AcceleRaid DAC960PRL



HOWTO Install Linux 2.4.0 with an Intel Lancewood (L440GX+) MainBoard 
onto a MYLEX DAC960PRL (AcceleRAID 150) RAID Controller

1. Update the bios to the latest available for the L440GX 
	Available at http://downloadfinder.intel.com/scripts-df/Detail_Desc.asp?
ProductID=309&DwnldID=2550

2. Upragade your mylex controller to the latest "boot 
code", "bios", "EzAssist", "firmware".
	Updates available at http://www.mylex.com/support/productgd/index.html

3. Using the RedHat 7.2 CD, at the "boot:" prompt type "linux apic" 
   and select the DAC960PRL without the AIC7xxx

4. RedHat should begin the install without a problem.

PLEASE REPORT SUCCESS/FAILURES TO billp@BillPratt.net !

William Pratt
Unix Systems Engineer
http://www.billpratt.net/
Comment 22 Need Real Name 2002-01-23 03:41:21 EST
***THIS FIXES A SMALL ERROR IN STEP 3. THERE IS NO NEED TO ONLY SELECT THE 
DAC960***

sorry for the typo :)

HOWTO Install Linux 2.4.0 with an Intel Lancewood (L440GX+) MainBoard 
onto a MYLEX DAC960PRL (AcceleRAID 150) RAID Controller

1. Update the bios to the latest available for the L440GX 
	Available at http://downloadfinder.intel.com/scripts-df/Detail_Desc.asp?
ProductID=309&DwnldID=2550

2. Upragade your mylex controller to the latest "boot 
code", "bios", "EzAssist", "firmware".
	Updates available at http://www.mylex.com/support/productgd/index.html

3. Using the RedHat 7.2 CD, at the "boot:" prompt type "linux apic" 

4. RedHat should begin the install without a problem.

PLEASE REPORT SUCCESS/FAILURES TO billp@BillPratt.net !

William Pratt
Unix Systems Engineer
http://www.billpratt.net/
Comment 23 Alan Cox 2003-06-09 11:11:17 EDT
Red Hat 9 now uses info we finally got from info to avoid the need for this. If
you still have problems with the RH9 or current errata kernel please reopen the
bug and include dmidecode data.
Comment 24 Daniel Senie 2003-06-09 13:36:03 EDT
Thanks to the engineering folks for getting the dmidecode info incorporated 
into RH9. It's the first release since RH7 (and first 2.4 kernel release) 
which installs without tricks on Lancewood motherboards. Solving this issue is 
sincerely appreciated by those of us who still have much hardware based on 
that platform (and hey, the stuff's working well and doing its job, so why 
replace it!).


Note You need to log in before you can comment on or make changes to this bug.