Bug 30980
Summary: | (440GX)Boot freezes when trying to insmod DAC960 | ||
---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Daniel Senie <dts> |
Component: | kernel | Assignee: | Doug Ledford <dledford> |
Status: | CLOSED ERRATA | QA Contact: | Brock Organ <borgan> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.1 | CC: | alan, billp, don_munroe, paul.van.de.griendt, rhartzog |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2003-06-09 15:11:17 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Daniel Senie
2001-03-07 19:29:04 UTC
I should note this bug, and testing with the present RedHat 6.2 (including patches, burned to a new CD) indicate RedHat installation no longer supports installation to the Mylex controllers. I do hope RedHat will be interested in fixing this situation with Wolverine. Locking up when a module is insmod'ed sounds more like a kernel issue OK. I've changed it over to Kernel. I cannot replicate this problem with the latest kernel build in our devel tree. You can try installing 7.0 on your machine and then apply the newer 2.4 kernel from our rawhide tree to test that it does work with that card. ftp://ftp.redhat.com/rawhide/i386/RedHat/RPMS/kernel-2.4.2-0.1.25.i386.rpm There have been no driver changes in the DAC960 driver between Wolverine and our latest tree. Fixed in anaconda (or worked around the hardware bug in anaconda because it can't be done in the kernel, depending on how you look at it). Well, I hate to tell you this, but in RedHat 7.1, the problem is STILL broken. On a system with a Mylex AcceleRAID 170 and an on-board AIC7xxx (L440GX+ motherboard), the system locks up solid. When this happened, the screen was indicating it was loading the DAC960 driver. I have an EXCELLENT test system for testing this bug. I will offer again: If you build an ISO image for a fixed RedHat 7.1 disc for me to test, I'll be VERY happy to burn it and test it. This is an absolute show-stopper. We'd been planning to upgrade many systems to 7.1, but all systems contain Mylex AcceleRAID cards of various models (150, 250, and 170 models). I have the same problem on a VA linux Fullon 2250 server. The server has an Intel L440GX+ motherboard with an onboard Adaptec AIC-7896N (aic7xxx) controller and a mylex DAC1164P Raid controller. I have 3 73GB disks on the mylex controller in a RAID 5 config. No devices on the Adaptec controller. (I have also tried a Mylex DAC960 controller with the same result) In all modes (expert, text, etc) the aic7xxx and the dac960 drivers are loaded. The aic7xxx module takes a long time to initialize. Switching to virtual console 4 I can see that it is probing each channel, id and lun with a message like <4> scsi : aborting command due to timeout : pid 0, scsi 1, channel 0, id 14, lun 0 0x12 00 00 00 ff 00 Then the DAC960 module loads with a message on VC4 like <5>DAC960: ***** DAC960 RAID Driver Version 2.4.10 of 1 February 2001 ***** <5>DAC960: Copyright ... And the system hangs there forever! Please HELP... Machines based on the L440GX+ reference design seem to have a bug / problem wrt interrupts. We are currently investigating this and will release a new installer-floppy to fix this as soon as possible. I found a work-around.... At the installer syslinux prompt, going into 'linux expert noprobe' mode doesn't autoload the aic7xxx or the dac960 driver. Manually adding the DAC960 device when prompted works! I'm installing 7.1 on a VA Linux Fullon 2x2 2250 right now... Must be a conflict between the aic7xxx and the dac960 driver? Could the aic7xxx probing get the Mylex DAC960 / 1164P into an unstable state? Yay! There's another bug listing the odd behavior of the AIC7xxx driver. Most of us don't need that driver regardless, since we're using RAID controllers (those who care about this particular issue). I agree the DAC960 driver in Anaconda is working fine, and that the aic7xxx driver is the real culprit, doing some sort of damage. I was able to install using: text noprobe from the boot disk/cdrom startup. As noted, select the DAC960 driver, and do NOT select the AIC7xxx driver, and things work just great. I do wish Intel had a way to completely disable the AIC7xxx on the Lancewood motherboard (it is possible to do so on the newer SLT2 motherboard, BTW). So, for the RedHat folks: Please chase down the AIC7xxx problems. You may want to add a work-around note to the appropriate area of the Support website for folks who need to do DAC960 installs. It would be nice if a proper fix were made (e.g. put the AIC7xxx driver out of its misery). OK. Time to follow up my own comment with another... It didn't work... using "text noprobe" I was able to complete an installation of RedHat onto a system with an AcceleRAID 170. Clearly, eliminating the AIC7xxx driver is the key to this phase. The problem doesn't end there, though... Now I reboot the system, and during the boot from hard disk, the kernel attempts to probe the SCSI buses of the AIC7xxx chip on the Lancewood (L440GX+) motherboard, and falls over dead. Anyone know if there's a magic incantation to put on the linux boot line to tell it NOT to load the aic7xxx driver? If that's possible, then the next question is what to put into the lilo config or elsewhere (or just build a custom kernel without the AIC7xxx driver). Seems like we've got a way to go before this is cleanly resolved. Hmmm... My install went fine using text noprobe? My system booted fine after the install. (L440GX+) The strange this is, the aic7xxx module still loads, even when it's not in /etc/modules.conf (See below) The one thing I have done is disabled the Adaptec Bios using the <Ctrl-A> thing. It was under an Advanced menu. (I did that before discovering the text noprobe option to the install though) I can post my dmesg if that would help? --Don Here's my /etc/modules.conf: [root@newproxy /root]# cat /etc/modules.conf alias scsi_hostadapter DAC960 alias parport_lowlevel parport_pc alias eth0 eepro100 alias eth1 3c59x #alias scsi_hostadapter1 aic7xxx #alias scsi_hostadapter2 aic7xxx alias usb-controller usb-uhci Help me ! :-) I've a Intel 440BX based (not GX) motherboard, an embedded Symbios 53C876 (active but not used: there are no drive connected), two PIII-500 SMP cpu's, 1 GB RAM, Mylex DAC960PJ with 64MB ram on board connected to 3 HD SCSI in raid5 behaviour. No EIDE HD, only an EIDE CDROM and an intel 82555 based network embedded on the motherboard. This PC has worked with RH 6.2 for 8 months without ANY problem. I've had the BAD idea to install RH 7.1 from scratch. I've read many many message here (like 29555 bug) and in the usenet about the freeze of system during installation of RH 7.1 : when PC load DAC960.o it stopped to work. It is not freeze (ALT-F2 works) but installation stopped. With ALT-F3 I see: "* going to insmod DAC960.o (path is NULL)" and nothing else With ALT-F4 : "<6>PCI: Assigned IRQ 11 for device 00:0c.1" "<5>DAC960: **** DAC960 RAID Driver etc. etc." "<5>DAC960: Copyright 1998-2001 by etc. etc." and nothing else I've tried with "boot: linux text noprobe" and say to load only DAC960, but it doesn't work. Also with "boot: linux pci=biosirq". I've tried the boot.img suggested from RedHat (in 29555 bug report)(only for GX chipset to workaroud APIC bug) but it doesn't load DAC960 driver (it seems not to be there). http://people.redhat.com/dledford/440gx/boot.img I see a message during kernel boot: "Warning only 896 MB will be used" (why ?) ..(omissis).. "PCI: Probing PCI hardware" "Unknown bridge resource 0 : assuming transparent" "Unknown bridge resource 1 : assuming transparent" "Unknown bridge resource 2 : assuming transparent" "PCI: using IRQ router PIIX [8086/7110] at 00:12.0" "PCI: Cannot allocate resource region 4 of device 00:12.1" And it not detect the second CPU. Any idea ? If I install an HD IDE, made installation on it, when system works (with newer kernel, i.e. 2.4.10) then I copy installation on RAID5 and remove ide HD ? Remove for the install process the second CPU ? help :-) We are installing RH7.1 on a VALinux 2140 machine that uses a Mylex dac960 raid controller. Trying a normal install, it just hangs while loading. I have also toyed "text noprono probe" loading just the DAC960 driver by itself, with the same result. We also have an adaptec scsi controller, but we have already tryed disabling it during troubleshooting with the same result. We noticed that the BOIS and firmware versions on the DAC960 currently are about 2-3 years old, but are unable to find new drivers. The post above discribes our issue almost to a T. Just letting you we are having the issue too. We are waiting on this to move from NT to Linux for our mail. >I've tried the boot.img suggested from RedHat (in 29555 bug report)(only for
>GX chipset to workaroud APIC bug) but it doesn't load DAC960 driver (it seems
>not to be there). http://people.redhat.com/dledford/440gx/boot.img
Doug, looks like that image may need updating to include the DAC960 driver?
Fundamentally, all this is an attempt to work around bugs in Intel's
BIOS.
The DAC960 driver should be on the image already. I didn't change the modules.cgz ball, I just put a new kernel on there. It may not be autodetected (which is something the other reports have mentioned with some models of the DOC960 card). I would try the linux noprobe option with the new boot disk from my site, then select the DAC960 from the list (it should be there). I've run into the same problem with AcceleRAID 170 and 7.1/7.2. I did manage to get the module loaded with an ide drive with 7.1 2.4.2 preinstalled,it took over 2 minuts to load and recognise the drive (3 18gig cheetas in raid5), but then things fizzled out after an attempt of actually writing to the disks. umount for each of /dev/rd/c0d0pN took over 120 sec. I tryed AMI Express 500 (megaraid.o) prior to trying Mylex, with even less of success. Drive would get recognised, fdisk can be run.Partition seems to be writen untill you try to run mke2fs, at wich point the partition table goes away. It seems that hardware RAID and 7.1/7.2 do seem to compatible unless you are IBM. Sorry if I'm getting pissy, but just spent 7 hours banging my head agains the problem. BTW,the board is tyan s2505, via chipset dual PIII without aic7xxx adapter present. I am in the same boat as the rest of these people. DAC960, Intel L440GX, AIC 7xxx, VA Linux 2240. I did download the boot disk from http://people.redhat.com/dledford/440gx/boot.img and used linux expert noprobe which allowed me to select DAC960, however the install still hangs. I can install redhat 7.0 on the same box however. What changes were made between 7.0 and 7.1/7.2 that woulds affect the DAC960 or AIC7xxx driver? I also downloaded the latest bios/firmware/boot code/assistant from www.mylex.com for the DAC960 PRL but that didn't help either. Looks like there is a lot of people with this problem that has now spanned 2 distributions. When, if ever, will there be a fix? BTW, if you want to see a syslog from my install attempt using 7.2 look at http://www.billpratt.net/downloads/syslog.txt I have the solution. I have tested it on 5 boxes now and it works perfect. This should also work for bug #29555 as well. When you look at the beginning of an install syslog that is seeing this problem you can tell where the kernel does not recognize the apic and therefore regards it an unknown resource (see below): <3>Unknown bridge resource 0: assuming transparent <3>Unknown bridge resource 1: assuming transparent <3>Unknown bridge resource 2: assuming transparent <3>Unknown bridge resource 0: assuming transparent <3>Unknown bridge resource 1: assuming transparent <3>Unknown bridge resource 2: assuming transparent <3>Unknown bridge resource 0: assuming transparent <3>Unknown bridge resource 1: assuming transparent <3>Unknown bridge resource 2: assuming transparent The kernel just needs to be told to look for the apic and there is no need for noprobe, expert or text options. The full instructions are below and have been tested on 5 different VA Linux 2240 servers with the following setup: MB: Intel Lancewood L440GX+ PROCS: DUAL Intel PIII 600E's RAM: 2GB ECC SDRAM HD: 4 x Quantam 18.2GB 10,000rpm SCSI: AIC7xxx (no drives) RAID: Mylex AcceleRaid DAC960PRL HOWTO Install Linux 2.4.0 with an Intel Lancewood (L440GX+) MainBoard onto a MYLEX DAC960PRL (AcceleRAID 150) RAID Controller 1. Update the bios to the latest available for the L440GX Available at http://downloadfinder.intel.com/scripts-df/Detail_Desc.asp? ProductID=309&DwnldID=2550 2. Upragade your mylex controller to the latest "boot code", "bios", "EzAssist", "firmware". Updates available at http://www.mylex.com/support/productgd/index.html 3. Using the RedHat 7.2 CD, at the "boot:" prompt type "linux apic" and select the DAC960PRL without the AIC7xxx 4. RedHat should begin the install without a problem. PLEASE REPORT SUCCESS/FAILURES TO billp ! William Pratt Unix Systems Engineer http://www.billpratt.net/ ***THIS FIXES A SMALL ERROR IN STEP 3. THERE IS NO NEED TO ONLY SELECT THE DAC960*** sorry for the typo :) HOWTO Install Linux 2.4.0 with an Intel Lancewood (L440GX+) MainBoard onto a MYLEX DAC960PRL (AcceleRAID 150) RAID Controller 1. Update the bios to the latest available for the L440GX Available at http://downloadfinder.intel.com/scripts-df/Detail_Desc.asp? ProductID=309&DwnldID=2550 2. Upragade your mylex controller to the latest "boot code", "bios", "EzAssist", "firmware". Updates available at http://www.mylex.com/support/productgd/index.html 3. Using the RedHat 7.2 CD, at the "boot:" prompt type "linux apic" 4. RedHat should begin the install without a problem. PLEASE REPORT SUCCESS/FAILURES TO billp ! William Pratt Unix Systems Engineer http://www.billpratt.net/ Red Hat 9 now uses info we finally got from info to avoid the need for this. If you still have problems with the RH9 or current errata kernel please reopen the bug and include dmidecode data. Thanks to the engineering folks for getting the dmidecode info incorporated into RH9. It's the first release since RH7 (and first 2.4 kernel release) which installs without tricks on Lancewood motherboards. Solving this issue is sincerely appreciated by those of us who still have much hardware based on that platform (and hey, the stuff's working well and doing its job, so why replace it!). |