From Bugzilla Helper: User-Agent: Mozilla/4.78 [en] (X11; U; OSF1 V5.1 alpha) Description of problem: System: C440GX+ Production Release 7 BIOS Build 104 Dual (PIII) Xeon 550 AIC7896 v2.20S1B1 SCSI devices: Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: SEAGATE Model: ST39236LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: SEAGATE Model: ST39236LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 06 Lun: 00 Vendor: ESG-SHV Model: SCA HSBP M6 Rev: 0.63 Type: Processor ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: TOSHIBA Model: CD-ROM XM-6401TA Rev: 1001 Type: CD-ROM ANSI SCSI revision: 02 Description: ========= I encountered the hang while loading the the aic7xxx driver during the initial install. I read the "7.1 gotchas", downloaded the alternate boot diskette image for the installer, booted from that using "linux apic", and was able to proceed through the install. The RH 7.1 installer automatically elected to install the 2.4.2-2smp kernel, though it did not show up as "checked" in the graphical installer. It also installed the 2.4.2-2 kernel, which I expected. The installer made the 2.4.2-2smp kernel the default. Booting from that default (2.4.2-2smp) kernel works fine. If I try to boot from the `linux-up' entry (the 2.4.2-2 kernel) *or* the boot diskette that was made at the end of the RH 7.1 install, however, I get SCSI timeouts for everything that's probed, and the boot fails. The timeouts look like: scsi: aborting command due to timeout: pid 0, scsi0, channel 0, id N, lun 0 Inquiry 00 00 00 ff 00 If there is a device at whatever id is being probed (in this case scsi0 ids 0,1,6 and scsi1 id 0) then theres a single timeout for that id and then the device inquiry info shows. If there is no device at the id being probed, the timeout shows up twice for that ID. After IDs 0-6 & 8-15 are probed for scsi0 and scsi1, additional SCSI messages appear Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0 scsi: abort pid 0, scsi 0, channel 0, id 0, lun 0 Test Unit Ready 00 00 00 00 00 (scsi0:0:0:0) SCSISIGI 0xb4, SEQADDR 0xbe, SSTAT0 0x0, SSTAT1 0x2 (scsi0:0:0:0) SGCACHEPTR 0x2, SSTAT2 0x0, STCNT 0x0 and then repeating messages about resetting, followed eventually by "trying harder". I can input those as well, if they're important. If I build a boot diskette using the 2.4.2-2smp kernel, the boot diskette works, so it appears that the reason that both the `linux-up' entry and the boot diskette fail is that they're both using the stock `2.4.2-2' uniprocessor kernel. The main reason I'm reporting all this is that I think the RedHat 7.1 "gotchas" page that talks about how to work around the installer problem should be updated to indicate that people should be especially sure to test their boot diskette after the RH 7.1 install, to make sure it works! Otherwise, they might be in for a nasty surprise if they need to boot from their installer-created boot diskette, and it doesn't work. How reproducible: Always Steps to Reproduce: Install RH 7.1 on a dual processor box with the AIC 7896 SCSI controller, using the workaround described in the RH 7.1 gotchas page. The install and subsequent reboots *from the default smp kernel* work fine. Reboots from the `linux-up' kernel or the kernel on the boot diskette (same kernel) fail, with SCSI aborts and timeouts. Actual Results: Couldn't boot off the installer-generated boot diskette or the linux-up boot choice Expected Results: Should be able to boot off of any of the kernel images. Additional info:
Thanks for the report. First of all, the real cause is a bug in the bios for which your hardware vendor should provide an update. (Several bioswrites for the 440GX promised us an updated bios when we proved the bug to them). Second, I would recommend updating to the 2.4.3-12 kernel which has the "noapic" by default and use that for creating a bootfloppy. And indeed, it would be good if this were documented on the gotscha's page, and I'll figure out a way to get this info there.
arjanv- Thanks for your comments and information. I had the i686 kernel updates from updates.redhat.com, so I loaded the 2.4.3-12, 2.4.3-12smp, and 2.4.3-12enterprise kernels this afternoon on that system, and added them as boot options to lilo (I made initrd images for each of them). The 2.4.3-12 uniprocessor kernel exhibited the same behavior as the 2.4.2-2 up kernel -- SCSI timeouts. I then went to Intel's web site, wandered around until I found http://developer.intel.com/support/motherboards/server/c440gx/index.htm and then selected the `Software & Drivers' page and downloaded the `BIOS8' update for that C440GX+ motherboard. I updated the BIOS, using the procedure documented with the updater, and now have a motherboard that reports that it's "Production Release 8" and "BIOS Build 106". Even with this BIOS, booting the 2.4.2 or 2.4.3 uniprocessor kernels still results in the SCSI timeouts. Using the smp or enterprise kernels from either 2.4.2 or 2.4.3 works as expected -- no timeouts. Anything else I should check/do/try?
We have a list of bioses in the kernel that need the "apic" parameter (as all these problems are caused by a bios bug), and apparently your bios is not yet on that list. I need a few bits of information in order to add your bios to the list; could you please download http://people.redhat.com/arjanv/dmidecode.c and as root: gcc dmidecode.c -o dmidecode ./dmidecode | mail -s "needs apic" hardwarebugs-list (note the later sends an email with your bios information; I recommend running the dmidecode program without the | mail part to check the information you will send.) I only need the first 20 lines or so, the rest is not relevant. Thanks.
To add to this - I have just failed to install RH 7.1 onto an Intel L440GX+ system. I too have read the gotchas page and the new boot image has allowed me to install 7.1, with the SMP kernel, however the system is unbootable, as above. Lots of "aborting command due to timeout" messages. I have also tried the Enterprise kernel, with the same result. Bios revision, Production Release 14.3 AIC 7896 v2.57S2B3. scsi id 1: Fujitsu MAJ3182MC scsi id 2: Fujitsu MAJ3182MC Obviously the systenm is nicely unusable now so we will be reverting back to 7.0, but I thought you'd like the info, because its obviously a long way from being fixed.
This is not something we can FIX. It's an intel bios bug, and Intel will release a fixed bios soon, if they haven't already.
oh right, sorry....got the impression people were actually trying to do something about it.... No, Intel's latest bios update (April 2001) claims to have all sorts of updates to the adaptec scsi interface, but it doesn't solve the problem...indeed, Intel steer well clear of claiming Linux will work on the motherboard and haven't even tested it!
They told us different. And we do try to work around this brokenness, with the "apic" option etc etc, but there is a limit to what can be done.
I am also still struggling to get 7.1 installed on an Intel 440 running 6.2, for which I have just dowloaded a new BIOS from intel.com. It shows now: L440GX+ Production Release 14.3 BIOS Build 133 Adaptec AIC-7896 SCSI BIOS v257S2B3 Apparently no newer BIOS is available. I've downloaded the http://people.redhat.com/dledford/440gx/bootnet.img disk and booted with 'linux apic'. This brings me past the aic7xxx loading but the installer later terminated with an anaconda error/dump when trying to perform an 'upgrade'. It lets me select NFS, configure network etc and then produced an anaconda dump (included as attachment to this comment). There is not yet a shell on Alt-F2, Alt-F3 displays '* no IDE floppy devices found' and Alt-F4 displays 3 times '<6>cdrom: open failed.'
Created attachment 32118 [details] anaconda dump
I've also produced a 'dmidecode' dump, if anyone is interested I'll upload it as attachment.
Red Hat now uses some alternative fixes we finally managed to get out of Intel.