Bug 29555

Summary:

[aic7xxx] Installer hangs loading the aic7xxx module

Product:

[Retired] Red Hat Linux

Reporter:

Ray Parish <rparish>

Component:

kernel

Assignee:

Doug Ledford <dledford>

Status:

CLOSED DEFERRED

QA Contact:

Severity:

high

Docs Contact:

Priority:

high

Version:

7.1

CC:

a.camisasca, benb, bfuller, bob, brown9, bscott, bugzilla, canfield, carloseduard, cattelan, cbaudry, cdwom, ch, dale, dpm, dts, ecorreale, egbert, elsner, ewt, gerry.gilmore, goran.pocina, itay, jdenman, jdouglas, jfu, j.miltenberger, john.messina, johnsonm, jrfuller, jwright, kslack, lewis, marc.schmitt, matthias_haase, mdeale, mduncan, michaeltodd, michiel, mskinner, murray, paul, peter, pingjiewang, pvegulla, ralf, raybry, rhyde, robert.barton, roccor, rogate, rstaaf, scream, shishz, stimits, syncomm, systemadmin, tallship, westerj, wisethk

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2001-05-17 03:27:37 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Tried special install image--but still hangs. Work around by not using ZIP drive	none
still don't know how to deal with Adaptex 7896 bug	none
Workaround for installing on L440GX+	none

Description Ray Parish 2001-02-26 15:10:40 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)


virtual terminal alt-f4 shows
SCSI host 0 channel 0 reset (pid0) timed out -trying harder
SCSI bus is being reset for host 0 channel 0.

Reproducible: Always
Steps to Reproduce:
1.Boot off Disc1 of Wolverine
2.
3.
	

Actual Results:  Hans on install loading aci7xxx module

Expected Results:  continued on with installation

This doesn't happen when I install Fisher. Seems like maybe a newer 
version of the aic7xxx module is causing the problems.

Comment 1 Michael Fulbright 2001-02-26 16:31:50 UTC

Appears to be a kernel issue.

Comment 2 Doug Ledford 2001-02-26 18:53:56 UTC

I need to know what hardware you have in your system to be able to help out at
all.  I also need you to switch over to VT4 (by pressing ALT-F4) when the system
says it is loading the aic7xxx module and tell me what you see there.

Comment 3 Glen Foster 2001-02-26 23:17:24 UTC

Doug, is this a dup of bug 29266?

Comment 4 Doug Ledford 2001-02-26 23:56:48 UTC

No, different issues.

Comment 5 Ray Parish 2001-02-27 12:37:33 UTC

terminal v4 shows:

<4>scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 1, lun 
0 0x12 00 00 00 ff 00
<4>scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 1, lun 
0 0x12 00 00 00 ff 00
<4>scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 
0 0x12 00 00 00 ff 00
<4>scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 2, lun 
0 0x12 00 00 00 ff 00
<4>scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 3, lun 
0 0x12 00 00 00 ff 00

so on.. until id 15 then switches to the second scsi chaing scsi1 id0 and so on 
until id15
then 
<4>attached scsi disk sda at scsi0, channel 0, id 0, lun 0
<4>scsi : aborting command due to timeout : pid 0, scsi0, channel 0, id 0, lun0 
0x00 00 00 00 00 00
<6>(scsi0:0:0:0) SCSISIGI 0xa4, SEQADDR 0x162, SSTAT0 0x0, SSTAT1 0x2
<6>(scsi0:0:0:0) SG_CACHEPTR 0x2, SSTAT2 0x0, STCNT 0x0
<4>SCSI host 0 abort (pid 0) timed out - resetting
<4>SCSI bus is being reset for host 0 channel 0.
<4>SCSI host 0 abort (pid 0) timed out - resetting
<4>SCSI bus is being reset for host 0 channel 0.
<4>(scsi0:0:0:0) Device reset, Message buffer in use
<4>SCSI host 0 channel 0 reset (pid 0) timed out - trying harder...

and so on.. keeps looping.

Machine hardware is Dual Pentium III 600E with 256megs of Ram
Onboard Adaptec AIC-7896 Scsi Bios v2.20S1B1
Scsi Drive - Western Digital WDE18300 Ultra2-LVD (ch a, scsi id 0)

Let me know if you need more information.

Comment 6 Michael K. Johnson 2001-02-28 00:28:26 UTC

How about Bug #29304 -- is it similar to that?

Comment 7 Ray Parish 2001-02-28 18:43:55 UTC

Is it the kernel? I installed the 2.4.1 rpm kernel and don't have the problem. 
I also installed 2.4.2 kernel source and no problem.. seems that it is what 
kernel is on the install CD.

Comment 8 Richard Hyde 2001-03-07 17:15:39 UTC

I have the same issue with a VA Linux Fullon 2230 with 2x AIC7xxx u2w
controllers and 2x u2w-LVD disks.

Comment 9 Christopher Penney 2001-03-16 14:25:10 UTC

I have the same problem with a cluster of SGI 1200L boxes.

Chris Penney
penney.edu

Comment 10 Lance A. Brown 2001-03-30 17:51:45 UTC

I am experiencing the same problem with a dual P2 400Mhz system on an Intel
440BX motherboard with an AHA-2944 SCSI card with 2 DEC DLT mini-libraries
attached.  This system runs Red Hat 6.2 Enterprise Edition fine.

Comment 11 Doug Ledford 2001-04-03 08:50:54 UTC

To the original poster, the problem sounds like IRQ routing issues, which aren't
an aic7xxx problem actually.  It makes me wonder if the SMP kernel from the
install would work when the install kernel wouldn't.

To Michaels question about bug #29304, they aren't the same.

To rhyde, the VA Linux box problem is known, and I've asked VA to give me an
accurate problem report so I could fix it, but they are planning on
using/supporting the new aic7xxx driver and don't want to invest the extra time
to help me fix mine, so it likely won't get resolved.  Use the new aic7xxx
driver instead (boot up the installer with the noprobe option, then select the
"New Adaptec ..." from the SCSI driver list).

To Chris Penney, I have no clue what hardware is in the SGI boxes, and without a
hardware list and an accurate report of how it fails I won't be able to do
anything about it.

To brown9, your system uses a 2944 card which has a totally different chipset
than the 7896 and because it's a card and not a motherboard device, it also uses
totally different IRQ routing, and since the original poster's problem sounds
like an IRQ problem, your's is likely not related.  Please open a different bug
report about your problem if it still exists in the latest ISOs and also include
a report of how the failure is manifest by getting the kernel spewage off of
vt4.

Comment 12 Need Real Name 2001-04-18 21:57:53 UTC

yep, me too

VA fullon 22xx, AIX7896 on board.

same timeouts as rparish.

This is with seawolf ( and i'm assuming seawolf is the release version of 7.1)

Comment 13 Need Real Name 2001-04-19 13:20:48 UTC

I've the same problem.
Intel Lancewood mainbaord (L440GX+)in an Intel 2150 Case (Code name Byrd)
One seagate dive ST39103LC in slot one.

Fails to load diver with Seawolf but works fine with 6.2 and 7.0 (We have about 
50 servers based on this mainboard and have never had a problem before.)
Symtoms are just like the above it just sits there reseting the scsi bus...
I can supply BIOS versions etc if you'd like....

Comment 14 Doug Ledford 2001-04-19 23:53:48 UTC

Let me know if the new aic7xxx driver will work in place of the default aic7xxx
driver by booting the system as linux noprobe and when asked for drivers, select
the "New Adaptec aha2940 ..." driver from the list (make sure to go all the way
down to the N's and select it from there, the other entry by the rest of the
adaptec drivers is the old/default on that we already know doesn't work).

Comment 15 Doug Ledford 2001-04-19 23:57:31 UTC

*** Bug 29572 has been marked as a duplicate of this bug. ***

Comment 16 Doug Ledford 2001-04-20 00:03:25 UTC

*** Bug 30978 has been marked as a duplicate of this bug. ***

Comment 17 Daniel Senie 2001-04-20 02:10:12 UTC

I was one of the folks who reported one of the other bugs that got marked as a 
dupe. Also using a Lancewood (L440GX+) motherboard. Very common server 
motherboard. It's in the ISP 2150 server and the one that predated that. If you 
have a LOT of patience, it is actually possible to continue beyond the AIC7xxx 
probe issue, however I then get clobbered by a lock-up in the DAC960 driver 
with Seawolf... can't win.

For whatever reason, the AIC7xxx driver in the 2.4.x kernels has had this 
problem. Not sure what was changed from the 2.2.x kernels, but it wasn't an 
improvement. I sure wish there were a way to disable the AIC7xxx chip on the 
motherboard. Since I use a RAID card, I don't use the onboard SCSI at all...

Comment 18 Doug Ledford 2001-04-20 02:45:52 UTC

In your specific case you can start the installer with the option noprobe and
then manually load the DAC960 driver and simply skipping loading any aic7xxx
driver at all.  You might also have to modify the file /etc/modules.conf to
remove the aic7xxx reference, but that will allow you to have a system that
totally ignores the aic7xxx driver.

Comment 19 Dana Canfield 2001-04-20 03:10:42 UTC

I have a L440GX+ motherboard and DAC960 as well (in a VA2240), and I have
already tried exactly what Doug suggested (skipping the aic-7xxx install), with
the result that the machine then locks up (with no error message) on the DAC960
install.  I don't know if it's relevant, but the messages just before the dac960
driver info message indicate that the DAC960 and the two on-board SCSI
controllers are sharing the same IRQ.  I've tried disabling the SCSI with the
System Setup Utility, but Linux finds the controllers anyway.

I even went so far as to rmmod all of the other modules that were loaded before
telling the installer to load the DAC960, and got the same result.

Another interesting tidbit is that I can install RedHat 7.0 on the machine, and
then upgrade to the 7.1 kernel RPM and that works just fine, so it doesn't seem
like it's really a kernel problem so much as something about the installer, as
someone else suggested above.

Comment 20 Need Real Name 2001-04-20 03:12:57 UTC

dledford:

I tried with the New adaptec driver (I've got the 440GX+ board), with 'linux
noprobe' and it gets lots of errors, still doesn't work.  I'll try to get the
errors pasted in.  they are different than before, oh yes.

Comment 21 Doug Ledford 2001-04-20 03:28:37 UTC

It seems that the 440GX+ boards are simply busted in regards to the Adaptec
SCSI.  I can't solve the problem because I can't reproduce it (I don't have any
of these products) and I've not been able to get anyone at VA or anywhere else
to actually fill me in on the details of the solution that was suppossedly
found.  All I can guess is that IRQ routing or some such is busted on this board
with the install kernel, but works with the SMP kernel (which get's the IRQ
routing differently than the kernel on the install image).  You might try
passing the option pci=biosirq to the kernel at boot and see if it helps (aka,
type "linux pci=biosirq" at the install floppy's boot: prompt)

Comment 22 Need Real Name 2001-04-24 03:52:06 UTC

DaveL,  Have one of these boards, and it IS wierd with IRQ's. There is a 
two floppy set that is used to muck with system stuff. THe Intel Bios
is pretty useless at IRQ setting and all that. It ties alot of things to 
IRQs, and wont let you switch them around. Mine is ripped apart right now, 
and there is a place near work where I can get these boards for $200.
Pls contact me for further testing. Mine is empty save a VooDoo3 card and 
3 U2 disks :^) that need to be thrashed by the penguin once again!

FWIW, I did flash to latest Adaptec Bios (search intel for l440GX+),
but the machine worked with RHL 7.0. Bones on 7.1. Also have DAC 960 for
another, but thats another story!

Comment 23 Need Real Name 2001-04-24 04:08:18 UTC

DaveL, Just got it to boot Cleanly!  (though not from OS disks :^/ )
Kernel 2.4.3, JGibbs 6.1.11 patch, and Alan Cox ac13. Boots fast and clean. 
(do have all PCI cards out, but will test with that next). TTFN.

Comment 24 Jean-Luc Fontaine 2001-04-24 12:13:34 UTC

It also fails on my ISP 5120 Intel server, with both the main aic7xxx driver and
the other one aic7xxx_mod from the drivers disk.

This is becoming a real problem as I need to ship that machine real soon...

Many thanks for your efforts.

jfontain

Comment 25 HSA_NOC_Engineer 2001-04-24 14:03:30 UTC

It amazes me that there are so many of us with this issue yet no fix available. 
I guess we are the ones responsible for finding a fix huh ? 

I am running the C-440GX+ w/ The Cabrillo-C Intel Chassis. Dual Xeon 450/1024k 
CPU's. 256MB RAM and a 9.1 gig 10krpm IBM fast SCSI HDD.

Having the issue with the RH71 ans LM80 Installs. I have installed both LM72 
and RH70 with no errors so I too must elude to the fact that it is probably a 
kernel-ized issue. I will try rpm'ing the 2.2 kernel tonight with a 2.4 to see 
if that will work.

Comment 26 Doug Ledford 2001-04-24 15:45:44 UTC

We've isolated where the problem comes from (it is IRQ routing issues, it is
only on kernels that are not SMP and do not have UP-IOAPIC enabled, which
basically means that the PCI BIOS IRQ routing table is hosed while the MP IRQ
table mapping is OK, which is likely a bug/deficiency in the motherboard BIOS). 
Fixing it will likely require an entirely new boot disk with UP-IOAPIC support
enabled (which also means all the modules have to be rebuilt and will result in
new driver disks as well :-(

Comment 27 Doug Ledford 2001-04-24 16:01:39 UTC

*** Bug 31936 has been marked as a duplicate of this bug. ***

Comment 28 HSA_NOC_Engineer 2001-04-24 16:12:27 UTC

So does this mean that this is going to be an issue that we won't see fixed for 
a while ?  Could back-revving my bios to a later version have an effect on the 
IRQ routing algo ?

I have been thru 3 diff. bios ver's with no help. Obviously there is something 
in the bios that is common thru all versions. ie. the way IRQ's are routed. I 
may try to remove a CPU tonight and install on a single CPU system. 

Would this have any effect since the problems seem to be abundant from SMP 
systems ?

Comment 29 Need Real Name 2001-04-25 05:00:30 UTC

Hey All,

Is this an issue of IRQ's being shared? I was able to get a clean boot
by compiling a 2.4.3 with JGibbs 6.1.11 plus ac13 last night. It found all
the devices, only barfing (appropriately) when looking for /dev/sda6 (per
the machine it was compiled on).  Was anyone able to get it booted with 
the Adaptec_old driver?  I have flashed to latest Intel Bios, is this a good
thing? The Boot floppies from Intel do permit some IRQ adjustment, but it seems
to clump IRQ, even though it should have IRQ 16-21?  We'll have to thank Intel
for making the latest BIOS work with Win2K... Did that break something in RH
land?  Happy to back flash if that would be useful.

Is possible to compile a module on other machine, and hack up the bootnet.img
file to swap aic7xxx.o with a new one that might work, or  does it get further
into the blood+guts of the kernel?!?  IF its just a matter of getting the
install started, would imagine there is a workaround to get things rolling.
(can I install with an adaptec 2940U/UW and then swap the cable over, once the
new kernel is on place?

Thanks all!  JDW

Comment 30 Jean-Luc Fontaine 2001-04-25 16:13:19 UTC

I am running out of bad luck :-(

I copied my installation from the SCSI disk to a temporary IDE disk, booted with
the noprobe option, asked for an upgrade on /dev/hda1 and then I hit the
following bug:

Traceback (innermost last):
  File "/usr/bin/anaconda", line 520, in ?
    intf.run(todo, test = test)
  File "/var/tmp/anaconda-7.1//usr/lib/anaconda/text.py", line 1126, in run
    rc = apply (step[1](), args)
  File "/var/tmp/anaconda-7.1//usr/lib/anaconda/textw/upgrade_text.py", line 194
, in __call__
    todo.upgradeMountFilesystems (root)
  File "/var/tmp/anaconda-7.1//usr/lib/anaconda/todo.py", line 1187, in upgradeM
ountFilesystems
    allowDirty = 0)
  File "/var/tmp/anaconda-7.1//usr/lib/anaconda/upgrade.py", line 97, in mountRo
otPartition
    if not allowDirty and theFstab.hasDirtyFilesystems():
  File "/var/tmp/anaconda-7.1//usr/lib/anaconda/fstab.py", line 810, in hasDirty
Filesystems
    if self.rootOnLoop():
  File "/var/tmp/anaconda-7.1//usr/lib/anaconda/fstab.py", line 264, in rootOnLo
op
    raise ValueError, "no root device has been set"
ValueError: no root device has been set

What do you want me to do with the full trace.
BTW, I was hoping to upgrade on the IDE disk then copy back to the SCSI.
Too bad :-(

Comment 31 Jean-Luc Fontaine 2001-04-25 16:37:47 UTC

Jean-Luc wrote
> I am running out of bad luck :-(

>I copied my installation from the SCSI disk to a temporary IDE disk, booted
with
>the noprobe option, asked for an upgrade on /dev/hda1 and then I hit the
>following bug:

False alarm.
I had forgotten to make my fstab point to the root partition on the IDE disk...
Sorry.
I am upgrading now!

Comment 32 HSA_NOC_Engineer 2001-04-26 15:44:30 UTC

It really amazes me that the irq routing in the bios is "busted" as ledford put 
it yet all previous versions of x86 Linux seem to work fine on this box. Even 
the newest kernel packages seem to install fine on old 2.2.x versions when 
updating to 2.4.x versions. 

What is so different that prevents us from working around this problem. ?

Comment 33 John P Rogate 2001-04-26 17:18:19 UTC

i want to add my name and comments here as well...they are also on bug 36358 
which reports the exact same problem.  i am experienceing ther same difficulty, 
and i have to get 15 machines ready for a class on 5/14.  too many of the items 
above are a bit confusing, and seem to end up with additional problems.  i have 
been installing 6.1, 6.2, and 7.0 on these machines with minimum difficulty.  
basically, the installations work fine and mostly clean.  for one set of 
machines a fix was needed in order to boot the machine (a few lines had to be 
commented out of rc.sysinit).

though i would like to get a machine up and running in any way possible, i 
think for a bunch of college students, a "kluge" install process would not be a 
good idea to start off with.

Comment 34 Need Real Name 2001-04-26 18:38:21 UTC

It seems that the older kernels (pre 2.4.2 or 2.4.3) that used Mr Ledfords
scsi kernel code work fine. The shift of responsibility of the AIC7xxx to 
Adaptec, didnt go cleanly. Am not adept enough to comment on the feasibility 
of the IRQ issue, except to say that it had worked (7.0), doesn't work now from 
RHL 7.1 floppies, but DOES boot properly when you boot a custom kernel 2.4.3-
ac12. * (thats what I used to get past the 'loading aic7xxxx' issues we all 
seem to be facing). All this independant of the IRQ issues seen elsewhere.

(FWIW, my co. put a compaq on my desk, everything was at IRQ 11.... Go figyah!)

So, what it would seem we need is an updated Boot Image floppy to 
get the install process going, (which might require a new kernel and module?).
Can we agree that this COULD be done, to get us going? I will volunteer to 
be a guinea pig, if this would help. Just dont know enough about RHL to hack
bootnet.img  (yet). Heck, I will boot 2.4.0 if needed!

    BIOS/IRQ issues dont seem like an issue here. though the cloudyness in 
dealing with all the aic7xxx issues on rhl/bugzilla might lead to that 
conclusion.  The right boot floppy would seem to be the magic bullet, unless 
there is something else afoot!

(*) IIRC Alan Cox's ac-6 fixed the aic7xxx by incorporating the Adaptec/Gibbs
6.11.1 patch. That is what may be the magic bullet here! Hope so! Who else 
wants to be a guinea pig?

Comment 35 Need Real Name 2001-04-26 21:14:37 UTC

FWIW, I'm seeing the exact same problem, except with a Mylex Flashpoint 
controller using the BusLogic BT-952 driver.  Happens whether or not I use the 
noprobe option at boot.

As others have experienced, RH 7.0 installs on this box just fine.

Comment 36 HSA_NOC_Engineer 2001-04-27 15:58:14 UTC

I would love to be pointed in a direction for a workaround as mentioned above 
with the Alan Cox fix. :O)



Thanks

Comment 37 Michiel Toneman 2001-04-28 16:52:54 UTC

I've wordked around this problem by:

1. first installing a basic server installation of RedHat 7.0
2. upgrading the kernel to the RH7.1 2.4.2smp one (need a few RPMS from 7.1)
3. use mkinitrd and lilo to get it to boot
4. do a 'ls *.rpm | grep -v kernel | xargs rpm -Fhv' to upgrade to 7.1 

I'm sure there's a better way, but this seems to work.

Hope this issue is resolved soon!

Comment 38 John P Rogate 2001-04-30 18:55:54 UTC

It is great that there are many attempts to get around the issue, however, soon 
i will be teaching a Linux Admin course, and to give a bunch of "traditional" 
students a comples series of workarounds is not a good start.  The lab is 
composed of 15 server class machines (gateways) that will all have the same 
problem with a clean install.  unfortunately, we are a small school and just 
cannot go out and buy new servers to have a clean install.  my feeling is that 
the install was clean with 6.1, 6.2, and 7.0 ... what has happened to make it 
otherwise.  unfortunately i do not know all of the inticicies of what is 
involed in making a new boot image, but it cannot be that tough.  

would like to hear from red hat as to the staus on resolving the issue.

Comment 39 michaeltodd 2001-04-30 18:56:45 UTC

This will probably just show my ignorance, but could a solution be as 
simple as creating a boot floppy from a 7.0 system, booting off the floppy, 
mounting and cd'ing to the cd-rom, and then launching the installer?

Comment 40 John P Rogate 2001-04-30 19:00:01 UTC

that was the second thing i tried ... it seemed to work for a while, and then 
when trying to boot .... looped on trying to get access to the disks.  that 
would have been sweet (for a while) if it had worked...  also on my machine 
when i tried that, the gui install was unreadable...

Comment 41 Need Real Name 2001-05-01 01:26:20 UTC

Is there more than swapping kernels or modules to tweak a bootnet.img
into doing the right thing? Seems the recent kernel with the Gibbs patch
might boot the system past the module load. That is what we need. Just a 
direct workaround, if there is such a thing!  Lets have it, or a comment on 
a way for us to create one for ourselves.  The RedHatters have been strangely 
quiet on this. Have found a few links on hacking bootnet.img.
 Here is something close maybe from an older version:

 http://www.van-dijk.net/mailarchive/cpqlin0010/0006.html

We dont want to force Mr/Ms Rogat to install RHL 6.2!

Dave L, what can we do? Its been a week, and the consensus seems to be that 
its not an IRQ issue as much as its a gibbs/ac-6+ patch. A little feedback might
get us working feverishly.

Best, JDW

Comment 42 Russell Cattelan 2001-05-01 07:37:57 UTC

II've been biting my tongue on this one for a while now, since
this stuff wasn't released until today.

The XFS team at SGI has modified anaconda to allow for XFS installs.

Part of that modification is building BOOT kernels that are XFS capable 
as such the kernels on the floppy images and the CD for the XFS installer 
have APIC and IO-APIC enabled which fixes the hang on the 1200's

Since I was generating installer images often enough this really wasn't any
extra hassle. 

The only difference in using the XFS installer vs the standard 7.1 installer is 
the kernels installed will be XFS capable. The rest of rpms are taken from 
the standard 7.1 CDs.

The file system type will default to XFS but than can be de-selected and
file system created will revert back to ext2.

The standard 7.1 kernels may even be force upgraded once the system is up
and running.

This should be a lot easier than trying to upgrade from a 6.2 or 7.0 install.

The images will be available 9am CST Tuesday May 1.

ftp://oss.sgi.com/projects/xfs/download/Release-1.0/iso


-Russell Cattelan

Comment 43 michaeltodd 2001-05-01 19:04:07 UTC

The silence from Red Hat on this, not even to just give us a time frame to 
expect a response, is pretty rude. I bought the boxed version, as I have for 
every version , and find this kind of typical corporate blow-off somewhat 
disenchanting. My CDR is dead, which is another reason I bought the 7.1 
in a box! So the SGI XFS ISO won't help me.

That said, I need to move and get this installed. I have blank HDs I want to 
install 7.0 and then upgrade to 7.1 from. What's the most accurate way to 
do this? Are there any lists of new RPMs that I might miss if I just 
freshened all currently installed RPMs from the 7.0 install?


TIA!

Comment 44 HSA_NOC_Engineer 2001-05-01 19:40:27 UTC

I too am getting rather disenchanted by the lack of help Redhat has offered to 
this point. Although they quickly identified the possible causes to the 
problem, they have yet to provide a workaround. 

It would seem that it should be as simple as reverting some of the packages 
back to their 7.0 format, although it is never this simple, it appears; on the 
surface; that little attention is being paid to this problem.


I still love ya RH, you're just burning my a$$ ri8 now.



Thanks.

Comment 45 Erik Troan 2001-05-01 20:17:14 UTC

Sorry for the silence; we are *actively* looking at this bug. It's looking like
a problem with the pci initialization code, but we're not sure quite yet...

Comment 46 Need Real Name 2001-05-02 03:16:01 UTC

OK!
Downloaded:

wget -rm ftp://ftp.thebarn.com/SGI/RH7.1-SGI-XFS-1.0
... get coffee ... 
Booted from bootnet.img, used 'linux dd' 
added driver.img when prodded
Formatted XFS partitions
-- proceded as usual ---
Results:
[root@radial /root]# more /etc/mtab  | grep xfs
/dev/scsi/host0/bus0/target0/lun0/part3 / xfs rw 0 0
/dev/scsi/host0/bus0/target0/lun0/part1 /boot xfs rw 0 0
/dev/scsi/host0/bus0/target0/lun0/part7 /cd-image xfs rw 0 0
/dev/scsi/host0/bus0/target0/lun0/part5 /usr2 xfs rw 0 0
/dev/scsi/host0/bus0/target0/lun0/part6 /usr3 xfs rw 0 0
[root@radial /root]# lspci 
00:00.0 Host bridge: Intel Corporation 440GX - 82443GX Host bridge
00:01.0 PCI bridge: Intel Corporation 440GX - 82443GX AGP bridge
00:0c.0 SCSI storage controller: Adaptec 7896
00:0c.1 SCSI storage controller: Adaptec 7896
00:0e.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev
08)00:12.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:12.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:12.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:12.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:14.0 VGA compatible controller: Cirrus Logic GD 5480 (rev 23)
01:0f.0 PCI bridge: Digital Equipment Corporation DECchip 21150 (rev 06)
02:07.0 VGA compatible controller: 3Dfx Interactive, Inc. Voodoo 3 (rev 01)
[root@radial /root]# cat /etc/issue
Red Hat Linux release 7.1 (Seawolf)
Kernel 2.4.2-SGI_XFS_1.0smp on a 2-processor i686
[root@radial /root]#

OK... Worth the wait! XFS and drives at 80 Mbps.... This _is_ going to be fun!
Buy 1000 Shares RHAT and boatload of SGI as well.

Next ... will it work with My DAC960 w/ 53C875 controller!??!?!?  Zoom!

Thanks Guys!

Comment 47 Doug Ledford 2001-05-02 18:53:42 UTC

This has indeed turned out to be a PCI issue, and the whole aic7xxx problems
(and standard kernels) are red herrings.  The problem is that any 2.4 kernel,
without having either SMP or UP-IOAPIC compiled into the kernel will fail.  The
problem is not in the drivers that get effected, the problem is in the core PCI
code.  Fixing the core PCI code is the needed solution (somehow, and we haven't
isolated it yet, when the core PCI code is confirming/reassigning resources for
hot plug PCI reasons, it is changing the IRQ on the card but failing to change
it in the card's PCI_IRQ register and also failing to change it in the card's
pdev structures, which results in the driver trying to attach to the wrong IRQ
and causing the problems people have seen here).  When a new boot disk is ready
to solve this problem, we will post its location here.

Comment 48 Doug Ledford 2001-05-02 18:55:35 UTC

*** Bug 38401 has been marked as a duplicate of this bug. ***

Comment 49 Need Real Name 2001-05-02 19:04:47 UTC

DLedford and all. Well, the kernel that came with SGI's XFS add-on did boot my
machine properly. Dont know about the PCI / IRQ issues, but my Box is now
running RHL7.1 with SGI/XFS just fine (well.. one day out anyway).  Might be
worth taking a look at that, and perhapes contacting the folks that made the 
boot image for that.  SGI deserves at least, a tip of the cap for getting 
my RedHat 7.1 installed, no?  JDW

Comment 50 Doug Ledford 2001-05-02 19:55:05 UTC

In the comments from cattelan on 2001-05-01 03:37:57 he says:

Part of that modification is building BOOT kernels that are XFS capable as such
the kernels on the floppy images and the CD for the XFS installer have APIC and
IO-APIC enabled which fixes the hang on the 1200's

So, obviously, yes he is also acknowledging that the problem is not the aic7xxx,
but the IRQ routing.

As to why we don't have discs out to do the same thing yet.  Simple, we have to
make our new boot disk work with existing CDs, which is *much* harder to do
because the kernel wants to change its version symbols with the addition of
IO-APIC support, but all the modules that are on the CD and that the installer
wants to load into the kernel have the symbol versions for the IO-APICless
version of the kernel.  If we just put an IO-APIC kernel on a disk, none of the
modules would load and the disk would be useless.  We have to go in and munge
all of the symbols for the kernel in order to make it work, and that means we
have to do extra testing.

Comment 51 Need Real Name 2001-05-02 21:36:47 UTC

Well, I just tried using SGI's anaconda, and it still hung with the SCSI bus 
resets on the BusLogic driver.....  :(

Comment 52 HSA_NOC_Engineer 2001-05-03 14:06:51 UTC

Awesome. It is comforting to see that we are at least making some progress. 
Love the work ethic over there guys. Keep it up. 


BD

Comment 53 John P Rogate 2001-05-03 14:35:19 UTC

it is great that many are trying to get around the issue which i am finding is 
going to be a little more extensive (customer wise) as more folks migrate to 
7.1.  The attemps by everyone to resolve the issue are outstanding....however, 
it is not the way software should be installed.  My goal as a professor at a 
small college in vermont is to assist students in their learning process to 
create a stabel and supportable IT environemnt.  Extraordinary efforts to get 
around "what should be" is not what needs to be taight students.  especially if 
it is my goal to assist in the proliferation of linux/unix in the business 
environemnt.  this type of effort in order to get a working system is not 
acceptable.  once again, it needs to work when shipped from the vendor.  in all 
my years of working as an engineer with Digital Equipment Corporation, and then 
in my own "reseller" / consulting business, this issue seems to be a little out 
of the ordinary.  red hat needs to resolve this issue (and prevent future ones 
like this) if the continued stronghold in the market is to continue.  my 
students are also key to their success as well

thanks...but i am getting a little frustrated, and i am a real big supporter of 
the red hat product.

Comment 54 Need Real Name 2001-05-03 15:30:14 UTC

All: Sure it's frustrating, RedHat makes its contribution by making Linux 
usable by the greater majority. In the Lab, frequent releases are the norm. In 
the commercial environment, oversights like this are not the way it should be. 
(However, the released-early-release often SHOULD be part of the Kernel 
development process). RedHat does add (significant) value by protecting me from 
the incompatibilities brought by the dynamics of open source world. That is 
what I pay for, and why I beleive RHAT is durable. Yes RedHat dropped the 
RedBall. Part of the learning I have continued to experience is by active 
participation in this sort of forum. Though our assertions are not always on 
target, its good to see and hear the development process in action, to be some 
part of it.  Debugging is an integral part of software development, and though 
I dont write code, am eager to put in observations.  If it requires banging two 
bricks together, I can contribute too! But... RHat should have caught this. 
This one's deep! JDW

Comment 55 Daniel Senie 2001-05-03 22:20:01 UTC

Based on a variety of comments in another bug, covering the DAC960, several 
folks have been able to install on L440GX+ motherboards with DAC960s, but 
that's not the end of the story.

After bypassing the aic7xxx probe during the install (which allows the DAC960 
to work fine) the kernel that gets installed again probes for the aic7xxx 
devices, and again falls over dead. I don't know if the kernel that got 
installed on the RAID array was the SMP one or not. It should have been, but 
who knows.

I also saw a number of error messages stream by about unknown bridge devices 
and such before the machine locked up, so it does indeed look like a PCI issue.

I've got a test system here which easily reproduces the issue. It's not a 
machine I can ship out (it serves as the spare parts for a number of other 
servers in my colo), but I am happy to run tests on it. Note that a new boot 
image is really not enough, though, since somehow I'd also need a new kernel on 
the freshly-installed system, one that has the fixes. Otherwise I can (as 
above) do the install, but not boot the result!

Comment 56 Need Real Name 2001-05-04 01:18:47 UTC

So, I'm ready to go ahead and work around this, but I've never been abled to
build my own install disks.  Where do the instructions for building my own
installation disks live?

Comment 57 John Messina 2001-05-04 14:40:51 UTC

OK.  Based on Mr. Ledford's latest comments, is this a RedHat-specific problem, or a problem with the 2.4.x kernel in general?

Comment 58 HSA_NOC_Engineer 2001-05-04 18:39:47 UTC

I am having the exact same problem with Madrake 8.0 with the new 2.4.x kern. 

Appears to be directly related to each other. Hopefully whatever fix RH finds 
will be applicable to Mandrake 8. Maybe we should bark up their tree and see 
what they have in the works.

Comment 59 John P Rogate 2001-05-05 12:35:51 UTC

I think at this point in time, the fix will have to come from red hat as the 
workarounds are "not working".

hopefully a fix is arriving soon?  status?

above, it was mentioned by RH engineer that the image would ahve to work with 
CD's in distribution.  Is there also any RH thoughts about updating the 
distribution so it all works fine in the future?

Finally, (directed to RH), what is the priority assigned to this issue?  i 
really do need to get some clear idea on resolution...a week?  month?  if this 
is assigned...then there must be some update on the status...

Comment 60 Daniel Senie 2001-05-05 13:07:28 UTC

What disappoints me most is that this issue was seen in the Wolverine beta 
test, and fixes which were made were not adequately tested. The beta testers 
should have been asked to test with new CD and boot images to be sure the 
problems were really solved. I, for one, offered to run such tests.

Most systems we deploy use the L440GX+ motherboard. A seemingly large 
percentage of the rack server platforms deployed use this motherboard, which 
has proven stable and reliable. Testing with this motherboard is essential. 
RedHat 6.1, 6.2 and 7.0 work quite reliably on this platform, using the 2.2.x 
kernels. It has become clear that in rewriting code for the 2.4 kernels, some 
knowledge and capability has been lost or damaged. This is an issue for the 
Linux community at large, not for RedHat per-se. However, as a packager and 
provider of support, RedHat bears the responsibility for ensuring software it 
releases is stable. If the 2.4.x kernels are not ready, then 7.1 should not 
have been shipped. It makes one wonder if release timing is dictated by 
engineering readiness or by a desire/need to increase revenues.

My company has decided to cease all use and recommendation of RedHat 7.1. We 
continue to use 7.0 with patches, and expect to do so for at least the next 
several months while watching to see if 7.1 stabilizes.

Comment 61 John P Rogate 2001-05-05 13:12:02 UTC

the issue is that 7.1 is loading fine for the desktop folks.  i loaded it haome 
it it works like a charm.

but the key to the success of widespread implementation of red hat in the 
commercial "space" is that it functions properly on servers, particularly with 
a popular motherboard.  it has to be resolved soon, or i think there will be a 
setback in the commercial arena.

Comment 62 Need Real Name 2001-05-05 14:56:14 UTC

Similar problem, except RH 7.1 hangs for me when copying boot image to disk, 
just before packages begin to install.  My config is Supermicro P6-DLS with 
BIOS 1.33, (2) PII-333, 512 MB RAM (Kingston PC100 ValueRam CL2), (1) 4.3 GB UW 
SCSI at ID0, and (2) 18.2 GB Western Digital Enterprise Ultra2 drives at ID1 
and ID2.  CD-Rom is IDE at 0:0.  Video card is ATI Rage 128 32 MB AGP card.  
SCSI drives are attached to on board AIC-7880 controller.  No sound card, and 
I've even gone through and disabled parallel and com ports.

I have 2 of these, same config, both have run every "out-of-the-box" 
distribution of redhat since 6.2.  I've even sucessfully installed Wolverine on 
both, so it doesn't seem to be a generic problem with the 2.4.x kernel (I 
believe Wolverine was 2.4.1).  I've tried upgrade from 7.0, clean install, you 
name it...locks in the same place, and VT4 shows a screen full of the same 
messages I see above:

<4>SCSI host 0 abort (pid 0) timed out - resetting
<4>SCSI bus is being reset for host 0 channel 0.

After few, though, every other one says "Trying harder"

This is a real kick in the face.

Comment 63 Need Real Name 2001-05-05 15:25:27 UTC

I have a Premio GX (440BX/ZX chipset AMI bios v1.9) mother board with dual PIII 450MHz, 256MB RAM, Adaptec U2W LVD/SE AIC-7890AB SCSI (Bios v2.00.0) and a Segate ST3917LW. We (We're system integrators -SCO Unix, NT and RHLinux) had used this platform for NT  and SCO very successfully in the past and could boot OK in RH 6.0, 6.1 and 6.2 single processor mode but not smp. 'Got the identical errors as reported by rparish on 2/27/01 when going to smp.  I attributed this to the smp problems of the earlier RH releases and ignored it at the time.  However, when RH7.1 wouldn't work with dual processors I thought it was an Adaptec problem - but - Not so!  It's a mother board chip set BIOS problem. After I upgraded from bios v1.3 to 1.9 the system worked like a champ in smp.  The last fix added support for M$ W2K, Dawi Control 2976UW boot from SCSI function and support for Diamond Viper 770 card - and others I presume.  You can't see this in RH, but if you set up a W98 system on a IDE drive and look at the Interrups before and after you upgrade the BIOS you can see that it shifted the AGP slot to a different interrupt rather than shared with the Adaptec card. In addition - after you get up when you cat /proc/interrupts they all say IO-APCI, level or edge, except 2, the cascade, which is XT-PIC. Again this is not a Redhat or Adaptec problem.
'Hope this helps.
Bob Jones

Comment 64 Need Real Name 2001-05-05 15:31:04 UTC

'Sorry about the last messgage without word wrap. 'Was using Opera 
and didnt realize it.
Bob Jones

Comment 65 Need Real Name 2001-05-05 15:55:59 UTC

Hello All... So it seems we have encountered an issue with the BIOS deciding 
how to map IRQ resources.  Previous message alludes to an issue beyond the scope
of Redhat.  Who in the open source world is responsible or authoritative for
addressing this issue? Can we get support from AMI, or Award, Pheonix etc?
The benefits of being able to select IRQ resource will need to be either
fixed or worked around. Would it be easier to fix the  IRQ setting on "N"
platform motherboards, or make some software layer work around the issue?
Have seen systems with all devices set to IRQ 11. Hmm, fix or work around?!?!

Comment 66 Need Real Name 2001-05-06 04:36:19 UTC

I too have am installation hang. Its a Promise Ultra66 Scsi thats causing it. 
(not that it seems like that matters) I had been waiting and waiting for a 
Distro to be released witha 2.4 kernel so I could finally run linux even with 
my SCSI. But now it hangs! ARGH! Are they're anyworkarounds yet?

Comment 67 Maurice LeBrun 2001-05-06 07:27:30 UTC

Same problem for me.  Two VALinux 2x2 FullOn 2200 rackmount servers with
Adaptec AIC7896 SCSI controllers.  Hangs after the loading aic7xxx driver
message.. can't even switch consoles at that point.  Network boot.

Comment 68 Need Real Name 2001-05-06 18:40:06 UTC

Time for another person to chime in!  I have the same problem with my Mylex 
Flashpoint LW, which uses the BusLogic driver.  I'm gonna be patient and let RH 
come up with a resolution soon.  I expect one either this week or next.

Comment 69 Doug Ledford 2001-05-07 15:08:01 UTC

For clarification sake, it's not that the BIOS is mapping IRQ's wrong, and it's
not that we have lost any functionality since the 2.2 kernel PCI code.  Quite
the opposite, it's that PCI functionality has been added since the 2.2 kernel. 
Specifically, the 2.4 kernel supports the notion of hot-plug PCI devices.  That
requires that the kernel PCI layer know about/assign all the address space and
IRQ resources to the various slots so that if sometime after booting up someone
adds a new PCI card that has a PCI bridge chip, then the PCI layer has to be
able to assign I/O and IRQ resources to the bridge chip itself, and those I/O
and IRQ resources have to come from the pool of resources already allocated to
the bus that the card was plugged into.  So, in order to insure that we will
always be able to accept a hot plug card, the PCI layer in the 2.4 kernel now
re-organizes those PCI resources at boot time to make sure that it is possible
to plug a card with a PCI bridge chip into every PCI slot if the user so
wishes.  In this process, the PCI code in the 2.4 kernel is evidently messing up
the IRQ assignment on the Adaptec chipset.  This *only* happens on the L440GX
motherboard from Intel as far as we can tell, and the root cause of the mistake
hasn't been isolated any further than what I've told you here.  The reason that
enabling SMP or UP-IOAPIC support solves the problem is that when IOAPIC support
is enabled (which happens automatically with SMP support, or explicitly with UP
kernels if you turn on UP-IOAPIC support), it runs *after* the PCI resource
allocation code, and it re-routes the interrupts according the the computers MP
table.  By doing that, it undoes whatever mistake the PCI code is making, and
things work.  *THIS IS NOT A FIX*  The fact that it undoes the mistake the PCI
code makes does not make the PCI code any better!  The PCI code still needs
fixed.  The whole reason we don't ship our boot kernel with IO-APIC support
enabled is that there are a number of boards out there that won't boot with
IO-APIC support enabled.  They happen to mostly be older UP boards that try to
use IO-APIC interrupts on their UP system and have bad/unusual MP tables that
confuse the IO-APIC code in the kernel, so if you are only interested in a
subset of the machines that Red Hat supports (such as SGI is), then you can
ignore them and ship with IO-APIC support enabled.  We can't do that.  We need
to find a real fix.  I don't expect that in the near term though, so our
proposed workaround (which we are testing) is an IO-APIC enabled kernel that by
default doesn't use the IO-APIC, and you need to boot the kernel with the option
"apic" to make it use the IO-APIC and hence work on the L440GX motherboards. 
Once we've confirmed that this kernel *doesn't* blow up on machines that have
known bad IO-APIC MP tables, we'll build a new kernel with munged kernel symbol
versions so that it will load the modules on the CD and then release that as the
workaround.

Comment 70 Doug Ledford 2001-05-07 15:18:54 UTC

For clarification sake, it's not that the BIOS is mapping IRQ's wrong, and it's
not that we have lost any functionality since the 2.2 kernel PCI code.  Quite
the opposite, it's that PCI functionality has been added since the 2.2 kernel. 
Specifically, the 2.4 kernel supports the notion of hot-plug PCI devices.  That
requires that the kernel PCI layer know about/assign all the address space and
IRQ resources to the various slots so that if sometime after booting up someone
adds a new PCI card that has a PCI bridge chip, then the PCI layer has to be
able to assign I/O and IRQ resources to the bridge chip itself, and those I/O
and IRQ resources have to come from the pool of resources already allocated to
the bus that the card was plugged into.  So, in order to insure that we will
always be able to accept a hot plug card, the PCI layer in the 2.4 kernel now
re-organizes those PCI resources at boot time to make sure that it is possible
to plug a card with a PCI bridge chip into every PCI slot if the user so
wishes.  In this process, the PCI code in the 2.4 kernel is evidently messing up
the IRQ assignment on the Adaptec chipset.  This *only* happens on the L440GX
motherboard from Intel as far as we can tell, and the root cause of the mistake
hasn't been isolated any further than what I've told you here.  The reason that
enabling SMP or UP-IOAPIC support solves the problem is that when IOAPIC support
is enabled (which happens automatically with SMP support, or explicitly with UP
kernels if you turn on UP-IOAPIC support), it runs *after* the PCI resource
allocation code, and it re-routes the interrupts according the the computers MP
table.  By doing that, it undoes whatever mistake the PCI code is making, and
things work.  *THIS IS NOT A FIX*  The fact that it undoes the mistake the PCI
code makes does not make the PCI code any better!  The PCI code still needs
fixed.  The whole reason we don't ship our boot kernel with IO-APIC support
enabled is that there are a number of boards out there that won't boot with
IO-APIC support enabled.  They happen to mostly be older UP boards that try to
use IO-APIC interrupts on their UP system and have bad/unusual MP tables that
confuse the IO-APIC code in the kernel, so if you are only interested in a
subset of the machines that Red Hat supports (such as SGI is), then you can
ignore them and ship with IO-APIC support enabled.  We can't do that.  We need
to find a real fix.  I don't expect that in the near term though, so our
proposed workaround (which we are testing) is an IO-APIC enabled kernel that by
default doesn't use the IO-APIC, and you need to boot the kernel with the option
"apic" to make it use the IO-APIC and hence work on the L440GX motherboards. 
Once we've confirmed that this kernel *doesn't* blow up on machines that have
known bad IO-APIC MP tables, we'll build a new kernel with munged kernel symbol
versions so that it will load the modules on the CD and then release that as the
workaround.

Comment 71 HSA_NOC_Engineer 2001-05-07 16:17:05 UTC

I assume that by saying L440GX you are speaking of most of the 440gx based 
intle boards. I have the C440GX+ which is the cabrillo system board. The 
l440GX+ was predominantly used in GW2000 servers. 

At any rate. My question is this. You mentioned that the IO-APIC related issue 
doesn't come into play in an SMP enabled system. Why is this an issue when 
installing to a SMP system. does the Kernel initialize in SMP mode for install 
or is that a "first boot" type of detection. The reason I ask is that I have 
been unable to install RH7.1 or MAdrake 8.0 on this system whether using 1 or 
both CPU's.

Comment 72 Need Real Name 2001-05-07 20:17:30 UTC

Sublime-1::

It is my understanding that the RH install kernel is not SMP capable (nor does
it have IO-APIC enabled), so it doesn't matter how many CPU we have.
scrood!

Comment 73 Michael Duncan 2001-05-07 22:41:45 UTC

I run all Intel Server Platform LB440GX+ mainboards at our ISP and would like 
to know more about how to turn on APIC when doing a Fresh install with 7.1

I have been trying to compile new kernel on an exsisting 7.0 install and I am 
not very good at this.  So any info here would be helpful on how to install 
from CDROM of seawolf with SMP support and APIC turn on to get around this bug.

Thanks for your help.

Comment 74 Johnray Fuller 2001-05-07 23:34:36 UTC

We have reports here in installation support that the Adaptec 29160 card appears
to break under 7.1...

Comment 75 HSA_NOC_Engineer 2001-05-08 15:03:53 UTC

Agreed. Since we are still a way out with the install floppies with the fix 
applied. Is there a step-by-step workaround that we could get posted to follow ?

Comment 76 michaeltodd 2001-05-08 16:45:24 UTC

First off, Red Hat, PLEASE give us a timeframe for a more official 
work-around. Hours, days, weeks, eons?

Everyone else, Ive found two work-arounds that work for my situation, a 
VA Linux 2230 FullOn server with the L440GX server board with dual 
CPUs and dual identical SCSI HDs.

The first work-around is to use the SGI XFS boot.img disk, but put the 
original 7.1 CD1 in the drive. The system will start up, find the Adaptec 
7xxx drives, and then start anaconda off the CD. From there you can do a 
normal 7.1 install, UNLESS you want to create or upgrade software 
RAIDed drives. In these cases anaconda dies as soon as it tries to format 
or mount the drives. Otherwise, the install goes fine.

 The same may or may not be true for hardware RAIDs.

If, like me, you have or want to use software RAID, you have to use the 
work-around described much above for manually upgrading from 7.0 to 
7.1. I used Google to get a cached list of 7.0 packages, and then 
compared it to the list of 7.1 packages to look new packages I would need 
to install specifically (instead of just a mass freshen), and find deprecated 
packages I didnt want on my system. Done carefully, this can be a pretty 
clean install.

XFS Boot Image that will boot the vanilla Red Hat 7.1 CD1 installer

ftp://oss.sgi.com/projects/xfs/download/Release-1.0/RH7.1-SGI-XFS-1.0/i
mages/boot.img


7.0 Package List

http://www.google.com/search?q=cache:www.redhat.com/products/softw
are/linux/pl_rhl7.html+Red+hat+7.0+package+List&hl=en

7.1 Package List

http://www.redhat.com/products/software/linux/pl_rhl.html

Comment 77 Need Real Name 2001-05-08 18:12:41 UTC

OK thats the best info I have heard so far.
I just want to be clear.
If I make a boot floppy with the xfs image above, I can put the seawolf Cd's in 
the cd drive and do a normal install that way.
And as long as I am not trying to run some type of raid I will be ok.
Will I be able to format my old scsi drives as individual drives?

Comment 78 Phil Oester 2001-05-08 19:17:32 UTC

This didn't work for me on a VALinux 1100 (Symbios SCSI).  Came up and said it 
couldn't detect any drives.

Comment 79 michaeltodd 2001-05-08 19:25:13 UTC

Like I said, this worked for my set-up. As Red Hat said, SGI enabled some 
kernel settings that may fix things for a few boxes, and everyone else be 
damned.

YMMV. In fact, it may cause more harm than good.

Comment 80 benb 2001-05-08 19:58:36 UTC

I'm having the same problem here, using a Dell Dimension XPS H266 which has an
Intel 440 FX (not GX) based motherboard and Adaptec AHA 2940 UW controller.

Could anyone (Doug?) from RedHat comment on the ramifications of turning on
UP-IOAPIC in the kernel? I'm interested to compile my own 2.4.4 kernel with
UP-IOPPIC support enabled as a work-around, but since I don't know what this is,
I don't know what the effect of enabling it would be, other than hopefully
allowing me to boot!

Ben

Comment 81 Need Real Name 2001-05-08 21:19:43 UTC

ok everything worked except the following
It couldn't create a boot floppy (could be user error)
It didn't find the on board ether card. (gonna check that out when I get back to
the machine this evening)

I have a 440 dual proc board with a 7968 scsi onboard.

Comment 82 michaeltodd 2001-05-08 21:29:19 UTC

I told the installer not to make a boot disk, and just used /sbin/mkbootdisk  
after booting up.

You might be chasing your tail since it's just a hack if it works at all, but 
you might need to use the drivers disk and maybe the noprobe option, 
'linux dd' or 'linux dd noprobe'.

Someone who actually understands what's going on may be able to tell 
how to recompile your kernel to add the driver for your card manually.

Comment 83 John Messina 2001-05-09 11:47:42 UTC

FWIW, I was able to install SuSE 7.1 and boot into the system using the 2.4 kernel on my system.  This bug has me stopped cold with a RedHat 
installation.

Comment 84 Need Real Name 2001-05-09 18:47:18 UTC

I have SGI 1200 with Adoptec 7856 and Dell GX1P with Adoptec 2940UW.
SGI hangs during the aic7xxx load nad Dell dies during the software load.
Did anybody find a solution to fix these problems?

Comment 85 John P Rogate 2001-05-09 18:54:35 UTC

These bug related comments are getting lengthy, and to no avail.  Any of the 
workarounds mentioned will most likely not solve the problem.  If the goal here 
is to get started with 7.1, then it loads fine on a spare desktop.  If it is to 
get a production server up and running, then any complicated workaround should 
not be acceptable.

Red hat needs to provide a fix/boot image.  The product needs to work correctly 
and install correctly (out of the box).

Red hat...what is the status of getting this resolved????????

Comment 86 Need Real Name 2001-05-10 05:34:50 UTC

we too have several LX440G lancewood based motherboards with adaptec AIC7896 
adapaters with scsi discs connected. I have tried the SGI XFS bootnet.img disc 
to no avail - the aic7xxx driver loads but no discs are found in fact it 
doesn't even seem to any kind of scsi probe. The next thing im going to try is 
to build a new bootnet.img with a new 2.4 kernel on it that is SMP enabled, 
since there are posts mentioned above that say if a 2.4 SMP kernel is used the 
adaptec problem is masked (not fixed, but masked) - worth a shot in my opinion. 
I too agree with the last poster, RedHat please give us some info on your 
attempts to fix this. I also can't believe RH7.1 was released with this 
problem, so many servers use the aic7xxx driver

- Mark

Comment 87 Need Real Name 2001-05-10 07:32:29 UTC

Ok so that system is dead in the water.
My options are buy a new motherboard or revert to 7.0
which has it's own problems. The first time since 5.2 I don't buy 
the most feature rich version of Red Hat and this pops up.
One of the most common server motherboard doesn't even make it to the install 
splash screen. I am glad I didn't waste the money this time. But I did waste 
the four to six hours of time downloading the ISOs. In the past I have paid for 
but never used the 30 to 90 day support from RedHat. 

So I have a few questions for RedHat.

1. Had I gone out and spent money buying Redhat 7.1, what would your answer to 
me be about this problem?
2. Can we please have a progress update?

At this point most of us are just plain frustrated. I apologize for my mood. 
You (RedHat) aren't making my case to get rid of some of the other OSes in the 
shop any easier with this problem. 7.1 and the 2.4 kernel solved a lot of weird 
stuff for me on my other dual proc systems. I had hoped to put it on all my 
systems as I dont like to support multiple levels of OS's.

Comment 88 Need Real Name 2001-05-10 16:51:10 UTC

Just to update from my comment above, I took the same 2 machines, disabled on-
board AIC-7880 and added Mylex DAC960P's, single boot drive on one channel and 
other 2 drives raid 0 on the other.  installs, boots, and runs like a champ on 
both machines.  Been running stable for 5 days now, and both installs went 
without a hitch.  I repeated the install twice on both machines just to make 
sure it wasn't a fluke.

All signs point to AIC-7XXX module on 7.1 boot image, and NOT IRQ routing issue 
or bios problem.

Comment 89 Doug Ledford 2001-05-10 19:28:34 UTC

I'm attempting to add two people to the Cc: list that would be more likely able
to give a status update and expected time of fix.

Comment 90 Need Real Name 2001-05-11 18:41:39 UTC

Thanx for the update info.

Comment 91 Need Real Name 2001-05-11 20:15:49 UTC

FWIW;


THIS IS A WORK AROUND for those who DO NOT rely on the embedded scsi controller.

I have the fortune to have multiple machines for testing... 

I have gotten rh7.1 to work, but only with a newer motherboard, and 
off board MegaRaid card and on board EIDE.

We do not rely on the embedded SCSI, so this fix is applicable to my situations.

Compile time options were NO support for Aic7xxxx and lowlevel scsi support 
for just the MegaRaid, network block devices and eide block support.



I noticed a version difference in the controller causing the problems.  Both machines are identical hardware,
aside from the embedded controllers BIOS level.

Machine 1: (WORKS) kernel 2.4.3smp installed over 2.2.16smp
==========

scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.5      <----------- diff
        <Adaptec aic7896/97 Ultra2 SCSI adapter>
        aic7896/97: Wide Channel A, SCSI Id=7, 32/255 SCBs


Machine 2: (Failed) kernel 2.4.3smp (worked) 2.2.16smp
==========


scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.30/3.2.4 <------- diff
       <Adaptec AIC-7896/7 Ultra2 SCSI host adapter>


Work-Around:
	Installed rh 7.0 out of box
	installed kernel source for 2.4.3
	Compiled with NO aic-7xxx support
	Compiled with Lowlevel AMI MegaRaid SCSI support, NFS block device, EIDE block device.
	installed 2.4.3 kernel. 

hickory dfox] uname -a
Linux hickory.networkmcs.com 2.4.3 #13 SMP Fri May 11 14:28:13 EDT 2001 i686 unknown


I hope this helps.....

Comment 92 Doug Ledford 2001-05-11 23:11:30 UTC

Status update: we are going to try and have a new boot disk image set available
for download (at least as a test version) by Monday.

Comment 93 Doug Ledford 2001-05-13 20:37:47 UTC

I am currently uploading a workaround boot disk set.  They should be available
in a few hours (3 1.44 MByte images over a 56K modem link, so it will take a
while).  However, they will appear as:

http://people.redhat.com/dledford/440gx/boot.img
http://people.redhat.com/dledford/440gx/bootnet.img
http://people.redhat.com/dledford/440gx/pcmcia.img

This should allow people to install in 440gx based motherboards.  The
installation instructions are pretty simple.

1)  Using the appropriate boot image diskette for your system, when you boot the
kernel, use the apic command line option, aka:

boot: linux apic

This will force the UP-IOAPIC code on in the boot kernel and allow the interrupt
routing to get properly fixed up.  This allows you to install to these
machines.  However, there is a second issue.

2)  If your machine only has 1 processor installed, then the SMP kernel will not
automatically get installed.  You will need to enable individual package
selection, then go into the kernel packages and select the SMP kernel.  You will
then need to make sure that when you reboot the machine, you tell it to boot the
SMP kernel.  You can make the SMP kernel the default by changing the line:

default = linux

to

default = linux-smp

in the /etc/lilo.conf file and then running the lilo command as root to
re-initialize the master boot record on your hard disks.

That should allow these systems to be operated as any other system.  We will
continue to look for the true cause of the PCI IRQ routing problem (we suspect
it might be a bug in the BIOS PIRQ routing table, so we will be working with
Intel to either verify or deny that suspicion).

Report any problems with the boot disks back here please.

Comment 94 Need Real Name 2001-05-14 06:03:43 UTC

boot.img and intructions work great.
Booted from that image and CDs with out a hitch.
Intel L440gx+ running 14.3 bios

Thank you very much.
I have a new problem with the Xserver and the cirus chips but I think thats a 
different thing. It was there when I tried the XFS boot.

I'll let you know if anything changes

Comment 95 Need Real Name 2001-05-14 06:47:43 UTC

Xserver was acting like it couldn't make up its mind on how to place things or 
refresh the screen.
Fixed video problem by adding 
Option "noaccel"
to the file /usr/X11R6/lib/X11/XF86Config  in section 'Device" after the 
commentedout videoram option.
I added the notes to bug #33095

Comment 96 Need Real Name 2001-05-15 05:24:09 UTC

Created attachment 18350 [details]
Tried special install image--but still hangs.  Work around by not using ZIP drive

Comment 97 Matthias Haase 2001-05-15 13:44:48 UTC

On our PIII 750 server with PCI Adaptec 29169N (SCSI only - ASUS CUBX Chipset
Intel 440 I think) the upgrade from RH 7.0 to 7.1 works fine with -noprobe and
selecting the 'new experimental aicxxx'.

With default 'old' aicxxx, the installer hangs too as described. APIC isn't used
as workaround for 're-initing' the IRQ. Single CPU. Have compiled the new aicxxx
module now in (from RH kernel 2.4.2 rpm-source), no problems, stable since one
week.

BTW, for a clean re-compile of the kernel, I have to patch the sources before,
(see my bugreport ID 40123).

Comment 98 benb 2001-05-15 22:21:42 UTC

I agree that it appears to be an aic7xxx issue, not a kernel/BIOS one. I rebuilt
2.4.4-ac using the new aic7xxx (which the the kenel configure help indicates it
the right one to use, not the old one), and was able to boot, compile kernels
etc. However I've decided to stay away from 2.4 for the time being since there's
still a VM/swap bug that I ran into (processes can get kiled under heavy swap
load) that has not yet been fixed.

Comment 99 Matthew Saltzman 2001-05-16 22:11:03 UTC

I have experienced the problem with a 440BX chipset and Mylex Flashpoint
controller.

Should this workaround work in my case?

Thanks.

Comment 100 Sanjiv Patel 2001-05-17 03:27:32 UTC

Doug,

What about for the rest of us who do not have 440GX chip set? When are you 
going to post a universal fix?

Sanjiv Patel

Comment 101 Doug Ledford 2001-05-17 04:21:48 UTC

If you don't have a 440GX chipset based motherboard then you have a different
problem than the problem in this bug report.  Please open a different bug report
with the specifics of your particular problem.  I have confirmation that the
disk images I posted do indeed work for people suffering from the particular
problem in this bug report, so I'm going to close this bug out.  I'm using the
resolution DEFERRED because I'm still working on the real cause of the problem,
the PCI code.  Since we do have a workaround in place now that people can use,
that shouldn't be a problem.

NOTE: Not all aic7xxx bugs are the same!!!! Please do not assume that if you see
the word aic7xxx in the bug report and you are having some sort of aic7xxx
problem that you automatically have the same problem as the one listed in the
bug report!!!!  You have to read the bug report CLOSELY to see exactly *HOW* the
problem in the bug report is detailed and then see if that actually matches your
problem.  In many cases, it does not!!  For instance, it is a totally different
issue when a machine has non-stop SCSI bus resets that start as soon as the
aic7xxx driver is loaded which never stop and never let the system do anything
(like is the case with the 440GX motherboards in this bug report) as compared to
when a machine is able to load the aic7xxx driver and seems to be OK until you
put it under some sort of load (such as copying the install image to the hard
disk) and only under load does it give resets and timeouts.  Seemingly minor
differences such as this make a huge difference in diagnosing the problem and
determining exactly *where* the problem is actually coming from.

Comment 102 Nam Nam 2001-05-17 10:46:31 UTC

I used boot disks dledford provided. The installation process went smoothly. But
after reboot, system still hang. I suspect the kernel come with Redhat CD-ROM
has been used, not the one in floppy. What should I do? I never do upgrade
kernel, I need more detail help. 

Thanks
Nam

Comment 103 Joshua Douglas 2001-09-06 03:09:28 UTC

Has there been any comment from Intel or have you determined that this is indeed
a IRQ routing issue.  If not what plans have been set forth to resolve the issue
in the future?

Please Advise,


Joshua Douglas
Test Engineer
Enterasys Networks

Comment 104 Doug Ledford 2001-09-19 14:15:57 UTC

We have had contact with Intel, the problem has been acknowledged, and Intel is
working on a possible workaround in the BIOS.  It is, as we said, an interrupt
routing issue.  It is caused by the fact that instead of using the PIIX4 chipset
as the interrupt router on this particular mainboard, Intel has a piece of
custom hardware mounted on the board to do the interrupt routing (for which they
don't want to release programming information, so we can't properly support it).
 The bug is that the PCI PIRQ routing table *Acts* like the PIIX4 is doing the
interrupt routing.  So, as a result of that table, the linux kernel tries to set
the interrupt routing via the PIIX4 and then the machine breaks.  Once the table
is modified to not indicate that the PIIX4 can be used for routing, then things
should start to work properly.

Comment 105 Need Real Name 2002-02-03 18:38:01 UTC

Created attachment 44379 [details]
still don't know how to deal with Adaptex 7896 bug

Comment 106 Jeremy Katz 2002-11-08 17:41:20 UTC

*** Bug 77426 has been marked as a duplicate of this bug. ***

Comment 107 Benjamin Scott 2003-05-07 02:12:07 UTC

See also Bug 78234 (against RHL 8.0), which describes a fix for when the
APIC/SMP workaround does not work.  (In short: Fixed in pristine kernel 2.4.20.)

Comment 108 Kandy Kay Danner 2003-11-04 14:40:44 UTC

 know it's late in coming, but I finally had time to find the 
workaround for installing RedHat on this board.  I installed RedHat 
Advanced Server 2.1 with the 2.4.18 based kernel.
 
I went into the BIOS reset the defaults, rebooted, then went back 
into the BIOS and DISABLED the IO APIC to IRQ mapping, then ENABLED 
the Bridge. 
 
I used this boot switch option:
 
linux noapic noprobe smp
 
I have a Megaraid card, so when the installed said, I don't have any 
drivers to load, do you want to load one?  I said yes and loaded the 
Megaraid driver.  The install completed.
 
You might try manually loading the Adaptec driver this way, or add a 
raid card.

Comment 109 Kandy Kay Danner 2003-11-04 14:43:08 UTC

Created attachment 95705 [details]
Workaround for installing on L440GX+

 know it's late in coming, but I finally had time to find the workaround for
installing RedHat on this board.  I installed RedHat Advanced Server 2.1 with
the 2.4.18 based kernel.
 
I went into the BIOS reset the defaults, rebooted, then went back into the BIOS
and DISABLED the IO APIC to IRQ mapping, then ENABLED the Bridge. 
 
I used this boot switch option:
 
linux noapic noprobe smp
 
I have a Megaraid card, so when the installed said, I don't have any drivers to
load, do you want to load one?	I said yes and loaded the Megaraid driver.  The
install completed.
 
You might try manually loading the Adaptec driver this way, or add a raid card.