131703 – U3 kernel panics when loading aacraid module

Bug 131703 - U3 kernel panics when loading aacraid module

Summary: U3 kernel panics when loading aacraid module

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Tom Coughlan
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	135729 (view as bug list)
Depends On:
Blocks:	123574
TreeView+	depends on / blocked

Reported:	2004-09-03 12:56 UTC by Trond H. Amundsen
Modified:	2007-11-30 22:07 UTC (History)
CC List:	25 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-20 20:56:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Example panic from console (1.66 KB, text/plain) 2004-09-03 12:58 UTC, Trond H. Amundsen	no flags	Details
Hand-copied kernel panic messages (731 bytes, text/plain) 2004-09-15 13:44 UTC, John Schmidt	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:550	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4	2004-12-20 05:00:00 UTC

Description Trond H. Amundsen 2004-09-03 12:56:55 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7)
Gecko/20040803 Firefox/0.9.3

Description of problem:
The new kernel from Update 3 panics during boot. The attached panic is
from a Dell PE2650 using latest BIOS (A18) and RAID firmware (2.8-0).
A similar PE2650 with the the same RAID firmware, and another with
previous firmware (2.7-1) doesn't have this problem, but we did
experience this on a PE2550 using 2.7-1 firmware. 

This is somewhat strange as the problem isn't always reproduceable.
One some boxes we don't see this problem, but on the boxes that have
this problem it is always reproduceable.

The latest kernel from Update 2 (2.4.21-15.0.4.ELsmp) works fine on
either firmware releases.


Version-Release number of selected component (if applicable):
kernel-2.4.21-20.ELsmp

How reproducible:
Sometimes

Steps to Reproduce:
1. Upgrade to latest firmware on a Dell PE2650 running RHEL3
2. Install latest kernel from Red Hat
3. Reboot the new kernel
    

Actual Results:  The system panics.

Expected Results:  Normal bootup.

Additional info:

Comment 1 Trond H. Amundsen 2004-09-03 12:58:35 UTC

Created attachment 103432 [details]
Example panic from console

Comment 2 Per Steinar Iversen 2004-09-03 15:04:02 UTC

We see this too, with both Dell 2550 and 2650 servers that have the
PERC3/Di RAID controller, with firmware versions from 2.6 to 2.8
(current).

Comment 3 Cory Ranschau 2004-09-03 16:45:09 UTC

I receive a very simliar error.  A PowerEdge 2650 was updated with
up2date and rebooted and the system worked fine.  Another reboot of
the system brought up the error message as above and The system no
longer boots with the kernel-2.4.21-20.ELsmp kernel.

Comment 4 Bob Plankers 2004-09-07 15:51:58 UTC

Similar problems seen on a Dell PowerEdge 2650 (BIOS A17) with an
Adaptec PERC 3/Di running firmware 2.80 (Build 6089)

Comment 5 Christopher Gilbert 2004-09-07 20:24:06 UTC

Similar issue as above.

PowerEdge 2650 (Bios A18)
Adaptec PERC 3/Di firmware 2.80 (Build 6092)

kernel-2.4.21-20.ELsmp from up2date panics.

kernel-2.4.21-4.ELsmp runs perfectly fine.

Comment 6 Günther Höpfner 2004-09-08 15:42:31 UTC

DITO

PowerEdge 2650 two years old
kernel-2.4.21-20.ELsmp up2date ->kernel panic 

Errormessage from aacraid
EIP aac_info[aacraid 0x12](2.4.21-20ELSMP/i668)

PowerEdge 2650 NEW

PowerEdge 2650 two years old
kernel-2.4.21-20.ELsmp up2date ->kernel panic 

Errormessage from aacraid
EIP aac_info[aacraid 0x12](2.4.21-20ELSMP/i668)

SingleKernels 2.4.21-20 running

firmware version not known

Comment 7 Tom Coughlan 2004-09-08 17:18:27 UTC

We are investigating.  

In the meantime, the previous version of the aacraid driver is
perserved in U3 as aacraid_00909.o.  I expect that the older driver
will not have this problem.  If you are able to get the system up, you
can change aacraid to aacraid_00909 in modules.conf, re-make the
initrd, then boot the U3 kernel.  We will develop a more complete
solution once we determine the cause of the problem.

Comment 8 Tom Coughlan 2004-09-08 19:17:12 UTC

Sorry, aacraid_00909 is the alternate driver in RHEL 2.1 (aacraid v0.9.9)

The alternate driver available in RHEL 3 U3 is aacraid_10102 (v1.1.2).

Comment 9 Stefan Hudson 2004-09-09 02:33:33 UTC

Additional information:

Seems to be SMP related.  
2.4.21-20.EL-smp on Dell PowerEdge 2550 (BIOS A09) with PERC3/Di
(firmware 2.7-0 Build 3546) panics, but 2.4.21-20.EL works. 

Confirmed - aacraid_10102 does work in 2.4.21-20.EL-smp on this machine.

Also note that a Dell PowerEdge 2650 (BIOS A18) with PERC/Di (firmware
V2.8-0 Build 6089) does NOT have this issue.

Comment 10 Mark Salyzyn 2004-09-09 13:07:53 UTC

Apply the following patch to drivers/scsi/aacraid/linit.c (only 
affects some 2.4.*, issue not present in 2.6.* trees with this 
driver):

--- linit.c.orig     Thu Sep  9 05:15:41 2004
+++ linit.c     Thu Sep  9 05:18:18 2004
@@ -413,7 +413,9 @@
 const char *aac_info(struct Scsi_Host *shost)
 {
        struct aac_dev *dev = (struct aac_dev *)shost->hostdata;
-       return aac_drivers[dev->cardtype].name;
+       if (dev)
+               return aac_drivers[dev->cardtype].name;
+       return AAC_DRIVERNAME;
 }

 /**

Comment 11 Petter Reinholdtsen 2004-09-09 14:58:09 UTC

What is triggering this problem?  Which of our PowerEdge 2650 are
safe to upgrade, and which one should we leave running the old
kernel?  How can we check if it is safe to run the latest RH kernel?

Comment 12 Mark Salyzyn 2004-09-09 15:01:45 UTC

This is a bug in the Linux aacraid driver, Peter. No aacraid based 
card is `safe'. I have discovered that it is a layered onion and 
needs further refinement to the patch I just submitted for testing 
here:

--- linit.c.badinfo     Thu Sep  9 05:15:41 2004
+++ linit.c     Thu Sep  9 07:28:07 2004
@@ -412,7 +412,17 @@

 const char *aac_info(struct Scsi_Host *shost)
 {
+#if ((LINUX_VERSION_CODE <= KERNEL_VERSION(2,5,0)) && defined
(MODULE))
+       struct aac_dev *dev;
+       if (shost == aac_dummy)
+               return AAC_DRIVERNAME;
+       dev = (struct aac_dev *)shost->hostdata;
+       if (!dev
+        || (dev->cardtype >= (sizeof(aac_drivers)/sizeof(aac_drivers
[0]))))
+               return AAC_DRIVERNAME;
+#else
        struct aac_dev *dev = (struct aac_dev *)shost->hostdata;
+#endif
        return aac_drivers[dev->cardtype].name;
 }

Comment 13 John Schmidt 2004-09-15 13:44:40 UTC

Created attachment 103861 [details]
Hand-copied kernel panic messages

Partial oops messages, some rolled off the console.

Comment 14 John Schmidt 2004-09-15 13:47:27 UTC

Also get this on 2650, Phoenix BIOS Revision A15, Dell PowerEdge 
Expandable RAID controller 3/Di BIOS v2.7-1, Dell PowerEdge 
Expandable RAID controller BIOS v3.31.  Occurs 50% of the time using 
the 2.4.20-20ELsmp kernel, not on the 2.4.20-20EL kernel or the
2.4.20-15ELsmp kernel.  See attachment #103861 [details].

Comment 15 Valdis Kletnieks 2004-09-16 17:04:26 UTC

Count me as another Dell 2650 user.. Thanks for the heads-up regarding
aacraid_10102.

I was able to boot the -20 non-SMP kernel, and use that to copy the
aacraid_10102.o from /lib/modules into the initrd, and the system
was able to boot the SMP kernel.

Quick summary of workaround:

zcat /boot/initrd-2.4.21-20.ELsmp.img > /tmp/smp.initrd
mount -o loop /tmp/smp.initrd /mnt/loop
cp
/lib/modules/2.4.21-20.ELsmp/kernel/drivers/addon/aacraid_10102/aacraid_10102.o
/mnt/loop/lib/aacraid.o
umount /mnt/loop
gzip /tmp/smp.initrd
cp /tmp/smp.initrd.gz /boot/initrd-2.4.21-20.ELsmp-test.img

and then point grub at that initrd.

Comment 16 Stefan Hudson 2004-09-16 19:31:44 UTC

That will work, but an easier way to do the workaround is:

edit /etc/modules.conf and replace "aacraid" with "aacraid_10102"

/sbin/mkinitrd /boot/initrd-2.4.21-20.ELsmp.img 2.4.21-20.ELsmp

Just remember to fix /etc/modules when redhat releases a fixed kernel.

Comment 17 WhidbeyNet 2004-09-21 17:32:58 UTC

We also experience this exact issue after going from 2.4.21-15.0.4 to
2.4.21-20.  System won't boot;  kernel panic relating to aacraid
module as described in comment #1. PE2650 with 3-disk RAID5 on
PERC3/di (older firmware, don't have version handy).

Comment 19 Tom Coughlan 2004-09-22 11:09:13 UTC

We are planning to include Mark's fix (comment 12) in RHEL 3 U4. We
are also going to put the fix in AS 2.1 U6, even though the bug does
not exhibit itself there.

Comment 20 Gunther Schlegel 2004-09-22 11:38:35 UTC

Tom,

that is not a good idea, unless you preserve both aacraid_10102.o and 
the buggy version in U4. Otherwise there might be no stable aacraid
driver in the U4 kernel. 

Personally I would prefer a patched kernel for U3.

Comment 21 nathan r. hruby 2004-09-22 11:47:40 UTC

Whoa.  Not acceptable.  This is a bug that prevents booting of a machine, not a FE.  

Many people *will* boot this config and get burned.  Save yourself and us the headache of 
3-4 months of dealing with workarounds and having to build unsupported driver modules 
and just fix it.  Please?

FWIW, the Dell PERC3/Di, an aacraid based card, has some serious issues, so Dell is 
saying: Please upgrade to the most recent driver to try to fix your random lockups and 
freezes!"  which people do because RHEL3-UPD3 has a newer version of the drive, plus 
support, and lo and behold it breaks your machine in a total and complete way.

I know that respinning the kernel is a PITA, but it needs to be done.

Comment 22 Tom Coughlan 2004-09-22 12:02:11 UTC

I will propose to include the fix in the next feasible RHEL 3 U3
erratum release.  

In U4 I am inclined to keep aacraid_10102.o and the patched 1.1.5
version.  No need to keep the buggy version.

Comment 23 Lance A. Brown 2004-09-22 13:22:27 UTC

Getting this fixed sooner than later is the right answer.  My group
has taken a bit of a black eye after updating our server to Update 3
and running into this problem head on.  I'm not pleased to have to
integrate a workaround into production systems.

I would very much like to see this bug fixed asap

Comment 24 Ernie Petrides 2004-09-29 10:38:32 UTC

A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.13.EL).

Comment 25 Trond H. Amundsen 2004-09-29 10:56:13 UTC

Ernie, does this mean that a fix won't be around until U4 is released?
I wouldn't think this an acceptable solution, the PE2650 being one of
the most mainstream server boxes out there...

BTW, has anyone else experienced that kudzu hangs indefinately when
booting boxes that have the 2.4.20-20.ELsmp with workaround applied?

Comment 26 Ernie Petrides 2004-09-29 11:04:14 UTC

Trond, there are no planned releases before U4.  Feel free to
contact customer support to lobby for a hot-fix for this.

Comment 27 Trond H. Amundsen 2004-09-29 11:19:51 UTC

Hmm.. I think Tom's suggestion in comment #22 is a good one, why not
include the fix in the next security update? Besides, as an academic
institution we don't have support, but we'll cope anyway :)

Comment 28 Matt Domsch 2004-09-29 12:33:59 UTC

It is slated to be included in any security errata that comes out
before Update 4, but there is none pending right now and with luck
there won't be.

If you can't use the older included driver for some reason, you can
use the 1.1.4.2302.1 driver in DKMS format on linux.dell.com.

Comment 29 James Boyle 2004-10-03 01:18:35 UTC

I have two identical Dell 2650's with Perc 3/Di (their worst ever)
cards.  I get a kernel panic with :

AAC0: kernel 2.8.4 build 6089  <--- BIOS build no.
AAC0: monitor 2.8.4 build 6089
AAC0: bios 2.8.0 build 6089
AAC0: serial f5d881d3fafaf001
&& kernel 2.4.21-20.ELsmp

but not with:
AAC0: kernel 2.8-0 build 6082 <--- BIOS build no.
AAC0: monitor 2.8-0 build 6082
AAC0: bios 2.8-0 build 6082
AAC0: serial f5d881d3
AAC0: 64 Bit DAC enabled
&& kernel 2.4.21-20.ELsmp

my temporary solution will be to use kernel 2.4.21-15.0.4.ELsmp on the
2650 with the newer RAID BIOS.

Both systems have A18 system BIOS.

Comment 30 zwlu 2004-10-06 21:30:08 UTC

I have two PE2650 (identical firmware setup), one of them kernel panic
with 2.4.21-20smp but okay with 2.4.21-20 non-smp kernel.  It appears
to  me that the system will panic if it has an IMPEFERECT raid status,
otherwise, it will boot into 2.4.21-20smp just fine. 
Here are the situdations that I encountered: (I have mirorr 0 for the
boot disk, disk0 and disk1)

1. Disk1 was flashing orange-green light, booting failed for 2.4.21-20smp
2. Re-insert disk1, no flashing orange-green light, booting 2.4.21-20
smp succeeded
3. Disk1 flashed orange-green light again after some time, rebooting
2.4.21-20 failed again.
4. Pull out disk1, booting 2.4.21-20smp failed.
5. Put in the new disk1 from DELL (raid container rebuilding), booting
2.4.21-20smp worked.
6. Shutdown the system, while (raid was rebuilding, RAID is
imperfect), rebooting to 2.4.21-20smp failed.
7. Booting into working kernel and having finished raid rebuild,
rebooting into 2.4.21-20smp worked again.

When will we have the kernel patch?  I hated to do those work-around.

Comment 31 Tom Coughlan 2004-10-07 20:30:37 UTC

> When will we have the kernel patch?

As stated earlier, the patch in comment 12 will be in U4, and it is
also proposed to be in the first U3 errata, if there is one.

> I hated to do those work-around.

There are some better workarounds in comment 7/8, and 28.

Comment 32 Tom Coughlan 2004-10-07 20:33:54 UTC

*** Bug 134936 has been marked as a duplicate of this bug. ***

Comment 33 nathan r. hruby 2004-10-14 09:53:37 UTC

verify ident. behavior on 2550 + perc3/Di

scsi3 : percraid
^MAAC0: kernel 2.8-0 build 6092
^MAAC0: monitor 2.8-0 build 6092
^MAAC0: bios 2.8-0 build 6092
^MAAC0: serial c9ec01d2
^MAAC0: ROMB RAID/SCSI mode enabled
^MAAC0: Non-DASD support enabled
^MUnable to handle kernel paging request at virtual address 38385e84
^M printing eip:
^Mf88730d2
^M*pde = 8c554972
^MOops: 0000
^Maacraid megaraid aic7xxx diskdumplib sd_mod scsi_mod  
^MCPU:    0
^MEIP:    0060:[<f88730d2>]    Not tainted
^MEFLAGS: 00010282

^MEIP is at aac_info [aacraid] 0x12 (2.4.21-20.ELsmp/i686)
^Meax: e7f60e80   ebx: f7fa6e80   ecx: f88730c0   edx: f887eec0
^Mesi: c4e46c80   edi: f7fa6e47   ebp: f887c55b   esp: c3757e88
^Mds: 0068   es: 0068   ss: 0068
^MProcess insmod (pid: 23, stackpage=c3757000)
^MStack: f880f336 c4e46c80 f7fa7900 ffffffff 00000000 c03f2324
00000246 f7fa7800 ^M       f887e670 f887ef40 f7fa6e80 00000001
f7fa6e80 f887c562 f8810c19 c4e46c80 ^M       00000020 00000001
00000007 00000001 c4e46c80 00000000 00000001 00000001 ^MCall Trace:  
[<f880f336>] scsi_setup_host [scsi_mod] 0xb6 (0xc3757e88)
^M[<f887e670>] aac_pci_tbl [aacraid] 0x70 (0xc3757ea8)
^M[<f887ef40>] aac_pci_driver [aacraid] 0x0 (0xc3757eac)
^M[<f887c562>] .rodata.str1.1 [aacraid] 0x2a (0xc3757ebc)
^M[<f8810c19>] scsi_register_Rsmp_4853a9b7 [scsi_mod] 0x299 (0xc3757ec0)
^M[<f8873eb4>] init_module [aacraid] 0xc4 (0xc3757eec)
^M[<f887eec0>] aac_driver_template [aacraid] 0x0 (0xc3757ef0)
^M[<f887ee60>] aac_cfg_fops [aacraid] 0x0 (0xc3757ef8)
^M[<c012ab26>] sys_init_module [kernel] 0x5b6 (0xc3757f0c)
^M[<f887d4a8>] .kmodtab [aacraid] 0x0 (0xc3757f20)
^M[<f8873060>] aac_detect [aacraid] 0x0 (0xc3757f2c)
^M[<f887d2f0>] __ksymtab [aacraid] 0x0 (0xc3757f30)
^M[<f8873060>] aac_detect [aacraid] 0x0 (0xc3757f58)

^MCode: 8b 04 c5 84 ea 87 f8 c3 8d b6 00 00 00 00 8b 44 24 04 8d 04

^MKernel panic: Fatal exception

Comment 34 nathan r. hruby 2004-10-14 16:26:57 UTC

verify crash still happens on 2550 + 3/Di with Build 6092 + aacraid_10102

Comment 35 Tom Coughlan 2004-10-14 18:57:02 UTC

You should not get this crash with aacraid_10102. Were you using
aacraid_10102, or the default aacraid (1.1.5-xxxx)?

Comment 36 nathan r. hruby 2004-10-14 22:48:20 UTC

Sorry I'm being unclear.  

#33 comes from a system using 1.1.5 (eg: RHEL 3 UPD 3 default driver)

The crash mentioned in #34 is the same system using aacraid_10102.  The crash is a hard 
deadlock fo the machine (no panic or OOPS).  1.1.5 + 6092 is supposed to fix the 
deadlock.  Maybe.

Both were same hardware, so why was it running 2 drivers today you ask?   It was 
upgraded from UPD2 to UPD3 finally this morning and is unhappy with all the drivers.  I 
tried on a fluke to see if 1.1.5 + 6092 would boot (it does for another identical machine) 
and it did not so I backed down to aacraid_10102 to get the machine to boot but it's still 
crashing.

Comment 37 Tom Coughlan 2004-10-15 12:07:01 UTC

*** Bug 135729 has been marked as a duplicate of this bug. ***

Comment 38 Tom Coughlan 2004-10-15 14:19:22 UTC

Nathan,

Try this pre-release kernel:

http://people.redhat.com/coughlan/RHEL3-perf-test/

Warning: this is a pre-beta U4 test kerrnel.  It has not been through
QA.  It must not be used in production.  It is only to be used for
early testing and feedback.

This kernel has the 1.1.5 driver with the patch in comment 12. If you
still have the deadlock with latest firmware, then please open a new
BZ. You have a different problem. 

Tom

Comment 39 Alan Madill 2004-10-31 17:50:54 UTC

I had the same problem and was fixed by re-creating the initial ram 
disk.  Not sure why.  Booted to the previous kernel version.
cd /boot
mv initrd-2.4.21-20.ELsmp.img initrd-2.4.21-20.ELsmp.img.old
mv initrd-2.4.21-20.EL.img initrd-2.4.21-20.EL.img.old
mkinitrd initrd-2.4.21-20.ELsmp.img 2.4.21-20.ELsmp
mkinitrd initrd-2.4.21-20.EL.img 2.4.21-20.EL

Comment 40 Edmond Baroud 2004-11-04 04:25:05 UTC

Two Dell 2450 with Perc 3/Si doing the same thing.
Will apply the patch tomorrow and see.

Comment 41 Andrew Robinson 2004-11-04 18:40:07 UTC

Based on Stefan Hudson's Additional Comment #16, I put together this
script. I've run it on my existing 2650's and added it to my kickstart
 for the servers I'm building. HTH...

#!/bin/sh
# The aacraid driver released with Red Hat Enterprise
# Linux 3, Update 3 has problems that can prevent a Dell
# PowerEdge 2650 server from booting. The workaround is to
# use the older aacraid_10102 driver. Two changes are
# needed to implement this. The /etc/modules.conf file
# should specify the aacraid_10102 module. An initrd file
# containing the other driver needs to be in place to
# make the correct driver available at bootup.

# Modify the modules.conf

timestamp=$( date "+%y%m%d%H%M%S" )

cp /etc/modules.conf /etc/modules.conf.${timestamp}
patch /etc/modules.conf <<EOPATCH
3c3,6
< alias scsi_hostadapter aacraid
---
> # For RHEL 3 EL U3, there is a bug with the aacraid driver.
> # The workaround is to use the aacraid_10102 driver. Be sure
> # to change this back with future RHEL version upgrades.
> alias scsi_hostadapter aacraid_10102
EOPATCH

# Rebuild the initrd file

mv /boot/initrd-2.4.21-20.ELsmp.img
/boot/initrd-2.4.21-20.ELsmp.img.${timestamp}
mkinitrd /boot/initrd-2.4.21-20.ELsmp.img 2.4.21-20.ELsmp

Comment 42 Trond H. Amundsen 2004-12-02 14:53:29 UTC

I see that the fix hasn't been included in 2.4.20-20.0.1.ELsmp, as
suggested in comment #28 and comment #31. Whether this is a slip-up or
a thought-through decision is unknown, but anyway it's a disappointment
that Red Hat doesn't take this problem more seriously.

Comment 43 Richard Lloyd 2004-12-06 17:53:07 UTC

Just a note that I recently installed RHEL 3 on a Fujitsu-Siemens
Primergy RX600 server (2 CPUs, Adaptec AIC-7902 U320 hardware RAID)
and downloaded 222 (!) RPM updates, including the 2.4.21-20.0.1.ELsmp
kernel and am seeing the same reboot crash that people are here, so
it's not just restricted to Dell Poweredges.

The crash seems to be intermittent and the latest one I got was during
an "insmod" of the aacraid driver according to the console output
(i.e. pretty well identical to Nathan's crash output in comment #33).

I must say that releasing a new kernel on 2nd December without this
problem fixed is very poor when the fix has been on this thread for
over 2 months. Priority "high" and Severity "high" apparently aren't
good enough to get this crucial fix in the kernel :-(

Comment 44 Jacob Kaplan-Moss 2004-12-06 18:02:49 UTC

Just want to add my name to the list of people increasingly
disappointed that this has yet to be fixed.  This really is a big deal
for us; the bug affects 3 production servers.

Comment 45 Lance A. Brown 2004-12-06 18:33:00 UTC

And me also.  I have 20+ servers affected by this bug.  It was very
bad form for RedHat to release a security update kernel and not
include a fix for this bug.

Comment 46 Ernie Petrides 2004-12-07 00:12:31 UTC

I apologize that Update 4 wasn't released last Wednesday as originally
scheduled.  It is now scheduled for release next week, and it will
contain the fix you've been waiting for.

Comment 47 Trond H. Amundsen 2004-12-07 09:53:08 UTC

Ah... so the reason that the fix wasn't included in the latest errata
is that U4 is just around the corner? Then I withdraw my critisism and
look forward to the arrival of U4. Good work guys :)

(Since we're counting.. I have 80+ pe2650s. I think I'm in the lead ;)

Comment 50 Frank Swasey 2004-12-08 18:30:53 UTC

Just to add a "me too"

I am having the same problem with an IBM x306 that has an Adaptec raid
controller and is using the aacraid module.  I am now able to boot
using the workaround in comment #16 above.

I am also looking forward to the release of U4 next week.

Comment 51 Thomas Petersen 2004-12-17 01:16:10 UTC

Hi, it's 'next week' and we are still waiting....

Comment 52 Justin McNutt 2004-12-17 13:26:56 UTC

Calm down, calm down.  U4 for RHEL 2.1 *did* release this week (I got
all the e-mails).  One can only assume that either U4 for 3.0 is
imminent, or some information came to light during the 2.1 release
that is delaying 3.0.

Sux, since I could use the kernel patch, too, and also sux since we
don't know why it's not out yet, but hey.  That's life.  Programming's
hard.  If U4 comes out next week, I'll be happy.  If all software (and
construction) projects were only a week late, I'd be f***ing ecstatic...

--J

P.S.  But I *want* that kernel update... :-/

Comment 53 Tom Coughlan 2004-12-17 18:59:56 UTC

U4 is scheduled to hit RHN on Monday.

Comment 54 John Flanagan 2004-12-20 20:56:09 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Comment 55 Vlady 2004-12-21 16:48:23 UTC

I don't think the problem is in the aacraid driver!
I have PE2250 with BIOS version A09 and PERC 3/Di firmware version
2.8.0 build 6092.
After kernel update to 2.4.21-27ELsmp system won't boot.
I tryed both accraid and aacraid_10102 drivers but the effect was a
series of "Segmentation fault" on each attempt operation with
filesystem to be done.
As a final result system freezed.
aacraid_10102 is a preserved version of the old aacraid driver and if
the problem was in the driver the system has to be able to boot with
it, but ir won't. I think the problem is in the interaction between
the kernel the the driver.

We had identical problems with PE1650 server which has just SCSI
controller (not RAID).

The only possible way to bring the machines back was to use the old
kernel -- 2.4.21-15.0.4

Comment 56 Vlady 2004-12-21 16:49:52 UTC

I don't think the problem is in the aacraid driver!
I have PE2250 with BIOS version A09 and PERC 3/Di firmware version
2.8.0 build 6092.
After kernel update to 2.4.21-27ELsmp system won't boot.
I tryed both accraid and aacraid_10102 drivers but the effect was a
series of "Segmentation fault" on each attempt operation with
filesystem to be done.
As a final result system freezed.
aacraid_10102 is a preserved version of the old aacraid driver and if
the problem was in the driver the system has to be able to boot with
it, but ir won't. I think the problem is in the interaction between
the kernel the the driver.

We had identical problems with PE1650 server which has just SCSI
controller (not RAID).

The only possible way to bring the machines back was to use the old
kernel -- 2.4.21-15.0.4

Comment 57 Tom Coughlan 2004-12-21 18:24:45 UTC

Vlady,

The problem you are describing is not the same as the problem reported
in this bugzilla.  Please open a new bugzilla. When you do, provide
the console output showing the driver being loaded and the device
configuration messages, and the subsequent error messages. 

Also, which driver is being used in the non-RAID PE1650 system?

Tom

Note You need to log in before you can comment on or make changes to this bug.

amanthei
boylej
cgilbert
coughlan
cpbarton
frank.swasey
george.liu
hgarcia
hudson
jacob
lance
matt_domsch
mcnuttj
nhruby
pere
persteinar.iversen
petrides
plankers
riel
schlegel
tao
tom.petersen
valdis.kletnieks
webmaster
zwlu