Description of problem:
I have problems with serial ata raid (aacraid module) on latest kernel.
It is hanging with kernel panic (see attached log).
Adaptec has latest BIOS from their web page.
Can somebody tell me, if it is a hardware problem or a bug in fedora kernels?
Does somebody similiar hardware configuration without problems?
Version-Release number of selected component (if applicable):
kernel-smp-2.6.16-1.2133_FC5 but also olders (before there was fedora core 4
kernels with same problem).
It is hangig periodically, but not on special action. Now it is hanging aprox. 2
times daily. Before on fedora core 4 it has been once weekly.
If you need more information, I can send it.
Created attachment 131345 [details]
kernel panic message
I am also seeiing this on an IBM eServer xSeries 260 with a IBM ServeRAID 8i
with latest firmware from IBM.
I am running Fedora Core 5 x86_64 with kernel 2.6.17-1.2157_FC5. I see kernel
panics between 15 minutes to 6 hours depending on when the disk IO increases.
I have currently moved the server to a different machine with the exactly the
same hardware configuration to make sure that the other machine just does not
have a hardware issue.
The server has 4 dual core Intel(R) Xeon(TM) MP CPU 3.66GHz with 8gb of RAM. I
am running 6 ~146gb SAS drives in a hardware based raid 10 configuration.
01:02.0 RAID bus controller: Adaptec AAC-RAID (rev 02)
Subsystem: IBM ServeRAID 8i
Flags: bus master, stepping, 66MHz, medium devsel, latency 240, IRQ 169
Memory at eb000000 (64-bit, non-prefetchable) [size=2M]
Memory at eb200000 (32-bit, non-prefetchable) [size=2M]
Memory at d0000000 (32-bit, prefetchable) [size=256M]
[virtual] Expansion ROM at e8020000 [disabled] [size=32K]
Capabilities: [c0] Power Management version 2
Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/1 Enable-
Capabilities: [e0] PCI-X non-bridge device
With FC4 was this the same panic or a different one?
I can't tell you exactly. This raid worked fine on fedora core 4 and also with
some kernels from fedora core 5. But may be traffic on this server has been
lower or there may be other reasons (for example hardware problem).
This raid device has been removed from my server and we are trying to claim
it. Without raid server works without problems.
I want to say this is a bug in the kernel based on the log. Or rather the
problem with the driver. The latest kernel, from mainline (or the FC6 - rawhide)
kernel might have this fixed.
I would recommend using the latest kernel, but not yet - there is still some
instability issues with the rawhide kernels that could bite you.
In the meantime pls give mainline kernel (http://www.kernel.org) a spin to see
if it works there.
I have no abilities to test it now, because this card is not in server now.
This server is in server hosting and I have no direct access.
After each hangup I must go to restart it immediately. :(
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed. See bug 207474 for further details.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.
Thank you for help. This kernel has been updated 6 days ago after an unexpected
reboot (I am using an panic=30 kernel parameter now). 6 days working without
problems, but this bug is happening less often. If you can, please leave this
bug open for aprx. 2 weeks from now. If there will be no problems 2 weeks, I
think this bug is solved.
My system is up for 22 days now.
[ondrejj@ns ~]$ uptime
13:58:32 up 22 days, 3:40, 3 users, load average: 4.29, 4.73, 4.16
I think there is no similiar problem now. You can close this bug.
Thank you again. :)