Red Hat Bugzilla – Bug 453472
[aacraid] aac_srb: aac_fib_send failed with status 8195
Last modified: 2010-10-22 22:27:06 EDT
Description of problem:
Upgrade to 5.2 from 5.1
Version-Release number of selected component (if applicable):
kernel-PAE - 2.6.18-92.1.6.el5.i686
Every boot into the PAE kernel
Steps to Reproduce:
1. reboot system
2. start with PAE kernel
1. system hangs with error message constantly scrolling
1. normal boot process
only exhibits symptoms in the PAE kernel.
Adaptec 2210 RAID
Hi "M Smith" -- could you attach a sysreport from 5.1, and a boot log of the 5.2
kernel to this issue?
Created attachment 311301 [details]
Boot log 5.2 PAE kernel
Created attachment 311302 [details]
SOS report for 5.1
My customer reports that kernel-PAE-2.6.18-53.1.13.el5 boots fine, but
kernel-PAE-2.6.18-92.el5 fails this way.
This bugzilla has Keywords: Regression.
Since no regressions are allowed between releases,
it is also being proposed as a blocker for this release.
Please resolve ASAP.
Update to kernel kernel-PAE-2.6.18-92.1.10.el5 still exhibits same error condition.
Kernel kernel-2.6.18-92.1.10.el5 does not.
HP uses this driver in some of its servers and will be affected.
I wasn't able to reproduce the behaviour on my test box.
Please test the patched kernel here -
the patch applied here updates the Adaptec aacraid driver from version 1.1-5
to version 1.1-5.
The system halts with a kernel panic with each of these kernels.
Attaching the boot log for each.
Created attachment 316322 [details]
Boot log 2.6.18-109.el5
Created attachment 316323 [details]
Boot log 2.6.18-109.el5PAE
Created attachment 316437 [details]
patch to 18.104.22.1683
Hi "M Smith",
I'm sorry, but I tested it on two different boxes before posting without any issues. The kernel was taken from many other patches until now untested so it could happen, it look to me that it is unrelated to this bz.
I still could not reproduce the problem, so I need further help from you.
In RHEL5.1 is driver version 22.214.171.1247, RHEL5.2 has 126.96.36.1993 and in RHEL5.3 will eventually come planned version 188.8.131.525. I'll post both patches here so you could test the patch planned for RHEL5.3 and when this will not help then remove the 184.108.40.2063.
Created attachment 316438 [details]
patch to 220.127.116.115
I'm sorry but I can't quite figure out what you want me to do.
What should I test now?
As you asking that I compile a kernel with each version of the driver and test each one?
Do I take that patch to 18.104.22.1685 in Comment #13 and use that with the kernel that I downloaded in Comment #8?
(In reply to comment #14)
> Hi Tomas,
> I'm sorry but I can't quite figure out what you want me to do.
> What should I test now?
> As you asking that I compile a kernel with each version of the driver and test
> each one?
Yes, if you could, I thought that it would be more flexible this way.
Start please with kernel sources for 5.1 and patch it with 22.214.171.1245.
When, this will not help then please remove the patch 126.96.36.1993 and test.
If you should have difficulties with compiling the kernel, tell me and I'll do it.
Created two test kernels as follows:
-used source from pub/redhat/linux/enterprise/5Server/en/os/SRPMS
- rpm -ivh kernel*.src.rpm
- Copied off the two patch files from the bugzilla ticket:
- edit /usr/src/redhat/SPECS/kernel-2.6.spec:
change the buildid macro:
%define buildid .2455
Add in first just the 2453 patch, then both that and the 2455 patch.
- Build kernel rpms:
rpmbuild -bb --target=i686 --with pae kernel-2.6.spec
Both kernels fail with the aacraid error as previously.
Bootup logs to be attached.
Created attachment 317232 [details]
boot log 2453 patch
boot log 2453 patch
Created attachment 317233 [details]
boot log 2455 patch
boot log 2455 patch
(In reply to comment #16)
> Created two test kernels as follows:
> Bootup logs to be attached.
Thanks, I've divided the 2453 patch into four parts, but before we begin with this, could you please update the firmware in the raid controller?
We have report from a customer with a similar problem where this this fixed the issue.
I have flashed the BIOS in the 2200S RAID Controller to version 4.2-0.
The same symptoms present themselves when restarting the system with this update applied.
Created attachment 318091 [details]
patch 2453 divided into 5 parts
I divided the 2453 patch into a few parts. Please start the test with compiling the kernel without any of them and without the 2453 and 2455 patch. This should bring us to the 5.1 kernel and it should work fine.
Then add the patches p(x).patch in order to see which patch will cause the problem.
Michael, I would expect that if the AAC_QUIRK_SCSI_32 quirk added to the latest driver were removed for your card, that would also fix the issue. Of the patch series that Tomas has posted, that would correspond to patch 5 that would break your system.
See this post on LKML for another similar issue:
FWIW, I'm seeing this on one of two nearly identical machines running CentOS 4 and kernel-smp-2.6.9-78.0.1.EL.i686. The difference between the two machines is that one has 2GB and the other 4GB, the one with 4GB does not boot.
These machines both have a Adaptec 2120S (Crusader), which also has the AAC_QUIRK_SCSI_32 turned on in patch posted by Tomas in Comment #25.
The same bug is also filed for Fedora 9 under Bug #450444.
We have done patches 1-4 so far and the system starts as expected.
Awaiting for completion of patch 5 to test that.
Created attachment 319352 [details]
this patch removes the AAC_QUIRK_SCSI_32
(In reply to comment #27)
> Tomas, David
> We have done patches 1-4 so far and the system starts as expected.
> Awaiting for completion of patch 5 to test that.
If everything is going as expected kernel with fifth patch will fail.
Try then please the attached patch on top of the 2453 patch (RHEL5.2). It removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controller as suggested by David.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
(In reply to comment #28)
> Created an attachment (id=319352) [details]
> this patch removes the AAC_QUIRK_SCSI_32
> (In reply to comment #27)
> > Tomas, David
> > We have done patches 1-4 so far and the system starts as expected.
> > Awaiting for completion of patch 5 to test that.
> > Michael
> If everything is going as expected kernel with fifth patch will fail.
> Try then please the attached patch on top of the 2453 patch (RHEL5.2). It
> removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controller as suggested
> by David.
> Thanks, Tomas
Patch five does indeed fail.
Applying the patch to the the 2453 one to remove AAC_QUIRK_SCSI_32 produces the desired result.
We will leave the system running this patched kernel for the time being.
Posted today on rhkl.
You can download this test kernel from http://people.redhat.com/dzickus/el5
(In reply to comment #35)
> in kernel-2.6.18-120.el5
> You can download this test kernel from http://people.redhat.com/dzickus/el5
I have installed both of these test kernels:
The system starts normally with both and is currently running the latter; I plan to leave this as the active kernel for now.
Customer tried the test kernel and not getting "aac_srb: aac_fib_send
failed with status: 8195" errors anymore. However dmesg still shows the
following messages :
Dec 4 10:27:32 web kernel: scsi 0:2:13:0: timing out command, waited 22s
Dec 4 10:27:32 web kernel: scsi 0:2:14:0: timing out command, waited 22s
Dec 4 16:21:21 web kernel: scsi 0:2:12:0: timing out command, waited 22s
Dec 4 16:21:21 web kernel: scsi 0:2:13:0: timing out command, waited 22s
Dec 4 16:21:21 web kernel: scsi 0:2:14:0: timing out command, waited 22s
Internal Status set to 'Waiting on SEG'
This event sent from IssueTracker by streeter
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.