Bug 453472 - [aacraid] aac_srb: aac_fib_send failed with status 8195
[aacraid] aac_srb: aac_fib_send failed with status 8195
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
i686 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Tomas Henzl
Martin Jenner
: OtherQA, Regression, ZStream
Depends On:
Blocks: 391501 429709 466885
  Show dependency treegraph
 
Reported: 2008-06-30 16:36 EDT by M Smith
Modified: 2010-10-22 22:27 EDT (History)
17 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-01-20 14:41:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Boot log 5.2 PAE kernel (17.59 KB, text/plain)
2008-07-08 13:58 EDT, M Smith
no flags Details
SOS report for 5.1 (388.85 KB, application/octet-stream)
2008-07-08 13:58 EDT, M Smith
no flags Details
Boot log 2.6.18-109.el5 (8.19 KB, text/plain)
2008-09-10 11:35 EDT, M Smith
no flags Details
Boot log 2.6.18-109.el5PAE (8.63 KB, text/plain)
2008-09-10 11:36 EDT, M Smith
no flags Details
patch to 1.1.5.2453 (49.81 KB, patch)
2008-09-11 09:27 EDT, Tomas Henzl
no flags Details | Diff
patch to 1.1.5.2455 (117.89 KB, patch)
2008-09-11 09:28 EDT, Tomas Henzl
no flags Details | Diff
boot log 2453 patch (13.02 KB, text/plain)
2008-09-19 15:51 EDT, M Smith
no flags Details
boot log 2455 patch (12.97 KB, text/plain)
2008-09-19 15:51 EDT, M Smith
no flags Details
patch 2453 divided into 5 parts (12.16 KB, application/octet-stream)
2008-09-30 11:26 EDT, Tomas Henzl
no flags Details
this patch removes the AAC_QUIRK_SCSI_32 (1.76 KB, patch)
2008-10-03 08:02 EDT, Tomas Henzl
no flags Details | Diff

  None (edit)
Description M Smith 2008-06-30 16:36:47 EDT
Description of problem:

   Upgrade to 5.2 from 5.1  

Version-Release number of selected component (if applicable):

 kernel-PAE.i686 2.6.18-92.1.1.el5
 kernel-PAE - 2.6.18-92.1.6.el5.i686

How reproducible:

Every boot into the PAE kernel


Steps to Reproduce:
1. reboot system
2. start with PAE kernel

  
Actual results:

1. system hangs with error message constantly scrolling


Expected results:

1. normal boot process

Additional info:

only exhibits symptoms in the PAE kernel.

Adaptec 2210 RAID
Comment 1 Prarit Bhargava 2008-07-02 08:51:52 EDT
Hi "M Smith" -- could you attach a sysreport from 5.1, and a boot log of the 5.2
kernel to this issue?

Thanks,

P.
Comment 2 M Smith 2008-07-08 13:58:06 EDT
Created attachment 311301 [details]
Boot log 5.2 PAE kernel
Comment 3 M Smith 2008-07-08 13:58:57 EDT
Created attachment 311302 [details]
SOS report for 5.1
Comment 4 Guy Streeter 2008-07-31 14:15:02 EDT
My customer reports that kernel-PAE-2.6.18-53.1.13.el5 boots fine, but
kernel-PAE-2.6.18-92.el5 fails this way.
Comment 5 RHEL Product and Program Management 2008-07-31 14:15:43 EDT
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 6 M Smith 2008-08-06 09:04:19 EDT
Update to kernel kernel-PAE-2.6.18-92.1.10.el5 still exhibits same error condition.

Kernel kernel-2.6.18-92.1.10.el5 does not.
Comment 7 Sandy Garza 2008-09-05 15:24:00 EDT
HP uses this driver in some of its servers and will be affected.
Comment 8 Tomas Henzl 2008-09-10 10:27:28 EDT
Hi All,
I wasn't able to reproduce the behaviour on my test box.
Please test the patched kernel here -
http://people.redhat.com/thenzl/bz453472/2455/
the patch applied here updates the Adaptec aacraid driver from version 1.1-5[2453]
to version 1.1-5[2455].
Comment 9 M Smith 2008-09-10 11:34:27 EDT
Greetings all.

The system halts with a kernel panic with each of these kernels.

Attaching the boot log for each.
Comment 10 M Smith 2008-09-10 11:35:28 EDT
Created attachment 316322 [details]
Boot log 2.6.18-109.el5
Comment 11 M Smith 2008-09-10 11:36:25 EDT
Created attachment 316323 [details]
Boot log  2.6.18-109.el5PAE
Comment 12 Tomas Henzl 2008-09-11 09:27:53 EDT
Created attachment 316437 [details]
patch to 1.1.5.2453

Hi "M Smith",
I'm sorry, but I tested it on two different boxes before posting without any issues. The kernel was taken from many other patches until now untested so it could happen, it look to me that it is unrelated to this bz.
I still could not reproduce the problem, so I need further help from you.
In RHEL5.1 is driver version 1.1.5.2437, RHEL5.2 has 1.1.5.2453 and in RHEL5.3 will eventually come planned version 1.1.5.2455. I'll post both patches here so you could test the patch planned for RHEL5.3 and when this will not help then remove the 1.1.5.2453.
Thanks
Comment 13 Tomas Henzl 2008-09-11 09:28:39 EDT
Created attachment 316438 [details]
patch to 1.1.5.2455
Comment 14 M Smith 2008-09-11 14:30:34 EDT
Hi Tomas,

I'm sorry but I can't quite figure out what you want me to do.

What should I test now?

As you asking that I compile a kernel with each version of the driver and test each one?

Do I take that patch to 1.1.5.2455 in Comment #13 and use that with the kernel that I downloaded in Comment #8?


Thanks
Comment 15 Tomas Henzl 2008-09-12 05:13:06 EDT
(In reply to comment #14)
> Hi Tomas,
> 
> I'm sorry but I can't quite figure out what you want me to do.
> 
> What should I test now?
> 
> As you asking that I compile a kernel with each version of the driver and test
> each one?

Yes, if you could, I thought that it would be more flexible this way. 
Start please with kernel sources for 5.1 and patch it with 1.1.5.2455.
When, this will not help then please remove the patch 1.1.5.2453 and test.
If you should have difficulties with compiling the kernel, tell me and I'll do it.
Thanks
Comment 16 M Smith 2008-09-19 15:50:11 EDT
Created two test kernels as follows:

-used source from pub/redhat/linux/enterprise/5Server/en/os/SRPMS

kernel-2.6.18-53.1.21.el5.src.rpm

- rpm -ivh kernel*.src.rpm

- Copied off the two patch files from the bugzilla ticket: 
aacraid-1.1.5.2453.patch
aacraid-1.1.5.2455.patch

- edit /usr/src/redhat/SPECS/kernel-2.6.spec:

change the buildid macro: 
%define buildid .2455

Add in first just the 2453 patch, then both that and the 2455 patch. 

- Build kernel rpms: 

rpmbuild -bb --target=i686 --with pae kernel-2.6.spec


Both kernels fail with the aacraid error as previously.

Bootup logs to be attached.


Thanks
Comment 17 M Smith 2008-09-19 15:51:20 EDT
Created attachment 317232 [details]
boot log 2453 patch

boot log 2453 patch
Comment 18 M Smith 2008-09-19 15:51:53 EDT
Created attachment 317233 [details]
boot log 2455 patch

boot log 2455 patch
Comment 23 Tomas Henzl 2008-09-29 05:04:04 EDT
(In reply to comment #16)
> Created two test kernels as follows:
> Bootup logs to be attached.

Thanks, I've divided the 2453 patch into four parts, but before we begin with this, could you please update the firmware in the raid controller? 
We have report from a customer with a similar problem where this this fixed the issue.
Tomas
Comment 24 M Smith 2008-09-29 17:31:25 EDT
Hello Tomas.

I have flashed the BIOS in the 2200S RAID Controller to version 4.2-0[8205].

The same symptoms present themselves when restarting the system with this update applied.


Michael
Comment 25 Tomas Henzl 2008-09-30 11:26:51 EDT
Created attachment 318091 [details]
patch 2453 divided into 5 parts

Michael,
I divided the 2453 patch into a few parts. Please start the test with compiling the kernel without any of them and without the 2453 and 2455 patch. This should bring us to the 5.1 kernel and it should work fine.
Then add the patches p(x).patch in order to see which patch will cause the problem.
Thanks,
Tomas
Comment 26 David Rees 2008-09-30 19:27:39 EDT
Michael, I would expect that if the AAC_QUIRK_SCSI_32 quirk added to the latest driver were removed for your card, that would also fix the issue. Of the patch series that Tomas has posted, that would correspond to patch 5 that would break your system.

See this post on LKML for another similar issue:

http://marc.info/?l=linux-kernel&m=122166454808377&w=2

FWIW, I'm seeing this on one of two nearly identical machines running CentOS 4 and kernel-smp-2.6.9-78.0.1.EL.i686. The difference between the two machines is that one has 2GB and the other 4GB, the one with 4GB does not boot.

These machines both have a Adaptec 2120S (Crusader), which also has the AAC_QUIRK_SCSI_32 turned on in patch posted by Tomas in Comment #25.

The same bug is also filed for Fedora 9 under Bug #450444.
Comment 27 M Smith 2008-10-02 16:27:21 EDT
Tomas, David

We have done patches 1-4 so far and the system starts as expected.

Awaiting for completion of patch 5 to test that.

Michael
Comment 28 Tomas Henzl 2008-10-03 08:02:46 EDT
Created attachment 319352 [details]
this patch removes the AAC_QUIRK_SCSI_32

(In reply to comment #27)
> Tomas, David
> 
> We have done patches 1-4 so far and the system starts as expected.
> 
> Awaiting for completion of patch 5 to test that.
> 
> Michael
If everything is going as expected kernel with fifth patch will fail.

Try then please the attached patch on top of the 2453 patch (RHEL5.2). It removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controller as suggested by David.
Thanks, Tomas
Comment 29 RHEL Product and Program Management 2008-10-03 08:15:32 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 30 M Smith 2008-10-03 17:15:43 EDT
(In reply to comment #28)
> Created an attachment (id=319352) [details]
> this patch removes the AAC_QUIRK_SCSI_32
> (In reply to comment #27)
> > Tomas, David
> > 
> > We have done patches 1-4 so far and the system starts as expected.
> > 
> > Awaiting for completion of patch 5 to test that.
> > 
> > Michael
> If everything is going as expected kernel with fifth patch will fail.
> Try then please the attached patch on top of the 2453 patch (RHEL5.2). It
> removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controller as suggested
> by David.
> Thanks, Tomas

Patch five does indeed fail.

Applying the patch to the the 2453 one to remove AAC_QUIRK_SCSI_32 produces the desired result.

We will leave the system running this patched kernel for the time being.

Thanks,

Michael
Comment 33 Tomas Henzl 2008-10-13 10:49:54 EDT
Posted today on rhkl.
Comment 35 Don Zickus 2008-10-20 11:12:47 EDT
in kernel-2.6.18-120.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 38 M Smith 2008-10-20 16:21:09 EDT
(In reply to comment #35)
> in kernel-2.6.18-120.el5
> You can download this test kernel from http://people.redhat.com/dzickus/el5

I have installed both of these test kernels:

2.6.18-120.el5
2.6.18-120.el5PAE

The system starts normally with both and is currently running the latter; I plan to leave this as the active kernel for now.
Comment 41 Issue Tracker 2008-12-09 10:49:53 EST
Hello streeter,

Customer tried the test kernel and not getting "aac_srb: aac_fib_send
failed with status: 8195" errors anymore. However dmesg still shows the
following messages :

Dec  4 10:27:32 web kernel: scsi 0:2:13:0: timing out command, waited 22s
Dec  4 10:27:32 web kernel: scsi 0:2:14:0: timing out command, waited 22s
Dec  4 16:21:21 web kernel: scsi 0:2:12:0: timing out command, waited 22s
Dec  4 16:21:21 web kernel: scsi 0:2:13:0: timing out command, waited 22s
Dec  4 16:21:21 web kernel: scsi 0:2:14:0: timing out command, waited 22s

Thanks.

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by streeter 
 issue 227174
Comment 44 errata-xmlrpc 2009-01-20 14:41:13 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html

Note You need to log in before you can comment on or make changes to this bug.