Bug 453472
Summary: | [aacraid] aac_srb: aac_fib_send failed with status 8195 | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | M Smith <msmith> | ||||||||||||||||||||||
Component: | kernel | Assignee: | Tomas Henzl <thenzl> | ||||||||||||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Martin Jenner <mjenner> | ||||||||||||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||||||||||||
Priority: | urgent | ||||||||||||||||||||||||
Version: | 5.2 | CC: | aacraid, achim_leubner, andriusb, bmr, cfairchild, coughlan, dhoward, dhuff, drees76, emcnabb, jpirko, jtluka, karen.skweres, kevin, pep, sandy.garza, tao | ||||||||||||||||||||||
Target Milestone: | rc | Keywords: | OtherQA, Regression, ZStream | ||||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||||
Hardware: | i686 | ||||||||||||||||||||||||
OS: | Linux | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||||
Last Closed: | 2009-01-20 19:41:13 UTC | Type: | --- | ||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||||
Bug Blocks: | 391501, 429709, 466885 | ||||||||||||||||||||||||
Attachments: |
|
Description
M Smith
2008-06-30 20:36:47 UTC
Hi "M Smith" -- could you attach a sysreport from 5.1, and a boot log of the 5.2 kernel to this issue? Thanks, P. Created attachment 311301 [details]
Boot log 5.2 PAE kernel
Created attachment 311302 [details]
SOS report for 5.1
My customer reports that kernel-PAE-2.6.18-53.1.13.el5 boots fine, but kernel-PAE-2.6.18-92.el5 fails this way. This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. Update to kernel kernel-PAE-2.6.18-92.1.10.el5 still exhibits same error condition. Kernel kernel-2.6.18-92.1.10.el5 does not. HP uses this driver in some of its servers and will be affected. Hi All, I wasn't able to reproduce the behaviour on my test box. Please test the patched kernel here - http://people.redhat.com/thenzl/bz453472/2455/ the patch applied here updates the Adaptec aacraid driver from version 1.1-5[2453] to version 1.1-5[2455]. Greetings all. The system halts with a kernel panic with each of these kernels. Attaching the boot log for each. Created attachment 316322 [details]
Boot log 2.6.18-109.el5
Created attachment 316323 [details]
Boot log 2.6.18-109.el5PAE
Created attachment 316437 [details]
patch to 1.1.5.2453
Hi "M Smith",
I'm sorry, but I tested it on two different boxes before posting without any issues. The kernel was taken from many other patches until now untested so it could happen, it look to me that it is unrelated to this bz.
I still could not reproduce the problem, so I need further help from you.
In RHEL5.1 is driver version 1.1.5.2437, RHEL5.2 has 1.1.5.2453 and in RHEL5.3 will eventually come planned version 1.1.5.2455. I'll post both patches here so you could test the patch planned for RHEL5.3 and when this will not help then remove the 1.1.5.2453.
Thanks
Created attachment 316438 [details]
patch to 1.1.5.2455
Hi Tomas, I'm sorry but I can't quite figure out what you want me to do. What should I test now? As you asking that I compile a kernel with each version of the driver and test each one? Do I take that patch to 1.1.5.2455 in Comment #13 and use that with the kernel that I downloaded in Comment #8? Thanks (In reply to comment #14) > Hi Tomas, > > I'm sorry but I can't quite figure out what you want me to do. > > What should I test now? > > As you asking that I compile a kernel with each version of the driver and test > each one? Yes, if you could, I thought that it would be more flexible this way. Start please with kernel sources for 5.1 and patch it with 1.1.5.2455. When, this will not help then please remove the patch 1.1.5.2453 and test. If you should have difficulties with compiling the kernel, tell me and I'll do it. Thanks Created two test kernels as follows: -used source from pub/redhat/linux/enterprise/5Server/en/os/SRPMS kernel-2.6.18-53.1.21.el5.src.rpm - rpm -ivh kernel*.src.rpm - Copied off the two patch files from the bugzilla ticket: aacraid-1.1.5.2453.patch aacraid-1.1.5.2455.patch - edit /usr/src/redhat/SPECS/kernel-2.6.spec: change the buildid macro: %define buildid .2455 Add in first just the 2453 patch, then both that and the 2455 patch. - Build kernel rpms: rpmbuild -bb --target=i686 --with pae kernel-2.6.spec Both kernels fail with the aacraid error as previously. Bootup logs to be attached. Thanks Created attachment 317232 [details]
boot log 2453 patch
boot log 2453 patch
Created attachment 317233 [details]
boot log 2455 patch
boot log 2455 patch
(In reply to comment #16) > Created two test kernels as follows: > Bootup logs to be attached. Thanks, I've divided the 2453 patch into four parts, but before we begin with this, could you please update the firmware in the raid controller? We have report from a customer with a similar problem where this this fixed the issue. Tomas Hello Tomas. I have flashed the BIOS in the 2200S RAID Controller to version 4.2-0[8205]. The same symptoms present themselves when restarting the system with this update applied. Michael Created attachment 318091 [details]
patch 2453 divided into 5 parts
Michael,
I divided the 2453 patch into a few parts. Please start the test with compiling the kernel without any of them and without the 2453 and 2455 patch. This should bring us to the 5.1 kernel and it should work fine.
Then add the patches p(x).patch in order to see which patch will cause the problem.
Thanks,
Tomas
Michael, I would expect that if the AAC_QUIRK_SCSI_32 quirk added to the latest driver were removed for your card, that would also fix the issue. Of the patch series that Tomas has posted, that would correspond to patch 5 that would break your system. See this post on LKML for another similar issue: http://marc.info/?l=linux-kernel&m=122166454808377&w=2 FWIW, I'm seeing this on one of two nearly identical machines running CentOS 4 and kernel-smp-2.6.9-78.0.1.EL.i686. The difference between the two machines is that one has 2GB and the other 4GB, the one with 4GB does not boot. These machines both have a Adaptec 2120S (Crusader), which also has the AAC_QUIRK_SCSI_32 turned on in patch posted by Tomas in Comment #25. The same bug is also filed for Fedora 9 under Bug #450444. Tomas, David We have done patches 1-4 so far and the system starts as expected. Awaiting for completion of patch 5 to test that. Michael Created attachment 319352 [details] this patch removes the AAC_QUIRK_SCSI_32 (In reply to comment #27) > Tomas, David > > We have done patches 1-4 so far and the system starts as expected. > > Awaiting for completion of patch 5 to test that. > > Michael If everything is going as expected kernel with fifth patch will fail. Try then please the attached patch on top of the 2453 patch (RHEL5.2). It removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controller as suggested by David. Thanks, Tomas This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. (In reply to comment #28) > Created an attachment (id=319352) [details] > this patch removes the AAC_QUIRK_SCSI_32 > (In reply to comment #27) > > Tomas, David > > > > We have done patches 1-4 so far and the system starts as expected. > > > > Awaiting for completion of patch 5 to test that. > > > > Michael > If everything is going as expected kernel with fifth patch will fail. > Try then please the attached patch on top of the 2453 patch (RHEL5.2). It > removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controller as suggested > by David. > Thanks, Tomas Patch five does indeed fail. Applying the patch to the the 2453 one to remove AAC_QUIRK_SCSI_32 produces the desired result. We will leave the system running this patched kernel for the time being. Thanks, Michael Posted today on rhkl. in kernel-2.6.18-120.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 (In reply to comment #35) > in kernel-2.6.18-120.el5 > You can download this test kernel from http://people.redhat.com/dzickus/el5 I have installed both of these test kernels: 2.6.18-120.el5 2.6.18-120.el5PAE The system starts normally with both and is currently running the latter; I plan to leave this as the active kernel for now. Hello streeter, Customer tried the test kernel and not getting "aac_srb: aac_fib_send failed with status: 8195" errors anymore. However dmesg still shows the following messages : Dec 4 10:27:32 web kernel: scsi 0:2:13:0: timing out command, waited 22s Dec 4 10:27:32 web kernel: scsi 0:2:14:0: timing out command, waited 22s Dec 4 16:21:21 web kernel: scsi 0:2:12:0: timing out command, waited 22s Dec 4 16:21:21 web kernel: scsi 0:2:13:0: timing out command, waited 22s Dec 4 16:21:21 web kernel: scsi 0:2:14:0: timing out command, waited 22s Thanks. Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by streeter issue 227174 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html |