Bug 457552 - aac_fib_send failed with status 8195
Summary: aac_fib_send failed with status 8195
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.7
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Tomas Henzl
QA Contact: Martin Jenner
URL:
Whiteboard:
: 465861 (view as bug list)
Depends On:
Blocks: 391511 450901 RHEL4u8_relnotes 461297 468151
TreeView+ depends on / blocked
 
Reported: 2008-08-01 14:08 UTC by Cale Fairchild
Modified: 2018-11-14 18:01 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, when using the accraid driver with an Adaptec 2120S or Adaptec 2200S controller, the system may have failed to bootup, returning the error: "aac_srb:aac_fib_send failed with status 8195". With this update, the accraid driver has been updated, which resolves this issue.
Clone Of:
Environment:
Last Closed: 2009-05-18 19:11:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
removes the AAC_QUIRK_SCSI_32 (1.74 KB, patch)
2008-10-07 08:44 UTC, Tomas Henzl
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 453472 0 urgent CLOSED [aacraid] aac_srb: aac_fib_send failed with status 8195 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHSA-2009:1024 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.8 kernel security and bug fix update 2009-05-18 14:57:26 UTC

Description Cale Fairchild 2008-08-01 14:08:41 UTC
Description of problem:

New kernel kernel-smp-2.6.9-78.EL hangs while trying to load accraid driver.
I also ran into this issue when trying to upgrade to RHEL 5.2 this morning.
Once I boot back into kernel 2.6.9-67.0.15.ELsmp the system boots fine.

Version-Release number of selected component (if applicable):

kernel-smp-2.6.9-78.EL
Adaptec 2200S RAID card

How reproducible:

Every time that kernel is booted.
  
Actual results:

System hangs with the message

aac_srb:aac_fib_send failed with status 8195

repeatedly output to the display

Expected results:

System boots into Linux.

Additional info:

This seems to be related to bug# 450444 and 452472

When there is a resolution will there be a way of using the 5.2
distribution disk to upgrade to version 5 or will I have to 
upgrade to 5.1 and then run yum to upgrade the rest of the way?

Comment 1 Cale Fairchild 2008-08-13 15:50:33 UTC
It has now been 12 days since I posted this bug report. I have not had any correspondence yet. I need to upgrade this server to RHEL 5 by the beginning of next week. Could someone please let me know what the status of the driver problem is? Thank you.

Comment 2 Bryn M. Reeves 2008-08-15 09:55:16 UTC
Hello Cale,

If you have an active Red Hat Enterprise Linux subscription please raise a case with Red Hat Global Support Services using your normal contact means. Your support contact will be able to attach the request to this bugzilla and can seek prioritisation for this bug within the Red Hat Enteprise Linux content update process.

Bugzilla is an engineering tool and is used internally by Red Hat to process changes to Red Hat products, as well as by the Fedora community to develop Fedora projects. The interface at bugzilla.redhat.com is open and anyone with an email address is able to create an account, file bugs, comment on bugs she or he has access to.

Unlike tickets raised with Red Hat GSS, issues filed directly in bugzilla do not have an associated SLA and Red Hat cannot guarantee response times or provide technical assistance while resolving problems.

Regards,

Comment 3 paul boin 2008-08-27 14:31:53 UTC
Yeah, and unfortunately many RHEL customers are *still* getting better response time here than on official tickets.

We're a customer and I'm here...

Comment 5 RHEL Program Management 2008-08-28 09:26:50 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 6 RHEL Program Management 2008-09-03 12:52:05 UTC
Updating PM score.

Comment 7 Achim Leubner 2008-09-04 11:50:34 UTC
I'm currently investigating it. It only occurs on 64-bit platforms and only with some controllers (Dell, Legend, Adaptec 2120S, 2200S). It's a problem with the AAC_QUIRK_SCSI_32 flag handling. I will provide a solution as soon as possible.

Comment 8 Bryn M. Reeves 2008-09-05 08:46:50 UTC
Don't think this is limited to 64-bit platforms. I have a couple of reports of this on 32-bit i686 systems equipped with >4GiB of RAM (i.e. PAE is in use and the physical address space is 36 bits).

Linux version 2.6.9-78.ELsmp (brewbuilder.redhat.com) (gcc vers
ion 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 15:39:47 EDT 2008
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009e400 (usable)
 BIOS-e820: 000000000009e400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000cc000 - 00000000000d4000 (reserved)
 BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000e7ff0000 (usable)
 BIOS-e820: 00000000e7ff0000 - 00000000e7fffc00 (ACPI data)
 BIOS-e820: 00000000e7fffc00 - 00000000e8000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fed00000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 00000001fee00000 (usable)
 BIOS-e820: 00000001fee00000 - 0000000200000000 (reserved)
7278MB HIGHMEM available.
896MB LOWMEM available.
[...]
Loading scsi_mod.ko module
ACPI: PCI Interrupt 0000:02:09.0[A] -> ing sd_mod.ko moGSI 30 (level, low) -> IRQ 233
Loading aacraid.ko module
Adaptec aacraid driver 1.1-5[2455]
aacraid0: kernel 4.1-0[6253] 
aacraid0: monitor 4.1-0[6253]
aacraid0: bios 4.1-0[6253]
aacraid0: serial B7612F
aacraid0: Non-DASD support enabled.
aacraid0: 64 Bit DAC enabled
scsi0 : aacraid
  Vendor: Adaptec   Model: LogicalDrive_1    Rev: V1.0
  Type:   Direct-Access                      ANSI SCSI revision: 02
SCSI device sda: 106291200 512-byte hdwr sectors (54421 MB)
sda: Write Protect is off
sda: Mode Sense: 06 00 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 106291200 512-byte hdwr sectors (54421 MB)
sda: Write Protect is off
sda: Mode Sense: 06 00 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2
Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0
aac_srb: aac_fib_send failed with status: 8195
aac_srb: aac_fib_send failed with status: 8195
aac_srb: aac_fib_send failed with status: 8195
[...]

Comment 9 Sandy Garza 2008-09-05 19:23:11 UTC
HP uses this driver in some of its servers and will be affected.

Comment 10 Bryn M. Reeves 2008-09-19 14:03:52 UTC
Have a number of reports of positive results following firmware updates now so it seems worth people checking with their hardware vendor for any available updates and applying them for testing.

Comment 15 Tom Coughlan 2008-10-06 20:55:43 UTC
(In reply to comment #0)

> This seems to be related to bug# 450444 and 452472

That second one should be 453472 not 452472.

Although the firmware version may have some impact on the problem, there has also been progress reported (with a patch) in 453472, and upstream as referenced in 450444.

Comment 16 Tomas Henzl 2008-10-07 08:44:54 UTC
Created attachment 319625 [details]
removes the AAC_QUIRK_SCSI_32

Hi All,
this patch removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controllers as suggested by David (see also the same issue for RHEL5.2 - bz#453472).
If someone is able to test it would be helpful. 
Thanks, Tomas

Comment 17 Doug Huff 2008-10-07 14:32:57 UTC
This does not only occur on 64bit platforms. It seems to occur on any system with >4G of ram whether it's 32bit or 64bit I did not see this bug and opened bug#465861 which is the same issue on a 32bit system.

Comment 18 Bryn M. Reeves 2008-10-07 14:38:51 UTC
In reply to comment #17 - see comment #8 ;)

I've fixed the arch field now.

Comment 19 Bryn M. Reeves 2008-10-07 14:40:07 UTC
*** Bug 465861 has been marked as a duplicate of this bug. ***

Comment 20 Cale Fairchild 2008-10-07 15:09:27 UTC
I have moved to RHEL 5.2 (using an older kernel) so I wouild be happy to check a patch for that kernel but I am afraid that I can no longer test the kernel for 4.7.

Comment 21 Doug Huff 2008-10-07 15:21:06 UTC
The only system I have that is exhibiting this issue is our oracle standby. I may be able to schedule some time for testing later this afternoon or tommorrow.

Comment 23 Tomas Henzl 2008-10-08 14:37:21 UTC
(In reply to comment #21)
Doug,
I think you are using another controller then 2120S or 2200S. So the posted patch
will probably not help.
In some cases it helps when you update firmware on your controller, if it doesn't
help please try extend the patch also for your controller.

Comment 26 Tomas Henzl 2008-10-09 15:20:14 UTC
Achim,
even if removing the quirk as proposed in the patch might have look
promising,the problem is that the quirk was introduced to prevent another
driver failure -
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94cf6ba11b
so it could happen that we will get another regression.

Comment 27 Martin Wilck 2008-10-09 15:48:21 UTC
(In reply to comment #26)
> Achim,
> even if removing the quirk as proposed in the patch might have look
> promising,the problem is that the quirk was introduced to prevent another
> driver failure -
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94cf6ba11b
> so it could happen that we will get another regression.

That was made because of http://bugzilla.kernel.org/show_bug.cgi?id=9133, which was a problem specific for Dell Perc 3/Di. Attachment #319625 [details] doesn't change anything for that controller. Thus there are chances that no regression will result, right? Perhaps one should ask Mark why he made his quirk for the non-Dell controllers as well.

Comment 28 Martin Wilck 2008-10-09 16:00:23 UTC
Would it make sense to test any of the option suggested on  http://bugzilla.kernel.org/show_bug.cgi?id=9133:

"Try loading the driver with aacraid.dacmode=0, aacraid.expose_physicals=0 or
aacraid.nondasd=0 (any of these should turn off calls to ScsiPortCommand64)."

If yes, which ones, which option values?

Comment 30 Martin Wilck 2008-10-10 10:03:30 UTC
About the patch mentioned in comment #26:

Unless overridden by module dac_mode=0, aac_scsi_32_64 is used for
aac_adapter_scsi() on 64bit-DMA-capable systems for all contollers with
AAC_QUIRK_SCSI_32 and AAC_OPT_SGMAP_HOST64 set.

aac_scsi_32_64() always returns FAILED on 64bit DMA-capable systems if the
adapter has the AAC_OPT_SGMAP_HOST64 flag set , and  if the physical memory is
>4GB.

Thus, on every system with 64bit DMA and >4GB memory, aac_adapter_scsi() will
always fail (!) for each controller with AC_QUIRK_SCSI_32 and
AAC_OPT_SGMAP_HOST64. In practice, that means to me that AC_QUIRK_SCSI_32
implies that you can't use >4GB memory (that agrees with the findings in
comment #17).

Perhaps the Perc 3/Di has this limitation, but the 2120S and 2200S certainly
don't. The patch from comment #16 is therefore correct, but most probably not
sufficient. I would suggest to remove AC_QUIRK_SCSI_32 for all controllers
except the Perc 3/Di.

Comment 31 Tomas Henzl 2008-10-10 11:57:10 UTC
(In reply to comment #30)
> Perhaps the Perc 3/Di has this limitation, but the 2120S and 2200S certainly
> don't. The patch from comment #16 is therefore correct, but most probably not
> sufficient. I would suggest to remove AC_QUIRK_SCSI_32 for all controllers
> except the Perc 3/Di.
We have a report from Doug Huff about this problem on a Dell PERC 3/DiB [Boxster] see bz#465861. 
So the question is - on exactly what controllers we should remove the AC_QUIRK_SCSI_32. The problem was reported on 3 controllers,  on several others I have seen reports that they are without problems and there is also one the original Perc 3/Di (not 3/Di), where we should leave the quirk. Changing the logic  in aac_scsi_32_64 without access to enough different hw seems also dangerous.

Comment 32 Martin Wilck 2008-10-13 06:38:41 UTC
I agree. I currently only care about 2120S and 2200S, which fail with the current EL4.7 driver. Wrt other controllers, bug reports will come in sooner or later if there are problems.

My main argument is that removing the quirk for 2120S and 2200S won't cause a regression.

Comment 33 Tomas Henzl 2008-10-13 14:50:36 UTC
Posted today on rhkl.

Comment 36 Vivek Goyal 2008-11-05 13:57:05 UTC
Committed in 78.17.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 50 Tomas Henzl 2009-01-27 11:48:51 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
It fixes a bug which prevented the system from booting with aacraid driver. System hangs with the message "aac_srb:aac_fib_send failed with status 8195".  Affected controllers are Adaptec 2120S and 2200S.

Comment 55 Ryan Lerch 2009-04-06 23:04:26 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-It fixes a bug which prevented the system from booting with aacraid driver. System hangs with the message "aac_srb:aac_fib_send failed with status 8195".  Affected controllers are Adaptec 2120S and 2200S.+Previously, when using the accraid driver with an Adaptec 2120S or Adaptec 2200S controller, the system may have failed to bootup, returning the error: "aac_srb:aac_fib_send failed with status 8195". With this update, the accraid driver has been updated, which resolves this issue.

Comment 58 errata-xmlrpc 2009-05-18 19:11:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html


Note You need to log in before you can comment on or make changes to this bug.