Bug 450444

Summary: aac_srb: aac_fib_send failed with status: 8195
Product: [Fedora] Fedora Reporter: Trevin Beattie <trevin>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 10CC: amigo03, andriusb, davidt, dhuff, drees76, jlawson-redhat, mccomb, mpagano, paul.boin, ServeRAIDDriver, tbeattie, thenzl
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-18 06:12:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Complete 2.6.22.5-49.fc6 kernel log for a short session
none
2.6.25-14.fc9 kernel log with duplicates filtered out none

Description Trevin Beattie 2008-06-08 15:26:19 UTC
Description of problem:
I've just tried installing Fedora 9 on my computer for the first time.  Last
month I had run a test installation of the 32-bit version in a VMware instance,
which is working.  This time it's the 64-bit version on real hardware.

After choosing the install option from the boot menu, there seems to be a long
pause while detecting hardware, but eventually anaconda starts up.  When I get
to the partition selection page though, it tells me no drives were found.

On the system log terminal, I see a continual stream of messages scrolling past
in an infinite loop:

    aac_srb: aac_fib_send failed with status: 8195

and /tmp/syslog is rapidly growing (it was nearly 2MB when I looked).  Syslog
does show the two hard drives I have on the standard SATA controller, but no
/dev/sd* devices have been created for them.

Version-Release number of selected component (if applicable):
2.6.25-14.fc9

How reproducible:
Every time

Steps to Reproduce:
1. Boot the Fedora 9 x86_64 installation DVD
2. Select "Install or Upgrade an existing system"
3. Skip the media check
4. Hit Ctrl-Alt-F3 and observe the syslog output
  
Actual results:
    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195
    ...

Expected results:
Detect sda and sdb on the sata_nv controller, and sdc on the aacraid controller

Additional info:
Motherboard: Tyan Thunder K8WE model S2895A2NRF, with an nVidia nForce I/O
controller
Two SATA hard disks attached to the motherboard
Adaptec 2120S SCSI RAID controller with 1 RAID array configured
Adaptec 3085 SATA RAID controller with no RAID array configured

Prior report of this bug found on LKML:
http://lkml.org/lkml/2008/5/12/365
(He also has an Adaptec 2120S)

Comment 1 Trevin Beattie 2008-06-08 17:59:48 UTC
Clarification: the long pause occurs after the lines:
"detecting hardware...
 waiting for hardware to initialize"

It is as early as this point that the "aac_srb: aac_fib_send failed with status:
8195" messages start appearing.

If I reboot the install DVD and add the "noprobe" kernel option, and then select
the proper drivers when prompted (libata, sata_nv, etc.), it will detect my
regular SATA drives and let me continue the installation.

After the installation completes and I reboot the system, the startup sequence
hangs for a minute or so at "Starting udev: _", and eventually complains: "Wait
timeout. Will continue in the backgroun[FAILED]".  This is followed by: "Setting
up Logical Volume Management:   No volume groups found".  The system detects my
regular SATA drives at least, but the kernel log is filling up with "aac_srb:
aac_fib_send failed with status: 8195" messages at the rate of over 3,400 lines
per minute!


Comment 2 Chuck Ebbert 2008-06-09 04:33:36 UTC
Can you compare the system logs from the older kernel with the new one? Does the
driver print any additional/different messages when it loads, other than the
repeating of the above message??

Comment 3 Trevin Beattie 2008-06-11 02:53:08 UTC
Created attachment 308888 [details]
Complete 2.6.22.5-49.fc6 kernel log for a short session

My current installation is Fedora Core 6 with kernel 2.6.22.5-49.fc6 (preserved
on a different partition).  It's difficult to compare the message logs
directly; the syslog output looks vastly different between the two kernels, and
messages specifically from aacraid aren't explicitly labeled.  I think the
aacraid driver output can be distilled down to this:

kernel: Adaptec aacraid driver (1.1-5[2437]-mh4)
kernel: AAC0: kernel 4.2-0[7349] Dec 11 2004
kernel: AAC0: monitor 4.2-0[7349]
kernel: AAC0: bios 4.2-0[7349]
kernel: AAC0: Non-DASD support enabled.
kernel: AAC0: 64bit support enabled.
kernel: AAC0: 64 Bit DAC enabled
kernel: scsi4 : aacraid
kernel: scsi 4:0:0:0: Direct-Access	Adaptec  Linux		  V1.0 PQ: 0
ANSI: 2
kernel: sd 4:0:0:0: [sdc] 430657536 512-byte hardware sectors (220497 MB)
kernel: sd 4:0:0:0: [sdc] Assuming Write Enabled
kernel: sd 4:0:0:0: [sdc] Assuming drive cache: write through
kernel: sd 4:0:0:0: [sdc] 430657536 512-byte hardware sectors (220497 MB)
kernel: sd 4:0:0:0: [sdc] Assuming Write Enabled
kernel: sd 4:0:0:0: [sdc] Assuming drive cache: write through
kernel:  sdc: sdc1
kernel: sd 4:0:0:0: [sdc] Attached SCSI removable disk
kernel: scsi 4:1:0:0: Direct-Access	MAXTOR	 ATLAS10K5_73WLS  JNZ3 PQ: 0
ANSI: 3
kernel: scsi 4:1:1:0: Direct-Access	MAXTOR	 ATLAS10K5_73WLS  JNZ3 PQ: 0
ANSI: 3
kernel: scsi 4:1:2:0: Direct-Access	MAXTOR	 ATLAS10K5_73WLS  JNZ3 PQ: 0
ANSI: 3
kernel: scsi 4:1:3:0: Direct-Access	MAXTOR	 ATLAS10K5_73WLS  JNZ3 PQ: 0
ANSI: 3
kernel: sd 4:0:0:0: Attached scsi generic sg2 type 0
kernel: scsi 4:1:0:0: Attached scsi generic sg3 type 0
kernel: scsi 4:1:1:0: Attached scsi generic sg4 type 0
kernel: scsi 4:1:2:0: Attached scsi generic sg5 type 0
kernel: scsi 4:1:3:0: Attached scsi generic sg6 type 0
kernel: AAC1: kernel 5.2-0[15323] Sep 21 2007
kernel: AAC1: monitor 5.2-0[15323]
kernel: AAC1: bios 5.2-0[15323]
kernel: AAC1: serial 1644d4
kernel: AAC1: Non-DASD support enabled.
kernel: AAC1: 64bit support enabled.
kernel: AAC1: 64 Bit DAC enabled
kernel: scsi5 : aacraid

Comment 4 Trevin Beattie 2008-06-11 02:57:15 UTC
Created attachment 308889 [details]
2.6.25-14.fc9 kernel log with duplicates filtered out

The syslog output from the FC9 kernel has absolutely NO messages that can be
identified as being from aacraid other than the infinitely repeating error:

kernel: aac_srb: aac_fib_send failed with status: 8195
[message repeats 2287 times]

which shows up at line 8 of the boot session.

Comment 5 Trevin Beattie 2008-06-11 02:59:30 UTC
Full kernel logs from startup to shutdown for both FC6 and FC9 attached above.

Comment 6 Trevin Beattie 2008-06-12 02:02:16 UTC
Found another user with the same problem on Gentoo Forums:
http://forums.gentoo.org/viewtopic-p-5077382.html?sid=a51c3a0fba6aa854c0b49b8fae5cc15a
He also has a 64-bit system and Adaptec 2120S.


Comment 7 Trevin Beattie 2008-06-14 01:26:14 UTC
I think we can isolate this bug to 64-bit code.  Based on a comment about a
recent patch to the aacraid driver:

http://www.spinics.net/lists/linux-scsi/msg26480.html

I decided to try booting the 32-bit FC9 install DVD.  The 32-bit driver loaded
properly and detected both of my RAID cards.


Comment 8 David Tseng 2008-08-08 15:42:14 UTC
I just ran into this bug with the recent RHEL 4 update 7 kernel update.
kernel-smp-2.6.9-78.EL

So currently I'm forced to stay with the last working kernel:
kernel-smp-2.6.9-67.0.20.EL

RAID card is:
Adaptec 2120S SCSI RAID SGL ULTRA 320 with 7349 firmware version.

I'm running the 32-bit kernel.

Comment 9 Jeff Lawson 2008-08-28 06:01:03 UTC
I have reproduced this problem on two different Dell PowerEdge 2650 servers running the 32-bit version of CentOS 5.2 (kernel-PAE-2.6.18-92.el5.i686.rpm), which should be the roughly comparable to RHEL 5 update 2.

The two servers had the same RAID controller, but slightly different BIOS versions which both triggered the repeating "aac_fib_send failed" errors:
Adaptec Dell Perc 3/Di BIOS 2.7-1 build 3170
Adaptec Dell Perc 3/Di BIOS 2.8-0 build 6082

After updating both to the latest version from Dell's website the problem seems to no longer occur under my limited testing so far:
Adaptec Dell Perc 3/Di BIOS 2.8-1 build 7692

Comment 10 Trevin Beattie 2008-08-28 13:33:22 UTC
Jeff, are the 2650's RAID controllers integrated into the motherboard or are they expansion cards?

Comment 11 Jeff Lawson 2008-08-29 01:00:02 UTC
Yes, the RAID controllers are integrated.

I have one more Dell PowerEdge 2650 that is reproducing this problem which I have not yet upgraded the firmware on.  I can leave this last system on this older firmware for a few more days if anyone has any other data-collection steps they'd like to try, otherwise I will upgrade its RAID firmware too.

Comment 12 Tom 2008-08-29 06:46:18 UTC
Jeff, could you give me a hint who I can update the RAID controller?

Comment 13 Jeff Lawson 2008-08-29 07:14:45 UTC
Go to the website of your RAID controller's manufacturer and see if they have any updates for your model.  My Dell PowerEdge 2650 had a Windows utility that created two floppy disks containing the automatic updater.  If your RAID controller is integrated then go to the motherboard manufacturer's website.

Again, I'm not certain as to whether updating this particular bug only occurs because of an old firmware issue, but I haven't seen the problem after updating two of my systems.

Comment 14 mccomb 2008-08-30 10:34:41 UTC
I have the same problem. I upgraded my kernel with up2date to 2.6.9-78.0.1.ELsmp and cannot boot properly. I am getting the following error messages:

    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195
    aac_srb: aac_fib_send failed with status: 8195

I also have the Adaptec 2120S installed and the firmware was 8205. I upgraded to the latest firmware, 8208, but it did not make any difference.

Unfortunately, for some reason GRUB does not list any other kernels that I can boot into, so I am stuck.

Does anyone know how I can get around this issue without doing a full rebuild of the server?  Thanks.

Comment 15 David Rees 2008-09-30 23:30:49 UTC
See this post on LKML for another similar issue:

http://marc.info/?l=linux-kernel&m=122166454808377&w=2

The same bug is also filed for RHEL 5 under Bug #453472.

mccomb: You might try booting with one of the two options:

aacraid.dacmode=0 or mem=4G

I pulled the options from the kernel commit which introduced this change:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94cf6ba11b

Comment 16 Tomas Henzl 2008-10-07 08:18:15 UTC
(In reply to comment #8)
> I just ran into this bug with the recent RHEL 4 update 7 kernel update.
> kernel-smp-2.6.9-78.EL
David and others,
I think that you are probably using using RHEL4.7 and not Fedora (this is a Fedora bug) so I'm adding you to the cc:list on BZ#457552.

Comment 17 Trevin Beattie 2008-11-14 06:25:04 UTC
The bug still exists in the Fedora 10 pre-release: kernel-2.6.27.4-68.fc10.x86_64.

I've also verified that it only happens when I have my Adaptec 2120S controller installed.  If that card is removed, aacraid properly detects my remaining 3085 controller.

The new 32-bit kernel still boots normally with the 2120S controller.

Could the patch mentioned in bug #453472 for EL 5.2 be applied to Fedora?  (We're only 9 patch levels ahead...)

Comment 18 Bug Zapper 2009-11-18 09:35:31 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Bug Zapper 2009-12-18 06:12:26 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.