Description of problem: New kernel kernel-smp-2.6.9-78.EL hangs while trying to load accraid driver. I also ran into this issue when trying to upgrade to RHEL 5.2 this morning. Once I boot back into kernel 2.6.9-67.0.15.ELsmp the system boots fine. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-78.EL Adaptec 2200S RAID card How reproducible: Every time that kernel is booted. Actual results: System hangs with the message aac_srb:aac_fib_send failed with status 8195 repeatedly output to the display Expected results: System boots into Linux. Additional info: This seems to be related to bug# 450444 and 452472 When there is a resolution will there be a way of using the 5.2 distribution disk to upgrade to version 5 or will I have to upgrade to 5.1 and then run yum to upgrade the rest of the way?
It has now been 12 days since I posted this bug report. I have not had any correspondence yet. I need to upgrade this server to RHEL 5 by the beginning of next week. Could someone please let me know what the status of the driver problem is? Thank you.
Hello Cale, If you have an active Red Hat Enterprise Linux subscription please raise a case with Red Hat Global Support Services using your normal contact means. Your support contact will be able to attach the request to this bugzilla and can seek prioritisation for this bug within the Red Hat Enteprise Linux content update process. Bugzilla is an engineering tool and is used internally by Red Hat to process changes to Red Hat products, as well as by the Fedora community to develop Fedora projects. The interface at bugzilla.redhat.com is open and anyone with an email address is able to create an account, file bugs, comment on bugs she or he has access to. Unlike tickets raised with Red Hat GSS, issues filed directly in bugzilla do not have an associated SLA and Red Hat cannot guarantee response times or provide technical assistance while resolving problems. Regards,
Yeah, and unfortunately many RHEL customers are *still* getting better response time here than on official tickets. We're a customer and I'm here...
This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP.
Updating PM score.
I'm currently investigating it. It only occurs on 64-bit platforms and only with some controllers (Dell, Legend, Adaptec 2120S, 2200S). It's a problem with the AAC_QUIRK_SCSI_32 flag handling. I will provide a solution as soon as possible.
Don't think this is limited to 64-bit platforms. I have a couple of reports of this on 32-bit i686 systems equipped with >4GiB of RAM (i.e. PAE is in use and the physical address space is 36 bits). Linux version 2.6.9-78.ELsmp (brewbuilder.redhat.com) (gcc vers ion 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 15:39:47 EDT 2008 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 000000000009e400 (usable) BIOS-e820: 000000000009e400 - 00000000000a0000 (reserved) BIOS-e820: 00000000000cc000 - 00000000000d4000 (reserved) BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 00000000e7ff0000 (usable) BIOS-e820: 00000000e7ff0000 - 00000000e7fffc00 (ACPI data) BIOS-e820: 00000000e7fffc00 - 00000000e8000000 (ACPI NVS) BIOS-e820: 00000000fec00000 - 00000000fed00000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved) BIOS-e820: 0000000100000000 - 00000001fee00000 (usable) BIOS-e820: 00000001fee00000 - 0000000200000000 (reserved) 7278MB HIGHMEM available. 896MB LOWMEM available. [...] Loading scsi_mod.ko module ACPI: PCI Interrupt 0000:02:09.0[A] -> ing sd_mod.ko moGSI 30 (level, low) -> IRQ 233 Loading aacraid.ko module Adaptec aacraid driver 1.1-5[2455] aacraid0: kernel 4.1-0[6253] aacraid0: monitor 4.1-0[6253] aacraid0: bios 4.1-0[6253] aacraid0: serial B7612F aacraid0: Non-DASD support enabled. aacraid0: 64 Bit DAC enabled scsi0 : aacraid Vendor: Adaptec Model: LogicalDrive_1 Rev: V1.0 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sda: 106291200 512-byte hdwr sectors (54421 MB) sda: Write Protect is off sda: Mode Sense: 06 00 00 00 SCSI device sda: drive cache: write back SCSI device sda: 106291200 512-byte hdwr sectors (54421 MB) sda: Write Protect is off sda: Mode Sense: 06 00 00 00 SCSI device sda: drive cache: write back sda: sda1 sda2 Attached scsi removable disk sda at scsi0, channel 0, id 0, lun 0 aac_srb: aac_fib_send failed with status: 8195 aac_srb: aac_fib_send failed with status: 8195 aac_srb: aac_fib_send failed with status: 8195 [...]
HP uses this driver in some of its servers and will be affected.
Have a number of reports of positive results following firmware updates now so it seems worth people checking with their hardware vendor for any available updates and applying them for testing.
(In reply to comment #0) > This seems to be related to bug# 450444 and 452472 That second one should be 453472 not 452472. Although the firmware version may have some impact on the problem, there has also been progress reported (with a patch) in 453472, and upstream as referenced in 450444.
Created attachment 319625 [details] removes the AAC_QUIRK_SCSI_32 Hi All, this patch removes the AAC_QUIRK_SCSI_32 for the 2120S and 2200S controllers as suggested by David (see also the same issue for RHEL5.2 - bz#453472). If someone is able to test it would be helpful. Thanks, Tomas
This does not only occur on 64bit platforms. It seems to occur on any system with >4G of ram whether it's 32bit or 64bit I did not see this bug and opened bug#465861 which is the same issue on a 32bit system.
In reply to comment #17 - see comment #8 ;) I've fixed the arch field now.
*** Bug 465861 has been marked as a duplicate of this bug. ***
I have moved to RHEL 5.2 (using an older kernel) so I wouild be happy to check a patch for that kernel but I am afraid that I can no longer test the kernel for 4.7.
The only system I have that is exhibiting this issue is our oracle standby. I may be able to schedule some time for testing later this afternoon or tommorrow.
(In reply to comment #21) Doug, I think you are using another controller then 2120S or 2200S. So the posted patch will probably not help. In some cases it helps when you update firmware on your controller, if it doesn't help please try extend the patch also for your controller.
Achim, even if removing the quirk as proposed in the patch might have look promising,the problem is that the quirk was introduced to prevent another driver failure - http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94cf6ba11b so it could happen that we will get another regression.
(In reply to comment #26) > Achim, > even if removing the quirk as proposed in the patch might have look > promising,the problem is that the quirk was introduced to prevent another > driver failure - > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94cf6ba11b > so it could happen that we will get another regression. That was made because of http://bugzilla.kernel.org/show_bug.cgi?id=9133, which was a problem specific for Dell Perc 3/Di. Attachment #319625 [details] doesn't change anything for that controller. Thus there are chances that no regression will result, right? Perhaps one should ask Mark why he made his quirk for the non-Dell controllers as well.
Would it make sense to test any of the option suggested on http://bugzilla.kernel.org/show_bug.cgi?id=9133: "Try loading the driver with aacraid.dacmode=0, aacraid.expose_physicals=0 or aacraid.nondasd=0 (any of these should turn off calls to ScsiPortCommand64)." If yes, which ones, which option values?
About the patch mentioned in comment #26: Unless overridden by module dac_mode=0, aac_scsi_32_64 is used for aac_adapter_scsi() on 64bit-DMA-capable systems for all contollers with AAC_QUIRK_SCSI_32 and AAC_OPT_SGMAP_HOST64 set. aac_scsi_32_64() always returns FAILED on 64bit DMA-capable systems if the adapter has the AAC_OPT_SGMAP_HOST64 flag set , and if the physical memory is >4GB. Thus, on every system with 64bit DMA and >4GB memory, aac_adapter_scsi() will always fail (!) for each controller with AC_QUIRK_SCSI_32 and AAC_OPT_SGMAP_HOST64. In practice, that means to me that AC_QUIRK_SCSI_32 implies that you can't use >4GB memory (that agrees with the findings in comment #17). Perhaps the Perc 3/Di has this limitation, but the 2120S and 2200S certainly don't. The patch from comment #16 is therefore correct, but most probably not sufficient. I would suggest to remove AC_QUIRK_SCSI_32 for all controllers except the Perc 3/Di.
(In reply to comment #30) > Perhaps the Perc 3/Di has this limitation, but the 2120S and 2200S certainly > don't. The patch from comment #16 is therefore correct, but most probably not > sufficient. I would suggest to remove AC_QUIRK_SCSI_32 for all controllers > except the Perc 3/Di. We have a report from Doug Huff about this problem on a Dell PERC 3/DiB [Boxster] see bz#465861. So the question is - on exactly what controllers we should remove the AC_QUIRK_SCSI_32. The problem was reported on 3 controllers, on several others I have seen reports that they are without problems and there is also one the original Perc 3/Di (not 3/Di), where we should leave the quirk. Changing the logic in aac_scsi_32_64 without access to enough different hw seems also dangerous.
I agree. I currently only care about 2120S and 2200S, which fail with the current EL4.7 driver. Wrt other controllers, bug reports will come in sooner or later if there are problems. My main argument is that removing the quirk for 2120S and 2200S won't cause a regression.
Posted today on rhkl.
Committed in 78.17.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: It fixes a bug which prevented the system from booting with aacraid driver. System hangs with the message "aac_srb:aac_fib_send failed with status 8195". Affected controllers are Adaptec 2120S and 2200S.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -It fixes a bug which prevented the system from booting with aacraid driver. System hangs with the message "aac_srb:aac_fib_send failed with status 8195". Affected controllers are Adaptec 2120S and 2200S.+Previously, when using the accraid driver with an Adaptec 2120S or Adaptec 2200S controller, the system may have failed to bootup, returning the error: "aac_srb:aac_fib_send failed with status 8195". With this update, the accraid driver has been updated, which resolves this issue.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html