Bug 162065 - aacraid driver hangs if Adaptec 2230SLP array not optimal
aacraid driver hangs if Adaptec 2230SLP array not optimal
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
athlon Linux
medium Severity medium
: ---
: ---
Assigned To: Tom Coughlan
Brian Brock
:
Depends On:
Blocks: 168424
  Show dependency treegraph
 
Reported: 2005-06-29 12:12 EDT by David Milburn
Modified: 2007-11-30 17:07 EST (History)
7 users (show)

See Also:
Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-15 11:09:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
This patch to remove the aac_handle_aif() code did not help. (9.38 KB, patch)
2005-06-29 12:15 EDT, David Milburn
no flags Details | Diff
Patch to turn on dprintk and add more debug printks, attaching console messages. (4.10 KB, patch)
2005-06-29 12:16 EDT, David Milburn
no flags Details | Diff
Console messages showing the driver stuck in aac_queue_get() (419.22 KB, text/plain)
2005-06-29 12:17 EDT, David Milburn
no flags Details
Test patch to use old comm interface, after syncing to latest, proving that it wasn't an old_comm problem. (951 bytes, patch)
2005-09-14 17:11 EDT, David Milburn
no flags Details | Diff
Patch RHEL3 U5 driver to not touch InboundMailbox7 register and reduce number of fibs (1.32 KB, patch)
2005-09-14 17:13 EDT, David Milburn
no flags Details | Diff

  None (edit)
Description David Milburn 2005-06-29 12:12:21 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
Using an Adaptec 2230SLP RAID controller with 2 73GB disks in a RAID-1 setup. If the array is not "optimal" then RHEL will quit responding to keyboard, mouse and network (system hung). System is in a state were fib_adapter_complete() calls acc_queue_get() which in turns calls aac_get_entry(), acc_get_entry() is always 
returning 0 causing the driver to be stuck in the following loop in aac_queue_get():

else if (qid == AdapHighRespQueue || qid == AdapNormRespQueue)
{
        while(!aac_get_entry(dev, qid, &entry, index, nonotify)) 
	{
			/* if no entries wait for some if caller wants to */
                        DPRINTK("RespQueue: No entries, wait...\n");
	}
}


Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. Boot with RAID array not optimal.
2.
3.
  

Actual Results:  System will hang, no response from keyboard, mouse, or networking.

Expected Results:  System should boot up and function as normal.

Additional info:

Based upon Alan Cox's comments for 2.6 http://lkml.org/lkml/2005/1/14/252, tried
to remove the aac_handle_aif() code from the 2.4 driver, the system still hung
when booting with raid not optimal. Also turned on dprintk and added some more 
debug statements, console messages attached.
Comment 1 David Milburn 2005-06-29 12:15:24 EDT
Created attachment 116133 [details]
This patch to remove the aac_handle_aif() code did not help.
Comment 2 David Milburn 2005-06-29 12:16:53 EDT
Created attachment 116134 [details]
Patch to turn on dprintk and add more debug printks, attaching console messages.
Comment 3 David Milburn 2005-06-29 12:17:55 EDT
Created attachment 116135 [details]
Console messages showing the driver stuck in aac_queue_get()
Comment 49 Tom Coughlan 2005-11-01 18:51:18 EST
Please test the kernel located at:

http://people.redhat.com/coughlan/.2.4.21-37.7.ELdrvrtest2/

to verify that it solves the problem. 

This contains version 1.1.5-2412 of the aacraid driver. This is the latest from
Adaptec, and is a candidate for U7. 
Comment 55 Ernie Petrides 2005-11-22 19:36:27 EST
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.10.EL).
Comment 62 Red Hat Bugzilla 2006-03-15 11:10:00 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html

Note You need to log in before you can comment on or make changes to this bug.