Bug 162065 - aacraid driver hangs if Adaptec 2230SLP array not optimal
Summary: aacraid driver hangs if Adaptec 2230SLP array not optimal
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: athlon
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tom Coughlan
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 168424
TreeView+ depends on / blocked
 
Reported: 2005-06-29 16:12 UTC by David Milburn
Modified: 2007-11-30 22:07 UTC (History)
7 users (show)

Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-15 16:09:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
This patch to remove the aac_handle_aif() code did not help. (9.38 KB, patch)
2005-06-29 16:15 UTC, David Milburn
no flags Details | Diff
Patch to turn on dprintk and add more debug printks, attaching console messages. (4.10 KB, patch)
2005-06-29 16:16 UTC, David Milburn
no flags Details | Diff
Console messages showing the driver stuck in aac_queue_get() (419.22 KB, text/plain)
2005-06-29 16:17 UTC, David Milburn
no flags Details
Test patch to use old comm interface, after syncing to latest, proving that it wasn't an old_comm problem. (951 bytes, patch)
2005-09-14 21:11 UTC, David Milburn
no flags Details | Diff
Patch RHEL3 U5 driver to not touch InboundMailbox7 register and reduce number of fibs (1.32 KB, patch)
2005-09-14 21:13 UTC, David Milburn
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0144 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7 2006-03-15 05:00:00 UTC

Description David Milburn 2005-06-29 16:12:21 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
Using an Adaptec 2230SLP RAID controller with 2 73GB disks in a RAID-1 setup. If the array is not "optimal" then RHEL will quit responding to keyboard, mouse and network (system hung). System is in a state were fib_adapter_complete() calls acc_queue_get() which in turns calls aac_get_entry(), acc_get_entry() is always 
returning 0 causing the driver to be stuck in the following loop in aac_queue_get():

else if (qid == AdapHighRespQueue || qid == AdapNormRespQueue)
{
        while(!aac_get_entry(dev, qid, &entry, index, nonotify)) 
	{
			/* if no entries wait for some if caller wants to */
                        DPRINTK("RespQueue: No entries, wait...\n");
	}
}


Version-Release number of selected component (if applicable):
kernel-2.4.21-32.0.1.EL

How reproducible:
Always

Steps to Reproduce:
1. Boot with RAID array not optimal.
2.
3.
  

Actual Results:  System will hang, no response from keyboard, mouse, or networking.

Expected Results:  System should boot up and function as normal.

Additional info:

Based upon Alan Cox's comments for 2.6 http://lkml.org/lkml/2005/1/14/252, tried
to remove the aac_handle_aif() code from the 2.4 driver, the system still hung
when booting with raid not optimal. Also turned on dprintk and added some more 
debug statements, console messages attached.

Comment 1 David Milburn 2005-06-29 16:15:24 UTC
Created attachment 116133 [details]
This patch to remove the aac_handle_aif() code did not help.

Comment 2 David Milburn 2005-06-29 16:16:53 UTC
Created attachment 116134 [details]
Patch to turn on dprintk and add more debug printks, attaching console messages.

Comment 3 David Milburn 2005-06-29 16:17:55 UTC
Created attachment 116135 [details]
Console messages showing the driver stuck in aac_queue_get()

Comment 49 Tom Coughlan 2005-11-01 23:51:18 UTC
Please test the kernel located at:

http://people.redhat.com/coughlan/.2.4.21-37.7.ELdrvrtest2/

to verify that it solves the problem. 

This contains version 1.1.5-2412 of the aacraid driver. This is the latest from
Adaptec, and is a candidate for U7. 

Comment 55 Ernie Petrides 2005-11-23 00:36:27 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.10.EL).


Comment 62 Red Hat Bugzilla 2006-03-15 16:10:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html



Note You need to log in before you can comment on or make changes to this bug.