Bug 89844 - (SCSI AACRAID)aacraid SCSI bus reset leads to reiserfs_panic
Summary: (SCSI AACRAID)aacraid SCSI bus reset leads to reiserfs_panic
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-04-28 20:58 UTC by Carl Litt
Modified: 2005-10-31 22:00 UTC (History)
0 users

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:40:51 UTC
Embargoed:


Attachments (Terms of Use)
Kernel panic (page 1) (47.86 KB, image/jpeg)
2003-04-28 21:15 UTC, Carl Litt
no flags Details
Kernel panic (page 2) (84.44 KB, image/jpeg)
2003-04-28 21:16 UTC, Carl Litt
no flags Details
Kernel panic (page 3) (58.58 KB, image/jpeg)
2003-04-28 21:16 UTC, Carl Litt
no flags Details
Kernel panic (page 4) (76.28 KB, image/jpeg)
2003-04-28 21:18 UTC, Carl Litt
no flags Details

Description Carl Litt 2003-04-28 20:58:30 UTC
Description of problem:
Systems are Dell PowerEdge 2650 with PERC 3/Di (aacraid) doing RAID5 and dual 
2.4 Xeon with HyperThreading.  3 identical PE2650's all exhibit this problem.  
After loading down each logical CPU with a process (eg. setiathome) the aacraid 
SCSI device will eventually reset and cause a kernel panic.  Note that there is 
no significant disk load required to reproduce.  Seeing as how it is 
reproducible on 3 machines on RAID5, it is not likely related to defective 
disks even though it appears that way.

Version-Release number of selected component (if applicable):
kernel-2.4.18-27.7.xsmp

How reproducible:
From a fresh 7.3 installed on reiserfs, try running multiple setiathome 
processes (1 per logical CPU).  Come back in a few hours (overnight?).

Steps to Reproduce:
1.  Basic install of Red Hat 7.3, upgrade and boot kernel-2.4.18-27.7.xsmp 
(i686).  Also upgraded to glibc-2.2.5-43 (i686), but nothing else installed or 
changed other than that.  Filesystems are reiserfs.  (/dev/sda1 
= /boot, /dev/sda2 = /, /dev/sda3 = /usr, /dev/sda4 = not mounted yet).
2.  Put load on each logical CPU (eg. setiathome).
3.  Come back later (duration undetermined however overnight seems to be long 
enough).  Machine will be frozen.  Have been able to reproduce this several 
times.
    
Actual results:
aacraid: Host adapter reset request. SCSI hang ?
SCSI disk error : host 0 channel 0 id 0 lun 0 return code = 6000000
...
kernel BUG at prints.c:344!
invalid operand: 0000
...
EIP is at reiserfs_panic .....
[ See attachments ]

Expected results:
No process should be able to reliably panic the kernel.  Also expect file 
system to be intact or at least repairable on reboot.

Additional info:
Attached are screen captures of the kernel panic message taken from the remote 
console.  2 machines are represented here and JPG's are numbered in order.  dev 
08:02 is the root filesystem, where the setiathome process is running from.  
BIOS and firmware are updated to newest (A10 and PERC 3/Di = 2.7-1).  Also is 
important to note that one particular time this happened the filesystems became 
corrupted beyond repair.  That particular time I was using LVM in /dev/sda4 and 
that volume became unusable (would segfault vgscan), not to mention 
that /dev/sda2 was unrecoverable by reiserfsck.

Comment 1 Carl Litt 2003-04-28 21:15:44 UTC
Created attachment 91366 [details]
Kernel panic (page 1)

Comment 2 Carl Litt 2003-04-28 21:16:16 UTC
Created attachment 91367 [details]
Kernel panic (page 2)

Comment 3 Carl Litt 2003-04-28 21:16:47 UTC
Created attachment 91368 [details]
Kernel panic (page 3)

Comment 4 Carl Litt 2003-04-28 21:18:53 UTC
Created attachment 91369 [details]
Kernel panic (page 4)

This completes the first 4 pages from the first kernel panic.  I have captures
from the other machine but they're pretty close to the same thing.  Available
on request.  Sorry for the multiple JPG attachments, this was the only way to
retrieve the kernel panic from the machine in this state.

Comment 5 Alan Cox 2003-06-27 21:25:37 UTC
Jpegs are fine. Currently testing newer aacraid drivers with adaptec


Comment 6 Bugzilla owner 2004-09-30 15:40:51 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.