Bug 13635

Summary:	Raid 5 config is causing kernel panic
Product:	[Retired] Red Hat Linux	Reporter:	eballweber
Component:	raidtools	Assignee:	Erik Troan <ewt>
Status:	CLOSED NOTABUG	QA Contact:
Severity:	medium	Docs Contact:
Priority:	high
Version:	6.2	CC:	mingo
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2000-07-31 00:04:48 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description eballweber 2000-07-09 22:49:24 UTC

System config.
N440BX motherboard, 5 IBM 9.1Gb model DNES-309170 SCSI disk drives
connected to on-board Fast-wide SCSI interface, 1 PIII 550Mhz processor,
256M Ram

Red Hat 6.2 installed with kernel version 2.2.14-12 per Update/Errata on
Red Hat support page.

Raid 5 configured as follows (from /etc/raidtab)
----------------------------------------
raiddev			/dev/md0
raid-level		5
nr-raid-disks		4	
chunk-size		64
parity-algorithm	left-symmetric
nr-spare-disks		1

device			/dev/sdb4
raid-disk		0

device			/dev/sdc1
raid-disk		1

device			/dev/sdd1
raid-disk		2

device			/dev/sde1
raid-disk		3

device			/dev/sda1
spare-disk		0

--------------------------------------

Raid5 partition created as follows
mkraid /dev/md0
./mke2fs -b4096 -R stride=16 -N1000000 -m1 /dev/md0
mount /dev/md0 /rdb0

Boot partition is on /dev/sdb1  ~ 10M
Primary partition on /dev/sdb2  ~ 1000M
Swap partition is on /dev/sdb3  ~ 128M

Problem occurs while running iozone v3_24 benchmark tool and now seems to
happen sooner with the upgrade to kernel version 2.2.14-12.

Error output follows.

[root@tisk01 bench]# ./iozone -s250m -r4 -f/rdb0/benchtest -i0 -i2 -a
	Iozone: Performance Test of File I/O
	        Version $Revision: 3.24 $
		Compiled for 32 bit mode.

	Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
	             Al Slater, Scott Rhine, Mike Wisner, Ken Goss
	             Steve Landherr, Brad Smith.

	Run began: Mon Jul 10 06:24:11 2000

	File size set to 256000 KB
	Record Size 4 KB
	Auto Mode
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 Kbytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
                                                            random 
random    bkwd  record  stride                                   
              KB  reclen   write rewrite    read    reread    read  
write    read rewrite    read   fwrite frewrite   fread  freread
          256000       4
Message from syslogd@tisk01 at Mon Jul 10 06:24:31 2000 ...
tisk01 kernel: Kernel panic: VFS: LRU block list corrupted

Comment 1 eballweber 2000-07-10 18:37:32 UTC

I have tried a raid 0 configuration with the same equipment and get the same
kernel panic.  This error seems to happen under high load and/or large files. 
Running benchmark tests on smaller files 1-20 MB does not produce this error
immediately.

Comment 2 Doug Ledford 2000-07-14 02:55:56 UTC

On my web page (http://people.redhat.com/dledford) there is a memory test script
that will detect faulty RAM on a computer far better than any other test we've
found to date.  Can you please try this on your machine.  The description given
here, between the two different reports, more or less clears RAID5 (since one
machine is RAID5 and the other RAID0) and our experience has been that these
problems are almost always hardware related instead of RAID software related. 
The fact that both of you say you have N440BX based machines also makes it sound
hardware related.  You might want to check and see if ECC error correction is
enabled in the BIOS on your motherboard (if you have ECC RAM, you might not). 
If you do have ECC RAM and ECC is enabled in the BIOS, then one of the best ways
to see if you have RAM related problems is to check the event log in the BIOS
and see if it reports any ECC errors.

Comment 3 eballweber 2000-07-31 00:04:46 UTC

I have looked more closely into this problem and it looks likt the SCSI chipset
on the N440BX board is not fully supported by RedHat 6.2.  The SCSI controller
is a "Symbios Logic 53C876 Dual Channel Ultra (one wide, one narrow)" and is
listed as Tier 3 supported.  Could this level of support be related to this
problem?

-eballweber

Comment 4 Erik Troan 2000-08-05 13:52:08 UTC

This isn't a problem with raid, but with poorly supported hardware.