System config. N440BX motherboard, 5 IBM 9.1Gb model DNES-309170 SCSI disk drives connected to on-board Fast-wide SCSI interface, 1 PIII 550Mhz processor, 256M Ram Red Hat 6.2 installed with kernel version 2.2.14-12 per Update/Errata on Red Hat support page. Raid 5 configured as follows (from /etc/raidtab) ---------------------------------------- raiddev /dev/md0 raid-level 5 nr-raid-disks 4 chunk-size 64 parity-algorithm left-symmetric nr-spare-disks 1 device /dev/sdb4 raid-disk 0 device /dev/sdc1 raid-disk 1 device /dev/sdd1 raid-disk 2 device /dev/sde1 raid-disk 3 device /dev/sda1 spare-disk 0 -------------------------------------- Raid5 partition created as follows mkraid /dev/md0 ./mke2fs -b4096 -R stride=16 -N1000000 -m1 /dev/md0 mount /dev/md0 /rdb0 Boot partition is on /dev/sdb1 ~ 10M Primary partition on /dev/sdb2 ~ 1000M Swap partition is on /dev/sdb3 ~ 128M Problem occurs while running iozone v3_24 benchmark tool and now seems to happen sooner with the upgrade to kernel version 2.2.14-12. Error output follows. [root@tisk01 bench]# ./iozone -s250m -r4 -f/rdb0/benchtest -i0 -i2 -a Iozone: Performance Test of File I/O Version $Revision: 3.24 $ Compiled for 32 bit mode. Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith. Run began: Mon Jul 10 06:24:11 2000 File size set to 256000 KB Record Size 4 KB Auto Mode Time Resolution = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 256000 4 Message from syslogd@tisk01 at Mon Jul 10 06:24:31 2000 ... tisk01 kernel: Kernel panic: VFS: LRU block list corrupted
I have tried a raid 0 configuration with the same equipment and get the same kernel panic. This error seems to happen under high load and/or large files. Running benchmark tests on smaller files 1-20 MB does not produce this error immediately.
On my web page (http://people.redhat.com/dledford) there is a memory test script that will detect faulty RAM on a computer far better than any other test we've found to date. Can you please try this on your machine. The description given here, between the two different reports, more or less clears RAID5 (since one machine is RAID5 and the other RAID0) and our experience has been that these problems are almost always hardware related instead of RAID software related. The fact that both of you say you have N440BX based machines also makes it sound hardware related. You might want to check and see if ECC error correction is enabled in the BIOS on your motherboard (if you have ECC RAM, you might not). If you do have ECC RAM and ECC is enabled in the BIOS, then one of the best ways to see if you have RAM related problems is to check the event log in the BIOS and see if it reports any ECC errors.
I have looked more closely into this problem and it looks likt the SCSI chipset on the N440BX board is not fully supported by RedHat 6.2. The SCSI controller is a "Symbios Logic 53C876 Dual Channel Ultra (one wide, one narrow)" and is listed as Tier 3 supported. Could this level of support be related to this problem? -eballweber
This isn't a problem with raid, but with poorly supported hardware.