Bug 100669

Summary: Got ext3 errors after building up a fresh large filesystem
Product: [Retired] Red Hat Linux Reporter: Klaus Steinberger <klaus.steinberger>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: high    
Version: 9CC: riel, sct
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-08-04 12:37:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output from e2fsck after that, the filesystem is severly corrupted none

Description Klaus Steinberger 2003-07-24 06:10:15 UTC
Description of problem:

We build up a server with a large RAID array. We tried to transfer large
filesystems (~ 100 GByte) to this server. We tried both rsync, nfs and restoring
from a Tivoli Backup Server. After building up the filesystem, we got
reproducible the following ext3 errors at 4:06 o'clock:


Jul 24 04:06:07 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #2768897: rec_len %% 4 != 0 - offset=900, inode=17956892,
rec_len=30583, name_len=49
Jul 24 04:07:00 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #19316737: rec_len %% 4 != 0 - offset=2448,
inode=17760280, rec_len=17489, name_len=114
Jul 24 04:08:10 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #7110658: rec_len %% 4 != 0 - offset=820,
inode=1835166060, rec_len=26478, name_len=95
Jul 24 04:08:11 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #6815746: rec_len %% 4 != 0 - offset=88,
inode=1886545774, rec_len=26670, name_len=0
Jul 24 04:08:11 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #6864898: directory entry across blocks - offset=2240,
inode=1668572005, rec_len=25964, name_len=97
Jul 24 04:09:20 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #9568259: rec_len is too small for name_len - offset=24,
inode=9568309, rec_len=36, name_len=59
Jul 24 04:10:08 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #6406148: rec_len %% 4 != 0 - offset=264,
inode=1248159828, rec_len=29797, name_len=95
Jul 24 04:10:11 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #16154628: rec_len is too small for name_len -
offset=2760, inode=16154711, rec_len=36, name_len=59
Jul 24 04:10:50 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #18825222: rec_len is too small for name_len - offset=64,
inode=18825232, rec_len=32, name_len=55
Jul 24 04:10:51 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #18939910: rec_len %% 4 != 0 - offset=112,
inode=1819244133, rec_len=29813, name_len=105
Jul 24 04:10:55 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
directory #2342919 contains a hole at offset 4096
Jul 24 04:11:16 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #12304392: rec_len is too small for name_len -
offset=368, inode=12304434, rec_len=36, name_len=58
Jul 24 04:11:26 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #2195465: rec_len %% 4 != 0 - offset=192, inode=29811,
rec_len=32783, name_len=33
Jul 24 04:11:30 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #11468809: rec_len %% 4 != 0 - offset=1304,
inode=1634102127, rec_len=29811, name_len=95
Jul 24 04:11:39 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir:
bad entry in directory #24969225: inode out of bounds - offset=1532,
inode=1836345390, rec_len=108, name_len=0
Jul 24 04:12:08 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir:
bad entry in directory #2228225: rec_len is too small for name_len - offset=532,
inode=2228243, rec_len=40, name_len=63
Jul 24 04:12:09 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir:
bad entry in directory #3620865: rec_len is too small for name_len - offset=120,
inode=3620870, rec_len=36, name_len=59
Jul 24 04:12:14 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir:
bad entry in directory #7110657: rec_len is too small for name_len - offset=152,
inode=7110661, rec_len=32, name_len=53
Jul 24 04:12:22 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir:
bad entry in directory #9977857: rec_len %% 4 != 0 - offset=2268,
inode=1918858100, rec_len=28277, name_len=95
[

Version-Release number of selected component (if applicable):
2.4.20-18.9


How reproducible:
Every time we freshly buildup this filesystem, including building up the LVM
Volume Group freshly


Steps to Reproduce:
1. Create the volume Group with pvcreate /dev/sdc1
   vgcgreate -s 16M vg01 /dev/sdc1
2. Create the Logical Volumes:
   lvcreate -n etp -L 200G vg01
   lvcreate -n etp1 -L 200G vg01
3. Create the filesystems:
   mke2fs -j -L etp -R stride=16 /dev/vg01/etp
   tune2fs -c 0 -i 0 /dev/vg01/etp
   mke2fs -j -L etp1 -R stride=16 /dev/vg01/etp1
   tune2fs -c 0 -i 0 /dev/vg01/etp1
4. Mount them:
   mount /dev/vg01/etp /export/data/etp
   mount /dev/vg01/etp1 /export/date/etp1
5. Fill them with data (around 100 GBytes per FS)
   We tried rsync -e rsh from a another server
   We also tried rsync onto NFS mounted filesystem from another server
   We also tried Tivoli's dsmc command to restore a filesystem
6. Wait till 4:06 o'clock until slocate or something else runs, and you
   see the errors in the log.
    
Actual results:

Data Corruption!


Expected results:

The system should never corrupt data!


Additional info:

Dual PIII 1.4 Ghz System with a Tyan 2518 Motherboard
3 Ware 7500-8 Raid controller with 6 Disks á 160 GByte
(5 Disks Raid 5, one Disk hotspare).

We see no errors from the Raidcontroller, no disk read errors, no SCSI errors,
just the ext3 errors. Some googling through the net suspected me that this is an
problem in the 2.4.20 kernel (maybe already in 2.4.18) and/or backports from 2.5
which are included in 2.4.20-18 kernel.

Comment 1 Klaus Steinberger 2003-07-24 06:13:23 UTC
Created attachment 93096 [details]
Output from e2fsck after that, the filesystem is severly corrupted

Comment 2 Stephen Tweedie 2003-08-04 11:29:44 UTC
Can you reproduce this without using LVM?

Comment 3 Klaus Steinberger 2003-08-04 12:37:47 UTC
Yes, the error happens also without LVM.

We currently investigate into a hardware problem with the 3ware Controller.
We have already changed anything in this computer except the 3ware. We tried
also to connect two of the 160GB IDE disks directly to the IDE Ports on the
motherboad. we then created a Volume Group spanning this two disks, and created
a 200 GB logical volume with an ext3 filesystem. This works without an error. 

Currently we have exchanged the 3ware 7500 against an older 6000, and try again,
so please wait with further actions until we report on this.

Sincerely,
Klaus Steinberger

Comment 4 Klaus Steinberger 2003-08-05 05:53:49 UTC
We replaced now the 3ware 7500-8 controller through an old 3ware 6000 controller
until we get a replacement for the 7500, the problem disappeared. So I think it
is was really a faulty controller. Please excuse that I reported a Bug, but it
looked for me like a software problem, as we got no error messages from the
controller.

Sincerely,
Klaus Steinberger

Comment 5 Stephen Tweedie 2003-08-05 07:59:42 UTC
OK, thanks for following up on this.