Description of problem: We build up a server with a large RAID array. We tried to transfer large filesystems (~ 100 GByte) to this server. We tried both rsync, nfs and restoring from a Tivoli Backup Server. After building up the filesystem, we got reproducible the following ext3 errors at 4:06 o'clock: Jul 24 04:06:07 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #2768897: rec_len %% 4 != 0 - offset=900, inode=17956892, rec_len=30583, name_len=49 Jul 24 04:07:00 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #19316737: rec_len %% 4 != 0 - offset=2448, inode=17760280, rec_len=17489, name_len=114 Jul 24 04:08:10 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #7110658: rec_len %% 4 != 0 - offset=820, inode=1835166060, rec_len=26478, name_len=95 Jul 24 04:08:11 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #6815746: rec_len %% 4 != 0 - offset=88, inode=1886545774, rec_len=26670, name_len=0 Jul 24 04:08:11 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #6864898: directory entry across blocks - offset=2240, inode=1668572005, rec_len=25964, name_len=97 Jul 24 04:09:20 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #9568259: rec_len is too small for name_len - offset=24, inode=9568309, rec_len=36, name_len=59 Jul 24 04:10:08 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #6406148: rec_len %% 4 != 0 - offset=264, inode=1248159828, rec_len=29797, name_len=95 Jul 24 04:10:11 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #16154628: rec_len is too small for name_len - offset=2760, inode=16154711, rec_len=36, name_len=59 Jul 24 04:10:50 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #18825222: rec_len is too small for name_len - offset=64, inode=18825232, rec_len=32, name_len=55 Jul 24 04:10:51 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #18939910: rec_len %% 4 != 0 - offset=112, inode=1819244133, rec_len=29813, name_len=105 Jul 24 04:10:55 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: directory #2342919 contains a hole at offset 4096 Jul 24 04:11:16 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #12304392: rec_len is too small for name_len - offset=368, inode=12304434, rec_len=36, name_len=58 Jul 24 04:11:26 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #2195465: rec_len %% 4 != 0 - offset=192, inode=29811, rec_len=32783, name_len=33 Jul 24 04:11:30 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #11468809: rec_len %% 4 != 0 - offset=1304, inode=1634102127, rec_len=29811, name_len=95 Jul 24 04:11:39 etprd01 kernel: EXT3-fs error (device lvm(58,0)): ext3_readdir: bad entry in directory #24969225: inode out of bounds - offset=1532, inode=1836345390, rec_len=108, name_len=0 Jul 24 04:12:08 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir: bad entry in directory #2228225: rec_len is too small for name_len - offset=532, inode=2228243, rec_len=40, name_len=63 Jul 24 04:12:09 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir: bad entry in directory #3620865: rec_len is too small for name_len - offset=120, inode=3620870, rec_len=36, name_len=59 Jul 24 04:12:14 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir: bad entry in directory #7110657: rec_len is too small for name_len - offset=152, inode=7110661, rec_len=32, name_len=53 Jul 24 04:12:22 etprd01 kernel: EXT3-fs error (device lvm(58,1)): ext3_readdir: bad entry in directory #9977857: rec_len %% 4 != 0 - offset=2268, inode=1918858100, rec_len=28277, name_len=95 [ Version-Release number of selected component (if applicable): 2.4.20-18.9 How reproducible: Every time we freshly buildup this filesystem, including building up the LVM Volume Group freshly Steps to Reproduce: 1. Create the volume Group with pvcreate /dev/sdc1 vgcgreate -s 16M vg01 /dev/sdc1 2. Create the Logical Volumes: lvcreate -n etp -L 200G vg01 lvcreate -n etp1 -L 200G vg01 3. Create the filesystems: mke2fs -j -L etp -R stride=16 /dev/vg01/etp tune2fs -c 0 -i 0 /dev/vg01/etp mke2fs -j -L etp1 -R stride=16 /dev/vg01/etp1 tune2fs -c 0 -i 0 /dev/vg01/etp1 4. Mount them: mount /dev/vg01/etp /export/data/etp mount /dev/vg01/etp1 /export/date/etp1 5. Fill them with data (around 100 GBytes per FS) We tried rsync -e rsh from a another server We also tried rsync onto NFS mounted filesystem from another server We also tried Tivoli's dsmc command to restore a filesystem 6. Wait till 4:06 o'clock until slocate or something else runs, and you see the errors in the log. Actual results: Data Corruption! Expected results: The system should never corrupt data! Additional info: Dual PIII 1.4 Ghz System with a Tyan 2518 Motherboard 3 Ware 7500-8 Raid controller with 6 Disks á 160 GByte (5 Disks Raid 5, one Disk hotspare). We see no errors from the Raidcontroller, no disk read errors, no SCSI errors, just the ext3 errors. Some googling through the net suspected me that this is an problem in the 2.4.20 kernel (maybe already in 2.4.18) and/or backports from 2.5 which are included in 2.4.20-18 kernel.
Created attachment 93096 [details] Output from e2fsck after that, the filesystem is severly corrupted
Can you reproduce this without using LVM?
Yes, the error happens also without LVM. We currently investigate into a hardware problem with the 3ware Controller. We have already changed anything in this computer except the 3ware. We tried also to connect two of the 160GB IDE disks directly to the IDE Ports on the motherboad. we then created a Volume Group spanning this two disks, and created a 200 GB logical volume with an ext3 filesystem. This works without an error. Currently we have exchanged the 3ware 7500 against an older 6000, and try again, so please wait with further actions until we report on this. Sincerely, Klaus Steinberger
We replaced now the 3ware 7500-8 controller through an old 3ware 6000 controller until we get a replacement for the 7500, the problem disappeared. So I think it is was really a faulty controller. Please excuse that I reported a Bug, but it looked for me like a software problem, as we got no error messages from the controller. Sincerely, Klaus Steinberger
OK, thanks for following up on this.