Bug 439749 - Directory corruption (duplicate entries) with dir_index
Directory corruption (duplicate entries) with dir_index
Status: CLOSED DUPLICATE of bug 465626
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.1
All Linux
high Severity low
: rc
: ---
Assigned To: Eric Sandeen
Red Hat Kernel QE team
:
Depends On:
Blocks: 483701
  Show dependency treegraph
 
Reported: 2008-03-31 02:04 EDT by Simon Matter
Modified: 2009-02-16 14:11 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-02-16 13:17:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Simon Matter 2008-03-31 02:04:07 EDT
Description of problem:

We are using rsnapshot in our backup environment. The target filesystem is 1T in
size consisting of millions of files, directories and hardlinks. Everything has
worked fine until some days ago when things started to fail. In the end I found
out that several directories had duplicate file entries with the same name, like
this:
[root@delta ~]# stat
/home/snapshots/daily.0/ns1.invoca.ch/_var_www/var/www/html/images/prod/img44798.jpg*
  File:
`/home/snapshots/daily.0/ns1.invoca.ch/_var_www/var/www/html/images/prod/img44798.jpg'
  Size: 1145            Blocks: 8          IO Block: 4096   regular file
Device: fd08h/64776d    Inode: 6801915     Links: 12
Access: (0664/-rw-rw-r--)  Uid: (  210/ UNKNOWN)   Gid: (  603/ UNKNOWN)
Access: 2008-03-30 06:14:44.000000000 +0200
Modify: 2006-05-03 22:27:10.000000000 +0200
Change: 2008-03-30 05:31:06.000000000 +0200
  File:
`/home/snapshots/daily.0/ns1.invoca.ch/_var_www/var/www/html/images/prod/img44798.jpg'
  Size: 1145            Blocks: 8          IO Block: 4096   regular file
Device: fd08h/64776d    Inode: 6801915     Links: 12
Access: (0664/-rw-rw-r--)  Uid: (  210/ UNKNOWN)   Gid: (  603/ UNKNOWN)
Access: 2008-03-30 06:14:44.000000000 +0200
Modify: 2006-05-03 22:27:10.000000000 +0200
Change: 2008-03-30 05:31:06.000000000 +0200

e2fsck didn't find anything wrong and the only way to make things working again
was to remove the directories. However, two days later the same problem appeared
again.
What appears to have finally solved the problems was to remove the dir_index
feature and e2fsck -fD the filesystem (I'll report again if the problem shows up
again).

I know that's not a very useful bugreport but I still wanted to post it here so
people can share their experiences.
Comment 1 Eric Sandeen 2008-05-10 01:01:23 EDT
Which kernel was this on?  There is one fix in 5.1 related to a dir_index
corruption:

* Mon Sep 17 2007 Don Zickus <dzickus@redhat.com> [2.6.18-48.el5]
- [fs] ext3: ensure do_split leaves enough free space in both blocks (Eric
Sandeen ) [286501]

although it's unlikely that this problem would lead to duplicate entries, I believe.

If this were to happen again, an e2image of the filesystem might be very helpful.

Thanks,
-Eric
Comment 2 Simon Matter 2008-05-10 08:32:34 EDT
I don't think it has something to do with #286501 because my kernel versions
were newer (kernel-xen-2.6.18-53.1.13.el5 and kernel-xen-2.6.18-53.1.14.el5) at
the time the issue showed up. We are running it as Dom0 and it also holds some
PVM DomU's but I don't think that matters. The filesystem in question has
nothing to do with the DomU's. What I can say for sure is that disabling
dir_index on the filesystem has fixed it immediately and no problems showed up
since then. Unfortunately I don't know how the reproduce the problem.
Comment 3 Eric Sandeen 2008-08-23 00:54:58 EDT
I apologize for the lack of activity on this bug...

There is a recent upstream fix related to duplicate file entries with dir_index dirs; this may well be be the problem.

"duplicate entries on ext3 when using readdir/readdir64" is the upstream thread on the linux-ext4 list.  It also contains a workaround:

----

Anyway, the workaround is as follows:

debugfs -w /dev/sdXXX
debugfs: set_super_value def_hash_version half_md4
debugfs: quit

Then completely delete any directories where you were having problems,
and recreate them.  (You can do the "mkdir foo.new; mv foo/* foo.new;
rmdir foo; mv foo.new foo" trick if you want to preserve the files in
that directory.)

----

The fix is not yet in Linus' tree but probably will be soon.  

If you run into this problem again, I can provide an updated ext3 module to test.  I would guess that the lack of dir_index might be hurting your workload with that many files ....

Thanks,
-Eric
Comment 5 Eric Sandeen 2008-12-02 15:42:19 EST
Duplicate dir entries is somewhat akin to corruption IMHO; I think this should get fixed in RHEL, and there are now upstream fixes which we can backport.

Thanks,
-Eric
Comment 6 RHEL Product and Program Management 2009-01-27 15:42:20 EST
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 RHEL Product and Program Management 2009-02-16 10:25:13 EST
Updating PM score.
Comment 8 Eric Sandeen 2009-02-16 13:17:21 EST
This is a dup of another, will mark as dup & transfer PM score if I can.

*** This bug has been marked as a duplicate of bug 465626 ***
Comment 9 Simon Matter 2009-02-16 13:40:28 EST
Unfortunately I get an access denied on 465626 and this bug is closed...
Comment 10 Eric Sandeen 2009-02-16 14:11:44 EST
I put you on cc:, sorry about that.

Note You need to log in before you can comment on or make changes to this bug.