Bug 562044

Summary: e2fsck does not correct all errors in filesystem
Product: Red Hat Enterprise Linux 5 Reporter: Lachlan McIlroy <lmcilroy>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED ERRATA QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: bnater, sct, vgaikwad
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: e2fsprogs-1.39-31.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 676465 683149 (view as bug list) Environment:
Last Closed: 2011-07-21 09:07:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 676465, 683149    
Attachments:
Description Flags
Output from e2fsck -y from first pass at repairing filesystem image
none
Small testcase filesystem none

Description Lachlan McIlroy 2010-02-05 05:20:56 UTC
Description of problem:

I received a e2image from a customer for an ext3 filesystem that was reporting errors.  The image is available from dropbox.redhat.com, filename appl_lv.e2i_2.bz2

Running e2fsck -y appl_lv.e2i_2 produced a lot of output but appeared to have fixed all the errors.  Running the same command again returned immediately and reported a clean filesystem.

But running e2fsck -y -f appl_lv.e2i_2 found the following unfixed errors:

e2fsck 1.39 (29-May-2006)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -36020
Fix? yes

Free blocks count wrong for group #1 (3203, counted=3204).
Fix? yes

Free blocks count wrong (52812212, counted=52812213).
Fix? yes


appl_lv.e2i_2: ***** FILE SYSTEM WAS MODIFIED *****
appl_lv.e2i_2: 130296/69697536 files (2.1% non-contiguous), 86579787/139392000 blocks

Running e2fsck -y -f again showed no more errors.

Version-Release number of selected component (if applicable):
I've been able to reproduce this with e2fsprogs-1.39-20.el5 and e2fsprogs-1.41.4-12.fc11.

How reproducible:
Every time using the same e2image file.

Comment 1 Lachlan McIlroy 2010-02-05 05:23:14 UTC
Created attachment 388974 [details]
Output from e2fsck -y from first pass at repairing filesystem image

Comment 2 Lachlan McIlroy 2010-02-05 05:25:37 UTC
Note that the appl_lv.e2i_2.bz2 e2image needs at least 532GB of disk space to unpack.

Comment 3 Eric Sandeen 2010-02-05 18:44:24 UTC
There are tools to turn 0s into sparseness, that can help.

-Eric

Comment 6 Lachlan McIlroy 2010-02-08 00:29:39 UTC
Eric,

What are these tools you mention?

Unpacking a raw e2image needs to put blocks back into their original locations - is there a way I can unpack this image to a sparse file backed storage?

Lachlan

Comment 7 Eric Sandeen 2010-02-08 01:00:03 UTC
Ok, thanks Lachlan.

make-sparse.c is at a few places - for example:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=236383#24

testing newer e2fsprogs might be worthwhile.

-Eric

Comment 8 Eric Sandeen 2010-02-08 23:11:22 UTC
out of curiosity, what happened to this fs?  Looks like a mess ...

Comment 9 Lachlan McIlroy 2010-02-08 23:40:22 UTC
I'm not sure how this filesystem got into this state.  It's from a customer and they reported that they had filesystems that could not be repaired - fsck was aborting with HTREE errors.  They gave us two e2images; the first had HTREE errors but we had no problem repairing it and this is the second one which had no HTREE errors but I did find this problem.

Comment 10 Eric Sandeen 2010-02-08 23:45:27 UTC
With the provided image I do see the unrepaired errors after the first pass; just a few things on the 2nd pass.

I'll try it out again w/ newer e2fsprogs to see if this is something that's been fixed already.

If they are hitting errors that we don't see, which arch were they on?

-Eric

Comment 11 Lachlan McIlroy 2010-02-09 03:03:02 UTC
They are using x86_64 systems.  The filesystem is on LVM if that helps at all.

Comment 12 Eric Sandeen 2010-02-09 05:39:07 UTC
e2fsck 1.41.9 behaves the same way, so I'll have to go looking for why it's not cleaning it up on the first pass.

(it's pretty minimal problems on the 2nd pass, just one block group off by 1 block marked as used that is not claimed by any file)

-Eric

Comment 13 Stephen Tweedie 2010-02-09 14:14:39 UTC
(In reply to comment #3)
> There are tools to turn 0s into sparseness, that can help.

To uncompress to a sparse file, "cp" can help:

$ bzcat file.bz2 | cp --sparse=always /dev/stdin file.output

Comment 14 Eric Sandeen 2011-02-09 18:55:56 UTC
Sent a patch, and an explanation of what's going wrong, upstream:
http://marc.info/?l=linux-ext4&m=129727690603767&w=2

We'll see what the maintainer says about it.  The fix is simple enough but it feels a little like a bandaid.  Should suffice though, and fixes this testcase.

Comment 15 Eric Sandeen 2011-02-09 19:06:37 UTC
We've got a reproducer and a candidate patch for this, so I'll go ahead & devel_ack+ it, and clone it to rhel6 (since the bug persists upstream)

Comment 16 Eric Sandeen 2011-02-10 16:22:34 UTC
Created attachment 478076 [details]
Small testcase filesystem

This fs image should replicate the problem; 2nd fsck finds block bitmap problems as with the larger customer image.

Comment 17 Eric Sandeen 2011-02-10 17:14:51 UTC
Tagged & built in e2fsprogs-1.39-31.el5

Comment 22 errata-xmlrpc 2011-07-21 09:07:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1080.html

Comment 23 errata-xmlrpc 2011-07-21 12:38:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1080.html