Bug 621313

Summary: GFS2: fsck.gfs2 seems to process large files twice
Product: Red Hat Enterprise Linux 6 Reporter: Nate Straz <nstraz>
Component: clusterAssignee: Robert Peterson <rpeterso>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 6.0CC: adas, bmarzins, cluster-maint, fdinitto, lhh, rpeterso, rwheeler, ssaha, swhiteho
Target Milestone: rcKeywords: RHELNAK
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: cluster-3.0.12-24.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 12:53:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
First Idea Patch none

Description Nate Straz 2010-08-04 18:34:11 UTC
Description of problem:

While running fsck.gfs2 on a file system with large files, it appears that fsck.gfs2 processes the files twice.  In the following output the same file, looking at the "Large file at X" message is processed twice, the "X percent complete" differs between the two outputs.

example:
[root@dash-01 ~]# /usr/bin/time fsck -t gfs2 -y /dev/rescue/promise
fsck from util-linux-ng 2.17.2
Initializing fsck
Validating Resource Group index.
Level 1 RG check.
(level 1 passed)
Starting pass1
Checking 1G of 34G of file at 170286669 (0xa265e4d)- 3 percent complete.
Checking 5G of 34G of file at 170286669 (0xa265e4d)- 17 percent complete.
Checking 10G of 34G of file at 170286669 (0xa265e4d)- 30 percent complete.
Checking 15G of 34G of file at 170286669 (0xa265e4d)- 44 percent complete.
Checking 19G of 34G of file at 170286669 (0xa265e4d)- 57 percent complete.
Checking 24G of 34G of file at 170286669 (0xa265e4d)- 71 percent complete.
Checking 29G of 34G of file at 170286669 (0xa265e4d)- 84 percent complete.
Checking 33G of 34G of file at 170286669 (0xa265e4d)- 98 percent complete.
Large file at 170286669 (0xa265e4d) - 100 percent complete.
Checking 2G of 34G of file at 170286669 (0xa265e4d)- 7 percent complete.
Checking 6G of 34G of file at 170286669 (0xa265e4d)- 19 percent complete.
Checking 11G of 34G of file at 170286669 (0xa265e4d)- 32 percent complete.
Checking 15G of 34G of file at 170286669 (0xa265e4d)- 45 percent complete.
Checking 19G of 34G of file at 170286669 (0xa265e4d)- 57 percent complete.
Checking 24G of 34G of file at 170286669 (0xa265e4d)- 70 percent complete.
Checking 28G of 34G of file at 170286669 (0xa265e4d)- 83 percent complete.
Checking 32G of 34G of file at 170286669 (0xa265e4d)- 95 percent complete.
Large file at 170286669 (0xa265e4d) - 100 percent complete.
Checking 1M of 183G of file at 181802531 (0xad61623)- 0 percent complete.
Checking 1G of 183G of file at 181802531 (0xad61623)- 1 percent complete.
Checking 6G of 183G of file at 181802531 (0xad61623)- 3 percent complete.
Checking 11G of 183G of file at 181802531 (0xad61623)- 6 percent complete.
Checking 16G of 183G of file at 181802531 (0xad61623)- 9 percent complete.
Checking 21G of 183G of file at 181802531 (0xad61623)- 11 percent complete.
Checking 26G of 183G of file at 181802531 (0xad61623)- 14 percent complete.
Checking 31G of 183G of file at 181802531 (0xad61623)- 17 percent complete.
Checking 36G of 183G of file at 181802531 (0xad61623)- 19 percent complete.
Checking 41G of 183G of file at 181802531 (0xad61623)- 22 percent complete.
Checking 46G of 183G of file at 181802531 (0xad61623)- 25 percent complete.
Checking 51G of 183G of file at 181802531 (0xad61623)- 27 percent complete.
Checking 56G of 183G of file at 181802531 (0xad61623)- 30 percent complete.
Checking 61G of 183G of file at 181802531 (0xad61623)- 33 percent complete.
Checking 66G of 183G of file at 181802531 (0xad61623)- 35 percent complete.
Checking 70G of 183G of file at 181802531 (0xad61623)- 38 percent complete.
Checking 75G of 183G of file at 181802531 (0xad61623)- 41 percent complete.
Checking 80G of 183G of file at 181802531 (0xad61623)- 44 percent complete.
Checking 85G of 183G of file at 181802531 (0xad61623)- 46 percent complete.
Checking 90G of 183G of file at 181802531 (0xad61623)- 49 percent complete.
Checking 95G of 183G of file at 181802531 (0xad61623)- 52 percent complete.
Checking 100G of 183G of file at 181802531 (0xad61623)- 54 percent complete.
Checking 105G of 183G of file at 181802531 (0xad61623)- 57 percent complete.
Checking 110G of 183G of file at 181802531 (0xad61623)- 60 percent complete.
Checking 115G of 183G of file at 181802531 (0xad61623)- 62 percent complete.
Checking 120G of 183G of file at 181802531 (0xad61623)- 65 percent complete.
Checking 125G of 183G of file at 181802531 (0xad61623)- 68 percent complete.
Checking 130G of 183G of file at 181802531 (0xad61623)- 70 percent complete.
Checking 135G of 183G of file at 181802531 (0xad61623)- 73 percent complete.
Checking 140G of 183G of file at 181802531 (0xad61623)- 76 percent complete.
Checking 144G of 183G of file at 181802531 (0xad61623)- 78 percent complete.
Checking 149G of 183G of file at 181802531 (0xad61623)- 81 percent complete.
Checking 154G of 183G of file at 181802531 (0xad61623)- 84 percent complete.
Checking 159G of 183G of file at 181802531 (0xad61623)- 87 percent complete.
Checking 164G of 183G of file at 181802531 (0xad61623)- 89 percent complete.
Checking 169G of 183G of file at 181802531 (0xad61623)- 92 percent complete.
Checking 174G of 183G of file at 181802531 (0xad61623)- 95 percent complete.
Checking 179G of 183G of file at 181802531 (0xad61623)- 97 percent complete.
Large file at 181802531 (0xad61623) - 100 percent complete.
Checking 1M of 183G of file at 181802531 (0xad61623)- 0 percent complete.
Checking 1G of 183G of file at 181802531 (0xad61623)- 0 percent complete.
Checking 5G of 183G of file at 181802531 (0xad61623)- 3 percent complete.
Checking 10G of 183G of file at 181802531 (0xad61623)- 5 percent complete.
Checking 14G of 183G of file at 181802531 (0xad61623)- 7 percent complete.
Checking 18G of 183G of file at 181802531 (0xad61623)- 10 percent complete.
Checking 23G of 183G of file at 181802531 (0xad61623)- 12 percent complete.
Checking 27G of 183G of file at 181802531 (0xad61623)- 14 percent complete.
Checking 31G of 183G of file at 181802531 (0xad61623)- 17 percent complete.
Checking 36G of 183G of file at 181802531 (0xad61623)- 19 percent complete.
Checking 40G of 183G of file at 181802531 (0xad61623)- 21 percent complete.
Checking 44G of 183G of file at 181802531 (0xad61623)- 24 percent complete.
Checking 48G of 183G of file at 181802531 (0xad61623)- 26 percent complete.
Checking 53G of 183G of file at 181802531 (0xad61623)- 28 percent complete.
Checking 57G of 183G of file at 181802531 (0xad61623)- 31 percent complete.
Checking 61G of 183G of file at 181802531 (0xad61623)- 33 percent complete.
Checking 66G of 183G of file at 181802531 (0xad61623)- 36 percent complete.
Checking 70G of 183G of file at 181802531 (0xad61623)- 38 percent complete.
Checking 74G of 183G of file at 181802531 (0xad61623)- 40 percent complete.
Checking 79G of 183G of file at 181802531 (0xad61623)- 43 percent complete.
Checking 83G of 183G of file at 181802531 (0xad61623)- 45 percent complete.
Checking 87G of 183G of file at 181802531 (0xad61623)- 47 percent complete.
Checking 91G of 183G of file at 181802531 (0xad61623)- 50 percent complete.
Checking 96G of 183G of file at 181802531 (0xad61623)- 52 percent complete.
Checking 100G of 183G of file at 181802531 (0xad61623)- 54 percent complete.
Checking 104G of 183G of file at 181802531 (0xad61623)- 57 percent complete.
Checking 109G of 183G of file at 181802531 (0xad61623)- 59 percent complete.
Checking 113G of 183G of file at 181802531 (0xad61623)- 61 percent complete.
Checking 117G of 183G of file at 181802531 (0xad61623)- 64 percent complete.
Checking 121G of 183G of file at 181802531 (0xad61623)- 66 percent complete.
Checking 126G of 183G of file at 181802531 (0xad61623)- 68 percent complete.
Checking 130G of 183G of file at 181802531 (0xad61623)- 71 percent complete.
Checking 134G of 183G of file at 181802531 (0xad61623)- 73 percent complete.
Checking 139G of 183G of file at 181802531 (0xad61623)- 75 percent complete.
Checking 143G of 183G of file at 181802531 (0xad61623)- 78 percent complete.
Checking 147G of 183G of file at 181802531 (0xad61623)- 80 percent complete.
Checking 152G of 183G of file at 181802531 (0xad61623)- 82 percent complete.
Checking 156G of 183G of file at 181802531 (0xad61623)- 85 percent complete.
Checking 160G of 183G of file at 181802531 (0xad61623)- 87 percent complete.
Checking 164G of 183G of file at 181802531 (0xad61623)- 89 percent complete.
Checking 169G of 183G of file at 181802531 (0xad61623)- 92 percent complete.
Checking 173G of 183G of file at 181802531 (0xad61623)- 94 percent complete.
Checking 177G of 183G of file at 181802531 (0xad61623)- 96 percent complete.
Checking 182G of 183G of file at 181802531 (0xad61623)- 99 percent complete.
Large file at 181802531 (0xad61623) - 100 percent complete.


Version-Release number of selected component (if applicable):
gfs2-utils-3.0.12-13.el6.x86_64

How reproducible:
Easily

Steps to Reproduce:
** guessing **
1. Create a file >10G
2. umount and fsck the file system
  
Actual results:
See above

Expected results:
Only process files once.

Additional info:

Comment 2 RHEL Program Management 2010-08-04 19:07:44 UTC
This issue has been proposed when we are only considering blocker
issues in the current Red Hat Enterprise Linux release.

** If you would still like this issue considered for the current
release, ask your support representative to file as a blocker on
your behalf. Otherwise ask that it be considered for the next
Red Hat Enterprise Linux release. **

Comment 3 Robert Peterson 2010-08-09 20:17:31 UTC
I think I know what's going on here.  In fact, pass1 is making
two traversals through each file's metadata.  The first traversal
should be relatively fast: all it's doing is reading the metadata
pointers and checking whether the blocks are all within the
boundaries of the device.  Basically, it's checking for a
significant number of bad block pointers.  It needs to do this
because we don't want to mark a block as duplicated (for example)
until we know if the pointers in general can be trusted. If the
dinode contains a number of corrupt pointers, we need to avoid
trying to back out previous block markings.  Thus it needs to be
in a separate loop.

Here's an example of why this was done:

In one set of customer metadata, a metadata block was overwritten
with a bunch of random data.  The overwritten block happened to be
on a large file, so there were several good metadata blocks prior
to the bad one, and several good metadata blocks after the bad one.
In this particular case, the bad metadata block had several block
references that were within file system boundaries, and several
outside those boundaries.  One of the bad pointers happened by
coincidence to be the block address of the root dinode.  Another
just happened to be a journal.  Still another just happened to be
the block address of the rindex file.  When fsck.gfs2 encountered
those blocks, it had already processed the root dinode, the journal
and the rindex dinode, so those bad references caused fsck.gfs2 to
mark those dinodes as duplicate references.  Later, it found
some really bad pointers in that dinode and decided to delete
the dinode that had the bad metadata.  The problem is: now it had
to delete the dinode, but some of the previously processed
metadata was valid and some was not.  

If it deleted all the previously processed blocks, it would
improperly delete the system files.  If it deletes none of the
previously processed pointers, depending on the order in which
blocks are processed, some subsequent valid references were
improperly determined to be duplicates and valid files were
improperly destroyed.

It was basically impossible for it to determine the good from the
bad from the middle of a file, and impossible for it to reverse
the block status from "duplicate reference" to what those blocks
originally represented: in this example, the three system files.  

I tried several methods of trying to keep track of blocks
and reverse decisions after the problem is discovered, but each
method led to its own set of problems.  The only foolproof way to
do it is to pre-process the metadata pointers and either trust
them all or not trust them all.

However, what I should do is not print the large file message
during pass1 preprocessing.  That's easy: I can add a new entry to
the metawalk_fxns structure for large file messages and only
set it for the traversal that takes significant time. That way,
the message will only come out once for any given dinode.

I also want to investigate how long this preprocessing takes
and whether I can do it more efficiently.  There's a potential
performance enhancement here.

Comment 4 Robert Peterson 2010-08-09 20:50:23 UTC
Created attachment 437707 [details]
First Idea Patch

This patch implements the option discussed in the previous update.
It compiles but is untested.

Comment 5 Robert Peterson 2010-08-13 14:15:56 UTC
Requesting ack flags for 6.1.

Comment 6 Robert Peterson 2010-09-23 15:46:12 UTC
The patch was tested on system roth-08 using a 100GB file.
The patch was pushed to the master branch of the gfs2-utils
git tree and the STABLE3 and RHEL6 branches of the cluster
git tree for inclusion into RHEL6.1. Changing status to
POST until we start doing builds for 6.1.

Comment 8 Nate Straz 2011-03-30 17:33:47 UTC
Verified with gfs2-utils-3.0.12-40.el6.

[root@dash-01 ~]# fsck.gfs2 -y /dev/fsck/bigfile
Initializing fsck
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
Starting pass1
Checking 1G of 66G of file at 33133 (0x816d)- 2 percent complete.
Checking 6G of 66G of file at 33133 (0x816d)- 9 percent complete.               Checking 10G of 66G of file at 33133 (0x816d)- 16 percent complete.             Checking 15G of 66G of file at 33133 (0x816d)- 22 percent complete.             Checking 19G of 66G of file at 33133 (0x816d)- 29 percent complete.             Checking 24G of 66G of file at 33133 (0x816d)- 36 percent complete.             Checking 28G of 66G of file at 33133 (0x816d)- 43 percent complete.             Checking 33G of 66G of file at 33133 (0x816d)- 49 percent complete.             Checking 37G of 66G of file at 33133 (0x816d)- 56 percent complete.             Checking 42G of 66G of file at 33133 (0x816d)- 63 percent complete.             Checking 46G of 66G of file at 33133 (0x816d)- 70 percent complete.             Checking 51G of 66G of file at 33133 (0x816d)- 76 percent complete.             Checking 55G of 66G of file at 33133 (0x816d)- 83 percent complete.             Checking 60G of 66G of file at 33133 (0x816d)- 90 percent complete.             Checking 64G of 66G of file at 33133 (0x816d)- 96 percent complete.             Large file at 33133 (0x816d) - 100 percent complete.
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Pass5 complete
gfs2_fsck complete

Comment 9 errata-xmlrpc 2011-05-19 12:53:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0537.html