Bug 621313
Summary: | GFS2: fsck.gfs2 seems to process large files twice | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Nate Straz <nstraz> | ||||
Component: | cluster | Assignee: | Robert Peterson <rpeterso> | ||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 6.0 | CC: | adas, bmarzins, cluster-maint, fdinitto, lhh, rpeterso, rwheeler, ssaha, swhiteho | ||||
Target Milestone: | rc | Keywords: | RHELNAK | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | cluster-3.0.12-24.el6 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-05-19 12:53:27 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Nate Straz
2010-08-04 18:34:11 UTC
This issue has been proposed when we are only considering blocker issues in the current Red Hat Enterprise Linux release. ** If you would still like this issue considered for the current release, ask your support representative to file as a blocker on your behalf. Otherwise ask that it be considered for the next Red Hat Enterprise Linux release. ** I think I know what's going on here. In fact, pass1 is making two traversals through each file's metadata. The first traversal should be relatively fast: all it's doing is reading the metadata pointers and checking whether the blocks are all within the boundaries of the device. Basically, it's checking for a significant number of bad block pointers. It needs to do this because we don't want to mark a block as duplicated (for example) until we know if the pointers in general can be trusted. If the dinode contains a number of corrupt pointers, we need to avoid trying to back out previous block markings. Thus it needs to be in a separate loop. Here's an example of why this was done: In one set of customer metadata, a metadata block was overwritten with a bunch of random data. The overwritten block happened to be on a large file, so there were several good metadata blocks prior to the bad one, and several good metadata blocks after the bad one. In this particular case, the bad metadata block had several block references that were within file system boundaries, and several outside those boundaries. One of the bad pointers happened by coincidence to be the block address of the root dinode. Another just happened to be a journal. Still another just happened to be the block address of the rindex file. When fsck.gfs2 encountered those blocks, it had already processed the root dinode, the journal and the rindex dinode, so those bad references caused fsck.gfs2 to mark those dinodes as duplicate references. Later, it found some really bad pointers in that dinode and decided to delete the dinode that had the bad metadata. The problem is: now it had to delete the dinode, but some of the previously processed metadata was valid and some was not. If it deleted all the previously processed blocks, it would improperly delete the system files. If it deletes none of the previously processed pointers, depending on the order in which blocks are processed, some subsequent valid references were improperly determined to be duplicates and valid files were improperly destroyed. It was basically impossible for it to determine the good from the bad from the middle of a file, and impossible for it to reverse the block status from "duplicate reference" to what those blocks originally represented: in this example, the three system files. I tried several methods of trying to keep track of blocks and reverse decisions after the problem is discovered, but each method led to its own set of problems. The only foolproof way to do it is to pre-process the metadata pointers and either trust them all or not trust them all. However, what I should do is not print the large file message during pass1 preprocessing. That's easy: I can add a new entry to the metawalk_fxns structure for large file messages and only set it for the traversal that takes significant time. That way, the message will only come out once for any given dinode. I also want to investigate how long this preprocessing takes and whether I can do it more efficiently. There's a potential performance enhancement here. Created attachment 437707 [details]
First Idea Patch
This patch implements the option discussed in the previous update.
It compiles but is untested.
Requesting ack flags for 6.1. The patch was tested on system roth-08 using a 100GB file. The patch was pushed to the master branch of the gfs2-utils git tree and the STABLE3 and RHEL6 branches of the cluster git tree for inclusion into RHEL6.1. Changing status to POST until we start doing builds for 6.1. Verified with gfs2-utils-3.0.12-40.el6. [root@dash-01 ~]# fsck.gfs2 -y /dev/fsck/bigfile Initializing fsck Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Checking 1G of 66G of file at 33133 (0x816d)- 2 percent complete. Checking 6G of 66G of file at 33133 (0x816d)- 9 percent complete. Checking 10G of 66G of file at 33133 (0x816d)- 16 percent complete. Checking 15G of 66G of file at 33133 (0x816d)- 22 percent complete. Checking 19G of 66G of file at 33133 (0x816d)- 29 percent complete. Checking 24G of 66G of file at 33133 (0x816d)- 36 percent complete. Checking 28G of 66G of file at 33133 (0x816d)- 43 percent complete. Checking 33G of 66G of file at 33133 (0x816d)- 49 percent complete. Checking 37G of 66G of file at 33133 (0x816d)- 56 percent complete. Checking 42G of 66G of file at 33133 (0x816d)- 63 percent complete. Checking 46G of 66G of file at 33133 (0x816d)- 70 percent complete. Checking 51G of 66G of file at 33133 (0x816d)- 76 percent complete. Checking 55G of 66G of file at 33133 (0x816d)- 83 percent complete. Checking 60G of 66G of file at 33133 (0x816d)- 90 percent complete. Checking 64G of 66G of file at 33133 (0x816d)- 96 percent complete. Large file at 33133 (0x816d) - 100 percent complete. Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0537.html |