Bug 683104
Summary: | fsck.gfs2 only rebuilds one missing journal at a time | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Nate Straz <nstraz> | ||||||||
Component: | cluster | Assignee: | Robert Peterson <rpeterso> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> | ||||||||
Severity: | low | Docs Contact: | |||||||||
Priority: | low | ||||||||||
Version: | 6.1 | CC: | ccaulfie, cluster-maint, fdinitto, lhh, rpeterso, swhiteho, teigland | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | cluster-3.0.12.1-4.el6 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
Prior to this patch, the fsck.gfs2 program used the number of entries in the journal index to look for missing journals. As a result, if more than one journal was missing, they were not all rebuilt and subsequent runs of fsck.gfs2 were needed to recover all the journals. Since each node needs its own journal, code was added to fsck.gfs2 to use the "per_node" system directory to determine the correct number of journals to repair. As a result, fsck.gfs2 now repairs all the journals in one run.
|
Story Points: | --- | ||||||||
Clone Of: | 622576 | Environment: | |||||||||
Last Closed: | 2011-12-06 14:51:03 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Nate Straz
2011-03-08 15:12:51 UTC
Created attachment 502895 [details]
Untested patch
I think this patch should do the trick, but I haven't taken
the time to test it yet.
Created attachment 505093 [details]
Patch that works properly
The previous patch did not work for several reasons. This one
works and is tested, and will most likely be shipped as is.
The previously attached patch was tested on system gfs-i24c-01. The test is as follows: (1) I restore a metadata set I created that has journals 2-6 missing. (2) I run the new fsck with -n to verify it doesn't crash or make changes. Due to the large amount of output, I redirect the output elsewhere. (3) I run the new fsck with -y to verify it rebuilds all the journals and gives the proper return code of 1. (4) I run the new fsck again to verify a second run finds no errors and gives a return code of 0. Here are the testing results: [root@gfs-i24c-01 ../gfs2/fsck]# gfs2_edit restoremeta /home/bob/metadata/gfs2/severaldeadjournals.meta /dev/sasdrives/bob File system size: 104792069 (0x63f0005) blocks, aka 399.768GB There are 104857600 blocks of 4096 bytes in the destination device. 104857600 metadata blocks (100%) processed, File /home/bob/metadata/gfs2/severaldeadjournals.meta restore successful. [root@gfs-i24c-01 ../gfs2/fsck]# ./fsck.gfs2 -n /dev/sasdrives/bob &> /tmp/gronk [root@gfs-i24c-01 ../gfs2/fsck]# echo $? 4 [root@gfs-i24c-01 ../gfs2/fsck]# ./fsck.gfs2 -y /dev/sasdrives/bob Initializing fsck File system journal "journal2" is missing: pass1 will try to recreate it. File system journal "journal3" is missing: pass1 will try to recreate it. File system journal "journal4" is missing: pass1 will try to recreate it. File system journal "journal5" is missing: pass1 will try to recreate it. File system journal "journal6" is missing: pass1 will try to recreate it. Journal recovery complete. Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Invalid or missing journal2 system inode (should be 4, is 0). Rebuilding system file "journal2" Invalid or missing journal3 system inode (should be 4, is 0). Rebuilding system file "journal3" Invalid or missing journal4 system inode (should be 4, is 0). Rebuilding system file "journal4" Invalid or missing journal5 system inode (should be 4, is 0). Rebuilding system file "journal5" Invalid or missing journal6 system inode (should be 4, is 0). Rebuilding system file "journal6" Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete The statfs file is wrong: Current statfs values: blocks: 104846384 (0x63fd430) free: 104745764 (0x63e4b24) dinodes: 35 (0x23) Calculated statfs values: blocks: 104846384 (0x63fd430) free: 104581594 (0x63bc9da) dinodes: 40 (0x28) The statfs file was fixed. Writing changes to disk gfs2_fsck complete [root@gfs-i24c-01 ../gfs2/fsck]# echo $? 1 [root@gfs-i24c-01 ../gfs2/fsck]# ./fsck.gfs2 /dev/sasdrives/bob Initializing fsck Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete [root@gfs-i24c-01 ../gfs2/fsck]# echo $? 0 [root@gfs-i24c-01 ../gfs2/fsck]# Created attachment 505130 [details]
Better patch
While doing additional testing I discovered a shortcoming of the
previous patch: If the per_node directory was missing and needed
to be built, fsck.gfs2 would crash because it was trying to
rebuild it too early (at a point where the rgrps were not read in).
This patch is able to handle that situation properly. If the
per_node directory is missing and is rebuilt, fsck.gfs2 may
only build one journal during that run.
This patch was pushed to the master branch of the gfs2-utils git repository and the RHEL6 branch of the cluster.git repository. It was tested on system gfs-i24c-01 as described in comment #3, plus another test where the per_node directory was manually removed with gfs2_edit. Changing status to POST until we get this into a build. Verified that multiple journals are recovered at the same time with gfs2-utils-3.0.12.1-7.el6.x86_64 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this patch, the fsck.gfs2 program used the number of entries in the journal index to look for missing journals. As a result, if more than one journal was missing, they were not all rebuilt and subsequent runs of fsck.gfs2 were needed to recover all the journals. Since each node needs its own journal, code was added to fsck.gfs2 to use the "per_node" system directory to determine the correct number of journals to repair. As a result, fsck.gfs2 now repairs all the journals in one run. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1516.html |