Hide Forgot
+++ This bug was initially created as a clone of Bug #622576 +++ * removing multiple journals in the middle If I have a file system with five journals and I remove the middle three, fsck.gfs2 will only recreate one journal at a time. I have to run fsck.gfs2 three times to get the journals all back. This seems like a bug that should be fixed. --- Additional comment from rpeterso@redhat.com on 2011-03-07 08:50:43 EST --- In answer to comment #11: fsck.gfs2 should probably recover multiple journals. Do you have output I can look at from this scenario where it didn't? I'd just like to double-check that it didn't act up. --- Additional comment from nstraz@redhat.com on 2011-03-07 10:51:57 EST --- Created attachment 482716 [details] fsck.gfs2 log while rebuilding journals Attached is the complete output of fsck.gfs2 while I run it until all of the journals are rebuilt. The interesting parts are probably these lines: Initializing fsck File system journal "journal1" is missing: pass1 will try to recreate it. Journal recovery complete. Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Error: resource group 17 (0x11): free space (0) does not match bitmap (3) (3 blocks were reclaimed) The rgrp was fixed. RGs: Consistent: 799 Inconsistent: 1 Fixed: 1 Total: 800 Starting pass1 Invalid or missing journal1 system inode (should be 4, is 0). Rebuilding system file "journal1" Pass1 complete ... Initializing fsck File system journal "journal2" is missing: pass1 will try to recreate it. Journal recovery complete. Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Invalid or missing journal2 system inode (should be 4, is 0). Rebuilding system file "journal2" Pass1 complete ... alizing fsck File system journal "journal3" is missing: pass1 will try to recreate it. Journal recovery complete. Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Invalid or missing journal3 system inode (should be 4, is 0). Rebuilding system file "journal3" Pass1 complete
Created attachment 502895 [details] Untested patch I think this patch should do the trick, but I haven't taken the time to test it yet.
Created attachment 505093 [details] Patch that works properly The previous patch did not work for several reasons. This one works and is tested, and will most likely be shipped as is.
The previously attached patch was tested on system gfs-i24c-01. The test is as follows: (1) I restore a metadata set I created that has journals 2-6 missing. (2) I run the new fsck with -n to verify it doesn't crash or make changes. Due to the large amount of output, I redirect the output elsewhere. (3) I run the new fsck with -y to verify it rebuilds all the journals and gives the proper return code of 1. (4) I run the new fsck again to verify a second run finds no errors and gives a return code of 0. Here are the testing results: [root@gfs-i24c-01 ../gfs2/fsck]# gfs2_edit restoremeta /home/bob/metadata/gfs2/severaldeadjournals.meta /dev/sasdrives/bob File system size: 104792069 (0x63f0005) blocks, aka 399.768GB There are 104857600 blocks of 4096 bytes in the destination device. 104857600 metadata blocks (100%) processed, File /home/bob/metadata/gfs2/severaldeadjournals.meta restore successful. [root@gfs-i24c-01 ../gfs2/fsck]# ./fsck.gfs2 -n /dev/sasdrives/bob &> /tmp/gronk [root@gfs-i24c-01 ../gfs2/fsck]# echo $? 4 [root@gfs-i24c-01 ../gfs2/fsck]# ./fsck.gfs2 -y /dev/sasdrives/bob Initializing fsck File system journal "journal2" is missing: pass1 will try to recreate it. File system journal "journal3" is missing: pass1 will try to recreate it. File system journal "journal4" is missing: pass1 will try to recreate it. File system journal "journal5" is missing: pass1 will try to recreate it. File system journal "journal6" is missing: pass1 will try to recreate it. Journal recovery complete. Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Invalid or missing journal2 system inode (should be 4, is 0). Rebuilding system file "journal2" Invalid or missing journal3 system inode (should be 4, is 0). Rebuilding system file "journal3" Invalid or missing journal4 system inode (should be 4, is 0). Rebuilding system file "journal4" Invalid or missing journal5 system inode (should be 4, is 0). Rebuilding system file "journal5" Invalid or missing journal6 system inode (should be 4, is 0). Rebuilding system file "journal6" Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete The statfs file is wrong: Current statfs values: blocks: 104846384 (0x63fd430) free: 104745764 (0x63e4b24) dinodes: 35 (0x23) Calculated statfs values: blocks: 104846384 (0x63fd430) free: 104581594 (0x63bc9da) dinodes: 40 (0x28) The statfs file was fixed. Writing changes to disk gfs2_fsck complete [root@gfs-i24c-01 ../gfs2/fsck]# echo $? 1 [root@gfs-i24c-01 ../gfs2/fsck]# ./fsck.gfs2 /dev/sasdrives/bob Initializing fsck Validating Resource Group index. Level 1 rgrp check: Checking if all rgrp and rindex values are good. (level 1 passed) Starting pass1 Pass1 complete Starting pass1b Pass1b complete Starting pass1c Pass1c complete Starting pass2 Pass2 complete Starting pass3 Pass3 complete Starting pass4 Pass4 complete Starting pass5 Pass5 complete gfs2_fsck complete [root@gfs-i24c-01 ../gfs2/fsck]# echo $? 0 [root@gfs-i24c-01 ../gfs2/fsck]#
Created attachment 505130 [details] Better patch While doing additional testing I discovered a shortcoming of the previous patch: If the per_node directory was missing and needed to be built, fsck.gfs2 would crash because it was trying to rebuild it too early (at a point where the rgrps were not read in). This patch is able to handle that situation properly. If the per_node directory is missing and is rebuilt, fsck.gfs2 may only build one journal during that run.
This patch was pushed to the master branch of the gfs2-utils git repository and the RHEL6 branch of the cluster.git repository. It was tested on system gfs-i24c-01 as described in comment #3, plus another test where the per_node directory was manually removed with gfs2_edit. Changing status to POST until we get this into a build.
Verified that multiple journals are recovered at the same time with gfs2-utils-3.0.12.1-7.el6.x86_64
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this patch, the fsck.gfs2 program used the number of entries in the journal index to look for missing journals. As a result, if more than one journal was missing, they were not all rebuilt and subsequent runs of fsck.gfs2 were needed to recover all the journals. Since each node needs its own journal, code was added to fsck.gfs2 to use the "per_node" system directory to determine the correct number of journals to repair. As a result, fsck.gfs2 now repairs all the journals in one run.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1516.html