Description of problem: gfs2_fsck does not fix gfs2 inconsistencies of the following type: May 1 07:27:29 hostname kernel: GFS2: fsid=: Trying to join cluster "lock_nolock", "clustername:gfs2" May 1 07:27:29 hostname kernel: Lock_Nolock (built Dec 17 2008 11:44:35) installed May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: Joined cluster. Now mounting FS... May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: jid=0, already locked for use May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: jid=0: Looking at journal... May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: fatal: filesystem consistency error May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: inode = 4 25 May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: function = jhead_scan, file = fs/gfs2/recovery.c, line = 239 May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: about to withdraw this file system May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: telling LM to withdraw May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: withdrawn May 1 07:27:29 hostname kernel: May 1 07:27:29 hostname kernel: Call Trace: May 1 07:27:29 hostname kernel: [<ffffffff88517526>] :gfs2:gfs2_lm_withdraw+0xc1/0xd0 May 1 07:27:29 hostname kernel: [<ffffffff885257e4>] :gfs2:gfs2_replay_read_block+0x78/0x89 May 1 07:27:29 hostname kernel: [<ffffffff88525890>] :gfs2:get_log_header+0x9b/0xe5 May 1 07:27:29 hostname kernel: [<ffffffff8852a71f>] :gfs2:gfs2_consist_inode_i+0x43/0x48 May 1 07:27:29 hostname kernel: [<ffffffff88525a38>] :gfs2:gfs2_find_jhead+0xf5/0x119 May 1 07:27:29 hostname kernel: [<ffffffff88525b9d>] :gfs2:gfs2_recover_journal+0x141/0x837 May 1 07:27:29 hostname kernel: [<ffffffff8851a99c>] :gfs2:gfs2_meta_read+0x17/0x65 May 1 07:27:29 hostname kernel: [<ffffffff8851ad9a>] :gfs2:gfs2_meta_indirect_buffer+0xba/0x160 May 1 07:27:29 hostname kernel: [<ffffffff80021afc>] __up_read+0x19/0x7f May 1 07:27:29 hostname kernel: [<ffffffff8850c112>] :gfs2:gfs2_block_map+0x32b/0x33e May 1 07:27:29 hostname kernel: [<ffffffff8851f9a9>] :gfs2:map_journal_extents+0x6f/0x13b May 1 07:27:29 hostname kernel: [<ffffffff8850c2ce>] :gfs2:gfs2_write_alloc_required+0xfd/0x122 May 1 07:27:29 hostname kernel: [<ffffffff8851fd26>] :gfs2:init_journal+0x2b1/0x410 May 1 07:27:29 hostname kernel: [<ffffffff88528992>] :gfs2:gfs2_jindex_hold+0x54/0x19c May 1 07:27:29 hostname kernel: [<ffffffff80022de5>] d_alloc_root+0x43/0x4b May 1 07:27:29 hostname kernel: [<ffffffff8851fea4>] :gfs2:init_inodes+0x1f/0x178 May 1 07:27:29 hostname kernel: [<ffffffff885208b4>] :gfs2:fill_super+0x8b7/0xa63 May 1 07:27:29 hostname kernel: [<ffffffff88514a7e>] :gfs2:gfs2_glock_nq_num+0x3b/0x68 May 1 07:27:29 hostname kernel: [<ffffffff800de3c1>] set_bdev_super+0x0/0xf May 1 07:27:29 hostname kernel: [<ffffffff800de3d0>] test_bdev_super+0x0/0xd May 1 07:27:29 hostname kernel: [<ffffffff8851fffd>] :gfs2:fill_super+0x0/0xa63 May 1 07:27:29 hostname kernel: [<ffffffff800df384>] get_sb_bdev+0x10a/0x164 May 1 07:27:29 hostname kernel: [<ffffffff800ded21>] vfs_kern_mount+0x93/0x11a May 1 07:27:29 hostname kernel: [<ffffffff800dedea>] do_kern_mount+0x36/0x4d May 1 07:27:29 hostname kernel: [<ffffffff800e8cab>] do_mount+0x6a7/0x717 May 1 07:27:29 hostname kernel: [<ffffffff8014a4a3>] radix_tree_delete+0x150/0x187 May 1 07:27:29 hostname kernel: [<ffffffff8002310e>] __pagevec_free+0x21/0x2e May 1 07:27:29 hostname kernel: [<ffffffff800076ad>] find_get_page+0x21/0x50 May 1 07:27:29 hostname kernel: [<ffffffff80013441>] filemap_nopage+0x188/0x322 May 1 07:27:29 hostname kernel: [<ffffffff80008b88>] __handle_mm_fault+0x51d/0xe5c May 1 07:27:29 hostname kernel: [<ffffffff800c873c>] zone_statistics+0x3e/0x6d May 1 07:27:29 hostname kernel: [<ffffffff8000f10b>] __alloc_pages+0x65/0x2ce May 1 07:27:29 hostname kernel: [<ffffffff8002a8a0>] iput+0x4b/0x84 May 1 07:27:29 hostname kernel: [<ffffffff8004bee9>] sys_mount+0x8a/0xcd May 1 07:27:29 hostname kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 May 1 07:27:29 hostname kernel: May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: jid=0: Failed May 1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: error recovering journal 0: -5 1- gfs2_fsck from BUG 496330 has been used to fix the problem, but does not fix it after 2 runs. Version-Release number of selected component (if applicable): gfs2-utils-0.1.53-1.el5_3.3.x86_64.rpm How reproducible: Everytime Steps to Reproduce: 1. Run gfs2_fsck -yv on the block device 2. try to mount the fs 3. obtain the error above and mount command hangs. Actual results: gfs2_fsck does not fix FS with the error above. Expected results: gfs2_fsck fix the error.
This bug is related to the bug below: https://bugzilla.redhat.com/show_bug.cgi?id=457557 ATTENTION: BUG 457557 is for gfs2-kmod - gfs2.ko BUG 498646 is for gfs2-utils - gfs2_fsck BUG 457557 is specifically for the kernel module bug which can introduce the problem, and the present bug is for the gfs2_fsck to fix it.
Adding Theophanis to the cc list to keep him informed.
On `gfs2_fsck -y' we have a jid=0: Failed. Initializing fsck Initializing lists... Recovering journals (this may take a while)jid=0: Looking at journal... jid=0: Failed jid=1: Looking at journal... jid=1: Journal is clean. jid=2: Looking at journal... jid=2: Journal is clean. Journal recovery complete. ... ...
Created attachment 342138 [details] Proposed patch This patch fixes the journal resequencing problem. It probably still needs more testing, but it seems to work properly.
Since I see a "RHEL 5.4 Development ends" date of Monday, I decided to push this fix to the master branch of the gfs2-utils git tree and the STABLE3, STABLE2 and RHEL5 branches of the cluster git tree for inclusion into 5.4. It was tested on roth-01 using the customer's failing metadata and fixed the sequence numbers. I'm changing the status to Modified as well. If the customer still has issues with the code, please feel free to change the status again.
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1337.html