Bug 498646 - gfs2_fsck does not fix filesystem when 'journal is already locked for use'
Summary: gfs2_fsck does not fix filesystem when 'journal is already locked for use'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs2-utils
Version: 5.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Robert Peterson
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-01 16:12 UTC by Eduardo Damato
Modified: 2018-10-20 02:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 11:01:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed patch (2.94 KB, patch)
2009-05-01 18:18 UTC, Robert Peterson
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1337 0 normal SHIPPED_LIVE Low: gfs2-utils security and bug fix update 2009-09-01 10:41:56 UTC

Comment 1 Eduardo Damato 2009-05-01 16:16:02 UTC
Description of problem:

gfs2_fsck does not fix gfs2 inconsistencies of the following type:

May  1 07:27:29 hostname kernel: GFS2: fsid=: Trying to join cluster "lock_nolock", "clustername:gfs2" 
May  1 07:27:29 hostname kernel: Lock_Nolock (built Dec 17 2008 11:44:35) installed 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: Joined cluster. Now mounting FS... 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: jid=0, already locked for use 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: jid=0: Looking at journal... 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: fatal: filesystem consistency error 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0:   inode = 4 25 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0:   function = jhead_scan, file = fs/gfs2/recovery.c, line = 239 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: about to withdraw this file system 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: telling LM to withdraw 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: withdrawn 
May  1 07:27:29 hostname kernel: 
May  1 07:27:29 hostname kernel: Call Trace: 
May  1 07:27:29 hostname kernel:  [<ffffffff88517526>] :gfs2:gfs2_lm_withdraw+0xc1/0xd0 
May  1 07:27:29 hostname kernel:  [<ffffffff885257e4>] :gfs2:gfs2_replay_read_block+0x78/0x89 
May  1 07:27:29 hostname kernel:  [<ffffffff88525890>] :gfs2:get_log_header+0x9b/0xe5 
May  1 07:27:29 hostname kernel:  [<ffffffff8852a71f>] :gfs2:gfs2_consist_inode_i+0x43/0x48 
May  1 07:27:29 hostname kernel:  [<ffffffff88525a38>] :gfs2:gfs2_find_jhead+0xf5/0x119 
May  1 07:27:29 hostname kernel:  [<ffffffff88525b9d>] :gfs2:gfs2_recover_journal+0x141/0x837 
May  1 07:27:29 hostname kernel:  [<ffffffff8851a99c>] :gfs2:gfs2_meta_read+0x17/0x65 
May  1 07:27:29 hostname kernel:  [<ffffffff8851ad9a>] :gfs2:gfs2_meta_indirect_buffer+0xba/0x160 
May  1 07:27:29 hostname kernel:  [<ffffffff80021afc>] __up_read+0x19/0x7f 
May  1 07:27:29 hostname kernel:  [<ffffffff8850c112>] :gfs2:gfs2_block_map+0x32b/0x33e 
May  1 07:27:29 hostname kernel:  [<ffffffff8851f9a9>] :gfs2:map_journal_extents+0x6f/0x13b 
May  1 07:27:29 hostname kernel:  [<ffffffff8850c2ce>] :gfs2:gfs2_write_alloc_required+0xfd/0x122 
May  1 07:27:29 hostname kernel:  [<ffffffff8851fd26>] :gfs2:init_journal+0x2b1/0x410 
May  1 07:27:29 hostname kernel:  [<ffffffff88528992>] :gfs2:gfs2_jindex_hold+0x54/0x19c 
May  1 07:27:29 hostname kernel:  [<ffffffff80022de5>] d_alloc_root+0x43/0x4b 
May  1 07:27:29 hostname kernel:  [<ffffffff8851fea4>] :gfs2:init_inodes+0x1f/0x178 
May  1 07:27:29 hostname kernel:  [<ffffffff885208b4>] :gfs2:fill_super+0x8b7/0xa63 
May  1 07:27:29 hostname kernel:  [<ffffffff88514a7e>] :gfs2:gfs2_glock_nq_num+0x3b/0x68 
May  1 07:27:29 hostname kernel:  [<ffffffff800de3c1>] set_bdev_super+0x0/0xf 
May  1 07:27:29 hostname kernel:  [<ffffffff800de3d0>] test_bdev_super+0x0/0xd 
May  1 07:27:29 hostname kernel:  [<ffffffff8851fffd>] :gfs2:fill_super+0x0/0xa63 
May  1 07:27:29 hostname kernel:  [<ffffffff800df384>] get_sb_bdev+0x10a/0x164 
May  1 07:27:29 hostname kernel:  [<ffffffff800ded21>] vfs_kern_mount+0x93/0x11a 
May  1 07:27:29 hostname kernel:  [<ffffffff800dedea>] do_kern_mount+0x36/0x4d 
May  1 07:27:29 hostname kernel:  [<ffffffff800e8cab>] do_mount+0x6a7/0x717 
May  1 07:27:29 hostname kernel:  [<ffffffff8014a4a3>] radix_tree_delete+0x150/0x187 
May  1 07:27:29 hostname kernel:  [<ffffffff8002310e>] __pagevec_free+0x21/0x2e 
May  1 07:27:29 hostname kernel:  [<ffffffff800076ad>] find_get_page+0x21/0x50 
May  1 07:27:29 hostname kernel:  [<ffffffff80013441>] filemap_nopage+0x188/0x322 
May  1 07:27:29 hostname kernel:  [<ffffffff80008b88>] __handle_mm_fault+0x51d/0xe5c 
May  1 07:27:29 hostname kernel:  [<ffffffff800c873c>] zone_statistics+0x3e/0x6d 
May  1 07:27:29 hostname kernel:  [<ffffffff8000f10b>] __alloc_pages+0x65/0x2ce 
May  1 07:27:29 hostname kernel:  [<ffffffff8002a8a0>] iput+0x4b/0x84 
May  1 07:27:29 hostname kernel:  [<ffffffff8004bee9>] sys_mount+0x8a/0xcd 
May  1 07:27:29 hostname kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0 
May  1 07:27:29 hostname kernel: 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: jid=0: Failed 
May  1 07:27:29 hostname kernel: GFS2: fsid=clustername:gfs2.0: error recovering journal 0: -5




1- gfs2_fsck from BUG 496330 has been used to fix the problem, but does not fix it after 2 runs.

Version-Release number of selected component (if applicable):

gfs2-utils-0.1.53-1.el5_3.3.x86_64.rpm

How reproducible:

Everytime

Steps to Reproduce:
1. Run gfs2_fsck -yv on the block device
2. try to mount the fs
3. obtain the error above and mount command hangs.
  
Actual results:

gfs2_fsck does not fix FS with the error above.

Expected results:

gfs2_fsck fix the error.

Comment 2 Eduardo Damato 2009-05-01 16:24:35 UTC
This bug is related to the bug below:

https://bugzilla.redhat.com/show_bug.cgi?id=457557

ATTENTION:

BUG 457557 is for gfs2-kmod - gfs2.ko
BUG 498646 is for gfs2-utils - gfs2_fsck

BUG 457557 is specifically for the kernel module bug which can introduce the problem, and the present bug is for the gfs2_fsck to fix it.

Comment 3 Robert Peterson 2009-05-01 16:29:36 UTC
Adding Theophanis to the cc list to keep him informed.

Comment 5 Eduardo Damato 2009-05-01 17:22:17 UTC
On `gfs2_fsck -y' we have a jid=0: Failed.

Initializing fsck
Initializing lists...
Recovering journals (this may take a while)jid=0: Looking at journal...
jid=0: Failed
jid=1: Looking at journal...
jid=1: Journal is clean.
jid=2: Looking at journal...
jid=2: Journal is clean.

Journal recovery complete.
...
...

Comment 7 Robert Peterson 2009-05-01 18:18:20 UTC
Created attachment 342138 [details]
Proposed patch

This patch fixes the journal resequencing problem.  It probably
still needs more testing, but it seems to work properly.

Comment 8 Robert Peterson 2009-05-01 21:45:32 UTC
Since I see a "RHEL 5.4 Development ends" date of Monday, I decided
to push this fix to the master branch of the gfs2-utils git tree
and the STABLE3, STABLE2 and RHEL5 branches of the cluster git tree
for inclusion into 5.4.  It was tested on roth-01 using the
customer's failing metadata and fixed the sequence numbers.
I'm changing the status to Modified as well.  If the customer still
has issues with the code, please feel free to change the status again.

Comment 13 Chris Ward 2009-07-03 18:43:23 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 15 errata-xmlrpc 2009-09-02 11:01:57 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1337.html


Note You need to log in before you can comment on or make changes to this bug.