Red Hat Bugzilla – Bug 169087
still a small window where another node can mount during a gfs_fsck
Last modified: 2010-01-11 22:07:37 EST
Description of problem:
For the most part, the "mount during an in progress gfs_fsck" issue was fixed by
the checkin for bz 160108. The lock_proto gets set to "fsck_gulm(or dlm)" just
before pass1 starts, well after the gfs_fsck initialization and after clearing
out of the journals. And even when this is "set" on the node doing the fsck, and
a "read" of the super block on that machine verifies that the lock_proto is
indeed changed, there is still a 5 - 15 second window that the other ndoes still
think that the lock_proto is the old one which will allow mounts to occur. This
may be a cache issue on the fsck node.
Steps to Reproduce:
1. start gfs_fsck on node
2. verify on that node that the lock_proto in the SB has changed
3. check the lock_proto on another node, it will not have changed yet
4. wait 5 - 15 seconds and then verify on another node that the lock_proto in
the SB has changed
2 parts to the problem: Fixed both parts and checked in code to RHEL4, STABLE
1 - When gfs_fsck starts, it sets lock_proto to fsck_gulm(or dlm). This happens
after the initialization phase which itself takes a good 10 seconds (on my setup).
FIX: Split the function fill_super_block() into two, read_super_block() and
fill_super_block(). block_mounters() is called between these two functions so
the ~10 second delay disappears
2- When gfs_fsck modifies the lock_proto to fsck_gulm(or dlm) in
block_mounters(), it doesn't fsync() the changes to the superblock to disk. This
was causing the other nodes to still use the old value of lock_proto (lock_gulm,
lock_dlm) thereby allowing gfs mounts.
FIX: added fsync() so that all nodes see the change instantly.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.