Bug 169087 - still a small window where another node can mount during a gfs_fsck
still a small window where another node can mount during a gfs_fsck
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Abhijith Das
GFS Bugs
Depends On:
  Show dependency treegraph
Reported: 2005-09-22 17:10 EDT by Corey Marthaler
Modified: 2010-01-11 22:07 EST (History)
0 users

See Also:
Fixed In Version: RHBA-2006-0233
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-03-09 14:43:11 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2005-09-22 17:10:32 EDT
Description of problem:
For the most part, the "mount during an in progress gfs_fsck" issue was fixed by
the checkin for bz 160108. The lock_proto gets set to "fsck_gulm(or dlm)" just
before pass1 starts, well after the gfs_fsck initialization and after clearing
out of the journals. And even when this is "set" on the node doing the fsck, and
a "read" of the super block on that machine verifies that the lock_proto is
indeed changed, there is still a 5 - 15 second window that the other ndoes still
think that the lock_proto is the old one which will allow mounts to occur. This
may be a cache issue on the fsck node.

How reproducible:

Steps to Reproduce:
1. start gfs_fsck on node
2. verify on that node that the lock_proto in the SB has changed
3. check the lock_proto on another node, it will not have changed yet
4. wait 5 - 15 seconds and then verify on another node that the lock_proto in
the SB has changed
Comment 1 Abhijith Das 2005-12-20 16:27:23 EST
2 parts to the problem: Fixed both parts and checked in code to RHEL4, STABLE
and HEAD
1 - When gfs_fsck starts, it sets lock_proto to fsck_gulm(or dlm). This happens
after the initialization phase which itself takes a good 10 seconds (on my setup). 
FIX: Split the function fill_super_block() into two, read_super_block() and
fill_super_block(). block_mounters() is called between these two functions so
the ~10 second delay disappears

2- When gfs_fsck modifies the lock_proto to fsck_gulm(or dlm) in
block_mounters(), it doesn't fsync() the changes to the superblock to disk. This
was causing the other nodes to still use the old value of lock_proto (lock_gulm,
lock_dlm) thereby allowing gfs mounts.
FIX: added fsync() so that all nodes see the change instantly.
Comment 4 Red Hat Bugzilla 2006-03-09 14:43:11 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.