Description of problem: There is a corner case not handled by SM which can result in new gfs mounts happening before a fencing is completed for a failed node that had the fs mounted. There is a chance of this leading to fs corruption. - Nodes A, B, C are cluster members and all have joined the fence domain. - A has gfs mounted, B and C do not. - A fails. - B and C begin fencing A. This can take some time, especially noticable when using fence_manual. - B and/or C mount gfs. - The new mount by B/C is allowed to go ahead before the fencing for A has completed. - If A is still writing to gfs and fencing has not completed before B/C do initial gfs recovery, the fs can be corrupted. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Fixed in STABLE and RHEL4 branches: SM should wait for all recoveries to complete before it processes any group joins/leaves. Fixes bz 162014.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-734.html