This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 162422 - Recovery problem when the gulm master node is fenced
Recovery problem when the gulm master node is fenced
Status: CLOSED ERRATA
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gulm (Show other bugs)
3
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: michael conrad tadpol tilstra
Cluster QE
https://www.redhat.com/archives/linux...
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-07-04 08:28 EDT by Alban Crequy
Modified: 2009-04-16 16:25 EDT (History)
1 user (show)

See Also:
Fixed In Version: RHBA-2005-723
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-10-10 11:26:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Alban Crequy 2005-07-04 08:28:45 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.7.6) Gecko/20050322

Description of problem:
When the gulm master node also mount a GFS filesystem, the fencing process does not run properly if the gulm master node has to be fenced.

I may loose data because the recovery process begun too early (before the fence is finished).


Version-Release number of selected component (if applicable):
GFS-6.0.2.20-2

How reproducible:
Always

Steps to Reproduce:
1. Get a 8-nodes cluster.
2. Choose 5 nodes for gulm servers.
3. Mount a GFS filesystem of your 8 nodes.
4. Unplug the network of the current gulm master.
5. Wait until another gulm server become the master.
6. Do NOT run fence_ack_manual and check if the locks of the unplugged node are released.
  

Actual Results:  1. The locks are released immediately when another gulm server become the master.
2. The journal is recovered by another node immediately too.

Expected Results:  The recovery process should wait that the user run fence_ack_manual

Additional info:

More explanations here:
https://www.redhat.com/archives/linux-cluster/2005-July/msg00000.html

I file this bugzilla as requested here:
https://www.redhat.com/archives/linux-cluster/2005-July/msg00006.html
Comment 1 michael conrad tadpol tilstra 2005-07-05 09:40:33 EDT
Have you tried this with only three nodes are gulm servers?  How does it behave
then?
Comment 2 michael conrad tadpol tilstra 2005-07-05 10:01:38 EDT
just checked with three nodes. bug is there too.
Comment 3 michael conrad tadpol tilstra 2005-07-05 11:08:36 EDT
check_for_stale_expires() is tripping on everyone.  it only runs if a jid
mapping is marked 1. (live mappings are marked 2).  Only time jidmapping is
marked 1 is when a node other than owner is replaying the journal.  Why are the
live mappings getting switched to 1 from 2? i duno, but I bet that's the bug
right there.  I look deeper.
Comment 4 michael conrad tadpol tilstra 2005-07-05 14:00:22 EDT
Fixed the issue that was in comment #3 didn't fix the bug.  Digging more.
Comment 5 michael conrad tadpol tilstra 2005-07-05 14:05:17 EDT
Bug only appears when master lock server is also mounting gfs.
So a workaround is to put the lockservers onto dedicated nodes.
Comment 7 michael conrad tadpol tilstra 2005-07-19 11:21:13 EDT
There was a kludge that tried to fix something, but I cannot find or figure what
it was suppose to fix.  That kludge was causing this.  Betting on this being a
bigger problem that whatever it was trying to fix and removing the kludge.


I think what it tried to fix was some weird end case where multiple clients and
lock servers failed in some way. 
Comment 9 Red Hat Bugzilla 2005-09-30 10:56:29 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-723.html
Comment 10 Red Hat Bugzilla 2005-10-07 12:43:08 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-733.html
Comment 11 Red Hat Bugzilla 2005-10-10 11:26:07 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-723.html

Note You need to log in before you can comment on or make changes to this bug.