Bug 129879 - gulm_tool on client reports Master node when there is no master
Summary: gulm_tool on client reports Master node when there is no master
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: gfs
Version: 3
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: michael conrad tadpol tilstra
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks: 137219
TreeView+ depends on / blocked
 
Reported: 2004-08-13 18:58 UTC by Adam "mantis" Manthei
Modified: 2010-01-12 02:56 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-05-25 16:41:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2005:466 0 normal SHIPPED_LIVE GFS bug fix update 2005-05-25 04:00:00 UTC

Description Adam "mantis" Manthei 2004-08-13 18:58:45 UTC
Description of problem:
gulm_tool getstats on a client will report the existance of a Master
in the event that the client was logged into a Master when the Master
node drops into Arbitrating.  Newly joining clients will not have this
bit of information.

Version-Release number of selected component (if applicable):
GFS-6.0.0-1.2

How reproducible:
Every time

Steps to Reproduce:
1. make lock_gulmd quorate on the nodes in the servers list
2. start lock_gulmd on a client
3. shutdown lock_gulmd on the slave nodes untill Master moves to
Arbitrating
4. run "gulm_tool getstats client1" and you will see an entry for the
Master
5. start lock_gulmd on a new client and run "gulm_tool getstats
client2" and there will not be entry for Master
  
Actual results:
[root@trin-01 root]# gulm_tool getstats trin-04
I_am = Client
Master = trin-01.lab.msp.redhat.com
rank = -1
GenerationID = 1092267400182171
run time = 14710
pid = 2505
verbosity = Default,Network2,Locking,Subscribers,LoginLoops,ServerState
failover = enabled

[root@trin-01 root]# gulm_tool getstats trin-05
I_am = Client
quorum_has = 1
quorum_needs = 2
rank = -1
GenerationID = 0
run time = 712
pid = 2500
verbosity = Default,Network2,Locking,Subscribers,LoginLoops,ServerState
failover = enabled


Expected results:
I would expect the fields quorum_has, quorum_needs and Master to be
the same on all the clients.  


Additional info:
Perhaps this is another set of fields worth munging for the getstats
output (rawstats could stay as it is)?

Comment 1 michael conrad tadpol tilstra 2004-09-23 15:08:38 UTC
Currently gulm_tool getstats reports what a specific server on a
specific node thinks right now.  Which isn't what you where expecting.
 Since it is perfectly possible for a server on a node to be thinking
the wrong things when you ask it.  This will get corrected in time.
(or the node is really messed up and will get fenced.)

I'm not conviced there is anything here to fix.

Comment 2 Adam "mantis" Manthei 2004-09-23 15:45:34 UTC
The node is heartbeating the Arbitrating (old Master) node.  It is not
really messed up and will not get fenced (unless you mean to imply
that all nodes logged into the cluster will get fenced eventually if
quorum is lost for whatever reason).

This will not get corrected by time.  The client will remain logged
into the old master (now arbitrating) node.  The only way this will be
corrected is for the Arbitrating node to become quorate again, making
it membership dependant, not time dependant.

If gulm_tool can't be wrong in this case, then the client itself must
be.  It thinks that there is quorum when infact there is not.  I
thought that this issue was addressed by the arbitrating node dropping
all of it's connections and forcing all clients and servers to
re-login when quorum was lost.  In such a case, the clients would see
that the master has lost lost quorum.

Am I missing something?

Comment 3 michael conrad tadpol tilstra 2004-09-23 16:14:55 UTC
Is the client really logged into the the node it thinks is master?
(but is really arbitrating.) This would be a bug.

The getstats output says who it thinks is the master.  That does not
mean the client is connected to that node.  It is just who it thinks
is the master.  If need really be, i can munge the output of getstats.

Comment 4 michael conrad tadpol tilstra 2004-09-23 17:53:30 UTC
i am the louse.
was thinking of something else entirely.  Pretty much just delete
everything I said in comment #3.


Comment 5 michael conrad tadpol tilstra 2004-09-23 19:33:00 UTC
right.  so i think i'm getting somewhere on this.
somewhere back when, clients were changed to not get kicked when
master dropped to arbit, because this created bogus fences.  That is
why the client is still logged in.  Of course, with all the shuffling
of cvs trees and bugzilla dbs, I cannot find the refences to this.

Comment 6 michael conrad tadpol tilstra 2004-09-24 16:29:52 UTC
So, out on the end points of the clients.  They don't know if the main
servers are in quorate or not.  Not a problem for gfs, since the way
the lock paths were designed.  The lock tables, which are on the main
servers and thus do know the quorate state, would stop lock traffic
when quorum was lost.  So the end clients with gfs worked as one would
expect.  Now since we're trying to move things to be less gfs
specific, we cannot relie on this anymore.

there is a fix in head, which while minor, changes protocols and
library interface.
There are two places in stable where this can be noticed.  First is as
in the initial bug report. (which mostly just appears wrong, even
though everything is still working correctly.)  Second, apps using
libgulm that are running out on client nodes won't get the correct
quorate state in this condition (is there anything on 6.0 using libgulm?).

Comment 7 michael conrad tadpol tilstra 2004-12-01 22:08:51 UTC
fix in 6.0.* now too.
except usespace libgulm interface has not changed.

Comment 8 Jay Turner 2005-05-25 16:41:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-466.html



Note You need to log in before you can comment on or make changes to this bug.