Bug 129879
Summary: | gulm_tool on client reports Master node when there is no master | ||
---|---|---|---|
Product: | [Retired] Red Hat Cluster Suite | Reporter: | Adam "mantis" Manthei <amanthei> |
Component: | gfs | Assignee: | michael conrad tadpol tilstra <mtilstra> |
Status: | CLOSED ERRATA | QA Contact: | GFS Bugs <gfs-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-05-25 16:41:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 137219 |
Description
Adam "mantis" Manthei
2004-08-13 18:58:45 UTC
Currently gulm_tool getstats reports what a specific server on a specific node thinks right now. Which isn't what you where expecting. Since it is perfectly possible for a server on a node to be thinking the wrong things when you ask it. This will get corrected in time. (or the node is really messed up and will get fenced.) I'm not conviced there is anything here to fix. The node is heartbeating the Arbitrating (old Master) node. It is not really messed up and will not get fenced (unless you mean to imply that all nodes logged into the cluster will get fenced eventually if quorum is lost for whatever reason). This will not get corrected by time. The client will remain logged into the old master (now arbitrating) node. The only way this will be corrected is for the Arbitrating node to become quorate again, making it membership dependant, not time dependant. If gulm_tool can't be wrong in this case, then the client itself must be. It thinks that there is quorum when infact there is not. I thought that this issue was addressed by the arbitrating node dropping all of it's connections and forcing all clients and servers to re-login when quorum was lost. In such a case, the clients would see that the master has lost lost quorum. Am I missing something? Is the client really logged into the the node it thinks is master? (but is really arbitrating.) This would be a bug. The getstats output says who it thinks is the master. That does not mean the client is connected to that node. It is just who it thinks is the master. If need really be, i can munge the output of getstats. i am the louse. was thinking of something else entirely. Pretty much just delete everything I said in comment #3. right. so i think i'm getting somewhere on this. somewhere back when, clients were changed to not get kicked when master dropped to arbit, because this created bogus fences. That is why the client is still logged in. Of course, with all the shuffling of cvs trees and bugzilla dbs, I cannot find the refences to this. So, out on the end points of the clients. They don't know if the main servers are in quorate or not. Not a problem for gfs, since the way the lock paths were designed. The lock tables, which are on the main servers and thus do know the quorate state, would stop lock traffic when quorum was lost. So the end clients with gfs worked as one would expect. Now since we're trying to move things to be less gfs specific, we cannot relie on this anymore. there is a fix in head, which while minor, changes protocols and library interface. There are two places in stable where this can be noticed. First is as in the initial bug report. (which mostly just appears wrong, even though everything is still working correctly.) Second, apps using libgulm that are running out on client nodes won't get the correct quorate state in this condition (is there anything on 6.0 using libgulm?). fix in 6.0.* now too. except usespace libgulm interface has not changed. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-466.html |