Description of problem: gulm_tool forceexpire can expire already expired nodes. The result of which is that the master lock_guld core will attempt to fence everytime that force expired is called. This can cause the master server to get bogged down with fencing actions. Version-Release number of selected component (if applicable): GFS-6.0-1.2 How reproducible: Always Steps to Reproduce: 1. start lock_gulmd on a slave or client node 2. modify ccsd so that the fencing action never returns success (agent = "false") 3. run `gulm_tool forceexpire $master $node` multiple times 4. run `ps` on the master node, there should now be mulitple fence_node processes all trying to fence node $node.
Created attachment 103043 [details] adds node state check to Force_Node_Expire(char *name) This patch will verify that the the node that is being marked expired is in the logged in state. If it is already in the logged in state, there isn't much point in calling fence_node again since fence_node is already in the process of being called, or has failed. If the node is not in the logged in state, 1008 (Bad state change) is returned. The error code should probably differ from the case that the node is not logged in (1007)
this looks ok. I'm either way on the error codes. If you think it would be clearer with two differnt ones, lets do that.
patch applied.
This bugzilla is reported to have been fixed years ago.