Description of problem: This is related to bz 168698. There is still the case where the fence init will hang on attempting to stop the fence service.
I could not recreate this. In a 3 node cluster, I stopped cman on 2 of the nodes to get the cluster non-quorate. I was still able to stop fenced on the remaining node without any problem. Also note 'service cman stop' will fail if fenced is running. Marking this NEEDINFO until we figure out how to recreate this problem.
I have also been unable to recreate this, though I know it was quite reproducable at one time. You can either just close this as WORKSFORME and if/when I see it again I'll reopen it, or you can just add the -t option and mark it closed that way.
Finally reproduced it. 1. On a four node cluster, start ccsd on all four nodes. 2. Start cman on only 3 nodes (enough to get quorum) 3. Start fenced on only 1 nodes 4. Stop cman on the two nodes that don't have fenced started 5. Attempt to stop fenced... HANG [root@link-08 ~]# fence_tool join [root@link-08 ~]# cman_tool services Service Name GID LID State Code Fence Domain: "default" 2 4 join S-6,20,1 [1] [root@link-08 ~]# fence_tool leave fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum [this will be stuck until quorum return] If there was a working "-t" flag, then the fence_tool leave would eventually time out.
I have also reproduced it. Note that using 'service cman stop' will do a leave/remove and adjust quorum accordingly, thus the bug was not exposed. My bad.
Fixed. Added timeout option for fence_tool leave command (previously was only used in fence_tool join) and made appropriate changes to the init script. Default timeout is 120 seconds (in init script), same as the the default timeout for startup. Note that the fenced init script contains two separate, configurable variables: FENCED_START_TIMEOUT and FENCED_STOP_TIMEOUT.
Marking verified.
This has been fixed in the current (4.7) release.