Description of problem: Similar to bug #132989, but for the fence_tool. I'm in a state where a fence operation is failing, so the fence domain service is in the "recover" state. Attempt to do a "fence_tool leave" in this state is not going to work (and doesn't, which is fine). However, the tool returns zero and reports no error. [root@tng3-1 cluster]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 recover 2 - [1 4 2 5] [root@tng3-1 cluster]# fence_tool leave [root@tng3-1 cluster]# echo $? 0 [root@tng3-1 cluster]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 recover 2 - [1 4 2 5] [root@tng3-1 cluster]# Version-Release number of selected component (if applicable): [root@tng3-1 cluster]# fence_tool -V fence_tool DEVEL.1095793252 (built Sep 21 2004 14:03:23) Copyright (C) Red Hat, Inc. 2004 All rights reserved. How reproducible: Yes. Steps to Reproduce: 1. Put the fence domain service in 'recover' state 2. Run 'fence_tool leave' 3. Actual results: Return code 0, no error reported. Expected results: Non-0 return code. Error reported. Additional info:
fence_tool leave just does kill(pid, SIGTERM). I can't think of a nice way off hand to check if fenced exits or not. I believe it /will/ exit in response to the delivered signal once it's done with recovery. In that sense, it's successful even if fenced doesn't exit immediately. Will think about this a bit more, any ideas are welcome.
Same issue as ccs_update... Is there some sort of ping you could give the fence daemon that responds by saying "aliving" or "shuting down"? Perhaps this is over-engineering the problem...
Updating version to the right level in the defects. Sorry for the storm.