Red Hat Bugzilla – Bug 1459251
pcs should not guess expected status of a resource when --wait is used
Last modified: 2017-07-21 07:09:33 EDT
Description of problem:
When the --wait flag is used in pcs commands, pcs guesses in what state a resource managed by the command should be when the command finishes. At the end of the command, pcs checks in what state the resource really is and returns 0 if real and expected status matches or 1 if the statuses do not match.
The issue is the expected state of the resource is very hard to get right and that may lead to pcs exiting with a bad return code.
Version-Release number of selected component (if applicable):
always, easily (depending on cluster settings complexity)
Steps to Reproduce:
# pcs resource create test1 ocf:pacemaker:Dummy meta is-managed=false --wait
Error: resource 'test1' is not running on any node
# echo $?
pcs exits with 1 because the resource did not start
pcs exits with 0 as the resource was not able to start (pacemaker does not start unmanaged resources) and therefore the command succeeded
With this particular reproducer the issue may seem to be easy to fix in pcs - if the resource is not managed, we expect it not to be started. However more complex setups are possible: the resource may not start due to constraints, utilization, cluster properties and so on and so forth. Also we are not talking about resource create only. Most of the commands supporting --wait are affected.
One way to deal with this is to not make any assumptions of expected resource state. --wait would cause pcs to run crm_resource --wait and print resource status. But pcs exit code would not depend on the resource status. Users would be able to use separate commands to figure out resources' status like described in bz1290830 comment 3.