Bug 1156311

Summary: Need ability to start resource and wait until it finishes starting before returning (and show error information if it fails)
Product: Red Hat Enterprise Linux 7 Reporter: Chris Feist <cfeist>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: high Docs Contact:
Priority: high    
Version: 7.1CC: cluster-maint, fdinitto, rsteiger, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcs-0.9.137-2.el7 Doc Type: Enhancement
Doc Text:
Feature: Add optional '--wait' parameter to 'pcs resource' commands making pcs wait for resources to start/move and report on what nodes the resource has started/failed. Reason: User needs to know whether the resource has started/failed without using 'pcs status command' after 'pcs resource create' command. Result: When using the '--wait' parameter in supported 'pcs resource' command the user is informed whether the resource has started/been moved/failed, on which nodes, and gets a failure explanation message in case of fail.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 09:20:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed fix
none
proposed fix 2
none
proposed fix 3 none

Description Chris Feist 2014-10-24 07:23:16 UTC
Description of problem:
When you create a resource with pcs it returns immediately, so if there is some error in the resource configuration you don't know without checking status (and don't know why it fails to start).

The new option should be --wait=[n] to be consistent with resource enable/disable

If a wait time is not specified we use the default wait timeout from pacemaker.

If a resource fails we don't wait the full timeout, just immediately return.

We should also return the node that the resource failed to start one (or succeeded starting on).

Also, if a resource fails we want to get the error output from the resource agent.  There is some work with resource agents to provide this information to pacemaker, but we may need to parse /var/log/messages to try and find that output.  We may also need to use pcsd to get this information.

Comment 1 Tomas Jelinek 2014-11-06 14:02:44 UTC
Created attachment 954471 [details]
proposed fix

Test:

[root@rh70-node1:~]# pcs resource create apa1 apache --wait
Resource 'apa1' is running on node rh70-node1.
[root@rh70-node1:~]# echo $?
0
[root@rh70-node1:~]# time pcs resource create apa2 apache configfile=/root/missing --wait
Error: unable to start: 'apa2', please check logs for failure information
rh70-node2: Port number  is invalid!
rh70-node1: Port number  is invalid!

real    0m1.312s
user    0m0.190s
sys     0m0.071s
[root@rh70-node1:~]# echo $?
1

Comment 5 Tomas Jelinek 2014-11-25 16:29:08 UTC
Created attachment 961279 [details]
proposed fix 2

Comment 7 Tomas Jelinek 2014-11-26 15:51:02 UTC
Affected commands (guide for testing):

pcs resource create
- waits for the resource to be started
- if --clone is specified waits for all instances to be running (works with globally-unique, clone-max and clone-node-max meta attributes to get the number of instances)
- if --master is specified waits for the resource to be promoted (works with master-max and master-node-max meta attributes to get the number of promoted instances)
- reports failures and nodes on which the resource is running
- does not wait if -f, -- disabled or meta target-role=Stopped is specified

pcs resource enable, pcs resource disable
- already had a --wait option support
- added: reports failures and nodes on which the resource is running

pcs resource move
- waits for the resource to be started on a target node (or a node different to the current one if the target node not specified)
- reports failures and nodes on which the resource is running
- does not wait if -f is specified or the resource is not running

pcs resource ban
- waits for the resource to be started on a node different to a target node (or current node if the target node is not specified)
- reports failures and nodes on which the resource is running
- does not wait if -f is specified or the resource is not running

pcs resource clear
- gets operations related to the specified resource using crm_simulate and waits for them to finish
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

pcs resource meta, pcs resource clone, pcs resource master
- waits for the resource to be started/stopped when changing target-role
- waits for master/clone instances to be started/stopped/promoted when changing globally-unique, clone-max, clone-node-max, master-max, master-node-max options
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

pcs resource unclone
- waits for the resource to be running as one instance
- does not wait if -f is specified or the resource is not running

pcs resource group add, pcs resource group remove, pcs resource ungroup
- gets operations related to the specified resource(s) using crm_simulate and waits for them to finish
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

Comment 8 Tomas Jelinek 2014-12-08 15:53:07 UTC
pcs resource update
- waits for the resource to be started/stopped when changing target-role
- waits for master/clone instances to be started/stopped/promoted when changing globally-unique, clone-max, clone-node-max, master-max, master-node-max options
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

Comment 9 Tomas Jelinek 2014-12-08 16:37:41 UTC
Created attachment 965916 [details]
proposed fix 3

Comment 18 errata-xmlrpc 2015-03-05 09:20:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0415.html