1156311 – Need ability to start resource and wait until it finishes starting before returning (and show error information if it fails)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1156311 - Need ability to start resource and wait until it finishes starting before returning (and show error information if it fails)

Summary: Need ability to start resource and wait until it finishes starting before ret...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcs
Sub Component:
Version:	7.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Tomas Jelinek
QA Contact:	cluster-qe@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-10-24 07:23 UTC by Chris Feist
Modified:	2015-03-05 09:20 UTC (History)
CC List:	4 users (show)
Fixed In Version:	pcs-0.9.137-2.el7
Doc Type:	Enhancement
Doc Text:	Feature: Add optional '--wait' parameter to 'pcs resource' commands making pcs wait for resources to start/move and report on what nodes the resource has started/failed. Reason: User needs to know whether the resource has started/failed without using 'pcs status command' after 'pcs resource create' command. Result: When using the '--wait' parameter in supported 'pcs resource' command the user is informed whether the resource has started/been moved/failed, on which nodes, and gets a failure explanation message in case of fail.
Clone Of:
Environment:
Last Closed:	2015-03-05 09:20:42 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
proposed fix (40.82 KB, patch) 2014-11-06 14:02 UTC, Tomas Jelinek	no flags	Details \| Diff
proposed fix 2 (79.30 KB, patch) 2014-11-25 16:29 UTC, Tomas Jelinek	no flags	Details \| Diff
proposed fix 3 (16.88 KB, patch) 2014-12-08 16:37 UTC, Tomas Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1187571	medium	CLOSED	ungrouping a resource from a cloned group produces invalid CIB when other resources exist in that group	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1188571	high	CLOSED	The --wait functionality implementation needs an overhaul	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2015:0415	normal	SHIPPED_LIVE	pcs bug fix and enhancement update	2015-03-05 14:16:41 UTC

Internal Links: 1187571 1188571

Description Chris Feist 2014-10-24 07:23:16 UTC

Description of problem:
When you create a resource with pcs it returns immediately, so if there is some error in the resource configuration you don't know without checking status (and don't know why it fails to start).

The new option should be --wait=[n] to be consistent with resource enable/disable

If a wait time is not specified we use the default wait timeout from pacemaker.

If a resource fails we don't wait the full timeout, just immediately return.

We should also return the node that the resource failed to start one (or succeeded starting on).

Also, if a resource fails we want to get the error output from the resource agent.  There is some work with resource agents to provide this information to pacemaker, but we may need to parse /var/log/messages to try and find that output.  We may also need to use pcsd to get this information.

Comment 1 Tomas Jelinek 2014-11-06 14:02:44 UTC

Created attachment 954471 [details]
proposed fix

Test:

[root@rh70-node1:~]# pcs resource create apa1 apache --wait
Resource 'apa1' is running on node rh70-node1.
[root@rh70-node1:~]# echo $?
0
[root@rh70-node1:~]# time pcs resource create apa2 apache configfile=/root/missing --wait
Error: unable to start: 'apa2', please check logs for failure information
rh70-node2: Port number  is invalid!
rh70-node1: Port number  is invalid!

real    0m1.312s
user    0m0.190s
sys     0m0.071s
[root@rh70-node1:~]# echo $?
1

Comment 5 Tomas Jelinek 2014-11-25 16:29:08 UTC

Created attachment 961279 [details]
proposed fix 2

Comment 7 Tomas Jelinek 2014-11-26 15:51:02 UTC

Affected commands (guide for testing):

pcs resource create
- waits for the resource to be started
- if --clone is specified waits for all instances to be running (works with globally-unique, clone-max and clone-node-max meta attributes to get the number of instances)
- if --master is specified waits for the resource to be promoted (works with master-max and master-node-max meta attributes to get the number of promoted instances)
- reports failures and nodes on which the resource is running
- does not wait if -f, -- disabled or meta target-role=Stopped is specified

pcs resource enable, pcs resource disable
- already had a --wait option support
- added: reports failures and nodes on which the resource is running

pcs resource move
- waits for the resource to be started on a target node (or a node different to the current one if the target node not specified)
- reports failures and nodes on which the resource is running
- does not wait if -f is specified or the resource is not running

pcs resource ban
- waits for the resource to be started on a node different to a target node (or current node if the target node is not specified)
- reports failures and nodes on which the resource is running
- does not wait if -f is specified or the resource is not running

pcs resource clear
- gets operations related to the specified resource using crm_simulate and waits for them to finish
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

pcs resource meta, pcs resource clone, pcs resource master
- waits for the resource to be started/stopped when changing target-role
- waits for master/clone instances to be started/stopped/promoted when changing globally-unique, clone-max, clone-node-max, master-max, master-node-max options
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

pcs resource unclone
- waits for the resource to be running as one instance
- does not wait if -f is specified or the resource is not running

pcs resource group add, pcs resource group remove, pcs resource ungroup
- gets operations related to the specified resource(s) using crm_simulate and waits for them to finish
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

Comment 8 Tomas Jelinek 2014-12-08 15:53:07 UTC

pcs resource update
- waits for the resource to be started/stopped when changing target-role
- waits for master/clone instances to be started/stopped/promoted when changing globally-unique, clone-max, clone-node-max, master-max, master-node-max options
- reports failures and nodes on which the resource is running
- does not wait if -f is specified

Comment 9 Tomas Jelinek 2014-12-08 16:37:41 UTC

Created attachment 965916 [details]
proposed fix 3

Comment 18 errata-xmlrpc 2015-03-05 09:20:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0415.html

Note You need to log in before you can comment on or make changes to this bug.