Bug 1501274

Summary:

pcs resource restart <bundle> <node> is not working

Product:

Red Hat Enterprise Linux 7

Reporter:

Tomas Jelinek <tojeline>

Component:

pcs

Assignee:

Tomas Jelinek <tojeline>

Status:

CLOSED ERRATA

QA Contact:

Ofer Blaut <oblaut>

Severity:

high

Docs Contact:

Priority:

high

Version:

7.4

CC:

abeekhof, cfeist, cluster-maint, cluster-qe, idevat, kgaillot, michele, mkrcmari, omular, rsteiger, tojeline

Target Milestone:

Keywords:

EasyFix

Target Release:

7.5

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

pcs-0.9.161-1.el7

Doc Type:

Bug Fix

Doc Text:

Cause: The user tries to restart a bundle resource on one node only. Consequence: Pcs exits with an error saying only clone and master/slave resources support restarting on a specified node. Fix: Allow bundles to be restarted on a specified node. Result: It is now possible to specify a node on which a bundle resource should be restarted.

Story Points:

---

Clone Of:

1489735

Environment:

Last Closed:

2018-04-10 15:40:54 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1489728, 1489735

Bug Blocks:

Attachments:

Description	Flags
proposed fix	none

Description Tomas Jelinek 2017-10-12 10:14:31 UTC

+++ This bug was initially created as a clone of Bug #1489735 +++

Description of problem:
This currently does not work until https://bugzilla.redhat.com/show_bug.cgi?id=1489728 is fixed. But once that is fixed (i.e. when crm_resource --restart <bundle> --node=<node> will work), we need a small fix in pcs to allow the bundle restart on a single node.

--- Additional comment from Tomas Jelinek on 2017-09-11 09:34:13 EDT ---

There is a chance this will actually work without any changes in pcs as we do not care if the specified resource is a bundle. We will see once pacemaker part is ready.

--- Additional comment from Tomas Jelinek on 2017-10-04 11:44:51 EDT ---

I tried restarting bundles with pacemaker-1.1.16-12.el7_4.4 installed both in the cluster machines and in containers. "crm_resource --restart <bundle>" seems to be working fine. However, "crm_resource --restart <bundle> --node <node>" seems to be restarting the bundle on all nodes.

# crm_resource --restart --resource httpd-bundle

Set 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role set=httpd-bundle-meta_attributes name=target-role=stopped
Waiting for 1 resources to stop:
 * httpd-bundle
Deleted 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role name=target-role
Waiting for 1 resources to start again:
 * httpd-bundle

# crm_resource --restart --resource httpd-bundle --node rh74-node1

Set 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role set=httpd-bundle-meta_attributes name=target-role=stopped
Waiting for 1 resources to stop:
 * httpd-bundle
Deleted 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role name=target-role
Waiting for 1 resources to start again:
 * httpd-bundle

While this is happening, I can see the bundle resource stopping and starting on all nodes in both cases.


Ken, is the bundle restart supposed to work with the --node option? If the option is not going to be supported, nothing has to be done in pcs, as 'pcs resource restart httpd-bundle' already works. In the other case, a small fix in pcs is needed to allow specifying a node when restarting a bundle.

--- Additional comment from Ken Gaillot on 2017-10-04 12:09:22 EDT ---

Grabbing this as a pacemaker bug because nothing is needed from pcs.

I think crm_resource --restart should treat bundles the same way as clones, i.e. ban the resource from the specified node rather than set target-role to Stopped. This should be a fairly simple change we can get into 7.5.

--- Additional comment from Tomas Jelinek on 2017-10-05 03:03:59 EDT ---

(In reply to Ken Gaillot from comment #4)
> Grabbing this as a pacemaker bug because nothing is needed from pcs.

That depends. Pcs currently does not allow specifying a node when restarting bundles:

# pcs resource restart httpd-bundle rh74-node1
Error: can only restart on a specific node for a clone or master/slave resource

As I said, a small fix in pcs is needed to allow specifying a node when restarting a bundle.


If a node is not specified, everything works with current pcs:

# pcs resource restart httpd-bundle
httpd-bundle successfully restarted

--- Additional comment from Ken Gaillot on 2017-10-05 09:59:21 EDT ---

Ah, I missed that. We can clone this bz then, but let me make sure my idea works first.

--- Additional comment from Ken Gaillot on 2017-10-06 19:17:02 EDT ---

Fixed upstream as of commit d6eb1cb4

When restarting a bundle with multiple replicas on a single node, crm_resource will now do so by banning the bundle from that node (same as it would with a clone). Note that if there are any free nodes available to run the instance, the instance may begin to start there, but once crm_resource notices it stopped on the original node, it will remove the ban and the instance will likely move back to the original node (same as would happen with a clone with clone-max less than the number of available nodes).

I noticed "crm_resource --wait" does not return after restarting a bundle, but that appears to be a separate issue, not affecting this bz.

--- Additional comment from Ken Gaillot on 2017-10-06 19:18:36 EDT ---

Also, for completeness, it is likely there are other places where tools should treat bundles like clones. That also will be investigated separately from this bz.

Comment 1 Tomas Jelinek 2017-10-13 12:18:55 UTC

Created attachment 1338225 [details]
proposed fix

Comment 2 Tomas Jelinek 2017-10-13 12:29:55 UTC

Test:
1. Setup a httpd-bundle bundle resource with an apache resource in it:
# pcs resource show httpd-bundle
 Bundle: httpd-bundle
  Docker: image=pcmktest:http options=--log-driver=journald replicas=3
  Network: host-netmask=24 ip-range-start=192.168.122.145
  Port Mapping:
   port=80 (httpd-port)
  Storage Mapping:
   options=rw source-dir-root=/root/docker/httpd-root target-dir=/var/www/html (httpd-root)
   options=rw source-dir-root=/root/docker/httpd-logs target-dir=/etc/httpd/logs (httpd-logs)
  Resource: apa (class=ocf provider=heartbeat type=apache)
   Operations: monitor interval=10 timeout=20s (apa-monitor-interval-10)
               start interval=0s timeout=40s (apa-start-interval-0s)
               stop interval=0s timeout=60s (apa-stop-interval-0s)

2. Restart the bundle on one node:
# pcs resource restart httpd-bundle rh74-node2
Confirm the resource was restarted on the specified node only (logs, watch pcs status, etc.)

3. Restarting the inner resource causes restart of the whole bundle:
# pcs resource restart apa
Warning: using httpd-bundle... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name)
httpd-bundle successfully restarted

# pcs resource restart apa rh74-node2
Warning: using httpd-bundle... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name)
httpd-bundle successfully restarted

Comment 7 errata-xmlrpc 2018-04-10 15:40:54 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0866