Hide Forgot
+++ This bug was initially created as a clone of Bug #1489735 +++ Description of problem: This currently does not work until https://bugzilla.redhat.com/show_bug.cgi?id=1489728 is fixed. But once that is fixed (i.e. when crm_resource --restart <bundle> --node=<node> will work), we need a small fix in pcs to allow the bundle restart on a single node. --- Additional comment from Tomas Jelinek on 2017-09-11 09:34:13 EDT --- There is a chance this will actually work without any changes in pcs as we do not care if the specified resource is a bundle. We will see once pacemaker part is ready. --- Additional comment from Tomas Jelinek on 2017-10-04 11:44:51 EDT --- I tried restarting bundles with pacemaker-1.1.16-12.el7_4.4 installed both in the cluster machines and in containers. "crm_resource --restart <bundle>" seems to be working fine. However, "crm_resource --restart <bundle> --node <node>" seems to be restarting the bundle on all nodes. # crm_resource --restart --resource httpd-bundle Set 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role set=httpd-bundle-meta_attributes name=target-role=stopped Waiting for 1 resources to stop: * httpd-bundle Deleted 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role name=target-role Waiting for 1 resources to start again: * httpd-bundle # crm_resource --restart --resource httpd-bundle --node rh74-node1 Set 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role set=httpd-bundle-meta_attributes name=target-role=stopped Waiting for 1 resources to stop: * httpd-bundle Deleted 'httpd-bundle' option: id=httpd-bundle-meta_attributes-target-role name=target-role Waiting for 1 resources to start again: * httpd-bundle While this is happening, I can see the bundle resource stopping and starting on all nodes in both cases. Ken, is the bundle restart supposed to work with the --node option? If the option is not going to be supported, nothing has to be done in pcs, as 'pcs resource restart httpd-bundle' already works. In the other case, a small fix in pcs is needed to allow specifying a node when restarting a bundle. --- Additional comment from Ken Gaillot on 2017-10-04 12:09:22 EDT --- Grabbing this as a pacemaker bug because nothing is needed from pcs. I think crm_resource --restart should treat bundles the same way as clones, i.e. ban the resource from the specified node rather than set target-role to Stopped. This should be a fairly simple change we can get into 7.5. --- Additional comment from Tomas Jelinek on 2017-10-05 03:03:59 EDT --- (In reply to Ken Gaillot from comment #4) > Grabbing this as a pacemaker bug because nothing is needed from pcs. That depends. Pcs currently does not allow specifying a node when restarting bundles: # pcs resource restart httpd-bundle rh74-node1 Error: can only restart on a specific node for a clone or master/slave resource As I said, a small fix in pcs is needed to allow specifying a node when restarting a bundle. If a node is not specified, everything works with current pcs: # pcs resource restart httpd-bundle httpd-bundle successfully restarted --- Additional comment from Ken Gaillot on 2017-10-05 09:59:21 EDT --- Ah, I missed that. We can clone this bz then, but let me make sure my idea works first. --- Additional comment from Ken Gaillot on 2017-10-06 19:17:02 EDT --- Fixed upstream as of commit d6eb1cb4 When restarting a bundle with multiple replicas on a single node, crm_resource will now do so by banning the bundle from that node (same as it would with a clone). Note that if there are any free nodes available to run the instance, the instance may begin to start there, but once crm_resource notices it stopped on the original node, it will remove the ban and the instance will likely move back to the original node (same as would happen with a clone with clone-max less than the number of available nodes). I noticed "crm_resource --wait" does not return after restarting a bundle, but that appears to be a separate issue, not affecting this bz. --- Additional comment from Ken Gaillot on 2017-10-06 19:18:36 EDT --- Also, for completeness, it is likely there are other places where tools should treat bundles like clones. That also will be investigated separately from this bz.
Created attachment 1338225 [details] proposed fix
Test: 1. Setup a httpd-bundle bundle resource with an apache resource in it: # pcs resource show httpd-bundle Bundle: httpd-bundle Docker: image=pcmktest:http options=--log-driver=journald replicas=3 Network: host-netmask=24 ip-range-start=192.168.122.145 Port Mapping: port=80 (httpd-port) Storage Mapping: options=rw source-dir-root=/root/docker/httpd-root target-dir=/var/www/html (httpd-root) options=rw source-dir-root=/root/docker/httpd-logs target-dir=/etc/httpd/logs (httpd-logs) Resource: apa (class=ocf provider=heartbeat type=apache) Operations: monitor interval=10 timeout=20s (apa-monitor-interval-10) start interval=0s timeout=40s (apa-start-interval-0s) stop interval=0s timeout=60s (apa-stop-interval-0s) 2. Restart the bundle on one node: # pcs resource restart httpd-bundle rh74-node2 Confirm the resource was restarted on the specified node only (logs, watch pcs status, etc.) 3. Restarting the inner resource causes restart of the whole bundle: # pcs resource restart apa Warning: using httpd-bundle... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name) httpd-bundle successfully restarted # pcs resource restart apa rh74-node2 Warning: using httpd-bundle... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name) httpd-bundle successfully restarted
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0866