Bug 1182244

Summary: crm_resource --restart broken
Product: Red Hat Enterprise Linux 7 Reporter: Radek Steiger <rsteiger>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: high    
Version: 7.1CC: cluster-maint, fdinitto, jkortus, nicolas
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.13-3.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-19 12:12:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Radek Steiger 2015-01-14 17:14:19 UTC
> Description of problem:

We've recently implemented a restart command into pcs using the crm_resource --restart feature. This however doesn't seem to work properly in the crm_resource itself, suffering from several issues:

[root@virt-041 ~]# pcs resource create dummy0 Dummy
[root@virt-041 ~]# pcs resource
 dummy0	(ocf::heartbeat:Dummy):	Started 

The resource is running, let's try to restart:

[root@virt-041 ~]# crm_resource --restart --resource dummy0
Could not set target-role for dummy0: No such device or address (-6)

Hm, looks like something is missing (issue #1)? What if I add a target-role meta attribute:

[root@virt-041 ~]# pcs resource update dummy0 meta target-role=Started
[root@virt-041 ~]# crm_resource --restart --resource dummy0
Waiting for 1 resources to stop:
 * dummy0
Segmentation fault

One step forward, then segfault (issue #2), but the resource has been stopped:

[root@virt-041 ~]# pcs resource
 dummy0	(ocf::heartbeat:Dummy):	Stopped 
[root@virt-041 ~]# pcs resource --full | grep Meta
  Meta Attrs: target-role=stopped 

Let's see what it does while stopped:

[root@virt-041 ~]# crm_resource --restart --resource dummy0
Error performing operation: No such device or address

Nope, neither this works properly (issue #3).


> Version-Release number of selected component (if applicable):
pacemaker-1.1.12-19.el7

Comment 2 Radek Steiger 2015-01-14 17:18:21 UTC
In addition, the feature (probably backported from 1.1.13) doesn't seem to be mentioned in neither man pages nor built-in help.

Comment 3 Andrew Beekhof 2015-01-15 02:58:11 UTC
(In reply to Radek Steiger from comment #2)
> In addition, the feature (probably backported from 1.1.13) doesn't seem to
> be mentioned in neither man pages nor built-in help.

For some reason I appear to have copied the option line from a legacy entry (these are suppressed) AND forgotten to go back and add a description.

This patch:

-    {"restart",    0, 0,  0,  NULL, 1},
+    {"restart",    0, 0,  0,  "\t\t(Advanced) Tell the cluster to restart this resource and anything that depends on it"},

results in the following showing up in --help and man pages:

       --restart
              (Advanced) Tell the cluster to restart this resource and anything that depends on it

Comment 4 Andrew Beekhof 2015-01-15 03:04:02 UTC
The fix for the underlying issue is:

diff --git a/tools/crm_resource.c b/tools/crm_resource.c
index 968683a..c04c056 100644
--- a/tools/crm_resource.c
+++ b/tools/crm_resource.c
@@ -1415,6 +1415,9 @@ update_dataset(cib_t *cib, pe_working_set_t * data_set, bool simulate)
         goto cleanup;
     }
 
+    if(data_set->input) {
+        free_xml(data_set->input);
+    }
     set_working_set_defaults(data_set);
     data_set->input = cib_xml_copy;
     data_set->now = crm_time_new(NULL);
@@ -1453,7 +1456,7 @@ update_dataset(cib_t *cib, pe_working_set_t * data_set, bool simulate)
 
   cleanup:
     cib_delete(shadow_cib);
-    free_xml(cib_xml_copy);
+    /* free_xml(cib_xml_copy); */
     free(pid);
 
     if(shadow_file) {


Without this patch we were free'ing memory that was subsequently needed for building the update to be sent to the cib. Thus the update looked like:

   debug: set_resource_attr: 	Update   <?O6 id="swift-fs">
   debug: set_resource_attr: 	Update     <meta_attributes id="swift-fs-meta_attributes">
   debug: set_resource_attr: 	Update       <nvpair id="swift-fs-meta_attributes-target-role" name="target-role" value="stopped"/>
   debug: set_resource_attr: 	Update     </meta_attributes>
   debug: set_resource_attr: 	Update   </?O6>


Note the "?O6" instead of "primitive".
Patch is currently in testing (to make sure its the only issue).

Bug was introduced in f1141a32e - "Fix: crm_resource: Clean up memory in --restart error paths" back in November :-(

Comment 6 Nicolas R. 2015-07-08 08:46:15 UTC
I've got the same problem when I want to restart a resource

pcs resource restart VIP
Error: Error performing operation: No such device or address


Using pcs-0.9.137-13 on Redhat7, and pacemaker 1.1.12-22

Will it be fixed?

Comment 9 errata-xmlrpc 2015-11-19 12:12:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-2383.html