Bug 1031141

Summary: pcs has strange/inconsistent behaviour and operation namings
Product: Red Hat Enterprise Linux 6 Reporter: Robert Scheck <redhat-bugzilla>
Component: pcsAssignee: Tomas Jelinek <tojeline>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 6.5CC: cluster-maint, fdinitto, robert.scheck, rsteiger, tojeline
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: pcs-0.9.138-1.el6 Doc Type: Bug Fix
Doc Text:
* After the user added a duplicate resource operation, Pacemaker configuration became invalid. With this update, pcs does not add the operation and instead informs the user that the same operation already exists. (BZ#1031141)
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 06:15:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed fix
none
fix for the original proposed fix
none
proposed fix - make default resource operations unique none

Description Robert Scheck 2013-11-15 17:18:59 UTC
Description of problem:
pcs has strange/inconsistent behaviour and operation namings, first 
configuration example:

$ pcs cluster cib vm_cfg
$ pcs -f vm_cfg resource create vm ocf:heartbeat:VirtualDomain config=/etc/libvirt/qemu/vm.xml snapshot=/var/lib/libvirt/qemu/pacemaker
$ pcs -f vm_cfg resource op add vm monitor interval=60s timeout=30s
$ pcs -f vm_cfg resource op add vm start interval=0 timeout=120s
$ pcs -f vm_cfg resource op add vm stop interval=0 timeout=120s
$ pcs -f vm_cfg constraint colocation add vm libvirtd INFINITY
$ pcs -f vm_cfg constraint order libvirtd then vm
$ pcs cluster cib-push vm_cfg

This results in:

$ pcs config
[...]
 Resource: vm (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/vm.xml snapshot=/var/lib/libvirt/qemu/pacemaker 
  Operations: monitor interval=60s (vm-monitor-interval-60s)
              monitor interval=60s timeout=30s (vm-name-monitor-interval-60s-timeout-30s)
              start interval=0 timeout=120s (vm-name-start-interval-0-timeout-120s)
              stop interval=0 timeout=120s (vm-name-stop-interval-0-timeout-120s)
$

Ouch? How did that happen? Yes, it's a new feature of pcs 0.9.90 (RHEL 6.5)
to add a monitor interval by default if none has been specified. However pcs 
doesn't seem to be really transaction safe thus it does not know that later
(in same cib!) a proper operations monitor is added. Thus the configuration 
is not valid:

$ crm_verify -L -V
   error: is_op_dup: 	Operation vm-name-monitor-interval-60s-timeout-30s is a duplicate of vm-monitor-interval-60s
   error: is_op_dup: 	Do not use the same (name, interval) combination more than once per resource
   error: is_op_dup: 	Operation vm-name-monitor-interval-60s-timeout-30s is a duplicate of vm-monitor-interval-60s
   error: is_op_dup: 	Do not use the same (name, interval) combination more than once per resource
Errors found during check: config not valid
$ 

That brought me to the second configuration example where I tried to add the 
operations monitor using create and not afterwards:

$ pcs cluster cib vm_cfg
$ pcs -f vm_cfg resource create vm ocf:heartbeat:VirtualDomain config=/etc/libvirt/qemu/vm.xml snapshot=/var/lib/libvirt/qemu/pacemaker op monitor interval=60s timeout=30s
$ pcs -f vm_cfg resource op add vm start interval=0 timeout=120s
$ pcs -f vm_cfg resource op add vm stop interval=0 timeout=120s
$ pcs -f vm_cfg constraint colocation add vm libvirtd INFINITY
$ pcs -f vm_cfg constraint order libvirtd then vm
$ pcs cluster cib-push vm_cfg

This results in:

$ pcs config
[...]
 Resource: vm (class=ocf provider=heartbeat type=VirtualDomain)
  Attributes: config=/etc/libvirt/qemu/vm.xml snapshot=/var/lib/libvirt/qemu/pacemaker 
  Operations: monitor interval=60s timeout=30s (vm-monitor-interval-60s)
              start interval=0 timeout=120s (vm-name-start-interval-0-timeout-120s)
              stop interval=0 timeout=120s (vm-name-stop-interval-0-timeout-120s)
$ 

Ehm? Why is it named "vm-monitor-interval-60s" rather "vm-name-monitor-
interval-60s-timeout-30s" even I added "monitor interval=60s timeout=30s"
to the create command?

I would expect "vm-name-monitor-interval-60s-timeout-30s" both times...

Version-Release number of selected component (if applicable):
pcs-0.9.90-1.el6_4.noarch

How reproducible:
Everytime, see above and below.

Actual results:
pcs has strange/inconsistent behaviour and operation namings.

Expected results:
At least proper naming as suggested/expected above. That pcs is not really
transaction safe is a pity but obviously not easy to change, I guess?

Comment 4 Tomas Jelinek 2014-12-15 13:58:03 UTC
Created attachment 969012 [details]
proposed fix

Comment 5 Tomas Jelinek 2014-12-17 10:36:40 UTC
Created attachment 970019 [details]
fix for the original proposed fix

Comment 6 Tomas Jelinek 2014-12-17 10:37:32 UTC
Created attachment 970020 [details]
proposed fix - make default resource operations unique

Comment 7 Tomas Jelinek 2015-01-27 13:57:36 UTC
Before Fix:
[root@rh66-node1 ~]# rpm -q pcs
pcs-0.9.123-9.el6.x86_64

[root@rh66-node1:~]# pcs resource create dummy Dummy
[root@rh66-node1:~]# pcs resource show dummy
 Resource: dummy (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (dummy-start-timeout-20)
              stop interval=0s timeout=20 (dummy-stop-timeout-20)
              monitor interval=10 timeout=20 (dummy-monitor-interval-10)
[root@rh66-node1:~]# pcs resource op add dummy monitor interval=10s timeout=20s
[root@rh66-node1:~]# pcs resource op add dummy start interval=0s timeout=20s
[root@rh66-node1:~]# pcs resource op add dummy stop interval=0s timeout=20s
[root@rh66-node1:~]# pcs resource show dummy
 Resource: dummy (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (dummy-start-timeout-20)
              stop interval=0s timeout=20 (dummy-stop-timeout-20)
              monitor interval=10 timeout=20 (dummy-monitor-interval-10)
              monitor interval=10s timeout=20s (dummy-name-monitor-interval-10s-timeout-20s)
              start interval=0s timeout=20s (dummy-name-start-interval-0s-timeout-20s)
              stop interval=0s timeout=20s (dummy-name-stop-interval-0s-timeout-20s)

[root@rh66-node1:~]# pcs resource create dummy1 Dummy op monitor interval=10s timeout=20s
[root@rh66-node1:~]# pcs resource show dummy1
 Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (dummy1-start-timeout-20)
              stop interval=0s timeout=20 (dummy1-stop-timeout-20)
              monitor interval=10s timeout=20s (dummy1-monitor-interval-10s)



After Fix:
[root@rh66-node1:~]# rpm -q pcs
pcs-0.9.138-1.el6.x86_64

[root@rh66-node1:~]# pcs resource create dummy Dummy
[root@rh66-node1:~]# pcs resource show dummy
 Resource: dummy (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (dummy-start-interval-0s)
              stop interval=0s timeout=20 (dummy-stop-interval-0s)
              monitor interval=10 timeout=20 (dummy-monitor-interval-10)
[root@rh66-node1:~]# pcs resource op add dummy monitor interval=10s timeout=20s
Error: operation monitor with interval 10s already specified for dummy:
monitor interval=10 timeout=20 (dummy-monitor-interval-10)
[root@rh66-node1:~]# echo $?
1
[root@rh66-node1:~]# pcs resource op add dummy start interval=0s timeout=20s
Error: operation start with interval 0s already specified for dummy:
start interval=0s timeout=20 (dummy-start-interval-0s)
[root@rh66-node1:~]# echo $?
1
[root@rh66-node1:~]# pcs resource op add dummy stop interval=0s timeout=20s
Error: operation stop with interval 0s already specified for dummy:
stop interval=0s timeout=20 (dummy-stop-interval-0s)
[root@rh66-node1:~]# echo $?
1
[root@rh66-node1:~]# pcs resource show dummy
 Resource: dummy (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (dummy-start-interval-0s)
              stop interval=0s timeout=20 (dummy-stop-interval-0s)
              monitor interval=10 timeout=20 (dummy-monitor-interval-10)

[root@rh66-node1:~]# pcs resource create dummy1 Dummy op monitor interval=10s timeout=20s
[root@rh66-node1:~]# pcs resource show dummy1
 Resource: dummy1 (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (dummy1-start-interval-0s)
              stop interval=0s timeout=20 (dummy1-stop-interval-0s)
              monitor interval=10s timeout=20s (dummy1-monitor-interval-10s)
[root@rh66-node1:~]# pcs resource create dummy2 Dummy op monitor interval=20s timeout=30s
[root@rh66-node1:~]# pcs resource show dummy2
 Resource: dummy2 (class=ocf provider=heartbeat type=Dummy)
  Operations: start interval=0s timeout=20 (dummy2-start-interval-0s)
              stop interval=0s timeout=20 (dummy2-stop-interval-0s)
              monitor interval=20s timeout=30s (dummy2-monitor-interval-20s)

Comment 11 errata-xmlrpc 2015-07-22 06:15:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-1446.html