Bug 1386941

Summary: After update, rabbitmq has new pacemaker node attributes that seem to cause puppet runs to fail.
Product: Red Hat OpenStack Reporter: Jeremy <jmelvin>
Component: puppet-pacemakerAssignee: RHOS Maint <rhos-maint>
Status: CLOSED WONTFIX QA Contact: nlevinki <nlevinki>
Severity: high Docs Contact:
Priority: high    
Version: 6.0 (Juno)CC: aschultz, chjones, jjoyce, jmelvin, jschluet, slinaber, tvignaud
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 6.0 (Juno)Flags: chjones: needinfo? (jmelvin)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-22 20:35:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jeremy 2016-10-19 21:58:03 UTC
Description of problem:

###updated the controller ,now puppet is running a script that does not recognize the new rabbitmq-server node-attr and fails. 

/etc/puppet/environments/production/modules/quickstack/manifests/pacemaker/memcached.pp

$ cat memcached.pp
class quickstack::pacemaker::memcached {

  include ::memcached
  include quickstack::pacemaker::common
  class {'::quickstack::firewall::memcached':}

  Exec['wait-for-settle'] -> Exec['pcs-memcached-server-set-up-on-this-node']

  Service['memcached'] ->
  exec {"pcs-memcached-server-set-up-on-this-node":
    command => "/tmp/ha-all-in-one-util.bash update_my_node_property memcached",
  } ->
  exec {"all-memcached-nodes-are-up":
    timeout   => 3600,
    tries     => 360,
    try_sleep => 10,
    command   => "/tmp/ha-all-in-one-util.bash all_members_include memcached",
  } ->
  quickstack::pacemaker::resource::generic { 'memcached':
    clone_opts => "interleave=true",
  } ->
  Anchor['pacemaker ordering constraints begin']
}

###script fails with errors such as:
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached


15:42:13 $ pcs property show
Cluster Properties:
 ceilometer: running
 cinder: running
 cluster-infrastructure: corosync
 cluster-name: openstack
 dc-version: 1.1.13-10.el7_2.4-44eb2dd
 glance: running
 have-watchdog: false
 heat: running
 horizon: running
 keystone: running
 mysqlinit: running
 neutron: running
 nosql: running
 nova: running
 pcmk-controller1: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance
 pcmk-controller2: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance,haproxy
 pcmk-controller3: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance,memcached
 rabbitmq: running
 stonith-enabled: false
Node Attributes:
 pcmk-controller1: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller1
 pcmk-controller2: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller2
 pcmk-controller3: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3
root.com:~ ( controller1 )


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. update OSP6
2. on forman puppet run, the script fails with errros above.
3.

Actual results:
puppet run fails

Expected results:
no failure

Additional info:

may be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1346164

Comment 2 Jeremy 2016-10-20 21:05:11 UTC
15:53:22 $ rpm -qa | grep pacemaker
pacemaker-libs-1.1.13-10.el7_2.4.x86_64
pacemaker-cluster-libs-1.1.13-10.el7_2.4.x86_64
pacemaker-1.1.13-10.el7_2.4.x86_64
pacemaker-cli-1.1.13-10.el7_2.4.x86_64

15:53:28 $ rpm -qa | grep puppet
puppet-3.6.2-2.el7.noarch

16:01:02 $ rpm -qa | grep rabbitmq
rabbitmq-server-3.3.5-22.el7ost.noarc

Comment 3 Jeremy 2016-10-21 13:13:17 UTC
Looking back at:

https://bugzilla.redhat.com/show_bug.cgi?id=1346164

I don't think that BZ was triaged appropriately and I think this actually is a bug with the ha-all-in-one.bash script.

Comment #2 from Jason Guiditta suggests that the version of OFI has already been tested per:

https://bugzilla.redhat.com/show_bug.cgi?id=1290684

However, that BZ was opened in 2015.   Based on:

https://rhn.redhat.com/errata/RHBA-2016-0556.html

It looks like the following code was added to the resource-agents package in March of 2016:

https://git.centos.org/blob/rpms!resource-agents/f784e8cb080c453bc9c1cafa447fb125da652761/SOURCES!bz1311180-rabbitmq-cluster-forget-stopped-cluster-nodes.patch;jsessionid=osbxg57k6dho155jb94ooijjw#L15

Which add's the attribute:

+# this attr represents the current active local rmq node name.
+# when rmq stops or the node is fenced, this attr disappears
 RMQ_CRM_ATTR_COOKIE="rmq-node-attr-${OCF_RESOURCE_INSTANCE}"
+# this attr represents the last known active local rmq node name
+# when rmp stops or the node is fenced, the attr stays forever so
+# we can continue to map an offline pcmk node to it's rmq node name
+# equivalent. 
+RMQ_CRM_ATTR_COOKIE_LAST_KNOWN="rmq-node-attr-last-known-${OCF_RESOURCE_INSTANCE}"

To me - when I read that, this means that this attribute will be added to the cluster so that if the node goes offline, pacemaker still knows the node name.  So, having this attribute added to the cluster is not a indication of a problem, but just a steady state.  To me, that makes the fix suggested appropriate.

Comment 4 Jeremy 2016-10-21 21:36:37 UTC
seems the fix is in this : https://bugzilla.redhat.com/attachment.cgi?id=1167789&action=diff

Comment 5 Red Hat Bugzilla Rules Engine 2017-02-07 20:45:57 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 6 Chris Jones 2017-10-24 14:05:25 UTC
has this been an issue beyond RHOS 6?

Comment 7 Scott Lewis 2018-02-22 20:35:50 UTC
OSP6 has been retired, and will not receive further updates. See https://access.redhat.com/support/policy/updates/openstack/platform/ for details.