Bug 1386941 - After update, rabbitmq has new pacemaker node attributes that seem to cause puppet runs to fail. [NEEDINFO]
Summary: After update, rabbitmq has new pacemaker node attributes that seem to cause ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-pacemaker
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 6.0 (Juno)
Assignee: RHOS Maint
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-19 21:58 UTC by Jeremy
Modified: 2019-12-16 07:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-22 20:35:50 UTC
Target Upstream Version:
chjones: needinfo? (jmelvin)


Attachments (Terms of Use)

Description Jeremy 2016-10-19 21:58:03 UTC
Description of problem:

###updated the controller ,now puppet is running a script that does not recognize the new rabbitmq-server node-attr and fails. 

/etc/puppet/environments/production/modules/quickstack/manifests/pacemaker/memcached.pp

$ cat memcached.pp
class quickstack::pacemaker::memcached {

  include ::memcached
  include quickstack::pacemaker::common
  class {'::quickstack::firewall::memcached':}

  Exec['wait-for-settle'] -> Exec['pcs-memcached-server-set-up-on-this-node']

  Service['memcached'] ->
  exec {"pcs-memcached-server-set-up-on-this-node":
    command => "/tmp/ha-all-in-one-util.bash update_my_node_property memcached",
  } ->
  exec {"all-memcached-nodes-are-up":
    timeout   => 3600,
    tries     => 360,
    try_sleep => 10,
    command   => "/tmp/ha-all-in-one-util.bash all_members_include memcached",
  } ->
  quickstack::pacemaker::resource::generic { 'memcached':
    clone_opts => "interleave=true",
  } ->
  Anchor['pacemaker ordering constraints begin']
}

###script fails with errors such as:
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached
Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached


15:42:13 $ pcs property show
Cluster Properties:
 ceilometer: running
 cinder: running
 cluster-infrastructure: corosync
 cluster-name: openstack
 dc-version: 1.1.13-10.el7_2.4-44eb2dd
 glance: running
 have-watchdog: false
 heat: running
 horizon: running
 keystone: running
 mysqlinit: running
 neutron: running
 nosql: running
 nova: running
 pcmk-controller1: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance
 pcmk-controller2: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance,haproxy
 pcmk-controller3: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance,memcached
 rabbitmq: running
 stonith-enabled: false
Node Attributes:
 pcmk-controller1: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller1
 pcmk-controller2: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller2
 pcmk-controller3: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3
root.com:~ ( controller1 )


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. update OSP6
2. on forman puppet run, the script fails with errros above.
3.

Actual results:
puppet run fails

Expected results:
no failure

Additional info:

may be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1346164

Comment 2 Jeremy 2016-10-20 21:05:11 UTC
15:53:22 $ rpm -qa | grep pacemaker
pacemaker-libs-1.1.13-10.el7_2.4.x86_64
pacemaker-cluster-libs-1.1.13-10.el7_2.4.x86_64
pacemaker-1.1.13-10.el7_2.4.x86_64
pacemaker-cli-1.1.13-10.el7_2.4.x86_64

15:53:28 $ rpm -qa | grep puppet
puppet-3.6.2-2.el7.noarch

16:01:02 $ rpm -qa | grep rabbitmq
rabbitmq-server-3.3.5-22.el7ost.noarc

Comment 3 Jeremy 2016-10-21 13:13:17 UTC
Looking back at:

https://bugzilla.redhat.com/show_bug.cgi?id=1346164

I don't think that BZ was triaged appropriately and I think this actually is a bug with the ha-all-in-one.bash script.

Comment #2 from Jason Guiditta suggests that the version of OFI has already been tested per:

https://bugzilla.redhat.com/show_bug.cgi?id=1290684

However, that BZ was opened in 2015.   Based on:

https://rhn.redhat.com/errata/RHBA-2016-0556.html

It looks like the following code was added to the resource-agents package in March of 2016:

https://git.centos.org/blob/rpms!resource-agents/f784e8cb080c453bc9c1cafa447fb125da652761/SOURCES!bz1311180-rabbitmq-cluster-forget-stopped-cluster-nodes.patch;jsessionid=osbxg57k6dho155jb94ooijjw#L15

Which add's the attribute:

+# this attr represents the current active local rmq node name.
+# when rmq stops or the node is fenced, this attr disappears
 RMQ_CRM_ATTR_COOKIE="rmq-node-attr-${OCF_RESOURCE_INSTANCE}"
+# this attr represents the last known active local rmq node name
+# when rmp stops or the node is fenced, the attr stays forever so
+# we can continue to map an offline pcmk node to it's rmq node name
+# equivalent. 
+RMQ_CRM_ATTR_COOKIE_LAST_KNOWN="rmq-node-attr-last-known-${OCF_RESOURCE_INSTANCE}"

To me - when I read that, this means that this attribute will be added to the cluster so that if the node goes offline, pacemaker still knows the node name.  So, having this attribute added to the cluster is not a indication of a problem, but just a steady state.  To me, that makes the fix suggested appropriate.

Comment 4 Jeremy 2016-10-21 21:36:37 UTC
seems the fix is in this : https://bugzilla.redhat.com/attachment.cgi?id=1167789&action=diff

Comment 5 Red Hat Bugzilla Rules Engine 2017-02-07 20:45:57 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 6 Chris Jones 2017-10-24 14:05:25 UTC
has this been an issue beyond RHOS 6?

Comment 7 Scott Lewis 2018-02-22 20:35:50 UTC
OSP6 has been retired, and will not receive further updates. See https://access.redhat.com/support/policy/updates/openstack/platform/ for details.


Note You need to log in before you can comment on or make changes to this bug.