Hide Forgot
Description of problem: ###updated the controller ,now puppet is running a script that does not recognize the new rabbitmq-server node-attr and fails. /etc/puppet/environments/production/modules/quickstack/manifests/pacemaker/memcached.pp $ cat memcached.pp class quickstack::pacemaker::memcached { include ::memcached include quickstack::pacemaker::common class {'::quickstack::firewall::memcached':} Exec['wait-for-settle'] -> Exec['pcs-memcached-server-set-up-on-this-node'] Service['memcached'] -> exec {"pcs-memcached-server-set-up-on-this-node": command => "/tmp/ha-all-in-one-util.bash update_my_node_property memcached", } -> exec {"all-memcached-nodes-are-up": timeout => 3600, tries => 360, try_sleep => 10, command => "/tmp/ha-all-in-one-util.bash all_members_include memcached", } -> quickstack::pacemaker::resource::generic { 'memcached': clone_opts => "interleave=true", } -> Anchor['pacemaker ordering constraints begin'] } ###script fails with errors such as: Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached Invalid Property: pcmk-controller2=rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3,memcached 15:42:13 $ pcs property show Cluster Properties: ceilometer: running cinder: running cluster-infrastructure: corosync cluster-name: openstack dc-version: 1.1.13-10.el7_2.4-44eb2dd glance: running have-watchdog: false heat: running horizon: running keystone: running mysqlinit: running neutron: running nosql: running nova: running pcmk-controller1: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance pcmk-controller2: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance,haproxy pcmk-controller3: memcached,haproxy,mysqlinit,rabbitmq,keystone,nova,cinder,neutron,heat,horizon,nosql,ceilometer,glance,memcached rabbitmq: running stonith-enabled: false Node Attributes: pcmk-controller1: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller1 pcmk-controller2: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller2 pcmk-controller3: rmq-node-attr-last-known-rabbitmq-server=rabbit@lb-backend-controller3 root.com:~ ( controller1 ) Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. update OSP6 2. on forman puppet run, the script fails with errros above. 3. Actual results: puppet run fails Expected results: no failure Additional info: may be related to: https://bugzilla.redhat.com/show_bug.cgi?id=1346164
15:53:22 $ rpm -qa | grep pacemaker pacemaker-libs-1.1.13-10.el7_2.4.x86_64 pacemaker-cluster-libs-1.1.13-10.el7_2.4.x86_64 pacemaker-1.1.13-10.el7_2.4.x86_64 pacemaker-cli-1.1.13-10.el7_2.4.x86_64 15:53:28 $ rpm -qa | grep puppet puppet-3.6.2-2.el7.noarch 16:01:02 $ rpm -qa | grep rabbitmq rabbitmq-server-3.3.5-22.el7ost.noarc
Looking back at: https://bugzilla.redhat.com/show_bug.cgi?id=1346164 I don't think that BZ was triaged appropriately and I think this actually is a bug with the ha-all-in-one.bash script. Comment #2 from Jason Guiditta suggests that the version of OFI has already been tested per: https://bugzilla.redhat.com/show_bug.cgi?id=1290684 However, that BZ was opened in 2015. Based on: https://rhn.redhat.com/errata/RHBA-2016-0556.html It looks like the following code was added to the resource-agents package in March of 2016: https://git.centos.org/blob/rpms!resource-agents/f784e8cb080c453bc9c1cafa447fb125da652761/SOURCES!bz1311180-rabbitmq-cluster-forget-stopped-cluster-nodes.patch;jsessionid=osbxg57k6dho155jb94ooijjw#L15 Which add's the attribute: +# this attr represents the current active local rmq node name. +# when rmq stops or the node is fenced, this attr disappears RMQ_CRM_ATTR_COOKIE="rmq-node-attr-${OCF_RESOURCE_INSTANCE}" +# this attr represents the last known active local rmq node name +# when rmp stops or the node is fenced, the attr stays forever so +# we can continue to map an offline pcmk node to it's rmq node name +# equivalent. +RMQ_CRM_ATTR_COOKIE_LAST_KNOWN="rmq-node-attr-last-known-${OCF_RESOURCE_INSTANCE}" To me - when I read that, this means that this attribute will be added to the cluster so that if the node goes offline, pacemaker still knows the node name. So, having this attribute added to the cluster is not a indication of a problem, but just a steady state. To me, that makes the fix suggested appropriate.
seems the fix is in this : https://bugzilla.redhat.com/attachment.cgi?id=1167789&action=diff
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
has this been an issue beyond RHOS 6?
OSP6 has been retired, and will not receive further updates. See https://access.redhat.com/support/policy/updates/openstack/platform/ for details.