Description of problem: - Rabbitmq is not starting on one of the nodes in the cluster. - This is a freshly deployed RHOSP10 environment Below is the error log from the failed controller node(c1f-ops-ctlc22): ~~~ =ERROR REPORT==== 9-May-2018::11:10:25 === Error on AMQP connection <0.405.0> (10.20.x.3x:47814 -> 10.20.x.3x:5672, vhost: '/', user: 'guest', state: running), channel 0: operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'" =ERROR REPORT==== 9-May-2018::11:10:25 === Error on AMQP connection <0.429.0> (10.20.x.3x:47942 -> 10.20.x.3x:5672, vhost: '/', user: 'guest', state: running), channel 0: operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'" ~~~ ~~~ [root@c1f-ops-ctlc21 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: c1f-ops-ctlc20 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum Last updated: Wed May 9 11:36:27 2018 Last change: Wed May 9 11:23:15 2018 by hacluster via crmd on c1f-ops-ctlc20 3 nodes configured 19 resources configured Online: [ c1f-ops-ctlc20 c1f-ops-ctlc21 c1f-ops-ctlc22 ] Full list of resources: ip-10.20.184.250 (ocf::heartbeat:IPaddr2): Started c1f-ops-ctlc20 ip-10.20.185.250 (ocf::heartbeat:IPaddr2): Started c1f-ops-ctlc21 ip-10.20.186.250 (ocf::heartbeat:IPaddr2): Started c1f-ops-ctlc22 Clone Set: haproxy-clone [haproxy] Started: [ c1f-ops-ctlc20 c1f-ops-ctlc21 c1f-ops-ctlc22 ] ip-10.20.182.33 (ocf::heartbeat:IPaddr2): Started c1f-ops-ctlc20 Master/Slave Set: galera-master [galera] Masters: [ c1f-ops-ctlc20 c1f-ops-ctlc21 c1f-ops-ctlc22 ] ip-10.20.176.250 (ocf::heartbeat:IPaddr2): Started c1f-ops-ctlc21 Clone Set: rabbitmq-clone [rabbitmq] Started: [ c1f-ops-ctlc20 c1f-ops-ctlc21 ] Stopped: [ c1f-ops-ctlc22 ] ip-10.20.186.12 (ocf::heartbeat:IPaddr2): Started c1f-ops-ctlc22 Master/Slave Set: redis-master [redis] Masters: [ c1f-ops-ctlc20 ] Slaves: [ c1f-ops-ctlc21 c1f-ops-ctlc22 ] openstack-cinder-volume (systemd:openstack-cinder-volume): Started c1f-ops-ctlc20 Failed Actions: * rabbitmq_start_0 on c1f-ops-ctlc22 'unknown error' (1): call=108, status=complete, exitreason='none', last-rc-change='Wed May 9 11:23:22 2018', queued=0ms, exec=10393ms Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled [root@c1f-ops-ctlc21 ~]# rabbitmqctl cluster_status Cluster status of node 'rabbit@c1f-ops-ctlc21' ... [{nodes,[{disc,['rabbit@c1f-ops-ctlc20','rabbit@c1f-ops-ctlc21']}]}, {running_nodes,['rabbit@c1f-ops-ctlc20','rabbit@c1f-ops-ctlc21']}, {cluster_name,<<"rabbit.tesoro.it">>}, {partitions,[]}, {alarms,[{'rabbit@c1f-ops-ctlc20',[]},{'rabbit@c1f-ops-ctlc21',[]}]}] ~~~ Version-Release number of selected component (if applicable): [gkadam@collab-shell c1f-ops-ctlc22.coll.tesoro.it]$ grep -ri rabbit installed-rpms puppet-rabbitmq-5.6.0-2.el7ost.noarch Thu Feb 22 18:26:45 2018 rabbitmq-server-3.6.3-7.el7ost.noarch Thu Feb 22 18:21:40 2018 Actual results: Rabbitmq failing on one of the controller nodes Expected results: Rabbitmq should start on all the controller nodes Additional info/Steps performed: We tried to restart the rabbitmq-clone in the following manner: - Unmanaged rabbitmq-clone on one of the controller nodes - killed all the rabbitmq processes on all the controller nodes - Managed the rabbitmq-clone again, and tried a pcs resource cleanup - we found a stale epmd process on the failed controller node, so we killed all the epmd process, and again tried the same procedure to start the rabbitmq-clone, it started successfully on all the nodes, but later it failed again on the same node(c1f-ops-ctlc22), once we tried to restart the openstack nova services on it. - We later, tried again to unmanage rabbitmq-clone - stop all the rabbitmq processes - rm -rf /var/lib/rabbitmq/mnesia/* - pcs resource manage rabbitmq-clone - however, this didn't worked, so we again unmanaged the rabbitmq-clone, and tried to start the rabbitmq application in the faulty node (c1f-ops-ctlc22), this didn't worked as well - it was unable to start the rabbitmq application and join to the master node manually. We have asked to upload the erlang logs, and we request Engineering team to analyze the same to narrow down the issue, this is because, we tried all the possible ways to recover the failing rabbitmq, but it is somehow failing on the same node.
Ganesh, please try this build - rabbitmq-server-3.6.3-10.el7ost. It should fix your issue.
A yum repository for the build of rabbitmq-server-3.6.3-10.el7ost (task 16274921) is available at: http://brew-task-repos.usersys.redhat.com/repos/official/rabbitmq-server/3.6.3/10.el7ost/ You can install the rpms locally by putting this .repo file in your /etc/yum.repos.d/ directory: http://brew-task-repos.usersys.redhat.com/repos/official/rabbitmq-server/3.6.3/10.el7ost/rabbitmq-server-3.6.3-10.el7ost.repo RPMs and build logs can be found in the following locations: http://brew-task-repos.usersys.redhat.com/repos/official/rabbitmq-server/3.6.3/10.el7ost/noarch/ The full list of available rpms is: http://brew-task-repos.usersys.redhat.com/repos/official/rabbitmq-server/3.6.3/10.el7ost/noarch/rabbitmq-server-3.6.3-10.el7ost.src.rpm http://brew-task-repos.usersys.redhat.com/repos/official/rabbitmq-server/3.6.3/10.el7ost/noarch/rabbitmq-server-3.6.3-10.el7ost.noarch.rpm Build output will be available for the next 21 days. If you wish to stop receiving these emails, please email: Mike Bonnet <mikeb> Thank you, The Brew Task Repos System
Hi there, If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Thanks, Alex
I don't think this particular ticket requires any documentation. So I'm going to sed requires_doc_text to '-'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2671