Bug 1384571

Summary:	OSP9 - rabbitmq cluster_status shows nodedown alerts, list_queues / list_connections hang
Product:	Red Hat OpenStack	Reporter:	Pratik Pravin Bandarkar <pbandark>
Component:	openstack-tripleo-heat-templates	Assignee:	Michele Baldessari <michele>
Status:	CLOSED ERRATA	QA Contact:	Asaf Hirshberg <ahirshbe>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	9.0 (Mitaka)	CC:	apevec, bschmaus, cpaquin, fdinitto, jeckersb, jjoyce, jschluet, jthomas, lhh, mburns, michele, pbandark, plemenko, rhel-osp-director-maint, slinaber, srevivo, tvignaud, vcojot
Target Milestone:	async	Keywords:	Triaged, ZStream
Target Release:	9.0 (Mitaka)
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-2.0.0-37.el7ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1386611 1391547 1391550 (view as bug list)		Environment:
Last Closed:	2016-12-21 16:51:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1391547, 1391550

Description Pratik Pravin Bandarkar 2016-10-13 14:52:34 UTC

Description of problem:

- `rabbitmq cluster_status` shows nodedown alerts 
- list_queues / list_connections hang
- `rabbitmqctl node_health_check`  fails with an error.

* There is no any issue while performing activity on RHOS setup(From horizon/cli). i.e. RHOS environment is functioning as expected.

<snip>
sudo rabbitmqctl node_health_check -n rabbit@node1
Checking health of node 'rabbit@node1' ...
Heath check failed:
health check of node 'rabbit@node1' fails: nodedown
</snip>

- api network is on ipv6. 


Version-Release number of selected component (if applicable):
RHOS9
rabbitmq-server-3.6.3-5.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:
rabbitmq cluster_status shows nodedown alerts, list_queues / list_connections hang

Expected results:
there should not be any issue with rabbitmqctl commands.

Additional info:

Comment 26 Asaf Hirshberg 2016-12-07 05:57:48 UTC

Pratik,

Are the nodedown alerts are shown after deployment finished or after some time(or operations/workload)?

Comment 29 Asaf Hirshberg 2016-12-08 06:45:20 UTC

Verified using openstack-tripleo-heat-templates-liberty-2.0.0-41.el7ost.noarch

Tested the status of rabbitmq using following scenarios:
-after deployment finished.
-after cluster operations
-after/during opnestack operations using Rally( create and delete instances x20times)
Results=all passed

[root@overcloud-controller-1 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@overcloud-controller-1' ...
[{nodes,[{disc,['rabbit@overcloud-controller-0',
                'rabbit@overcloud-controller-1',
                'rabbit@overcloud-controller-2']}]},
 {running_nodes,['rabbit@overcloud-controller-2',
                 'rabbit@overcloud-controller-0',
                 'rabbit@overcloud-controller-1']},
 {cluster_name,<<"rabbit">>},
 {partitions,[]},
 {alarms,[{'rabbit@overcloud-controller-2',[]},
          {'rabbit@overcloud-controller-0',[]},
          {'rabbit@overcloud-controller-1',[]}]}]
[root@overcloud-controller-1 ~]# rabbitmqctl node_health_check
Checking health of node 'rabbit@overcloud-controller-1' ...
Health check passed
[root@overcloud-controller-1 ~]#

Comment 31 errata-xmlrpc 2016-12-21 16:51:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2983.html