Bug 1384571

Summary: OSP9 - rabbitmq cluster_status shows nodedown alerts, list_queues / list_connections hang
Product: Red Hat OpenStack Reporter: Pratik Pravin Bandarkar <pbandark>
Component: openstack-tripleo-heat-templatesAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: Asaf Hirshberg <ahirshbe>
Severity: urgent Docs Contact:
Priority: high    
Version: 9.0 (Mitaka)CC: apevec, bschmaus, cpaquin, fdinitto, jeckersb, jjoyce, jschluet, jthomas, lhh, mburns, michele, pbandark, plemenko, rhel-osp-director-maint, slinaber, srevivo, tvignaud, vcojot
Target Milestone: asyncKeywords: Triaged, ZStream
Target Release: 9.0 (Mitaka)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-2.0.0-37.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1386611 1391547 1391550 (view as bug list) Environment:
Last Closed: 2016-12-21 16:51:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1391547, 1391550    

Description Pratik Pravin Bandarkar 2016-10-13 14:52:34 UTC
Description of problem:

- `rabbitmq cluster_status` shows nodedown alerts 
- list_queues / list_connections hang
- `rabbitmqctl node_health_check`  fails with an error.

* There is no any issue while performing activity on RHOS setup(From horizon/cli). i.e. RHOS environment is functioning as expected.

<snip>
sudo rabbitmqctl node_health_check -n rabbit@node1
Checking health of node 'rabbit@node1' ...
Heath check failed:
health check of node 'rabbit@node1' fails: nodedown
</snip>

- api network is on ipv6. 


Version-Release number of selected component (if applicable):
RHOS9
rabbitmq-server-3.6.3-5.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:
rabbitmq cluster_status shows nodedown alerts, list_queues / list_connections hang

Expected results:
there should not be any issue with rabbitmqctl commands.

Additional info:

Comment 26 Asaf Hirshberg 2016-12-07 05:57:48 UTC
Pratik,

Are the nodedown alerts are shown after deployment finished or after some time(or operations/workload)?

Comment 29 Asaf Hirshberg 2016-12-08 06:45:20 UTC
Verified using openstack-tripleo-heat-templates-liberty-2.0.0-41.el7ost.noarch

Tested the status of rabbitmq using following scenarios:
-after deployment finished.
-after cluster operations
-after/during opnestack operations using Rally( create and delete instances x20times)
Results=all passed

[root@overcloud-controller-1 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@overcloud-controller-1' ...
[{nodes,[{disc,['rabbit@overcloud-controller-0',
                'rabbit@overcloud-controller-1',
                'rabbit@overcloud-controller-2']}]},
 {running_nodes,['rabbit@overcloud-controller-2',
                 'rabbit@overcloud-controller-0',
                 'rabbit@overcloud-controller-1']},
 {cluster_name,<<"rabbit">>},
 {partitions,[]},
 {alarms,[{'rabbit@overcloud-controller-2',[]},
          {'rabbit@overcloud-controller-0',[]},
          {'rabbit@overcloud-controller-1',[]}]}]
[root@overcloud-controller-1 ~]# rabbitmqctl node_health_check
Checking health of node 'rabbit@overcloud-controller-1' ...
Health check passed
[root@overcloud-controller-1 ~]#

Comment 31 errata-xmlrpc 2016-12-21 16:51:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2983.html