Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1618439

Summary: RabbitMQ resources fail to start in IPv6 Underlay network deployments
Product: Red Hat OpenStack Reporter: Anup P <anup7cisco>
Component: rabbitmq-serverAssignee: Peter Lemenkov <plemenko>
Status: CLOSED CURRENTRELEASE QA Contact: Udi Shkalim <ushkalim>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: anup7cisco, apevec, chjones, jeckersb, lhh, michele, srevivo
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-21 17:36:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
var_log_rabbitmq&keystone none

Description Anup P 2018-08-16 16:27:03 UTC
Created attachment 1476452 [details]
var_log_rabbitmq&keystone

Description of problem:

RabbitMQ resources fail to start in IPv6 Underlay network deployments.

Overcloud deployment completes succesfully even though rabbitmq is in failed state

Version-Release number of selected component (if applicable):

rhosp-director-images-10.0-20180329.1.el7ost.noarch
rhosp-director-images-ipa-10.0-20180329.1.el7ost.noarch

puppet-rabbitmq-5.6.0-2.el7ost.noarch
rabbitmq-server-3.6.3-7.el7ost.noarch


How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with IPv6 isolated networks

Actual results:
Deployment completes, rabbitmq fails

+--------------------------------------+------------+-----------------+----------------------+--------------+
| ID                                   | Stack Name | Stack Status    | Creation Time        | Updated Time |
+--------------------------------------+------------+-----------------+----------------------+--------------+
| 26db8c53-980f-44a7-b758-3cb203366c2e | overcloud  | CREATE_COMPLETE | 2018-08-16T10:17:02Z | None         |
+--------------------------------------+------------+-----------------+----------------------+--------------+


[heat-admin@overcloud-controller-0 ~]$ sudo pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: overcloud-controller-0 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum
Last updated: Thu Aug 16 15:14:59 2018
Last change: Thu Aug 16 11:36:42 2018 by root via cibadmin on overcloud-controller-0

1 node configured
11 resources configured

Online: [ overcloud-controller-0 ]

Full list of resources:

 ip-2405.200.1413.3a..1009      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-2405.201.fffb.18c..1008     (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-2405.200.1413.3d..1005      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 ]
 ip-172.18.36.110       (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 Clone Set: rabbitmq-clone [rabbitmq]
     rabbitmq   (ocf::heartbeat:rabbitmq-cluster):      FAILED overcloud-controller-0 (Monitoring)
 ip-2405.200.1413.3a..1002      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-0 ]
 ip-2405.200.1413.3c..1007      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started overcloud-controller-0

Failed Actions:
* rabbitmq_monitor_10000 on overcloud-controller-0 'unknown error' (1): call=280, status=Timed Out, exitreason='',
    last-rc-change='Thu Aug 16 15:12:52 2018', queued=0ms, exec=40006ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


Expected results:
RabbitMQ works fine after overcloud deployment.

Additional info:

[root@overcloud-controller-0 ~]# netstat -ltupn | grep 5672
tcp6       0      0 :::25672                :::*                    LISTEN      4598/beam.smp
[root@overcloud-controller-0 ~]#


Attaching log zip file of /var/log/rabbitmq & keystone

===========================================================================

/var/log/keystone/keystone.log

2018-08-16 15:48:59.596 165094 ERROR oslo.messaging._drivers.impl_rabbit [-] [8234ec23-f124-4632-94ef-163e2a3d05bb] AMQP server on 2405:200:1413:3a::100b:5672 is unreachable: [Errno 111] Connection refused. Trying again in 20 seconds. Client port: None
2018-08-16 15:48:59.596 165231 ERROR oslo.messaging._drivers.impl_rabbit [-] [c785eb2b-ecff-4a6c-87e2-e6e9721ef2bb] AMQP server on 2405:200:1413:3a::100b:5672 is unreachable: [Errno 111] Connection refused. Trying again in 20 seconds. Client port: None
2018-08-16 15:48:59.605 165296 ERROR oslo.messaging._drivers.impl_rabbit [-] [a4eaf580-e51f-41aa-8b17-5d5af34b6107] AMQP server on 2405:200:1413:3a::100b:5672 is unreachable: [Errno 111] Connection refused. Trying again in 20 seconds. Client port: None
2018-08-16 15:48:59.613 165107 ERROR oslo.messaging._drivers.impl_rabbit [-] [52ba9d91-a97b-4703-9edf-1daa74e623f6] AMQP server on 2405:200:1413:3a::100b:5672 is unreachable: [Errno 111] Connection refused. Trying again in 20 seconds. Client port: None
2018-08-16 15:49:17.134 165119 INFO oslo.messaging._drivers.impl_rabbit [req-a1c08460-c9b9-4bc3-ad96-c550e10bf6be - - - - -] [9b53c2a4-7e2d-48b4-9900-f741a3c190a1] Reconnected to AMQP server on 2405:200:1413:3a::100b:5672 via [amqp] clientwith port 53272.
2018-08-16 15:49:17.136 165352 INFO oslo.messaging._drivers.impl_rabbit [-] [a6baa6c3-f0e6-408d-abb9-a83f06080d52] Reconnected to AMQP server on 2405:200:1413:3a::100b:5672 via [amqp] clientwith port 53218.
2018-08-16 15:49:17.316 165103 INFO keystone.common.wsgi [req-b5d0688d-169c-4ab2-bf6b-c3c5402934b9 35cee3311f3c4d3aa78a5e289a922131 3c53f3f61b1149b4bf481e63ffc5d6b7 - default default] GET http://172.18.36.110:35357/v3/auth/tokens
2018-08-16 15:49:26.702 165335 INFO oslo.messaging._drivers.impl_rabbit [-] [d73255e6-250f-42f1-998b-1c6d7441331f] Reconnected to AMQP server on 2405:200:1413:3a::100b:5672 via [amqp] clientwith port 53580.
2018-08-16 15:49:27.244 165086 INFO oslo.messaging._drivers.impl_rabbit [-] [443d473f-92b9-4913-b332-52dfbf175c49] Reconnected to AMQP server on 2405:200:1413:3a::100b:5672 via [amqp] clientwith port 53786.
2018-08-16 15:49:27.245 165106 INFO oslo.messaging._drivers.impl_rabbit [-] [2aef062e-7ab9-4ce9-ad6e-19dc20f3accd] Reconnected to AMQP server on 2405:200:1413:3a::100b:5672 via [amqp] clientwith port 53756.
2018-08-16 15:49:27.245 165098 INFO oslo.messaging._drivers.impl_rabbit [-] [68065b1b-1c4d-4ed8-a3d6-4076627237f6] Reconnected to AMQP server on 2405:200:1413:3a::100b:5672 via [amqp] clientwith port 54034.

==========================================================================

/var/log/rabbitmq/rabbit\@overcloud-controller-0-sasl.log

=CRASH REPORT==== 16-Aug-2018::15:18:54 ===
  crasher:
    initial call: gen:init_it/6
    pid: <0.4509.0>
    registered_name: []
    exception exit: {{badmatch,{error,not_found}},
                     [{rabbit_mirror_queue_master,terminate,2,
                          [{file,"src/rabbit_mirror_queue_master.erl"},
                           {line,194}]},
                      {rabbit_amqqueue_process,terminate_shutdown,2,
                          [{file,"src/rabbit_amqqueue_process.erl"},
                           {line,308}]},
                      {gen_server2,terminate,3,
                          [{file,"src/gen_server2.erl"},{line,1129}]},
                      {proc_lib,wake_up,3,
                          [{file,"proc_lib.erl"},{line,250}]}]}
      in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1132)
    ancestors: [<0.4506.0>,rabbit_amqqueue_sup_sup,rabbit_sup,<0.78.0>]
    messages: []
    links: [<0.11821.0>]
    dictionary: [{{xtype_to_module,direct},rabbit_exchange_type_direct},
                  {rand_seed,
                      {#{max => 288230376151711743,
                         next => #Fun<rand.8.51808955>,
                         type => exsplus,
                         uniform => #Fun<rand.9.51808955>,
                         uniform_n => #Fun<rand.10.51808955>},
                       [15638301245953242|56416654922598978]}},
                  {process_name,
                      {rabbit_amqqueue_process,
                          {resource,<<"/">>,queue,
                              <<"neutron-vo-Trunk-1.0.overcloud-compute-1.localdomain">>}}},
                  {guid,{{909981763,2224745146,3461475620,3305787263},0}}]
    trap_exit: true
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 2703
  neighbours:
    neighbour: [{pid,<0.12396.0>},
                  {registered_name,[]},
                  {initial_call,
                      {gen,init_it,
                          ['Argument__1','Argument__2','Argument__3',
                           'Argument__4','Argument__5','Argument__6']}},
                  {current_function,{gen_server2,process_next_msg,1}},
                  {ancestors,
                      [<0.11821.0>,<0.4509.0>,<0.4506.0>,
                       rabbit_amqqueue_sup_sup,rabbit_sup,<0.78.0>]},
                  {messages,[]},
                  {links,[<0.11821.0>]},
                  {dictionary,
                      [{rand_seed,
                           {#{max => 288230376151711743,
                              next => #Fun<rand.8.51808955>,
                              type => exsplus,
                              uniform => #Fun<rand.9.51808955>,
                              uniform_n => #Fun<rand.10.51808955>},
                            [232621642269437626|271767594708699019]}},
                       {process_name,
                           {gm,{resource,<<"/">>,queue,
                                   <<"neutron-vo-Trunk-1.0.overcloud-compute-1.localdomain">>}}}]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,376},
                  {stack_size,7},
                  {reductions,350}]
    neighbour: [{pid,<0.11821.0>},
                  {registered_name,[]},
                  {initial_call,
                      {gen,init_it,
                          ['Argument__1','Argument__2','Argument__3',
                           'Argument__4','Argument__5','Argument__6']}},
                  {current_function,{erlang,hibernate,3}},
                  {ancestors,
                      [<0.4509.0>,<0.4506.0>,rabbit_amqqueue_sup_sup,
                       rabbit_sup,<0.78.0>]},
                  {messages,[]},
                  {links,[<0.4509.0>,<0.12396.0>]},
                  {dictionary,
                      [{rand_seed,
                           {#{max => 288230376151711743,
                              next => #Fun<rand.8.51808955>,
                              type => exsplus,
                              uniform => #Fun<rand.9.51808955>,
                              uniform_n => #Fun<rand.10.51808955>},
                            [42343817096104206|173907689134262040]}},
                       {process_name,
                           {rabbit_mirror_queue_coordinator,
                               {resource,<<"/">>,queue,
                                   <<"neutron-vo-Trunk-1.0.overcloud-compute-1.localdomain">>}}}]},
                  {trap_exit,false},
                  {status,waiting},
                  {heap_size,376},
                  {stack_size,0},
                  {reductions,166}]

=CRASH REPORT==== 16-Aug-2018::15:18:54 ===

=======================================================

Maybe similar to
https://bugzilla.redhat.com/show_bug.cgi?id=1358311
https://bugzilla.redhat.com/show_bug.cgi?id=1347802

Comment 1 Michele Baldessari 2018-08-17 08:03:15 UTC
Can we please get a full sosreport of controller-0 and also the full overcloud deploy command line (+ custom templates used)?

Comment 3 Peter Lemenkov 2018-08-17 11:03:48 UTC
(In reply to Anup P from comment #0)

> =CRASH REPORT==== 16-Aug-2018::15:18:54 ===
>   crasher:
>     initial call: gen:init_it/6
>     pid: <0.4509.0>
>     registered_name: []
>     exception exit: {{badmatch,{error,not_found}},
>                      [{rabbit_mirror_queue_master,terminate,2,
>                           [{file,"src/rabbit_mirror_queue_master.erl"},
>                            {line,194}]},
>                       {rabbit_amqqueue_process,terminate_shutdown,2,
>                           [{file,"src/rabbit_amqqueue_process.erl"},
>                            {line,308}]},
>                       {gen_server2,terminate,3,
>                           [{file,"src/gen_server2.erl"},{line,1129}]},
>                       {proc_lib,wake_up,3,
>                           [{file,"proc_lib.erl"},{line,250}]}]}
>       in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1132)

This particular issue was addressed in bug 1597245 with rabbitmq-server-3.6.3-10.el7ost, so please consider upgrading. Unfortunately it's not related to IPv6 so there might be something else.

SOS-reports would be helpful.

Comment 4 Anup P 2018-08-17 18:13:26 UTC
Added sosreport & templates with command on below link

https://www.dropbox.com/s/39xmx30f5yzvq24/1618439_rabbitmq_ipv6_.zip?dl=0

Comment 5 Michele Baldessari 2018-11-21 17:36:38 UTC
Used rabbitmq-server version was rabbitmq-server-3.6.3-7.el7ost.noarch.

This should be fixed (as per c#3) by rabbitmq-server-3.6.3-10.el7ost.noarch