Bug 1780399 - Minor update fails due to RabbitMQ loopback_nodes configuration in latest RabbitMQ puppet module
Summary: Minor update fails due to RabbitMQ loopback_nodes configuration in latest Rab...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: z11
: 13.0 (Queens)
Assignee: Luca Miccini
QA Contact: Sasha Smolyak
URL:
Whiteboard:
: 1789147 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-05 21:16 UTC by chrisbro@redhat.com
Modified: 2023-10-06 18:51 UTC (History)
9 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.4.1-26.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-10 11:23:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-3213 0 None None None 2022-08-23 18:36:43 UTC
Red Hat Product Errata RHBA-2020:0760 0 None None None 2020-03-10 11:24:00 UTC

Description chrisbro@redhat.com 2019-12-05 21:16:05 UTC
Description of problem:

Minor update fails due to RabbitMQ loopback_nodes configuration in latest RabbitMQ puppet module

Version-Release number of selected component (if applicable):



How reproducible:

Repeatedly on overcloud update and subsequent stack update, unless the mitigating hieradata has been set.

Steps to Reproduce:

1.During an update from OSP13z6 to OSP13z9 in our QA environment, we noticed that any API operation that involved communication over the AMQP bus would fail.

Checking logs, we found the following error:

"ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN"

We eventually traced this back to an extra line in the rabbitmq.config 

{loopback_users, [<<"guest">>]},

This appears to be similar to this bug:

https://bugs.launchpad.net/tripleo/+bug/1587961

Which was fixed by:

https://review.opendev.org/#/c/324016/

It appears that this change was abandoned as the puppet-rabbitmq change that added the extra line had not been merged.

This pull request https://github.com/voxpupuli/puppet-rabbitmq/pull/699 has now been merged:

 https://github.com/voxpupuli/puppet-rabbitmq/commit/0ada399b330fbc84a1a1179ad0e827e0735e1912

It appears to have arrived in the OSP13 version of the openstack-puppet for the z9 release. When the stack is updated, the new Puppet manifests write out the above line to RabbitMQ config and this blocks all clients connecting with 'guest' over any interface other than localhost.

As all OSP services use 'guest' to connect to AMQP, this causes an outage.
 
To work around this issue we have added
 ControllerExtraConfig:
   rabbitmq::loopback_users: []
   
Which sets the line back to
 
 {loopback_users, []},

2.
3.

Actual results:


Expected results:

To resolve this for everyone we probably need to bring back https://review.opendev.org/#/c/324016/

Additional info:

Quality Assurance environment, on z6 to z9 update.

Comment 1 Luca Miccini 2019-12-06 08:34:18 UTC
I am not sure the duplicate loopback_users list is the root cause behind "ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN".

I have a rabbit cluster with:


% This file managed by Puppet
% Template Path: rabbitmq/templates/rabbitmq.config
[
  {rabbit, [
    {loopback_users, [<<"guest">>]},
    {tcp_listen_options, [
         {keepalive,     true},
         {backlog,       128},
         {nodelay,       true},
         {linger,        {true, 0}},
         {exit_on_close, false}
    ]},
    {collect_statistics_interval, 30000},
    {tcp_listeners, [{"192.168.24.14", 5672}]},
    {cluster_partition_handling, ignore},
    {loopback_users, []},
    {queue_master_locator, <<"min-masters">>},
    {default_user, <<"guest">>},
    {default_pass, <<"7WlemN6GGJxrDbGKNMyrCVXfV">>}
  ]},
  {kernel, [
    {inet_dist_listen_max, 25672},
    {inet_dist_listen_min, 25672},
    {inet_dist_use_interface, {192,168,24,14}},
    {net_ticktime, 15}
  ]}
,
  {rabbitmq_management, [
    {rates_mode, none}
,    {listener, [
      {ip, "127.0.0.1"},
      {port, 15672}
    ]}
  ]}
].
% EOF


and if I run the following:

#!/usr/bin/env python

import sys
import socket
from kombu import Connection

host = sys.argv[1]
port = 5672
user = "guest"
password = sys.argv[2]
vhost = "/"
url = 'amqp://{0}:{1}@{2}:{3}/{4}'.format(user, password, host, port, vhost)
with Connection(url) as c:
    try:
        c.connect()
    except socket.error:
        raise ValueError("Received socket.error, "
                         "rabbitmq server probably isn't running")
    except IOError:
        raise ValueError("Received IOError, probably bad credentials")
    else:
        print("Credentials are valid")



(undercloud) [stack@undercloud ~]$ python py 192.168.24.14 7WlemN6GGJxrDbGKNMyrCVXfV
Credentials are valid

(undercloud) [stack@undercloud ~]$ python py 192.168.24.14 bogus
Traceback (most recent call last):
  File "py", line 18, in <module>
    c.connect()
  File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 261, in connect
    return self.connection
  File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 802, in connection
    self._connection = self._establish_connection()
  File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 757, in _establish_connection
    conn = self.transport.establish_connection()
  File "/usr/lib/python3.6/site-packages/kombu/transport/pyamqp.py", line 130, in establish_connection
    conn.connect()
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 313, in connect
    self.drain_events(timeout=self.connect_timeout)
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 500, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 506, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/lib/python3.6/site-packages/amqp/method_framing.py", line 55, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 510, in on_inbound_method
    method_sig, payload, content,
  File "/usr/lib/python3.6/site-packages/amqp/abstract_channel.py", line 126, in dispatch_method
    listener(*args)
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 639, in _on_close
    (class_id, method_id), ConnectionError)
amqp.exceptions.AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.


so imho there must be something else at play.

Comment 7 Luca Miccini 2019-12-12 06:27:28 UTC
The issue here is the puppet-rabbitmq change coupled with this snippet present in the templates:

parameter_defaults:
  ControllerExtraConfig:
    rabbitmq_config_variables:
      hipe_compile: true


this is overriding entirely the default rabbitmq_config_variables content:

...
            rabbitmq_config_variables:
              cluster_partition_handling: 'ignore'
              queue_master_locator: '<<"min-masters">>'
              loopback_users: '[]'
...

because puppet-tripleo is reading it from from hiera:

class tripleo::profile::base::rabbitmq (
  $certificate_specs             = {},
  $config_variables              = hiera('rabbitmq_config_variables'),


btw, in addition to the loopback_users, in the sosreports these options are also gone from the rabbitmq.config file:

    {cluster_partition_handling, ignore},
    {queue_master_locator, <<"min-masters">>},


I would suggest to add the entire content of rabbitmq_config_variables to the templates and add/change what is needed instead of passing the single option/value, like the following:


~~~
parameter_defaults:
  ControllerExtraConfig:

    rabbitmq_config_variables:
      cluster_partition_handling: 'ignore'
      queue_master_locator: '<<"min-masters">>'
      loopback_users: '[]'
      hipe_compile: true
~~~


I'll backport https://review.opendev.org/#/c/698073/ anyway since there are no side effects.

Comment 10 Luca Miccini 2020-01-09 15:00:05 UTC
*** Bug 1789147 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2020-03-10 11:23:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760


Note You need to log in before you can comment on or make changes to this bug.