Description of problem: Minor update fails due to RabbitMQ loopback_nodes configuration in latest RabbitMQ puppet module Version-Release number of selected component (if applicable): How reproducible: Repeatedly on overcloud update and subsequent stack update, unless the mitigating hieradata has been set. Steps to Reproduce: 1.During an update from OSP13z6 to OSP13z9 in our QA environment, we noticed that any API operation that involved communication over the AMQP bus would fail. Checking logs, we found the following error: "ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN" We eventually traced this back to an extra line in the rabbitmq.config {loopback_users, [<<"guest">>]}, This appears to be similar to this bug: https://bugs.launchpad.net/tripleo/+bug/1587961 Which was fixed by: https://review.opendev.org/#/c/324016/ It appears that this change was abandoned as the puppet-rabbitmq change that added the extra line had not been merged. This pull request https://github.com/voxpupuli/puppet-rabbitmq/pull/699 has now been merged: https://github.com/voxpupuli/puppet-rabbitmq/commit/0ada399b330fbc84a1a1179ad0e827e0735e1912 It appears to have arrived in the OSP13 version of the openstack-puppet for the z9 release. When the stack is updated, the new Puppet manifests write out the above line to RabbitMQ config and this blocks all clients connecting with 'guest' over any interface other than localhost. As all OSP services use 'guest' to connect to AMQP, this causes an outage. To work around this issue we have added ControllerExtraConfig: rabbitmq::loopback_users: [] Which sets the line back to {loopback_users, []}, 2. 3. Actual results: Expected results: To resolve this for everyone we probably need to bring back https://review.opendev.org/#/c/324016/ Additional info: Quality Assurance environment, on z6 to z9 update.
I am not sure the duplicate loopback_users list is the root cause behind "ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN". I have a rabbit cluster with: % This file managed by Puppet % Template Path: rabbitmq/templates/rabbitmq.config [ {rabbit, [ {loopback_users, [<<"guest">>]}, {tcp_listen_options, [ {keepalive, true}, {backlog, 128}, {nodelay, true}, {linger, {true, 0}}, {exit_on_close, false} ]}, {collect_statistics_interval, 30000}, {tcp_listeners, [{"192.168.24.14", 5672}]}, {cluster_partition_handling, ignore}, {loopback_users, []}, {queue_master_locator, <<"min-masters">>}, {default_user, <<"guest">>}, {default_pass, <<"7WlemN6GGJxrDbGKNMyrCVXfV">>} ]}, {kernel, [ {inet_dist_listen_max, 25672}, {inet_dist_listen_min, 25672}, {inet_dist_use_interface, {192,168,24,14}}, {net_ticktime, 15} ]} , {rabbitmq_management, [ {rates_mode, none} , {listener, [ {ip, "127.0.0.1"}, {port, 15672} ]} ]} ]. % EOF and if I run the following: #!/usr/bin/env python import sys import socket from kombu import Connection host = sys.argv[1] port = 5672 user = "guest" password = sys.argv[2] vhost = "/" url = 'amqp://{0}:{1}@{2}:{3}/{4}'.format(user, password, host, port, vhost) with Connection(url) as c: try: c.connect() except socket.error: raise ValueError("Received socket.error, " "rabbitmq server probably isn't running") except IOError: raise ValueError("Received IOError, probably bad credentials") else: print("Credentials are valid") (undercloud) [stack@undercloud ~]$ python py 192.168.24.14 7WlemN6GGJxrDbGKNMyrCVXfV Credentials are valid (undercloud) [stack@undercloud ~]$ python py 192.168.24.14 bogus Traceback (most recent call last): File "py", line 18, in <module> c.connect() File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 261, in connect return self.connection File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 802, in connection self._connection = self._establish_connection() File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 757, in _establish_connection conn = self.transport.establish_connection() File "/usr/lib/python3.6/site-packages/kombu/transport/pyamqp.py", line 130, in establish_connection conn.connect() File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 313, in connect self.drain_events(timeout=self.connect_timeout) File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 500, in drain_events while not self.blocking_read(timeout): File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 506, in blocking_read return self.on_inbound_frame(frame) File "/usr/lib/python3.6/site-packages/amqp/method_framing.py", line 55, in on_frame callback(channel, method_sig, buf, None) File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 510, in on_inbound_method method_sig, payload, content, File "/usr/lib/python3.6/site-packages/amqp/abstract_channel.py", line 126, in dispatch_method listener(*args) File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 639, in _on_close (class_id, method_id), ConnectionError) amqp.exceptions.AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile. so imho there must be something else at play.
The issue here is the puppet-rabbitmq change coupled with this snippet present in the templates: parameter_defaults: ControllerExtraConfig: rabbitmq_config_variables: hipe_compile: true this is overriding entirely the default rabbitmq_config_variables content: ... rabbitmq_config_variables: cluster_partition_handling: 'ignore' queue_master_locator: '<<"min-masters">>' loopback_users: '[]' ... because puppet-tripleo is reading it from from hiera: class tripleo::profile::base::rabbitmq ( $certificate_specs = {}, $config_variables = hiera('rabbitmq_config_variables'), btw, in addition to the loopback_users, in the sosreports these options are also gone from the rabbitmq.config file: {cluster_partition_handling, ignore}, {queue_master_locator, <<"min-masters">>}, I would suggest to add the entire content of rabbitmq_config_variables to the templates and add/change what is needed instead of passing the single option/value, like the following: ~~~ parameter_defaults: ControllerExtraConfig: rabbitmq_config_variables: cluster_partition_handling: 'ignore' queue_master_locator: '<<"min-masters">>' loopback_users: '[]' hipe_compile: true ~~~ I'll backport https://review.opendev.org/#/c/698073/ anyway since there are no side effects.
*** Bug 1789147 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0760