Bug 1780399

Summary: Minor update fails due to RabbitMQ loopback_nodes configuration in latest RabbitMQ puppet module
Product: Red Hat OpenStack Reporter: chrisbro <chrisbro>
Component: openstack-tripleo-heat-templatesAssignee: Luca Miccini <lmiccini>
Status: CLOSED ERRATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: medium Docs Contact:
Priority: low    
Version: 13.0 (Queens)CC: dabarzil, dhill, ebarrera, jjoyce, jschluet, lmiccini, mburns, slinaber, tvignaud
Target Milestone: z11Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.4.1-26.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-10 11:23:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description chrisbro@redhat.com 2019-12-05 21:16:05 UTC
Description of problem:

Minor update fails due to RabbitMQ loopback_nodes configuration in latest RabbitMQ puppet module

Version-Release number of selected component (if applicable):



How reproducible:

Repeatedly on overcloud update and subsequent stack update, unless the mitigating hieradata has been set.

Steps to Reproduce:

1.During an update from OSP13z6 to OSP13z9 in our QA environment, we noticed that any API operation that involved communication over the AMQP bus would fail.

Checking logs, we found the following error:

"ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN"

We eventually traced this back to an extra line in the rabbitmq.config 

{loopback_users, [<<"guest">>]},

This appears to be similar to this bug:

https://bugs.launchpad.net/tripleo/+bug/1587961

Which was fixed by:

https://review.opendev.org/#/c/324016/

It appears that this change was abandoned as the puppet-rabbitmq change that added the extra line had not been merged.

This pull request https://github.com/voxpupuli/puppet-rabbitmq/pull/699 has now been merged:

 https://github.com/voxpupuli/puppet-rabbitmq/commit/0ada399b330fbc84a1a1179ad0e827e0735e1912

It appears to have arrived in the OSP13 version of the openstack-puppet for the z9 release. When the stack is updated, the new Puppet manifests write out the above line to RabbitMQ config and this blocks all clients connecting with 'guest' over any interface other than localhost.

As all OSP services use 'guest' to connect to AMQP, this causes an outage.
 
To work around this issue we have added
 ControllerExtraConfig:
   rabbitmq::loopback_users: []
   
Which sets the line back to
 
 {loopback_users, []},

2.
3.

Actual results:


Expected results:

To resolve this for everyone we probably need to bring back https://review.opendev.org/#/c/324016/

Additional info:

Quality Assurance environment, on z6 to z9 update.

Comment 1 Luca Miccini 2019-12-06 08:34:18 UTC
I am not sure the duplicate loopback_users list is the root cause behind "ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN".

I have a rabbit cluster with:


% This file managed by Puppet
% Template Path: rabbitmq/templates/rabbitmq.config
[
  {rabbit, [
    {loopback_users, [<<"guest">>]},
    {tcp_listen_options, [
         {keepalive,     true},
         {backlog,       128},
         {nodelay,       true},
         {linger,        {true, 0}},
         {exit_on_close, false}
    ]},
    {collect_statistics_interval, 30000},
    {tcp_listeners, [{"192.168.24.14", 5672}]},
    {cluster_partition_handling, ignore},
    {loopback_users, []},
    {queue_master_locator, <<"min-masters">>},
    {default_user, <<"guest">>},
    {default_pass, <<"7WlemN6GGJxrDbGKNMyrCVXfV">>}
  ]},
  {kernel, [
    {inet_dist_listen_max, 25672},
    {inet_dist_listen_min, 25672},
    {inet_dist_use_interface, {192,168,24,14}},
    {net_ticktime, 15}
  ]}
,
  {rabbitmq_management, [
    {rates_mode, none}
,    {listener, [
      {ip, "127.0.0.1"},
      {port, 15672}
    ]}
  ]}
].
% EOF


and if I run the following:

#!/usr/bin/env python

import sys
import socket
from kombu import Connection

host = sys.argv[1]
port = 5672
user = "guest"
password = sys.argv[2]
vhost = "/"
url = 'amqp://{0}:{1}@{2}:{3}/{4}'.format(user, password, host, port, vhost)
with Connection(url) as c:
    try:
        c.connect()
    except socket.error:
        raise ValueError("Received socket.error, "
                         "rabbitmq server probably isn't running")
    except IOError:
        raise ValueError("Received IOError, probably bad credentials")
    else:
        print("Credentials are valid")



(undercloud) [stack@undercloud ~]$ python py 192.168.24.14 7WlemN6GGJxrDbGKNMyrCVXfV
Credentials are valid

(undercloud) [stack@undercloud ~]$ python py 192.168.24.14 bogus
Traceback (most recent call last):
  File "py", line 18, in <module>
    c.connect()
  File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 261, in connect
    return self.connection
  File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 802, in connection
    self._connection = self._establish_connection()
  File "/usr/lib/python3.6/site-packages/kombu/connection.py", line 757, in _establish_connection
    conn = self.transport.establish_connection()
  File "/usr/lib/python3.6/site-packages/kombu/transport/pyamqp.py", line 130, in establish_connection
    conn.connect()
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 313, in connect
    self.drain_events(timeout=self.connect_timeout)
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 500, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 506, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/lib/python3.6/site-packages/amqp/method_framing.py", line 55, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 510, in on_inbound_method
    method_sig, payload, content,
  File "/usr/lib/python3.6/site-packages/amqp/abstract_channel.py", line 126, in dispatch_method
    listener(*args)
  File "/usr/lib/python3.6/site-packages/amqp/connection.py", line 639, in _on_close
    (class_id, method_id), ConnectionError)
amqp.exceptions.AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.


so imho there must be something else at play.

Comment 7 Luca Miccini 2019-12-12 06:27:28 UTC
The issue here is the puppet-rabbitmq change coupled with this snippet present in the templates:

parameter_defaults:
  ControllerExtraConfig:
    rabbitmq_config_variables:
      hipe_compile: true


this is overriding entirely the default rabbitmq_config_variables content:

...
            rabbitmq_config_variables:
              cluster_partition_handling: 'ignore'
              queue_master_locator: '<<"min-masters">>'
              loopback_users: '[]'
...

because puppet-tripleo is reading it from from hiera:

class tripleo::profile::base::rabbitmq (
  $certificate_specs             = {},
  $config_variables              = hiera('rabbitmq_config_variables'),


btw, in addition to the loopback_users, in the sosreports these options are also gone from the rabbitmq.config file:

    {cluster_partition_handling, ignore},
    {queue_master_locator, <<"min-masters">>},


I would suggest to add the entire content of rabbitmq_config_variables to the templates and add/change what is needed instead of passing the single option/value, like the following:


~~~
parameter_defaults:
  ControllerExtraConfig:

    rabbitmq_config_variables:
      cluster_partition_handling: 'ignore'
      queue_master_locator: '<<"min-masters">>'
      loopback_users: '[]'
      hipe_compile: true
~~~


I'll backport https://review.opendev.org/#/c/698073/ anyway since there are no side effects.

Comment 10 Luca Miccini 2020-01-09 15:00:05 UTC
*** Bug 1789147 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2020-03-10 11:23:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0760