Bug 1397897 - rabbit options are overridden during upgrade
Summary: rabbit options are overridden during upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 10.0 (Newton)
Assignee: Pradeep Kilambi
QA Contact: Yurii Prokulevych
URL:
Whiteboard:
Depends On: 1399182
Blocks: 1399252
TreeView+ depends on / blocked
 
Reported: 2016-11-23 14:13 UTC by Yurii Prokulevych
Modified: 2016-12-29 16:58 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.1.0-6.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1399252 (view as bug list)
Environment:
Last Closed: 2016-12-14 16:34:41 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 381545 None None None 2016-11-25 16:27:02 UTC
OpenStack gerrit 401404 None None None 2016-11-24 13:15:41 UTC
Launchpad 1644278 None None None 2016-11-24 13:16:33 UTC

Description Yurii Prokulevych 2016-11-23 14:13:53 UTC
Description of problem:
-----------------------
During upgrade from RHOS-9 to RHOS-10 rabbit* options in ceilometer.conf are overridden after 'controller and block storage' stage.

Excerpt from ceilometer.conf

[oslo_messaging_rabbit]
rabbit_host=127.0.0.1
rabbit_port=5672
rabbit_hosts=127.0.0.1:5672
rabbit_use_ssl=False
rabbit_userid=guest
rabbit_password=naAt9M4dZW9vd7UnXGJVKB2xs
rabbit_virtual_host=/
rabbit_ha_queues=False
heartbeat_timeout_threshold=60
heartbeat_rate=2

This causes next messages in collector and agent-notification.log:
------------------------------------------------------------------
2016-11-23 14:07:03.907 26451 ERROR oslo.messaging._drivers.impl_rabbit [-] [c275de08-1f4d-4b6f-8d55-df5fc56cbb3d] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] Connection refused. Trying again in 32 seconds. Client port: None
2016-11-23 14:07:35.945 26451 ERROR oslo.messaging._drivers.impl_rabbit [-] [c275de08-1f4d-4b6f-8d55-df5fc56cbb3d] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] Connection refused. Trying again in 32 seconds. Client port: None
2016-11-23 14:08:07.982 26451 ERROR oslo.messaging._drivers.impl_rabbit [-] [c275de08-1f4d-4b6f-8d55-df5fc56cbb3d] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] Connection refused. Trying again in 32 seconds. Client port: None


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-compat-2.0.0-34.4.el7ost.noarch
openstack-tripleo-heat-templates-5.1.0-4.el7ost.noarch

openstack-ceilometer-polling-7.0.0-3.el7ost.noarch
python-ceilometermiddleware-0.5.1-1.el7ost.noarch
openstack-ceilometer-central-7.0.0-3.el7ost.noarch
python-ceilometer-7.0.0-3.el7ost.noarch
openstack-ceilometer-compute-7.0.0-3.el7ost.noarch
openstack-ceilometer-common-7.0.0-3.el7ost.noarch
openstack-ceilometer-api-7.0.0-3.el7ost.noarch
puppet-ceilometer-9.4.0-2.el7ost.noarch
openstack-ceilometer-collector-7.0.0-3.el7ost.noarch
python-ceilometerclient-2.6.2-1.el7ost.noarch
openstack-ceilometer-notification-7.0.0-3.el7ost.noarch


Steps to Reproduce:
1. Upgrade from RHOS-9 to RHOS-10
2. After 'controller and block storage' step check aforementioned logs


Additional info:
----------------
Virtual setup: 3controllers + 2computes + 1swift + 3ceph
After convergence step options are re-configured by puppet.
But before convergence it's required to upgrade all computes,
which might take a while. Affectively meaning no new metrics/measures are processed by telemetry. 
Also appropriate queues (notification.*) ain't drained and might cause resource starvation.

Comment 2 Jaromir Coufal 2016-11-23 18:22:43 UTC
Fixing the DFG tag.

Comment 3 Marios Andreou 2016-11-24 13:15:41 UTC
So, FTR I did ask about this bug before leaving last night and was told it wasn't a blocker. In any case the fix is waiting on CI to merge to newton @ https://review.openstack.org/#/c/401404/

updating trackers to point to newton

Comment 4 Marios Andreou 2016-11-24 17:34:42 UTC
https://review.openstack.org/#/c/401404/ just merged to stable/newton

Comment 5 Yurii Prokulevych 2016-11-25 13:06:53 UTC
The IPv6 addresses are malformed with the ':port' option:

rabbit_hosts=fd00:fd00:fd00:2000::16:5672,fd00:fd00:fd00:2000::12:5672,fd00:fd00:fd00:2000::15:5672

Causing ceilometer* services fail to connect:
2016-11-25 12:15:02.690 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on fd00:fd00:fd00:2000::16:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 1 seconds. Client port: None
2016-11-25 12:15:06.704 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on fd00:fd00:fd00:2000::15:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 1 seconds. Client port: None
2016-11-25 12:15:10.718 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on fd00:fd00:fd00:2000::12:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 32 seconds. Client port: None

Adding square brackets around the host's part and restarting the services eliminates the issue:
rabbit_hosts=[fd00:fd00:fd00:2000::16]:5672,[fd00:fd00:fd00:2000::12]:5672,[fd00:fd00:fd00:2000::15]:5672

Comment 6 Marios Andreou 2016-11-25 13:32:19 UTC
(In reply to Yurii Prokulevych from comment #5)
> The IPv6 addresses are malformed with the ':port' option:
> 
> rabbit_hosts=fd00:fd00:fd00:2000::16:5672,fd00:fd00:fd00:2000::12:5672,fd00:
> fd00:fd00:2000::15:5672
> 
> Causing ceilometer* services fail to connect:
> 2016-11-25 12:15:02.690 13343 ERROR oslo.messaging._drivers.impl_rabbit [-]
> [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on
> fd00:fd00:fd00:2000::16:5672:5672 is unreachable: [Errno 113] No route to
> host. Trying again in 1 seconds. Client port: None
> 2016-11-25 12:15:06.704 13343 ERROR oslo.messaging._drivers.impl_rabbit [-]
> [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on
> fd00:fd00:fd00:2000::15:5672:5672 is unreachable: [Errno 113] No route to
> host. Trying again in 1 seconds. Client port: None
> 2016-11-25 12:15:10.718 13343 ERROR oslo.messaging._drivers.impl_rabbit [-]
> [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on
> fd00:fd00:fd00:2000::12:5672:5672 is unreachable: [Errno 113] No route to
> host. Trying again in 32 seconds. Client port: None
> 
> Adding square brackets around the host's part and restarting the services
> eliminates the issue:
> rabbit_hosts=[fd00:fd00:fd00:2000::16]:5672,[fd00:fd00:fd00:2000::12]:5672,
> [fd00:fd00:fd00:2000::15]:5672





so from irc just now chem checked that the normalize_ip_for_uri function at https://review.openstack.org/#/c/401404/1/extraconfig/tasks/mitaka_to_newton_ceilometer_wsgi_upgrade.pp
 should add the brackets correctly.


@Yurii can we please keep this environment around for a while?

Comment 7 Sofer Athlan-Guyot 2016-11-25 15:49:33 UTC
Hi,

so the issue here is that we apply this code:

  $rabbit_endpoints = suffix(any2array(normalize_ip_for_uri($rabbit_hosts)), ":${rabbit_port}")

while the puppet module haven't been updated yet.  The normalize_ip_for_uri receive the support for array there https://review.openstack.org/#/q/change:I8d361ce9cfcfe6a3f8592b2b7991971a3c748c75

It has been ported to stable/mitaka but somehow it's not in the downstream osp-9 OPM package.

So the array mechanism doesn't work.

There are two way to fix this.  Either we merge the backport into OPM or we modify:

    modified   extraconfig/tasks/mitaka_to_newton_ceilometer_wsgi_upgrade.pp
    @@ -52,9 +52,7 @@ $rabbit_hosts = hiera('rabbitmq_node_ips', undef)
     $rabbit_port  = hiera('ceilometer::rabbit_port', 5672)
     $rabbit_endpoints = suffix(any2array(normalize_ip_for_uri($rabbit_hosts)), ":${rabbit_port}")
     
    -class { '::ceilometer' :
    -  rabbit_hosts => $rabbit_endpoints,
    -}
    +include ::ceilometer
     
     class {'::ceilometer::db':
       database_connection => $database_connection,


That will let the current ipv6 definition in the ceilometer working:

rabbit_hosts=fd00:fd00:fd00:2000::16,fd00:fd00:fd00:2000::12,fd00:fd00:fd00:2000::15

(yes, this *is* working ...)

It has to be noted that it's downstream specific as the OPM osp9
doesn't the backported patch.

Comment 8 Sofer Athlan-Guyot 2016-11-25 16:27:02 UTC
After reviewing the problem with Emilien the integration of the patch in OSP-9 OPM is the way to go.  Waiting on a new OPM build with this patch.

Adding the needed review for OPM in OPS-9 (381545)

Comment 13 Yurii Prokulevych 2016-12-01 16:41:52 UTC
Verified with openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch

grep ^rabbit /etc/ceilometer/ceilometer.conf
rabbit_hosts=[fd00:fd00:fd00:2000::16]:5672,[fd00:fd00:fd00:2000::14]:5672,[fd00:fd00:fd00:2000::15]:5672
rabbit_use_ssl=False
rabbit_userid=guest
rabbit_password=rsxWD38CCbEFdyvYUY6tKY7uc
rabbit_virtual_host=/
rabbit_ha_queues=True

Comment 15 errata-xmlrpc 2016-12-14 16:34:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.