Description of problem: ----------------------- During upgrade from RHOS-9 to RHOS-10 rabbit* options in ceilometer.conf are overridden after 'controller and block storage' stage. Excerpt from ceilometer.conf [oslo_messaging_rabbit] rabbit_host=127.0.0.1 rabbit_port=5672 rabbit_hosts=127.0.0.1:5672 rabbit_use_ssl=False rabbit_userid=guest rabbit_password=naAt9M4dZW9vd7UnXGJVKB2xs rabbit_virtual_host=/ rabbit_ha_queues=False heartbeat_timeout_threshold=60 heartbeat_rate=2 This causes next messages in collector and agent-notification.log: ------------------------------------------------------------------ 2016-11-23 14:07:03.907 26451 ERROR oslo.messaging._drivers.impl_rabbit [-] [c275de08-1f4d-4b6f-8d55-df5fc56cbb3d] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] Connection refused. Trying again in 32 seconds. Client port: None 2016-11-23 14:07:35.945 26451 ERROR oslo.messaging._drivers.impl_rabbit [-] [c275de08-1f4d-4b6f-8d55-df5fc56cbb3d] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] Connection refused. Trying again in 32 seconds. Client port: None 2016-11-23 14:08:07.982 26451 ERROR oslo.messaging._drivers.impl_rabbit [-] [c275de08-1f4d-4b6f-8d55-df5fc56cbb3d] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] Connection refused. Trying again in 32 seconds. Client port: None Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-tripleo-heat-templates-compat-2.0.0-34.4.el7ost.noarch openstack-tripleo-heat-templates-5.1.0-4.el7ost.noarch openstack-ceilometer-polling-7.0.0-3.el7ost.noarch python-ceilometermiddleware-0.5.1-1.el7ost.noarch openstack-ceilometer-central-7.0.0-3.el7ost.noarch python-ceilometer-7.0.0-3.el7ost.noarch openstack-ceilometer-compute-7.0.0-3.el7ost.noarch openstack-ceilometer-common-7.0.0-3.el7ost.noarch openstack-ceilometer-api-7.0.0-3.el7ost.noarch puppet-ceilometer-9.4.0-2.el7ost.noarch openstack-ceilometer-collector-7.0.0-3.el7ost.noarch python-ceilometerclient-2.6.2-1.el7ost.noarch openstack-ceilometer-notification-7.0.0-3.el7ost.noarch Steps to Reproduce: 1. Upgrade from RHOS-9 to RHOS-10 2. After 'controller and block storage' step check aforementioned logs Additional info: ---------------- Virtual setup: 3controllers + 2computes + 1swift + 3ceph After convergence step options are re-configured by puppet. But before convergence it's required to upgrade all computes, which might take a while. Affectively meaning no new metrics/measures are processed by telemetry. Also appropriate queues (notification.*) ain't drained and might cause resource starvation.
Fixing the DFG tag.
So, FTR I did ask about this bug before leaving last night and was told it wasn't a blocker. In any case the fix is waiting on CI to merge to newton @ https://review.openstack.org/#/c/401404/ updating trackers to point to newton
https://review.openstack.org/#/c/401404/ just merged to stable/newton
The IPv6 addresses are malformed with the ':port' option: rabbit_hosts=fd00:fd00:fd00:2000::16:5672,fd00:fd00:fd00:2000::12:5672,fd00:fd00:fd00:2000::15:5672 Causing ceilometer* services fail to connect: 2016-11-25 12:15:02.690 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on fd00:fd00:fd00:2000::16:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 1 seconds. Client port: None 2016-11-25 12:15:06.704 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on fd00:fd00:fd00:2000::15:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 1 seconds. Client port: None 2016-11-25 12:15:10.718 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on fd00:fd00:fd00:2000::12:5672:5672 is unreachable: [Errno 113] No route to host. Trying again in 32 seconds. Client port: None Adding square brackets around the host's part and restarting the services eliminates the issue: rabbit_hosts=[fd00:fd00:fd00:2000::16]:5672,[fd00:fd00:fd00:2000::12]:5672,[fd00:fd00:fd00:2000::15]:5672
(In reply to Yurii Prokulevych from comment #5) > The IPv6 addresses are malformed with the ':port' option: > > rabbit_hosts=fd00:fd00:fd00:2000::16:5672,fd00:fd00:fd00:2000::12:5672,fd00: > fd00:fd00:2000::15:5672 > > Causing ceilometer* services fail to connect: > 2016-11-25 12:15:02.690 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] > [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on > fd00:fd00:fd00:2000::16:5672:5672 is unreachable: [Errno 113] No route to > host. Trying again in 1 seconds. Client port: None > 2016-11-25 12:15:06.704 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] > [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on > fd00:fd00:fd00:2000::15:5672:5672 is unreachable: [Errno 113] No route to > host. Trying again in 1 seconds. Client port: None > 2016-11-25 12:15:10.718 13343 ERROR oslo.messaging._drivers.impl_rabbit [-] > [88c3c502-362c-44cc-8d74-031ca720f3dc] AMQP server on > fd00:fd00:fd00:2000::12:5672:5672 is unreachable: [Errno 113] No route to > host. Trying again in 32 seconds. Client port: None > > Adding square brackets around the host's part and restarting the services > eliminates the issue: > rabbit_hosts=[fd00:fd00:fd00:2000::16]:5672,[fd00:fd00:fd00:2000::12]:5672, > [fd00:fd00:fd00:2000::15]:5672 so from irc just now chem checked that the normalize_ip_for_uri function at https://review.openstack.org/#/c/401404/1/extraconfig/tasks/mitaka_to_newton_ceilometer_wsgi_upgrade.pp should add the brackets correctly. @Yurii can we please keep this environment around for a while?
Hi, so the issue here is that we apply this code: $rabbit_endpoints = suffix(any2array(normalize_ip_for_uri($rabbit_hosts)), ":${rabbit_port}") while the puppet module haven't been updated yet. The normalize_ip_for_uri receive the support for array there https://review.openstack.org/#/q/change:I8d361ce9cfcfe6a3f8592b2b7991971a3c748c75 It has been ported to stable/mitaka but somehow it's not in the downstream osp-9 OPM package. So the array mechanism doesn't work. There are two way to fix this. Either we merge the backport into OPM or we modify: modified extraconfig/tasks/mitaka_to_newton_ceilometer_wsgi_upgrade.pp @@ -52,9 +52,7 @@ $rabbit_hosts = hiera('rabbitmq_node_ips', undef) $rabbit_port = hiera('ceilometer::rabbit_port', 5672) $rabbit_endpoints = suffix(any2array(normalize_ip_for_uri($rabbit_hosts)), ":${rabbit_port}") -class { '::ceilometer' : - rabbit_hosts => $rabbit_endpoints, -} +include ::ceilometer class {'::ceilometer::db': database_connection => $database_connection, That will let the current ipv6 definition in the ceilometer working: rabbit_hosts=fd00:fd00:fd00:2000::16,fd00:fd00:fd00:2000::12,fd00:fd00:fd00:2000::15 (yes, this *is* working ...) It has to be noted that it's downstream specific as the OPM osp9 doesn't the backported patch.
After reviewing the problem with Emilien the integration of the patch in OSP-9 OPM is the way to go. Waiting on a new OPM build with this patch. Adding the needed review for OPM in OPS-9 (381545)
Verified with openstack-tripleo-heat-templates-5.1.0-7.el7ost.noarch grep ^rabbit /etc/ceilometer/ceilometer.conf rabbit_hosts=[fd00:fd00:fd00:2000::16]:5672,[fd00:fd00:fd00:2000::14]:5672,[fd00:fd00:fd00:2000::15]:5672 rabbit_use_ssl=False rabbit_userid=guest rabbit_password=rsxWD38CCbEFdyvYUY6tKY7uc rabbit_virtual_host=/ rabbit_ha_queues=True
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html