Bug 1475128 - OSP11 -> OSP12 upgrade: openstack-ceilometer-collector service remains running on the baremetal host during upgrade
OSP11 -> OSP12 upgrade: openstack-ceilometer-collector service remains runnin...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
high Severity urgent
: rc
: 12.0 (Pike)
Assigned To: Emilien Macchi
Marius Cornea
: Triaged
Depends On:
Blocks: 1399762
  Show dependency treegraph
 
Reported: 2017-07-26 02:41 EDT by Marius Cornea
Modified: 2018-02-05 14:10 EST (History)
10 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-7.0.1-0.20170927205937.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 16:44:48 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
messages (1010.01 KB, application/x-gzip)
2017-07-27 07:56 EDT, Marius Cornea
no flags Details
ansible upgrade_tasks step1 and step2 (24.82 KB, text/plain)
2017-07-28 05:52 EDT, Marios Andreou
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenStack gerrit 494589 None None None 2017-08-18 05:47 EDT

  None (edit)
Description Marius Cornea 2017-07-26 02:41:29 EDT
Description of problem:
OSP11 -> OSP12 upgrade: openstack-ceilometer-collector remains running on the baremetal host during upgrade. I would expect the service to be stopped and disabled on the baremetal host during upgrade so it can run inside a container. If it's not needed any longer then it should also be stopped and disabled and have the rpm removed. 

[root@controller-0 heat-admin]# systemctl status openstack-ceilometer-collector
● openstack-ceilometer-collector.service - OpenStack ceilometer collection service
   Loaded: loaded (/usr/lib/systemd/system/openstack-ceilometer-collector.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-07-25 23:43:37 UTC; 6h ago
 Main PID: 414639 (ceilometer-coll)
   Memory: 16.4M
   CGroup: /system.slice/openstack-ceilometer-collector.service
           ├─414639 ceilometer-collector: master process [/usr/bin/ceilometer-collector --logfile /var/log/ceilometer/collector.log]
           └─414853 ceilometer-collector: CollectorService worker(0)

Jul 25 23:43:37 controller-0 systemd[1]: Started OpenStack ceilometer collection service.
Jul 25 23:43:37 controller-0 systemd[1]: Starting OpenStack ceilometer collection service...

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch

How reproducible:
100%
Comment 1 Pradeep Kilambi 2017-07-26 10:55:55 EDT
the collector should be disabled by default. Collector is deprecated in osp12 and we did not containerize it. It should load this upgrade yaml:

https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/disabled/ceilometer-collector-disabled.yaml
Comment 2 Marios Andreou 2017-07-27 06:27:09 EDT
The openstack-ceilometer-collector service was deprecated in https://review.openstack.org/#/c/450885/. After that merged and by default in the resource registry we disable the service by pointing to the puppet/services/disabled/ceilometer-collector-disabled.yaml as per pradk comment #1 (and the service has not been containerized). 

SO we must be missing something... per comment #0, the service *is* already being stopped disabled at [2] unless we are still enabling it somehow (for removing the package we have https://review.openstack.org/479886 still in review, but it shouldn't affect us here).

Mcornea can you sanity check the templates you used (I already checked OSP12 and can't see anything missing from upstream, i.e. it all seems to be there wrt the review that deprecated this), in particular that you aren't pointing to the 'non' disabled ceilometer-collector. In theory you'd have to include [3] nowadays to get ceilometer-collector

Alternatively, if you have /var/log/messages from the controller I can check the upgrade_tasks and check if the 'stop and disable' ceilo-collector are there as they should be.

thanks

[1] https://github.com/openstack/tripleo-heat-templates/blob/c2b2cc555a7d6d447e2e33b7d9f29801eb740b03/overcloud-resource-registry-puppet.j2.yaml#L202

[2] https://github.com/openstack/tripleo-heat-templates/blob/c2b2cc555a7d6d447e2e33b7d9f29801eb740b03/puppet/services/disabled/ceilometer-collector-disabled.yaml#L39

[3] https://github.com/openstack/tripleo-heat-templates/blob/c2b2cc555a7d6d447e2e33b7d9f29801eb740b03/environments/services/ceilometer-collector.yaml
Comment 3 Marius Cornea 2017-07-27 07:54:21 EDT
This the deploy command that I used for the docker composable upgrade:

#!/bin/bash

timeout 100m openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /home/stack/virt/network/network-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \
-e /home/stack/docker-osp12.yaml \

Attaching /var/log/messages from the controller node
Comment 4 Marius Cornea 2017-07-27 07:56 EDT
Created attachment 1305304 [details]
messages
Comment 5 Marios Andreou 2017-07-28 05:52 EDT
Created attachment 1305877 [details]
ansible upgrade_tasks step1 and step2

picked out step1 and 2 from the attached /var/log/messages for easier debug.
Comment 6 Marios Andreou 2017-07-28 06:12:47 EDT
@mcornea I checked the log you attached and picked out step1 and 2 of the upgrade_tasks into a new attachment.

Indeed I do not see the expected 'stop and disable ceilometer-collector' so something else must be going on.

I guess next step is to sanity check the templates, or I missed something still. 

Are you by any chance using environments/disable-telemetry in your deployment (I don't see it in the templates above but worth checking)... in that file I see "OS::TripleO::Services::CeilometerCollector: OS::Heat::None" which if used would explain why the (now) default  OS::TripleO::Services::CeilometerCollector: puppet/services/disabled/ceilometer-collector-disabled.yaml (from the resource registry) is being overruled.

Otherwise can you check/grep against your templates for 
"OS::TripleO::Services::CeilometerCollector" to see if there is some other mapping there.
Comment 7 Marius Cornea 2017-08-03 11:23:22 EDT
(In reply to marios from comment #6)
> @mcornea I checked the log you attached and picked out step1 and 2 of the
> upgrade_tasks into a new attachment.
> 
> Indeed I do not see the expected 'stop and disable ceilometer-collector' so
> something else must be going on.
> 
> I guess next step is to sanity check the templates, or I missed something
> still. 
> 
> Are you by any chance using environments/disable-telemetry in your
> deployment (I don't see it in the templates above but worth checking)... in
> that file I see "OS::TripleO::Services::CeilometerCollector: OS::Heat::None"
> which if used would explain why the (now) default 
> OS::TripleO::Services::CeilometerCollector:
> puppet/services/disabled/ceilometer-collector-disabled.yaml (from the
> resource registry) is being overruled.

Nope, these are the environments used for the deploy command:

-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \
-e /home/stack/docker-osp12.yaml \

> Otherwise can you check/grep against your templates for 
> "OS::TripleO::Services::CeilometerCollector" to see if there is some other
> mapping there.

(undercloud) [stack@undercloud-0 openstack-tripleo-heat-templates]$ grep -Ri OS::TripleO::Services::CeilometerCollector
deployed-server/deployed-server-roles-data.yaml:    - OS::TripleO::Services::CeilometerCollector
environments/contrail/roles_data_contrail.yaml:    - OS::TripleO::Services::CeilometerCollector
environments/disable-telemetry.yaml:  OS::TripleO::Services::CeilometerCollector: OS::Heat::None
environments/services/ceilometer-collector.yaml:  OS::TripleO::Services::CeilometerCollector: ../../puppet/services/ceilometer-collector.yaml
overcloud-resource-registry-puppet.j2.yaml:  OS::TripleO::Services::CeilometerCollector: puppet/services/disabled/ceilometer-collector-disabled.yaml

[stack@undercloud-0 ~]$ grep -Ri OS::TripleO::Services::CeilometerCollector /home/stack/virt/
[stack@undercloud-0 ~]$
Comment 8 Marios Andreou 2017-08-17 09:44:39 EDT
Current theory: ceilometer is disabled by default now reg points to the services/disabled, AND, it is removed entirely from the roles_data.yaml ( https://github.com/openstack/tripleo-heat-templates/blob/master/roles_data.yaml ). SO even though the reg is pointing to services/disabled, since the service is not included on any roles, the tasks in that file are not executed.

If you have ceilometer-collector, it means your roles_data used when you deployed has that service. So you should include it again in the roles data you use on upgrade. But this also seems counterintuitive. Will discuss on scrum today.

We can confirm this by checking the stack too like:

openstack stack output show overcloud EnabledServices > EnabledServices

grep ceilometer ./EnabledServices

you shouldn't have the ceilometercollector there
Comment 9 Pradeep Kilambi 2017-08-17 10:52:59 EDT
We do something similar with CeilometerExpirer where the service is disabled, but its still in roles_data so it gets picked up i guess. Marios and I had a quick chat on this and this could be a potential solution. If there better ways, we can discuss. I pushed a patch upstream so we can discuss further and merge if we all agree:

https://review.openstack.org/494589
Comment 10 Marios Andreou 2017-08-18 05:47:46 EDT
Adding pradk's review to the trackers - pasting from my comment there "it seems counterintuitive to require the service to be in roles_data even though it is disabled by default, but I can't think of another solution. It might somehow be rationalised as a deprecation period of one cycle (!) since we need it there in order to run the remaining service specific decommission tasks before it completely dissapears on the Q->R upgrade"
Comment 13 Marius Cornea 2017-11-22 08:55:11 EST
[root@controller-0 heat-admin]# systemctl status openstack-ceilometer-collector
● openstack-ceilometer-collector.service - OpenStack ceilometer collection service
   Loaded: loaded (/usr/lib/systemd/system/openstack-ceilometer-collector.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

Nov 21 22:01:39 controller-0 systemd[1]: openstack-ceilometer-collector.service stop-sigterm timed out. Killing.
Nov 21 22:01:39 controller-0 systemd[1]: openstack-ceilometer-collector.service: main process exited, code=killed, status=9/KILL
Nov 21 22:01:39 controller-0 systemd[1]: Unit openstack-ceilometer-collector.service entered failed state.
Nov 21 22:01:39 controller-0 systemd[1]: openstack-ceilometer-collector.service failed.
Nov 21 22:01:39 controller-0 systemd[1]: Started OpenStack ceilometer collection service.
Nov 21 22:01:39 controller-0 systemd[1]: Starting OpenStack ceilometer collection service...
Nov 21 22:03:11 controller-0 ceilometer-collector[71272]: /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:246: NotSupportedWarning: Configuration option(s) ['api', 'api_paste_config', 'config_dir', 'c...'] not supported
Nov 21 22:03:11 controller-0 ceilometer-collector[71272]: exception.NotSupportedWarning
Nov 22 11:07:17 controller-0 systemd[1]: Stopping OpenStack ceilometer collection service...
Nov 22 11:07:35 controller-0 systemd[1]: Stopped OpenStack ceilometer collection service.
Hint: Some lines were ellipsized, use -l to show in full.
Comment 16 errata-xmlrpc 2017-12-13 16:44:48 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.