Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: openstack-ceilometer-collector remains running on the baremetal host during upgrade. I would expect the service to be stopped and disabled on the baremetal host during upgrade so it can run inside a container. If it's not needed any longer then it should also be stopped and disabled and have the rpm removed. [root@controller-0 heat-admin]# systemctl status openstack-ceilometer-collector ● openstack-ceilometer-collector.service - OpenStack ceilometer collection service Loaded: loaded (/usr/lib/systemd/system/openstack-ceilometer-collector.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2017-07-25 23:43:37 UTC; 6h ago Main PID: 414639 (ceilometer-coll) Memory: 16.4M CGroup: /system.slice/openstack-ceilometer-collector.service ├─414639 ceilometer-collector: master process [/usr/bin/ceilometer-collector --logfile /var/log/ceilometer/collector.log] └─414853 ceilometer-collector: CollectorService worker(0) Jul 25 23:43:37 controller-0 systemd[1]: Started OpenStack ceilometer collection service. Jul 25 23:43:37 controller-0 systemd[1]: Starting OpenStack ceilometer collection service... Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-7.0.0-0.20170718190543.el7ost.noarch How reproducible: 100%
the collector should be disabled by default. Collector is deprecated in osp12 and we did not containerize it. It should load this upgrade yaml: https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/disabled/ceilometer-collector-disabled.yaml
The openstack-ceilometer-collector service was deprecated in https://review.openstack.org/#/c/450885/. After that merged and by default in the resource registry we disable the service by pointing to the puppet/services/disabled/ceilometer-collector-disabled.yaml as per pradk comment #1 (and the service has not been containerized). SO we must be missing something... per comment #0, the service *is* already being stopped disabled at [2] unless we are still enabling it somehow (for removing the package we have https://review.openstack.org/479886 still in review, but it shouldn't affect us here). Mcornea can you sanity check the templates you used (I already checked OSP12 and can't see anything missing from upstream, i.e. it all seems to be there wrt the review that deprecated this), in particular that you aren't pointing to the 'non' disabled ceilometer-collector. In theory you'd have to include [3] nowadays to get ceilometer-collector Alternatively, if you have /var/log/messages from the controller I can check the upgrade_tasks and check if the 'stop and disable' ceilo-collector are there as they should be. thanks [1] https://github.com/openstack/tripleo-heat-templates/blob/c2b2cc555a7d6d447e2e33b7d9f29801eb740b03/overcloud-resource-registry-puppet.j2.yaml#L202 [2] https://github.com/openstack/tripleo-heat-templates/blob/c2b2cc555a7d6d447e2e33b7d9f29801eb740b03/puppet/services/disabled/ceilometer-collector-disabled.yaml#L39 [3] https://github.com/openstack/tripleo-heat-templates/blob/c2b2cc555a7d6d447e2e33b7d9f29801eb740b03/environments/services/ceilometer-collector.yaml
This the deploy command that I used for the docker composable upgrade: #!/bin/bash timeout 100m openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ --libvirt-type kvm \ --ntp-server clock.redhat.com \ -e /home/stack/virt/network/network-environment.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \ -e /home/stack/docker-osp12.yaml \ Attaching /var/log/messages from the controller node
Created attachment 1305304 [details] messages
Created attachment 1305877 [details] ansible upgrade_tasks step1 and step2 picked out step1 and 2 from the attached /var/log/messages for easier debug.
@mcornea I checked the log you attached and picked out step1 and 2 of the upgrade_tasks into a new attachment. Indeed I do not see the expected 'stop and disable ceilometer-collector' so something else must be going on. I guess next step is to sanity check the templates, or I missed something still. Are you by any chance using environments/disable-telemetry in your deployment (I don't see it in the templates above but worth checking)... in that file I see "OS::TripleO::Services::CeilometerCollector: OS::Heat::None" which if used would explain why the (now) default OS::TripleO::Services::CeilometerCollector: puppet/services/disabled/ceilometer-collector-disabled.yaml (from the resource registry) is being overruled. Otherwise can you check/grep against your templates for "OS::TripleO::Services::CeilometerCollector" to see if there is some other mapping there.
(In reply to marios from comment #6) > @mcornea I checked the log you attached and picked out step1 and 2 of the > upgrade_tasks into a new attachment. > > Indeed I do not see the expected 'stop and disable ceilometer-collector' so > something else must be going on. > > I guess next step is to sanity check the templates, or I missed something > still. > > Are you by any chance using environments/disable-telemetry in your > deployment (I don't see it in the templates above but worth checking)... in > that file I see "OS::TripleO::Services::CeilometerCollector: OS::Heat::None" > which if used would explain why the (now) default > OS::TripleO::Services::CeilometerCollector: > puppet/services/disabled/ceilometer-collector-disabled.yaml (from the > resource registry) is being overruled. Nope, these are the environments used for the deploy command: -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e /home/stack/virt/network/network-environment.yaml \ -e /home/stack/virt/hostnames.yml \ -e /home/stack/virt/debug.yaml \ -e /home/stack/virt/nodes_data.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \ -e /home/stack/docker-osp12.yaml \ > Otherwise can you check/grep against your templates for > "OS::TripleO::Services::CeilometerCollector" to see if there is some other > mapping there. (undercloud) [stack@undercloud-0 openstack-tripleo-heat-templates]$ grep -Ri OS::TripleO::Services::CeilometerCollector deployed-server/deployed-server-roles-data.yaml: - OS::TripleO::Services::CeilometerCollector environments/contrail/roles_data_contrail.yaml: - OS::TripleO::Services::CeilometerCollector environments/disable-telemetry.yaml: OS::TripleO::Services::CeilometerCollector: OS::Heat::None environments/services/ceilometer-collector.yaml: OS::TripleO::Services::CeilometerCollector: ../../puppet/services/ceilometer-collector.yaml overcloud-resource-registry-puppet.j2.yaml: OS::TripleO::Services::CeilometerCollector: puppet/services/disabled/ceilometer-collector-disabled.yaml [stack@undercloud-0 ~]$ grep -Ri OS::TripleO::Services::CeilometerCollector /home/stack/virt/ [stack@undercloud-0 ~]$
Current theory: ceilometer is disabled by default now reg points to the services/disabled, AND, it is removed entirely from the roles_data.yaml ( https://github.com/openstack/tripleo-heat-templates/blob/master/roles_data.yaml ). SO even though the reg is pointing to services/disabled, since the service is not included on any roles, the tasks in that file are not executed. If you have ceilometer-collector, it means your roles_data used when you deployed has that service. So you should include it again in the roles data you use on upgrade. But this also seems counterintuitive. Will discuss on scrum today. We can confirm this by checking the stack too like: openstack stack output show overcloud EnabledServices > EnabledServices grep ceilometer ./EnabledServices you shouldn't have the ceilometercollector there
We do something similar with CeilometerExpirer where the service is disabled, but its still in roles_data so it gets picked up i guess. Marios and I had a quick chat on this and this could be a potential solution. If there better ways, we can discuss. I pushed a patch upstream so we can discuss further and merge if we all agree: https://review.openstack.org/494589
Adding pradk's review to the trackers - pasting from my comment there "it seems counterintuitive to require the service to be in roles_data even though it is disabled by default, but I can't think of another solution. It might somehow be rationalised as a deprecation period of one cycle (!) since we need it there in order to run the remaining service specific decommission tasks before it completely dissapears on the Q->R upgrade"
[root@controller-0 heat-admin]# systemctl status openstack-ceilometer-collector ● openstack-ceilometer-collector.service - OpenStack ceilometer collection service Loaded: loaded (/usr/lib/systemd/system/openstack-ceilometer-collector.service; disabled; vendor preset: disabled) Active: inactive (dead) Nov 21 22:01:39 controller-0 systemd[1]: openstack-ceilometer-collector.service stop-sigterm timed out. Killing. Nov 21 22:01:39 controller-0 systemd[1]: openstack-ceilometer-collector.service: main process exited, code=killed, status=9/KILL Nov 21 22:01:39 controller-0 systemd[1]: Unit openstack-ceilometer-collector.service entered failed state. Nov 21 22:01:39 controller-0 systemd[1]: openstack-ceilometer-collector.service failed. Nov 21 22:01:39 controller-0 systemd[1]: Started OpenStack ceilometer collection service. Nov 21 22:01:39 controller-0 systemd[1]: Starting OpenStack ceilometer collection service... Nov 21 22:03:11 controller-0 ceilometer-collector[71272]: /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:246: NotSupportedWarning: Configuration option(s) ['api', 'api_paste_config', 'config_dir', 'c...'] not supported Nov 21 22:03:11 controller-0 ceilometer-collector[71272]: exception.NotSupportedWarning Nov 22 11:07:17 controller-0 systemd[1]: Stopping OpenStack ceilometer collection service... Nov 22 11:07:35 controller-0 systemd[1]: Stopped OpenStack ceilometer collection service. Hint: Some lines were ellipsized, use -l to show in full.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462