Hide Forgot
Description of problem: As described in https://bugs.launchpad.net/tripleo/+bug/1713531, when upgrading from Ocata to Pike (containerized) the environment ends up with the process heat-api-cloudwatch up and running under httpd in Pike upgraded overcloud intead of being containarized. Version-Release number of selected component (if applicable): (undercloud) [stack@undercloud ~]$ rpm -qa | grep heat-templates openstack-tripleo-heat-templates-7.0.1-0.20170927010252.a58332e.el7.centos.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy an overcloud: openstack overcloud deploy --libvirt-type qemu --ntp-server pool.ntp.org --templates /home/stack/tht-ocata/ -e /home/stack/tht-ocata/overcloud-resource-registry-puppet.yaml 2. Check that the service is not started: [heat-admin@overcloud-controller-0 ~]$ sudo systemctl list-units | grep heat session-5.scope loaded active running Session 5 of user heat-admin openstack-heat-api-cfn.service loaded active running Openstack Heat CFN-compatible API Service openstack-heat-api-cloudwatch.service loaded active running OpenStack Heat CloudWatch API Service openstack-heat-api.service loaded active running OpenStack Heat API Service openstack-heat-engine.service loaded active running Openstack Heat Engine Service user-1000.slice loaded active active User Slice of heat-admin HTTP Heat Services: None [heat-admin@overcloud-controller-0 ~]$ sudo httpd -t -D DUMP_VHOSTS | grep heat [heat-admin@overcloud-controller-0 ~]$ 3. Upgrade to Pike (containerized): 3.1. Download master tht to tht-master 3.2. Specify docker registry in docker_registry.yaml file: parameter_defaults: DockerNamespace: 192.168.24.1:8787/tripleoupstream DockerNamespaceIsRegistry: true EOF 3.3. Download container images: openstack overcloud container image upload --config-file /usr/share/openstack-tripleo-common/container-images/overcloud_containers.yaml 3.4 Prepara container image definition yaml file: openstack overcloud container image prepare \ --namespace tripleoupstream \ --tag latest \ --env-file docker-centos-tripleoupstream.yaml 3.5 Upgrade via command: export THT=/home/stack/tht-master (undercloud) [stack@undercloud ~]$ openstack overcloud deploy --templates $THT \ --libvirt-type qemu \ --ntp-server pool.ntp.org \ -e $THT/environments/docker.yaml \ -e $THT/environments/major-upgrade-composable-steps-docker.yaml \ -e docker-centos-tripleoupstream.yaml \ -e docker_registry.yaml \ -e upgrade_repos.yaml 4. Service heat_api_cloudwatch is running under apache: [heat-admin@overcloud-controller-0 ~]$ sudo httpd -t -D DUMP_VHOSTS | grep heat 192.168.24.7:8003 overcloud-controller-0.internalapi.localdomain (/etc/httpd/conf.d/10-heat_api_cloudwatch_wsgi.conf:6) [heat-admin@overcloud-controller-0 ~]$ sudo systemctl status httpd ● httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/httpd.service.d └─openstack-dashboard.conf Active: active (running) since Tue 2017-09-26 13:46:07 UTC; 1h 53min ago Docs: man:httpd(8) man:apachectl(8) Main PID: 115195 (httpd) Status: "Total requests: 0; Current requests/sec: 0; Current traffic: 0 B/sec" Memory: 355.0M CGroup: /system.slice/httpd.service ├─115195 /usr/sbin/httpd -DFOREGROUND ├─115243 cinder_wsgi -DFOREGROUND ├─115244 cinder_wsgi -DFOREGROUND ├─115245 heat_api_cloudw -DFOREGROUND <<<<<<<<<< ├─115247 /usr/sbin/httpd -DFOREGROUND ├─115248 /usr/sbin/httpd -DFOREGROUND ├─115249 /usr/sbin/httpd -DFOREGROUND ├─115250 /usr/sbin/httpd -DFOREGROUND ├─115251 /usr/sbin/httpd -DFOREGROUND ├─115253 /usr/sbin/httpd -DFOREGROUND ├─115263 /usr/sbin/httpd -DFOREGROUND └─115264 /usr/sbin/httpd -DFOREGROUND 5. But no heat service is running under systemd: [heat-admin@overcloud-controller-0 ~]$ sudo systemctl list-units | grep heat session-27.scope loaded active running Session 27 of user heat-admin user-1000.slice loaded active active User Slice of heat-admin 6. All heat services are running under containers except heat-api-cloudwatch: [heat-admin@overcloud-controller-0 ~]$ sudo docker ps | grep heat 085051b5dfbb tripleoupstream/centos-binary-heat-api:latest "kolla_start" About an hour ago Up About an hour (healthy) heat_api_cron 39e2b83a8bfc tripleoupstream/centos-binary-heat-api-cfn:latest "kolla_start" About an hour ago Up About an hour (healthy) heat_api_cfn b80786761582 tripleoupstream/centos-binary-heat-engine:latest "kolla_start" About an hour ago Up About an hour (healthy) heat_engine e9b2ddafcd7b tripleoupstream/centos-binary-heat-api:latest "kolla_start" About an hour ago Up About an hour (healthy) heat_api Actual results: Expected results: Additional info:
These are the puppet logs from the moment heat-api-cloudwatch is re-configured as an httpd service during the upgrade: http://pastebin.test.redhat.com/519887 Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Heat::Api_cloudwatch/Service[heat-api-cloudwatch]: Triggered 'refresh' from 1 events", Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Heat::Deps/Anchor[heat::service::end]: Triggered 'refresh' from 1 events", Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: executed successfully", Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Apache/Apache::Vhost[default]/Concat[15-default.conf]/File[/etc/httpd/conf.d/15-default.conf]/ensure: removed", Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Heat::Wsgi::Apache_api_cloudwatch/Heat::Wsgi::Apache[api_cloudwatch]/Openstacklib::Wsgi::Apache[heat_api_cloudwatch_wsgi]/File[/var/www/cgi-bin/heat]/ensure: created", Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Heat::Wsgi::Apache_api_cloudwatch/Heat::Wsgi::Apache[api_cloudwatch]/Openstacklib::Wsgi::Apache[heat_api_cloudwatch_wsgi]/File[heat_api_cloudwatch_wsgi]/ensure: defined content as '{md5}2eb19266988f424046d53acfbcf01c2c'", Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Heat::Wsgi::Apache_api_cloudwatch/Heat::Wsgi::Apache[api_cloudwatch]/Openstacklib::Wsgi::Apache[heat_api_cloudwatch_wsgi]/Apache::Vhost[heat_api_cloudwatch_wsgi]/Concat[10-heat_api_cloudwatch_wsgi.conf]/File[/etc/httpd/conf.d/10-heat_api_cloudwatch_wsgi.conf]/ensure: defined content as '{md5}bd90943ed4380f5332bf94387bd4fe06'", Sep 26 13:53:23 localhost os-collect-config: "Notice: /Stage[main]/Apache::Service/Service[httpd]/ensure: ensure changed 'stopped' to 'running'", Also, it might be useful noticing that there is not heat template for the heat-api-cloudwatch service under /tripleo-heat-templates/docker/services: https://github.com/openstack/tripleo-heat-templates/tree/master/docker/services
Hi Jose, the behaviour you're describing in comment #0 sounds right and it is what is currently specified by the tripleo-heat-templates - that is resource registry pointing to the puppet/services/heat-api-cloudwatch.yaml https://github.com/openstack/tripleo-heat-templates/blob/e1a9638732290c247e5dac10392bc8702b531981/overcloud-resource-registry-puppet.j2.yaml#L136 and HeatApiCloudwatch is included in the default controller role services As you're pointing out in comment #1 there is no /docker/services/heat-api-cloudwatch.yaml i.e. the heat-api-cloudwatch.yaml service is not containerized. I see that this service was converted to run under httpd with httpd with https://review.openstack.org/#/c/440977/4/puppet/services/heat-api-cloudwatch.yaml@78 on 10 March. That change is not in Ocata so indeed as per your comment #0 you have systemd openstack-heat-api-cloudwatch.service loaded active running before the upgrade, then on Pike you have that service stopped and disabled here https://github.com/openstack/tripleo-heat-templates/blob/e1a9638732290c247e5dac10392bc8702b531981/puppet/services/heat-api-cloudwatch.yaml#L138-L141. Instead heat-api-cloudwatch is being served by httpd. So to be clear, this BZ is for "why don't we have containerized heat-api-cloudwatch-api lets do it" right? If so we should reach out to deployment/containers team and see what they think/feasible/how much work etc. Holding on marking triaged incase we go to secondary not primary in tracking.
we discussed this on upgrades scrum yesterday - Jose is going to find out more about why this service was not containerized as per comment #2. If it was 'just forgotten' then we can use this BZ to track the effort. If there is some legitimate reason we can either close this bz or track whatever the longer term effort is to overcome the problem, if that is possible. We can mark triaged once the get the answer.
As discussed in today's scrum meeting there is not a clear idea on how to proceed with the service upon upgrade. The service seems to be deprecated: "This feature will be deprecated or removed during the Havana cycle as we move to using Ceilometer as a metric/alarm service instead [1]", however it is deployed by default in the roles data. Also, if it is deprecated, what should be the right way to proceed with it during upgrade? Should the configuration be migrated to the corresponding service (Ceilometer) and stop heat-api-cloudwatch? Or, should it be stopped? (currently the service goes from running as a systemd service to an apache service when upgrading from Ocata to Pike). Or, is the current behavior the right one? Can someone from the CloudApps or Telemetry DFG's solve our doubts? [1] https://wiki.openstack.org/wiki/Heat/Using-CloudWatch
Cleaning up the needinfos a bit.
It looks like aschultz has to remove the services from being deployed,but there should be a corresponding upgrade task to clean up as well.
The patch to remove the services from the roles is https://review.openstack.org/#/c/508964/. This does not cleanup the existing running services which would need to be done via the upgrade processes. For that I'd have to defer to the Upgrades & CloudApp DFGs on those bits.
thanks Alex going to post something to remove cloudwatch api by default (but allow operator to keep it if wanted) we can track both of those things here so added the other launchpad to trackers too
Moving this to POST... there are two reviews in trackers... the one at https://review.openstack.org/#/c/508964/ removes the service from the roles files on master. We didn't backport that one to Pike and I don't think we need to/should - adding needinfo Alex what do you think? If you agree we can remove it from trackers so its clearer for the release delivery folks what need to go into the package build
Yes I think that makes sense. For pike we want to clean it up so we need to leave it in the roles so it gets properly disabled.
After upgrade: [root@controller-0 heat-admin]# docker ps | grep heat 171049f0028a rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-heat-api-docker:20171103.1 "kolla_start" 37 minutes ago Up 37 minutes (healthy) heat_api_cron 54d6481ca142 rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-heat-api-cfn-docker:20171103.1 "kolla_start" 37 minutes ago Up 37 minutes (healthy) heat_api_cfn 529c66916005 rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-heat-engine-docker:20171103.1 "kolla_start" 37 minutes ago Up 37 minutes (healthy) heat_engine 344195defd4f rhos-qe-mirror-tlv.usersys.redhat.com:5000/rhosp12/openstack-heat-api-docker:20171103.1 "kolla_start" 37 minutes ago Up 37 minutes (healthy) heat_api [root@controller-0 heat-admin]# sudo httpd -t -D DUMP_VHOSTS | grep heat [root@controller-0 heat-admin]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462