Description of problem: Whether doing an upgrade from OSP9 to OSP10 or a fresh install of OSP10, it appears that the services that get removed from pacemaker while started during the deployment or upgrade are never set to start on a controller node reboot. systemctl output of the services shows them disabled openstack-cinder-api.service disabled openstack-cinder-backup.service disabled openstack-cinder-scheduler.service disabled openstack-cinder-volume.service disabled openstack-glance-api.service disabled openstack-glance-glare.service disabled openstack-glance-registry.service disabled openstack-glance-scrubber.service disabled openstack-gnocchi-api.service disabled openstack-gnocchi-metricd.service disabled openstack-gnocchi-statsd.service disabled openstack-heat-api-cfn.service disabled openstack-heat-api-cloudwatch.service disabled openstack-heat-api.service disabled openstack-heat-engine.service disabled openstack-manila-api.service disabled openstack-manila-data.service disabled openstack-manila-scheduler.service disabled openstack-manila-share.service disabled openstack-nova-api.service disabled openstack-nova-cert.service disabled Version-Release number of selected component (if applicable): OSP10 How reproducible: 100% Steps to Reproduce: 1.Deploy simple OSP10 environment 2. 3. Actual results: Services in list above are not enabled on reboot Expected results: Services in list above should be enabled on reboot Additional info:
I am working on testing the fresh install again to see if I can reproduce. However the field has reported at a customer that on upgrade the services were disabled.
*** Bug 1416083 has been marked as a duplicate of this bug. ***
I checked with another customer who has already walked through a couple of OSP9 to OSP10 test upgrades and they are not seeing the behaviour.
Thanks - I plan to run through the upgrade again so I will see if I see the same issue and will update this BZ with my findings.
I was unable to reproduce this on an upgrade and/or a fresh install: [root@overcloud-controller-0 heat-admin]# systemctl list-unit-files|grep enabled|grep openstack openstack-aodh-evaluator.service enabled openstack-aodh-listener.service enabled openstack-aodh-notifier.service enabled openstack-ceilometer-central.service enabled openstack-ceilometer-collector.service enabled openstack-ceilometer-notification.service enabled openstack-cinder-api.service enabled openstack-cinder-scheduler.service enabled openstack-glance-api.service enabled openstack-glance-registry.service enabled openstack-gnocchi-metricd.service enabled openstack-gnocchi-statsd.service enabled openstack-heat-api-cfn.service enabled openstack-heat-api-cloudwatch.service enabled openstack-heat-api.service enabled openstack-heat-engine.service enabled openstack-nova-api.service enabled openstack-nova-conductor.service enabled openstack-nova-consoleauth.service enabled openstack-nova-novncproxy.service enabled openstack-nova-scheduler.service enabled openstack-swift-account-auditor.service enabled openstack-swift-account-reaper.service enabled openstack-swift-account-replicator.service enabled openstack-swift-account.service enabled openstack-swift-container-auditor.service enabled openstack-swift-container-replicator.service enabled openstack-swift-container-updater.service enabled openstack-swift-container.service enabled openstack-swift-object-auditor.service enabled openstack-swift-object-replicator.service enabled openstack-swift-object-updater.service enabled openstack-swift-object.service enabled openstack-swift-proxy.service enabled
Customer has been unable to reproduce either at this time.
Ack, let me know if there are any logs for me to look at. Basically the idea is that after the convergence step in which puppet runs on all nodes, it should enable the systemd services.
Customer did not save off any logs when they experienced the issue. They are already aware that without logs from issue and without ability to reproduce we really cannot troubleshoot further.
Reopening as this happened to me as well, during an upgrade from OSP9 to 10. After carrying out the step https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller httpd (which now is running keystone) is now disabled. I'm not sure whether it will get enabled at a later stage, but since the node is supposed to be rebooted during this stage, enabling it later will not be sufficient.
Actually, the above hold true for most (all?) migrated services... [root@overcloud-controller-0 keystone]# systemctl list-unit-files '*openstack*' UNIT FILE STATE openstack-aodh-api.service disabled openstack-aodh-evaluator.service disabled openstack-aodh-listener.service disabled openstack-aodh-notifier.service disabled openstack-ceilometer-api.service disabled openstack-ceilometer-central.service disabled openstack-ceilometer-collector.service disabled openstack-ceilometer-compute.service disabled openstack-ceilometer-notification.service disabled openstack-ceilometer-polling.service disabled openstack-cinder-api.service disabled openstack-cinder-backup.service disabled openstack-cinder-scheduler.service disabled openstack-cinder-volume.service disabled openstack-glance-api.service disabled openstack-glance-glare.service disabled openstack-glance-registry.service disabled openstack-glance-scrubber.service disabled openstack-gnocchi-api.service disabled openstack-gnocchi-metricd.service disabled openstack-gnocchi-statsd.service disabled openstack-heat-api-cfn.service disabled openstack-heat-api-cloudwatch.service disabled openstack-heat-api.service disabled openstack-heat-engine.service disabled openstack-manila-api.service disabled openstack-manila-data.service disabled openstack-manila-scheduler.service disabled openstack-manila-share.service disabled openstack-nova-api.service disabled openstack-nova-cert.service disabled openstack-nova-compute.service disabled openstack-nova-conductor.service disabled openstack-nova-console.service disabled openstack-nova-consoleauth.service disabled openstack-nova-metadata-api.service disabled openstack-nova-novncproxy.service disabled openstack-nova-os-compute-api.service disabled openstack-nova-scheduler.service disabled openstack-nova-xvpvncproxy.service disabled openstack-sahara-all.service disabled openstack-sahara-api.service disabled openstack-sahara-engine.service disabled openstack-swift-account-auditor.service enabled openstack-swift-account-auditor@.service disabled openstack-swift-account-reaper.service enabled openstack-swift-account-reaper@.service disabled openstack-swift-account-replicator.service enabled openstack-swift-account-replicator@.service disabled openstack-swift-account.service enabled openstack-swift-account@.service disabled openstack-swift-container-auditor.service enabled openstack-swift-container-auditor@.service disabled openstack-swift-container-reconciler.service disabled openstack-swift-container-replicator.service enabled openstack-swift-container-replicator@.service disabled openstack-swift-container-updater.service enabled openstack-swift-container-updater@.service disabled openstack-swift-container.service enabled openstack-swift-container@.service disabled openstack-swift-object-auditor.service enabled openstack-swift-object-auditor@.service disabled openstack-swift-object-expirer.service disabled openstack-swift-object-reconstructor.service disabled openstack-swift-object-reconstructor@.service disabled openstack-swift-object-replicator.service enabled openstack-swift-object-replicator@.service disabled openstack-swift-object-updater.service enabled openstack-swift-object-updater@.service disabled openstack-swift-object.service enabled openstack-swift-object@.service disabled openstack-swift-proxy.service enabled 72 unit files listed. And actually, it seems httpd is not enabled also in an environment running OSP9, which was updgraded from OSP8...
Puppet should enable all these services once the converge step runs. If that is not the case we want to know.
That is indeed not the case, at least not in the https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller step
That is not the converge step. This is the converge step: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Finalization
Sorry, my misunderstanding. But anyway, to enable them in the convergence step is too late as we recommend the user to reboot the controllers already in the "Upgrading Controller Nodes"-step, i.e. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller So since the services aren't enabled, they won't come up after the reboot with an extended outage as a result.
Hi David, I see, thanks for the feedback. I wasn't aware we recommended a reboot in the major-upgrade step. I went through the code and I will propose a fix upstream (newton only as for ocata we switched to different upgrade architecture). Since I believe it is urgent, I will try to explain what happens during the 9->10 upgrade and then we can discuss how we best address this until a fix lands. During the 9->10 upgrade we move to the HA NG architecture for the control plane (which in short means we move from almost all services being managed by pacemaker on the controller to only a few. We basically move from http://acksyn.org/files/tripleo/mitaka-new-install.pdf to http://acksyn.org/files/tripleo/light-cib-nomongo.pdf in terms of pacemaker resources). So during the "major-upgrade-pacemaker" step we basically do the following: 1) For all the OSP9-mitaka pacemaker services we first delete any constraints and then disable and then delete the pacemaker resource from the cluster CIB https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh#L67-L99 [A] 2) We then update all packages via yum 3) Then we start all the services. Now, 1-3 happen all during the major-upgrade-pacemaker step so after that is completed the services are not enabled by default. If you want to do that by hand , in the meantime until convergence runs, you can take the list above at [A] and simply enable them via systemctl enable "${service%%-clone}"
The enabling by hand would happen after the major-upgrade-pacemaker step has completed and it would be run on all controllers before the convergence step.
Code verified on: openstack-tripleo-heat-templates-5.2.0-18.el7ost
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1585