Bug 1416073
| Summary: | OpenStack Services Removed from Pacemaker Not Set To Enabled in Systemd for Reboot | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Benjamin Schmaus <bschmaus> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Michele Baldessari <michele> |
| Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 10.0 (Newton) | CC: | apevec, augol, bschmaus, cpaquin, dbecker, djuran, jjoyce, jwang, lhh, mburns, michele, morazi, rhel-osp-director-maint, srevivo, ushkalim |
| Target Milestone: | z3 | Keywords: | Reopened, Triaged, ZStream |
| Target Release: | 10.0 (Newton) | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-5.2.0-18.el7ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-06-28 14:44:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Benjamin Schmaus
2017-01-24 14:14:26 UTC
I am working on testing the fresh install again to see if I can reproduce. However the field has reported at a customer that on upgrade the services were disabled. *** Bug 1416083 has been marked as a duplicate of this bug. *** I checked with another customer who has already walked through a couple of OSP9 to OSP10 test upgrades and they are not seeing the behaviour. Thanks - I plan to run through the upgrade again so I will see if I see the same issue and will update this BZ with my findings. I was unable to reproduce this on an upgrade and/or a fresh install: [root@overcloud-controller-0 heat-admin]# systemctl list-unit-files|grep enabled|grep openstack openstack-aodh-evaluator.service enabled openstack-aodh-listener.service enabled openstack-aodh-notifier.service enabled openstack-ceilometer-central.service enabled openstack-ceilometer-collector.service enabled openstack-ceilometer-notification.service enabled openstack-cinder-api.service enabled openstack-cinder-scheduler.service enabled openstack-glance-api.service enabled openstack-glance-registry.service enabled openstack-gnocchi-metricd.service enabled openstack-gnocchi-statsd.service enabled openstack-heat-api-cfn.service enabled openstack-heat-api-cloudwatch.service enabled openstack-heat-api.service enabled openstack-heat-engine.service enabled openstack-nova-api.service enabled openstack-nova-conductor.service enabled openstack-nova-consoleauth.service enabled openstack-nova-novncproxy.service enabled openstack-nova-scheduler.service enabled openstack-swift-account-auditor.service enabled openstack-swift-account-reaper.service enabled openstack-swift-account-replicator.service enabled openstack-swift-account.service enabled openstack-swift-container-auditor.service enabled openstack-swift-container-replicator.service enabled openstack-swift-container-updater.service enabled openstack-swift-container.service enabled openstack-swift-object-auditor.service enabled openstack-swift-object-replicator.service enabled openstack-swift-object-updater.service enabled openstack-swift-object.service enabled openstack-swift-proxy.service enabled Customer has been unable to reproduce either at this time. Ack, let me know if there are any logs for me to look at. Basically the idea is that after the convergence step in which puppet runs on all nodes, it should enable the systemd services. Customer did not save off any logs when they experienced the issue. They are already aware that without logs from issue and without ability to reproduce we really cannot troubleshoot further. Reopening as this happened to me as well, during an upgrade from OSP9 to 10. After carrying out the step https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller httpd (which now is running keystone) is now disabled. I'm not sure whether it will get enabled at a later stage, but since the node is supposed to be rebooted during this stage, enabling it later will not be sufficient. Actually, the above hold true for most (all?) migrated services... [root@overcloud-controller-0 keystone]# systemctl list-unit-files '*openstack*' UNIT FILE STATE openstack-aodh-api.service disabled openstack-aodh-evaluator.service disabled openstack-aodh-listener.service disabled openstack-aodh-notifier.service disabled openstack-ceilometer-api.service disabled openstack-ceilometer-central.service disabled openstack-ceilometer-collector.service disabled openstack-ceilometer-compute.service disabled openstack-ceilometer-notification.service disabled openstack-ceilometer-polling.service disabled openstack-cinder-api.service disabled openstack-cinder-backup.service disabled openstack-cinder-scheduler.service disabled openstack-cinder-volume.service disabled openstack-glance-api.service disabled openstack-glance-glare.service disabled openstack-glance-registry.service disabled openstack-glance-scrubber.service disabled openstack-gnocchi-api.service disabled openstack-gnocchi-metricd.service disabled openstack-gnocchi-statsd.service disabled openstack-heat-api-cfn.service disabled openstack-heat-api-cloudwatch.service disabled openstack-heat-api.service disabled openstack-heat-engine.service disabled openstack-manila-api.service disabled openstack-manila-data.service disabled openstack-manila-scheduler.service disabled openstack-manila-share.service disabled openstack-nova-api.service disabled openstack-nova-cert.service disabled openstack-nova-compute.service disabled openstack-nova-conductor.service disabled openstack-nova-console.service disabled openstack-nova-consoleauth.service disabled openstack-nova-metadata-api.service disabled openstack-nova-novncproxy.service disabled openstack-nova-os-compute-api.service disabled openstack-nova-scheduler.service disabled openstack-nova-xvpvncproxy.service disabled openstack-sahara-all.service disabled openstack-sahara-api.service disabled openstack-sahara-engine.service disabled openstack-swift-account-auditor.service enabled openstack-swift-account-auditor@.service disabled openstack-swift-account-reaper.service enabled openstack-swift-account-reaper@.service disabled openstack-swift-account-replicator.service enabled openstack-swift-account-replicator@.service disabled openstack-swift-account.service enabled openstack-swift-account@.service disabled openstack-swift-container-auditor.service enabled openstack-swift-container-auditor@.service disabled openstack-swift-container-reconciler.service disabled openstack-swift-container-replicator.service enabled openstack-swift-container-replicator@.service disabled openstack-swift-container-updater.service enabled openstack-swift-container-updater@.service disabled openstack-swift-container.service enabled openstack-swift-container@.service disabled openstack-swift-object-auditor.service enabled openstack-swift-object-auditor@.service disabled openstack-swift-object-expirer.service disabled openstack-swift-object-reconstructor.service disabled openstack-swift-object-reconstructor@.service disabled openstack-swift-object-replicator.service enabled openstack-swift-object-replicator@.service disabled openstack-swift-object-updater.service enabled openstack-swift-object-updater@.service disabled openstack-swift-object.service enabled openstack-swift-object@.service disabled openstack-swift-proxy.service enabled 72 unit files listed. And actually, it seems httpd is not enabled also in an environment running OSP9, which was updgraded from OSP8... Puppet should enable all these services once the converge step runs. If that is not the case we want to know. That is indeed not the case, at least not in the https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller step That is not the converge step. This is the converge step: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Finalization Sorry, my misunderstanding. But anyway, to enable them in the convergence step is too late as we recommend the user to reboot the controllers already in the "Upgrading Controller Nodes"-step, i.e. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller So since the services aren't enabled, they won't come up after the reboot with an extended outage as a result. Hi David, I see, thanks for the feedback. I wasn't aware we recommended a reboot in the major-upgrade step. I went through the code and I will propose a fix upstream (newton only as for ocata we switched to different upgrade architecture). Since I believe it is urgent, I will try to explain what happens during the 9->10 upgrade and then we can discuss how we best address this until a fix lands. During the 9->10 upgrade we move to the HA NG architecture for the control plane (which in short means we move from almost all services being managed by pacemaker on the controller to only a few. We basically move from http://acksyn.org/files/tripleo/mitaka-new-install.pdf to http://acksyn.org/files/tripleo/light-cib-nomongo.pdf in terms of pacemaker resources). So during the "major-upgrade-pacemaker" step we basically do the following: 1) For all the OSP9-mitaka pacemaker services we first delete any constraints and then disable and then delete the pacemaker resource from the cluster CIB https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh#L67-L99 [A] 2) We then update all packages via yum 3) Then we start all the services. Now, 1-3 happen all during the major-upgrade-pacemaker step so after that is completed the services are not enabled by default. If you want to do that by hand , in the meantime until convergence runs, you can take the list above at [A] and simply enable them via systemctl enable "${service%%-clone}" The enabling by hand would happen after the major-upgrade-pacemaker step has completed and it would be run on all controllers before the convergence step. Code verified on: openstack-tripleo-heat-templates-5.2.0-18.el7ost Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1585 |