Bug 1416073 - OpenStack Services Removed from Pacemaker Not Set To Enabled in Systemd for Reboot
Summary: OpenStack Services Removed from Pacemaker Not Set To Enabled in Systemd for R...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: z3
: 10.0 (Newton)
Assignee: Michele Baldessari
QA Contact: Amit Ugol
URL:
Whiteboard:
: 1416083 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-24 14:14 UTC by Benjamin Schmaus
Modified: 2020-12-14 08:02 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.2.0-18.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-28 14:44:12 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1677346 0 None None None 2017-03-29 18:49:46 UTC
OpenStack gerrit 451512 0 None MERGED [Newton-only] Enable services in the major-upgrade-pacemaker step 2020-06-23 11:45:54 UTC
Red Hat Knowledge Base (Article) 2986341 0 None None None 2017-03-29 19:07:43 UTC
Red Hat Product Errata RHBA-2017:1585 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 director Bug Fix Advisory 2017-06-28 18:42:51 UTC

Description Benjamin Schmaus 2017-01-24 14:14:26 UTC
Description of problem:

Whether doing an upgrade from OSP9 to OSP10 or a fresh install of OSP10, it appears that the services that get removed from pacemaker while started during the deployment or upgrade are never set to start on a controller node reboot.

systemctl output of the services shows them disabled

openstack-cinder-api.service                  disabled
openstack-cinder-backup.service               disabled
openstack-cinder-scheduler.service            disabled
openstack-cinder-volume.service               disabled
openstack-glance-api.service                  disabled
openstack-glance-glare.service                disabled
openstack-glance-registry.service             disabled
openstack-glance-scrubber.service             disabled
openstack-gnocchi-api.service                 disabled
openstack-gnocchi-metricd.service             disabled
openstack-gnocchi-statsd.service              disabled
openstack-heat-api-cfn.service                disabled
openstack-heat-api-cloudwatch.service         disabled
openstack-heat-api.service                    disabled
openstack-heat-engine.service                 disabled
openstack-manila-api.service                  disabled
openstack-manila-data.service                 disabled
openstack-manila-scheduler.service            disabled
openstack-manila-share.service                disabled
openstack-nova-api.service                    disabled
openstack-nova-cert.service                   disabled


Version-Release number of selected component (if applicable):
OSP10

How reproducible:
100%

Steps to Reproduce:
1.Deploy simple OSP10 environment
2.
3.

Actual results:
Services in list above are not enabled on reboot

Expected results:
Services in list above should be enabled on reboot

Additional info:

Comment 1 Benjamin Schmaus 2017-01-24 14:28:06 UTC
I am working on testing the fresh install again to see if I can reproduce.  However the field has reported at a customer that on upgrade the services were disabled.

Comment 2 Chris Paquin 2017-01-24 14:47:38 UTC
*** Bug 1416083 has been marked as a duplicate of this bug. ***

Comment 3 Benjamin Schmaus 2017-01-24 15:09:00 UTC
I checked with another customer who has already walked through a couple of OSP9 to OSP10 test upgrades and they are not seeing the behaviour.

Comment 4 Chris Paquin 2017-01-24 16:44:13 UTC
Thanks - I plan to run through the upgrade again so I will see if I see the same issue and will update this BZ with my findings.

Comment 5 Benjamin Schmaus 2017-01-27 19:12:16 UTC
I was unable to reproduce this on an upgrade and/or a fresh install:

[root@overcloud-controller-0 heat-admin]# systemctl list-unit-files|grep enabled|grep openstack
openstack-aodh-evaluator.service              enabled 
openstack-aodh-listener.service               enabled 
openstack-aodh-notifier.service               enabled 
openstack-ceilometer-central.service          enabled 
openstack-ceilometer-collector.service        enabled 
openstack-ceilometer-notification.service     enabled 
openstack-cinder-api.service                  enabled 
openstack-cinder-scheduler.service            enabled 
openstack-glance-api.service                  enabled 
openstack-glance-registry.service             enabled 
openstack-gnocchi-metricd.service             enabled 
openstack-gnocchi-statsd.service              enabled 
openstack-heat-api-cfn.service                enabled 
openstack-heat-api-cloudwatch.service         enabled 
openstack-heat-api.service                    enabled 
openstack-heat-engine.service                 enabled 
openstack-nova-api.service                    enabled 
openstack-nova-conductor.service              enabled 
openstack-nova-consoleauth.service            enabled 
openstack-nova-novncproxy.service             enabled 
openstack-nova-scheduler.service              enabled 
openstack-swift-account-auditor.service       enabled 
openstack-swift-account-reaper.service        enabled 
openstack-swift-account-replicator.service    enabled 
openstack-swift-account.service               enabled 
openstack-swift-container-auditor.service     enabled 
openstack-swift-container-replicator.service  enabled 
openstack-swift-container-updater.service     enabled 
openstack-swift-container.service             enabled 
openstack-swift-object-auditor.service        enabled 
openstack-swift-object-replicator.service     enabled 
openstack-swift-object-updater.service        enabled 
openstack-swift-object.service                enabled 
openstack-swift-proxy.service                 enabled

Comment 6 Benjamin Schmaus 2017-02-05 17:28:17 UTC
Customer has been unable to reproduce either at this time.

Comment 7 Michele Baldessari 2017-02-06 11:20:15 UTC
Ack, let me know if there are any logs for me to look at. Basically the idea is that after the convergence step in which puppet runs on all nodes, it should enable the systemd services.

Comment 8 Benjamin Schmaus 2017-02-06 12:25:51 UTC
Customer did not save off any logs when they experienced the issue.  They are already aware that without logs from issue and without ability to reproduce we really cannot troubleshoot further.

Comment 10 David Juran 2017-03-15 09:51:23 UTC
Reopening as this happened to me as well, during an upgrade from OSP9 to 10.

After carrying out the step https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller

httpd (which now is running keystone) is now disabled. I'm not sure whether it will get enabled at a later stage, but since the node is supposed to be rebooted during this stage, enabling it later will not be sufficient.

Comment 11 David Juran 2017-03-15 10:12:11 UTC
Actually, the above hold true for most (all?) migrated services...

 [root@overcloud-controller-0 keystone]# systemctl list-unit-files '*openstack*'
UNIT FILE STATE
openstack-aodh-api.service disabled
openstack-aodh-evaluator.service disabled
openstack-aodh-listener.service disabled
openstack-aodh-notifier.service disabled
openstack-ceilometer-api.service disabled
openstack-ceilometer-central.service disabled
openstack-ceilometer-collector.service disabled
openstack-ceilometer-compute.service disabled
openstack-ceilometer-notification.service disabled
openstack-ceilometer-polling.service disabled
openstack-cinder-api.service disabled
openstack-cinder-backup.service disabled
openstack-cinder-scheduler.service disabled
openstack-cinder-volume.service disabled
openstack-glance-api.service disabled
openstack-glance-glare.service disabled
openstack-glance-registry.service disabled
openstack-glance-scrubber.service disabled
openstack-gnocchi-api.service disabled
openstack-gnocchi-metricd.service disabled
openstack-gnocchi-statsd.service disabled
openstack-heat-api-cfn.service disabled
openstack-heat-api-cloudwatch.service disabled
openstack-heat-api.service disabled
openstack-heat-engine.service disabled
openstack-manila-api.service disabled
openstack-manila-data.service disabled
openstack-manila-scheduler.service disabled
openstack-manila-share.service disabled
openstack-nova-api.service disabled
openstack-nova-cert.service disabled
openstack-nova-compute.service disabled
openstack-nova-conductor.service disabled
openstack-nova-console.service disabled
openstack-nova-consoleauth.service disabled
openstack-nova-metadata-api.service disabled
openstack-nova-novncproxy.service disabled
openstack-nova-os-compute-api.service disabled
openstack-nova-scheduler.service disabled
openstack-nova-xvpvncproxy.service disabled
openstack-sahara-all.service disabled
openstack-sahara-api.service disabled
openstack-sahara-engine.service disabled
openstack-swift-account-auditor.service enabled
openstack-swift-account-auditor@.service disabled
openstack-swift-account-reaper.service enabled
openstack-swift-account-reaper@.service disabled
openstack-swift-account-replicator.service enabled
openstack-swift-account-replicator@.service disabled
openstack-swift-account.service enabled
openstack-swift-account@.service disabled
openstack-swift-container-auditor.service enabled
openstack-swift-container-auditor@.service disabled
openstack-swift-container-reconciler.service disabled
openstack-swift-container-replicator.service enabled
openstack-swift-container-replicator@.service disabled
openstack-swift-container-updater.service enabled
openstack-swift-container-updater@.service disabled
openstack-swift-container.service enabled
openstack-swift-container@.service disabled
openstack-swift-object-auditor.service enabled
openstack-swift-object-auditor@.service disabled
openstack-swift-object-expirer.service disabled
openstack-swift-object-reconstructor.service disabled
openstack-swift-object-reconstructor@.service disabled
openstack-swift-object-replicator.service enabled
openstack-swift-object-replicator@.service disabled
openstack-swift-object-updater.service enabled
openstack-swift-object-updater@.service disabled
openstack-swift-object.service enabled
openstack-swift-object@.service disabled
openstack-swift-proxy.service enabled

72 unit files listed.

And actually, it seems httpd is not enabled also in an environment running OSP9, which was updgraded from OSP8...

Comment 12 Michele Baldessari 2017-03-16 07:31:29 UTC
Puppet should enable all these services once the converge step runs. If that is not the case we want to know.

Comment 15 David Juran 2017-03-20 16:38:42 UTC
Sorry, my misunderstanding.

But anyway, to enable them in the convergence step is too late as we recommend the user to reboot the controllers already in the "Upgrading Controller Nodes"-step, i.e. https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/upgrading_red_hat_openstack_platform/chap-upgrading_the_environment#sect-Major-Upgrading_the_Overcloud-Controller

So since the services aren't enabled, they won't come up after the reboot with an extended outage as a result.

Comment 16 Michele Baldessari 2017-03-29 18:37:15 UTC
Hi David,

I see, thanks for the feedback. I wasn't aware we recommended a reboot in the major-upgrade step. I went through the code and I will propose a fix upstream (newton only as for ocata we switched to different upgrade architecture).

Since I believe it is urgent, I will try to explain what happens during the 9->10 upgrade and then we can discuss how we best address this until a fix lands.

During the 9->10 upgrade we move to the HA NG architecture for the control plane
(which in short means we move from almost all services being managed by pacemaker 
on the controller to only a few. We basically move from 
http://acksyn.org/files/tripleo/mitaka-new-install.pdf to 
http://acksyn.org/files/tripleo/light-cib-nomongo.pdf in terms of pacemaker 
resources).

So during the "major-upgrade-pacemaker" step we basically do the
following:
1) For all the OSP9-mitaka pacemaker services we first delete any constraints and then disable and then delete the pacemaker resource from the cluster CIB
https://github.com/openstack/tripleo-heat-templates/blob/stable/newton/extraconfig/tasks/major_upgrade_pacemaker_migrations.sh#L67-L99 [A]
2) We then update all packages via yum
3) Then we start all the services. 

Now, 1-3 happen all during the major-upgrade-pacemaker step so after that is completed the services are not enabled by default. If you want to do that by hand , in the meantime until convergence runs, you can take the list above at [A] and simply enable them via systemctl enable "${service%%-clone}"

Comment 17 Michele Baldessari 2017-03-29 18:42:43 UTC
The enabling by hand would happen after the major-upgrade-pacemaker step has completed and it would be run on all controllers before the convergence step.

Comment 26 Udi Shkalim 2017-06-05 08:59:42 UTC
Code verified on: openstack-tripleo-heat-templates-5.2.0-18.el7ost

Comment 28 errata-xmlrpc 2017-06-28 14:44:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1585


Note You need to log in before you can comment on or make changes to this bug.