Description of problem: During the stack update, puppet try to restart services via systemctl and it's fail cause pacemaker is controlling services on controller overcloud Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch How reproducible: Steps to Reproduce: 1. Install undercloud / overcloud in 7.0 2. Update undercloud in 7.1 3. Update openstack-puppet-modules on all nodes cf https://bugzilla.redhat.com/show_bug.cgi?id=1267318 4. to update the stack, do a openstack overcloud deploy --templates /home/stack/templates-7.1/ [...] Actual results: Stack failed with status: resources.ControllerNodesPostDeployment: resources.ControllerOvercloudServicesDeployment_Step4: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 ERROR: openstack Heat Stack update failed. And in puppet : heat deployment-output-show d9e1fd93-9385-44a7-8f1b-7726ff4a4cee deploy_stderr Warning: Scope(Class[Keystone]): Execution of db_sync does not depend on $enabled anymore. Please use sync_db instead. Warning: Scope(Class[Glance::Registry]): Execution of db_sync does not depend on $manage_service or $enabled anymore. Please use sync_db instead. Warning: Scope(Class[Nova::Api]): The conductor_workers parameter is deprecated and has no effect. Use workers parameter of nova::conductor class instead. Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_host'; class ::nova::compute has not been evaluated Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_protocol'; class ::nova::compute has not been evaluated Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_port'; class ::nova::compute has not been evaluated Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_path'; class ::nova::compute has not been evaluated Warning: Scope(Class[Concat::Setup]): concat::setup is deprecated as a public API of the concat module and should no longer be directly included in the manifest. Error: /Stage[main]/Cinder::Api/Exec[cinder-manage db_sync]: Failed to call refresh: cinder-manage db sync returned 1 instead of one of [0] Error: /Stage[main]/Cinder::Api/Exec[cinder-manage db_sync]: cinder-manage db sync returned 1 instead of one of [0] Error: /Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]: Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0] Error: /Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]: keystone-manage db_sync returned 1 instead of one of [0] Error: /Stage[main]/Glance::Registry/Exec[glance-manage db_sync]: Failed to call refresh: glance-manage --config-file=/etc/glance/glance-registry.conf db_sync returned 1 instead of one of [0] Error: /Stage[main]/Glance::Registry/Exec[glance-manage db_sync]: glance-manage --config-file=/etc/glance/glance-registry.conf db_sync returned 1 instead of one of [0] Error: /Stage[main]/Nova::Api/Exec[nova-db-sync]: Failed to call refresh: Command exceeded timeout Error: /Stage[main]/Nova::Api/Exec[nova-db-sync]: Command exceeded timeout Wrapped exception: execution expired Error: /Stage[main]/Nova::Scheduler/Nova::Generic_service[scheduler]/Service[nova-scheduler]: Failed to call refresh: Could not restart Service[nova-scheduler]: Execution of '/usr/bin/systemctl restart openstack-nova-scheduler' returned 1: Job for openstack-nova-scheduler.service canceled. Error: /Stage[main]/Nova::Scheduler/Nova::Generic_service[scheduler]/Service[nova-scheduler]: Could not restart Service[nova-scheduler]: Execution of '/usr/bin/systemctl restart openstack-nova-scheduler' returned 1: Job for openstack-nova-scheduler.service canceled. Wrapped exception: Execution of '/usr/bin/systemctl restart openstack-nova-scheduler' returned 1: Job for openstack-nova-scheduler.service canceled. Error: /Stage[main]/Heat/Exec[heat-dbsync]: Failed to call refresh: heat-manage --config-file /etc/heat/heat.conf db_sync returned 1 instead of one of [0] Error: /Stage[main]/Heat/Exec[heat-dbsync]: heat-manage --config-file /etc/heat/heat.conf db_sync returned 1 instead of one of [0] Error: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]: Failed to call refresh: neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head returned 1 instead of one of [0] Error: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]: neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head returned 1 instead of one of [0] Error: /Stage[main]/Neutron::Server/Service[neutron-server]: Failed to call refresh: Could not restart Service[neutron-server]: Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service canceled. Error: /Stage[main]/Neutron::Server/Service[neutron-server]: Could not restart Service[neutron-server]: Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service canceled. Wrapped exception: Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service canceled. Expected results: It should restart by pacemaker Additional info:
As per conversation on IRC, db_sync seems to have failed because the client was pointed to a new VIP which HAProxy isn't hosting. In theory we could make HAProxy restart but changing a VIP has too many implications so I think it's better to avoid it changing. To do so, we should force the VIPs to remain unchanged, we can probably do this via deployment params. Investigating.
The VIP change is going to be tracked by BZ#1272357
I've hit this with nova-conductor too: Execution of '/usr/bin/systemctl restart openstack-nova-conductor' returned 1: Job for openstack-nova-conductor.service canceled
It looks like in the templates we change the identity_uri causing cascading attempts to restart: 'http://172.16.20.11:35357/' to 'http://192.0.2.15:35357/' Will try to upload occ.log and continue investigation.
In 7.1 the keystone admin api runs on the ctlplane network while in 7.0 it was running on the internal_api network. 7.1 ServiceNetMap in overcloud-without-mergepy.yaml: KeystoneAdminApiNetwork: ctlplane # allows undercloud to config endpoints 7.0 ServiceNetMap in overcloud-without-mergepy.yaml: KeystoneAdminApiNetwork: internal_api
Marius, thanks! We can test if upgrade works without restarts by changing it the keystone admin network. If it does, the BZ remains valid as services restart should be orchestrated, but we'll be able at least to complete an upgrade without changing the overcloud config.
As per comment #7, the workaround to avoid services being restarted is to configure in the 7.1 templates the KeystoneAdminVip in the ServiceNetMap so that it continues to stay on the internal_api, as it was in 7.0 templates To do so, add the following into a custom upgrade.yaml: parameters: ServiceNetMap: NeutronTenantNetwork: tenant CeilometerApiNetwork: internal_api MongoDbNetwork: internal_api CinderApiNetwork: internal_api CinderIscsiNetwork: storage GlanceApiNetwork: storage GlanceRegistryNetwork: internal_api KeystoneAdminApiNetwork: internal_api KeystonePublicApiNetwork: internal_api NeutronApiNetwork: internal_api HeatApiNetwork: internal_api NovaApiNetwork: internal_api NovaMetadataNetwork: internal_api NovaVncProxyNetwork: internal_api SwiftMgmtNetwork: storage_mgmt SwiftProxyNetwork: storage HorizonNetwork: internal_api MemcachedNetwork: internal_api RabbitMqNetwork: internal_api RedisNetwork: internal_api MysqlNetwork: internal_api CephClusterNetwork: storage_mgmt CephPublicNetwork: storage ControllerHostnameResolveNetwork: internal_api ComputeHostnameResolveNetwork: internal_api BlockStorageHostnameResolveNetwork: internal_api ObjectStorageHostnameResolveNetwork: internal_api CephStorageHostnameResolveNetwork: storage
We should probably quiesce the cluster/node to fix this long term.
This either happens or it does not. Upgrading from 7.0 and from 7.1 to 7.2 passed automation and there is not going to be support for 7.0 -> 7.1. Is it safe to close it because it passed CI ?
I was able to pass an update from 7.0 to 7.1 by passing the update-from-keystone-admin-internal-api.yaml environment file.
(In reply to Marius Cornea from comment #16) > I was able to pass an update from 7.0 to 7.1 by passing the > update-from-keystone-admin-internal-api.yaml environment file. Correction: the update was 7.0 to 7.2.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:2650