Bug 1272347

Summary:	director stack update 7.0 to 7.1 KeystoneAdminApiNetwork change causes unwanted services restart
Product:	Red Hat OpenStack	Reporter:	Cyril Lopez <cylopez>
Component:	rhosp-director	Assignee:	Giulio Fidente <gfidente>
Status:	CLOSED ERRATA	QA Contact:	Marius Cornea <mcornea>
Severity:	medium	Docs Contact:
Priority:	urgent
Version:	7.0 (Kilo)	CC:	calfonso, dnavale, dsavinea, gfidente, glambert, hrosnet, jcoufal, jprovazn, jstransk, mburns, mcornea, rhel-osp-director-maint, sasha, yguenane
Target Milestone:	y2	Keywords:	Triaged
Target Release:	7.0 (Kilo)
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-0.8.6-82.el7ost	Doc Type:	Known Issue
Doc Text:	With this update, the default network where the 'KeystoneAdminVip' is placed was changed from 'InternalApi' to 'ctlplane' so that the post-deployment Identity service initialization step could be carried by the Undercloud over the 'ctlplane' network. Relocating the 'KeystoneAdminVip' causes a cascading restart of the services pointing to the old 'KeystoneAdminVip'. As a workaround to make sure the KeystoneAdminVip remains on the 'InternalApi' network, a customized 'ServiceNetMap' must be provided as deployment parameter when launching an update from the 7.0 release. A sample Orchestration environment file passing a customized 'ServiceNetMap' is as follows: parameters: ServiceNetMap: NeutronTenantNetwork: tenant CeilometerApiNetwork: internal_api MongoDbNetwork: internal_api CinderApiNetwork: internal_api CinderIscsiNetwork: storage GlanceApiNetwork: storage GlanceRegistryNetwork: internal_api KeystoneAdminApiNetwork: internal_api KeystonePublicApiNetwork: internal_api NeutronApiNetwork: internal_api HeatApiNetwork: internal_api NovaApiNetwork: internal_api NovaMetadataNetwork: internal_api NovaVncProxyNetwork: internal_api SwiftMgmtNetwork: storage_mgmt SwiftProxyNetwork: storage HorizonNetwork: internal_api MemcachedNetwork: internal_api RabbitMqNetwork: internal_api RedisNetwork: internal_api MysqlNetwork: internal_api CephClusterNetwork: storage_mgmt CephPublicNetwork: storage ControllerHostnameResolveNetwork: internal_api ComputeHostnameResolveNetwork: internal_api BlockStorageHostnameResolveNetwork: internal_api ObjectStorageHostnameResolveNetwork: internal_api CephStorageHostnameResolveNetwork: storage If any additional binding network from the above has been customized then that setting has to be preserved as well. As a result of the workaround changes, the 'KeystoneAdminVip' is not relocated on the 'ctlplane' network so that no services restart needs to be triggered.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-12-21 16:52:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Cyril Lopez 2015-10-16 07:47:09 UTC

Description of problem:
During the stack update, puppet try to restart services via systemctl and it's fail cause pacemaker is controlling services on controller overcloud


Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch

How reproducible:


Steps to Reproduce:
1. Install undercloud / overcloud in 7.0
2. Update undercloud in 7.1
3. Update openstack-puppet-modules on all nodes cf https://bugzilla.redhat.com/show_bug.cgi?id=1267318
4. to update the stack, do a openstack overcloud deploy --templates /home/stack/templates-7.1/ [...]


Actual results:
Stack failed with status: resources.ControllerNodesPostDeployment: resources.ControllerOvercloudServicesDeployment_Step4: Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
ERROR: openstack Heat Stack update failed.

And in puppet :
heat deployment-output-show d9e1fd93-9385-44a7-8f1b-7726ff4a4cee deploy_stderr
Warning: Scope(Class[Keystone]): Execution of db_sync does not depend on $enabled anymore. Please use sync_db instead.
Warning: Scope(Class[Glance::Registry]): Execution of db_sync does not depend on $manage_service or $enabled anymore. Please use sync_db instead.
Warning: Scope(Class[Nova::Api]): The conductor_workers parameter is deprecated and has no effect. Use workers parameter of nova::conductor class instead.
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_host'; class ::nova::compute has not been evaluated
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_protocol'; class ::nova::compute has not been evaluated
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_port'; class ::nova::compute has not been evaluated
Warning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_path'; class ::nova::compute has not been evaluated
Warning: Scope(Class[Concat::Setup]): concat::setup is deprecated as a public API of the concat module and should no longer be directly included in the manifest.
Error: /Stage[main]/Cinder::Api/Exec[cinder-manage db_sync]: Failed to call refresh: cinder-manage db sync returned 1 instead of one of [0]
Error: /Stage[main]/Cinder::Api/Exec[cinder-manage db_sync]: cinder-manage db sync returned 1 instead of one of [0]
Error: /Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]: Failed to call refresh: keystone-manage db_sync returned 1 instead of one of [0]
Error: /Stage[main]/Keystone::Db::Sync/Exec[keystone-manage db_sync]: keystone-manage db_sync returned 1 instead of one of [0]
Error: /Stage[main]/Glance::Registry/Exec[glance-manage db_sync]: Failed to call refresh: glance-manage --config-file=/etc/glance/glance-registry.conf db_sync returned 1 instead of one of [0]
Error: /Stage[main]/Glance::Registry/Exec[glance-manage db_sync]: glance-manage --config-file=/etc/glance/glance-registry.conf db_sync returned 1 instead of one of [0]
Error: /Stage[main]/Nova::Api/Exec[nova-db-sync]: Failed to call refresh: Command exceeded timeout
Error: /Stage[main]/Nova::Api/Exec[nova-db-sync]: Command exceeded timeout
Wrapped exception:
execution expired
Error: /Stage[main]/Nova::Scheduler/Nova::Generic_service[scheduler]/Service[nova-scheduler]: Failed to call refresh: Could not restart Service[nova-scheduler]: Execution of '/usr/bin/systemctl restart openstack-nova-scheduler' returned 1: Job for openstack-nova-scheduler.service canceled.
Error: /Stage[main]/Nova::Scheduler/Nova::Generic_service[scheduler]/Service[nova-scheduler]: Could not restart Service[nova-scheduler]: Execution of '/usr/bin/systemctl restart openstack-nova-scheduler' returned 1: Job for openstack-nova-scheduler.service canceled.
Wrapped exception:
Execution of '/usr/bin/systemctl restart openstack-nova-scheduler' returned 1: Job for openstack-nova-scheduler.service canceled.
Error: /Stage[main]/Heat/Exec[heat-dbsync]: Failed to call refresh: heat-manage --config-file /etc/heat/heat.conf db_sync returned 1 instead of one of [0]
Error: /Stage[main]/Heat/Exec[heat-dbsync]: heat-manage --config-file /etc/heat/heat.conf db_sync returned 1 instead of one of [0]
Error: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]: Failed to call refresh: neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head returned 1 instead of one of [0]
Error: /Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]: neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head returned 1 instead of one of [0]
Error: /Stage[main]/Neutron::Server/Service[neutron-server]: Failed to call refresh: Could not restart Service[neutron-server]: Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service canceled.
Error: /Stage[main]/Neutron::Server/Service[neutron-server]: Could not restart Service[neutron-server]: Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service canceled.
Wrapped exception:
Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service canceled.


Expected results:
It should restart by pacemaker

Additional info:

Comment 2 Giulio Fidente 2015-10-16 08:22:56 UTC

As per conversation on IRC, db_sync seems to have failed because the client was pointed to a new VIP which HAProxy isn't hosting.

In theory we could make HAProxy restart but changing a VIP has too many implications so I think it's better to avoid it changing.

To do so, we should force the VIPs to remain unchanged, we can probably do this via deployment params. Investigating.

Comment 3 Giulio Fidente 2015-10-16 09:04:09 UTC

The VIP change is going to be tracked by BZ#1272357

Comment 4 Giulio Fidente 2015-10-20 18:00:29 UTC

I've hit this with nova-conductor too:

Execution of '/usr/bin/systemctl restart openstack-nova-conductor' returned 1: Job for openstack-nova-conductor.service canceled

Comment 5 Giulio Fidente 2015-10-20 18:06:46 UTC

It looks like in the templates we change the identity_uri causing cascading attempts to restart:

  'http://172.16.20.11:35357/' to 'http://192.0.2.15:35357/'

Will try to upload occ.log and continue investigation.

Comment 6 Marius Cornea 2015-10-20 18:13:26 UTC

In 7.1 the keystone admin api runs on the ctlplane network while in 7.0 it was running on the internal_api network.
       
7.1 ServiceNetMap in overcloud-without-mergepy.yaml:
KeystoneAdminApiNetwork: ctlplane # allows undercloud to config endpoints

7.0 ServiceNetMap in overcloud-without-mergepy.yaml:
KeystoneAdminApiNetwork: internal_api

Comment 7 Giulio Fidente 2015-10-20 18:16:03 UTC

Marius, thanks! We can test if upgrade works without restarts by changing it the keystone admin network. If it does, the BZ remains valid as services restart should be orchestrated, but we'll be able at least to complete an upgrade without changing the overcloud config.

Comment 8 Giulio Fidente 2015-10-21 09:23:34 UTC

As per comment #7, the workaround to avoid services being restarted is to configure in the 7.1 templates the KeystoneAdminVip in the ServiceNetMap so that it continues to stay on the internal_api, as it was in 7.0 templates

To do so, add the following into a custom upgrade.yaml:

parameters:
  ServiceNetMap:
    NeutronTenantNetwork: tenant
    CeilometerApiNetwork: internal_api
    MongoDbNetwork: internal_api
    CinderApiNetwork: internal_api
    CinderIscsiNetwork: storage
    GlanceApiNetwork: storage
    GlanceRegistryNetwork: internal_api
    KeystoneAdminApiNetwork: internal_api
    KeystonePublicApiNetwork: internal_api
    NeutronApiNetwork: internal_api
    HeatApiNetwork: internal_api
    NovaApiNetwork: internal_api
    NovaMetadataNetwork: internal_api
    NovaVncProxyNetwork: internal_api
    SwiftMgmtNetwork: storage_mgmt
    SwiftProxyNetwork: storage
    HorizonNetwork: internal_api
    MemcachedNetwork: internal_api
    RabbitMqNetwork: internal_api
    RedisNetwork: internal_api
    MysqlNetwork: internal_api
    CephClusterNetwork: storage_mgmt
    CephPublicNetwork: storage
    ControllerHostnameResolveNetwork: internal_api
    ComputeHostnameResolveNetwork: internal_api
    BlockStorageHostnameResolveNetwork: internal_api
    ObjectStorageHostnameResolveNetwork: internal_api
    CephStorageHostnameResolveNetwork: storage

Comment 9 Giulio Fidente 2015-10-22 12:28:08 UTC

We should probably quiesce the cluster/node to fix this long term.

Comment 15 Amit Ugol 2015-12-09 13:27:38 UTC

This either happens or it does not.
Upgrading from 7.0 and from 7.1 to 7.2 passed automation and there is not going to be support for 7.0 -> 7.1. Is it safe to close it because it passed CI ?

Comment 16 Marius Cornea 2015-12-13 00:35:14 UTC

I was able to pass an update from 7.0 to 7.1 by passing the update-from-keystone-admin-internal-api.yaml environment file.

Comment 17 Marius Cornea 2015-12-13 00:38:30 UTC

(In reply to Marius Cornea from comment #16)
> I was able to pass an update from 7.0 to 7.1 by passing the
> update-from-keystone-admin-internal-api.yaml environment file.

Correction: the update was 7.0 to 7.2.

Comment 19 errata-xmlrpc 2015-12-21 16:52:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:2650