Hide Forgot
Description of problem: OSP11 -> OSP12 upgrade: when migrating to containers we should remove the rpm packages associated to the started containers from host During upgrade all packages on the host get updated including the ones for services which are migrated to containers. Since these packages aren't required any longer on the host we should remove them or expose an option for the user to remove or keep them. yum update could take quite a bit and we can save precious time by not updating packages which are not needed.
Setting the following parameter: parameter_defaults: UpgradeRemoveUnusedPackages: true major-upgrade-composable-steps-docker.yaml fails with: 2017-11-21 14:04:49Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2.0]: SIGNAL_IN_PROGRESS Signal: deployment 5cbd13ee-fce0-4547-af5b-6e06f509414f failed (2) 2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2.0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2]: CREATE_FAILED Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y]: UPDATE_FAILED Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-11-21 14:04:51Z [AllNodesDeploySteps]: UPDATE_FAILED resources.AllNodesDeploySteps: Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-11-21 14:04:52Z [overcloud]: UPDATE_FAILED resources.AllNodesDeploySteps: Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.ControllerUpgrade_Step2.0: resource_type: OS::Heat::SoftwareDeployment physical_resource_id: 5cbd13ee-fce0-4547-af5b-6e06f509414f status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... changed: [localhost] => (item=openstack-swift-object-auditor) failed: [localhost] (item=openstack-swift-object-expirer) => {"changed": false, "failed": true, "item": "openstack-swift-object-expirer", "msg": "Could not find the requested service openstack-swift-object-expirer: host"} changed: [localhost] => (item=openstack-swift-object-replicator) changed: [localhost] => (item=openstack-swift-object-updater) changed: [localhost] => (item=openstack-swift-object) to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8020c133-78f4-4b77-b5f4-1c2251c21317_playbook.retry PLAY RECAP ********************************************************************* localhost : ok=75 changed=59 unreachable=0 failed=1 (truncated, view all with --long) deploy_stderr: | [WARNING]: Consider using yum, dnf or zypper module rather than running rpm Heat Stack update failed. Heat Stack update failed. (undercloud) [stack@undercloud-0 ~]$
thanks mcornea do we have the logs from the failed controller (/var/log/messages especially?) ? I'd like to see more of the upgrade_tasks execution so I can see why this is failing. Seems to be the service stop not the removal that is failing but i wonder if there is some sequencing issue we are missing
Created attachment 1356762 [details] the failing upgrade_tasks from controller thanks for access to the env mcornea ... looks like the problem is that swift-object-expirer is provided by the swift proxy package ** so will need to rework https://review.openstack.org/#/c/510545/2/docker/services/swift-storage.yaml and https://review.openstack.org/#/c/510545/2/docker/services/swift-proxy.yaml a bit... there are two cases...stand-alone swift and all-in one on controller swift. If this were stand-alone swift we wouldn't hit the problem i believe. So the upgrade_tasks are doing stop and remove on swift-proxy (attached here) and then failing to stop the expirer since it was removed with the swift-proxy package. ** [root@overcloud-controller-0 heat-admin]# yum provides /usr/bin/swift-object-expirer ... openstack-swift-proxy-2.15.2-0.20170927035729.0344d6e.el7.centos.noarch : A proxy server for : Swift Repo : delorean Matched from: Filename : /usr/bin/swift-object-expirer openstack-swift-proxy-2.15.2-0.20170927035729.0344d6e.el7.centos.noarch : A proxy server for : Swift Repo : @delorean Matched from: Filename : /usr/bin/swift-object-expirer
I posted a fix to Pike @ https://review.openstack.org/#/c/521917/ (master /#/c/521914/ ) but you won't be able to apply it cleanly without its parent review @ https://review.openstack.org/#/c/517327/ so we will need both for this one: curl https://review.openstack.org/changes/517327/revisions/current/patch?download | base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1 curl https://review.openstack.org/changes/521917/revisions/current/patch?download | base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1 adding them to trackers too thanks
openstack-tripleo-heat-templates-7.0.3-12.el7ost
*** Bug 1515822 has been marked as a duplicate of this bug. ***
last patch merged pike @ https://review.openstack.org/#/c/521917/
The upgrade completes successfully but the not all the rpm packages of the services migrated to containers are removed. Checking a controller node we can see the running containers: [root@controller-0 heat-admin]# docker ps --format "table {{.Names}}" NAMES gnocchi_api gnocchi_statsd gnocchi_metricd panko_api nova_metadata nova_api glance_api swift_account_server aodh_listener swift_container_auditor heat_api_cron swift_object_expirer swift_object_updater swift_container_replicator swift_account_auditor logrotate_crond heat_api_cfn nova_conductor swift_object_replicator swift_container_server heat_engine aodh_api swift_rsync nova_vnc_proxy ceilometer_agent_notification swift_account_reaper nova_consoleauth nova_api_cron aodh_notifier ceilometer_agent_central swift_account_replicator swift_object_auditor heat_api swift_proxy swift_object_server nova_scheduler swift_container_updater aodh_evaluator keystone_cron keystone nova_placement horizon haproxy-bundle-docker-0 redis-bundle-docker-0 galera-bundle-docker-0 rabbitmq-bundle-docker-0 clustercheck memcached and the list of remove packages during upgrade: Nov 30 09:12:21 Erased: openstack-aodh-api-4.0.2-2.el7ost.noarch Nov 30 09:12:25 Erased: openstack-aodh-evaluator-4.0.2-2.el7ost.noarch Nov 30 09:12:31 Erased: openstack-aodh-listener-4.0.2-2.el7ost.noarch Nov 30 09:13:00 Erased: openstack-aodh-notifier-4.0.2-2.el7ost.noarch Nov 30 09:13:04 Erased: 1:openstack-ceilometer-central-8.1.1-2.el7ost.noarch Nov 30 09:13:14 Erased: 1:openstack-ceilometer-collector-8.1.1-2.el7ost.noarch Nov 30 09:13:14 Erased: 1:openstack-ceilometer-notification-8.1.1-2.el7ost.noarch Nov 30 09:13:19 Erased: 1:openstack-glance-14.0.0-3.el7ost.noarch Nov 30 09:14:01 Erased: 1:openstack-nova-api-15.0.7-3.el7ost.noarch Nov 30 09:14:05 Erased: 1:openstack-nova-conductor-15.0.7-3.el7ost.noarch Nov 30 09:14:15 Erased: 1:openstack-nova-console-15.0.7-3.el7ost.noarch Nov 30 09:14:18 Erased: 1:openstack-nova-placement-api-15.0.7-3.el7ost.noarch Nov 30 09:14:18 Erased: openstack-sahara-ui-6.0.2-1.el7ost.noarch Nov 30 09:14:18 Erased: 1:openstack-dashboard-theme-11.0.4-1.el7ost.noarch Nov 30 09:14:18 Erased: 1:openstack-dashboard-11.0.4-1.el7ost.noarch Nov 30 09:14:18 Erased: mod_wsgi-3.4-12.el7_0.x86_64 Nov 30 09:14:18 Erased: 1:mod_ssl-2.4.6-67.el7_4.6.x86_64 Nov 30 09:14:18 Erased: mod_auth_mellon-0.11.0-4.el7.x86_64 Nov 30 09:14:18 Erased: httpd-2.4.6-67.el7_4.6.x86_64 Nov 30 09:14:41 Erased: 1:openstack-nova-scheduler-15.0.7-3.el7ost.noarch Nov 30 09:14:44 Erased: 1:openstack-nova-novncproxy-15.0.7-3.el7ost.noarch Nov 30 09:15:20 Erased: openstack-swift-proxy-2.13.1-2.el7ost.noarch Nov 30 09:15:26 Erased: openstack-swift-object-2.13.1-2.el7ost.noarch Nov 30 09:15:26 Erased: openstack-swift-account-2.13.1-2.el7ost.noarch Nov 30 09:15:26 Erased: openstack-swift-container-2.13.1-2.el7ost.noarch So there are remaining packages for the following services which got moved to containers from a quick eye check: openstack-gnocchi openstack-panko-api openstack-heat-api openstack-ceilometer haproxy galera memcached rabbitmq-server Given that this doesn't prevent the upgrade to complete do we want to consider this bug as verified for now and implement the complete removal of packages of containerized services in OSP13 where we expect all Openstack related services to be containerized?
thanks for the info mcornea this is very useful. Yes I think we should mark verified for now. We are running the removal for all the services that are managed by director which are the openstack things. Even then those removal tasks are marked with ignore_errors incase there are issues/dependencies that prevent removal, the upgrade won't fail. imo we should file a new BZ with this info and mark it for 13
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462