Bug 1470041
| Summary: | OSP11 -> OSP12 upgrade: when migrating to containers we should remove the rpm packages associated to the started containers from host | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Marius Cornea <mcornea> | ||||
| Component: | openstack-tripleo-heat-templates | Assignee: | Marios Andreou <mandreou> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Marius Cornea <mcornea> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 12.0 (Pike) | CC: | dbecker, jschluet, mandreou, mbracho, mbultel, mburns, morazi, rhel-osp-director-maint, sathlang, sclewis, tvignaud, ukalifon | ||||
| Target Milestone: | rc | Keywords: | Triaged | ||||
| Target Release: | 12.0 (Pike) | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | openstack-tripleo-heat-templates-7.0.3-12.el7ost | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-12-13 21:40:04 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1399762 | ||||||
| Attachments: |
|
||||||
|
Description
Marius Cornea
2017-07-12 10:30:58 UTC
Setting the following parameter:
parameter_defaults:
UpgradeRemoveUnusedPackages: true
major-upgrade-composable-steps-docker.yaml fails with:
2017-11-21 14:04:49Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2.0]: SIGNAL_IN_PROGRESS Signal: deployment 5cbd13ee-fce0-4547-af5b-6e06f509414f failed (2)
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2.0]: CREATE_FAILED Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2]: CREATE_FAILED Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2]: CREATE_FAILED Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y]: UPDATE_FAILED Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2017-11-21 14:04:51Z [AllNodesDeploySteps]: UPDATE_FAILED resources.AllNodesDeploySteps: Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2017-11-21 14:04:52Z [overcloud]: UPDATE_FAILED resources.AllNodesDeploySteps: Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
Stack overcloud UPDATE_FAILED
overcloud.AllNodesDeploySteps.ControllerUpgrade_Step2.0:
resource_type: OS::Heat::SoftwareDeployment
physical_resource_id: 5cbd13ee-fce0-4547-af5b-6e06f509414f
status: CREATE_FAILED
status_reason: |
Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
deploy_stdout: |
...
changed: [localhost] => (item=openstack-swift-object-auditor)
failed: [localhost] (item=openstack-swift-object-expirer) => {"changed": false, "failed": true, "item": "openstack-swift-object-expirer", "msg": "Could not find the requested service openstack-swift-object-expirer: host"}
changed: [localhost] => (item=openstack-swift-object-replicator)
changed: [localhost] => (item=openstack-swift-object-updater)
changed: [localhost] => (item=openstack-swift-object)
to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8020c133-78f4-4b77-b5f4-1c2251c21317_playbook.retry
PLAY RECAP *********************************************************************
localhost : ok=75 changed=59 unreachable=0 failed=1
(truncated, view all with --long)
deploy_stderr: |
[WARNING]: Consider using yum, dnf or zypper module rather than running rpm
Heat Stack update failed.
Heat Stack update failed.
(undercloud) [stack@undercloud-0 ~]$
thanks mcornea do we have the logs from the failed controller (/var/log/messages especially?) ? I'd like to see more of the upgrade_tasks execution so I can see why this is failing. Seems to be the service stop not the removal that is failing but i wonder if there is some sequencing issue we are missing Created attachment 1356762 [details] the failing upgrade_tasks from controller thanks for access to the env mcornea ... looks like the problem is that swift-object-expirer is provided by the swift proxy package ** so will need to rework https://review.openstack.org/#/c/510545/2/docker/services/swift-storage.yaml and https://review.openstack.org/#/c/510545/2/docker/services/swift-proxy.yaml a bit... there are two cases...stand-alone swift and all-in one on controller swift. If this were stand-alone swift we wouldn't hit the problem i believe. So the upgrade_tasks are doing stop and remove on swift-proxy (attached here) and then failing to stop the expirer since it was removed with the swift-proxy package. ** [root@overcloud-controller-0 heat-admin]# yum provides /usr/bin/swift-object-expirer ... openstack-swift-proxy-2.15.2-0.20170927035729.0344d6e.el7.centos.noarch : A proxy server for : Swift Repo : delorean Matched from: Filename : /usr/bin/swift-object-expirer openstack-swift-proxy-2.15.2-0.20170927035729.0344d6e.el7.centos.noarch : A proxy server for : Swift Repo : @delorean Matched from: Filename : /usr/bin/swift-object-expirer I posted a fix to Pike @ https://review.openstack.org/#/c/521917/ (master /#/c/521914/ ) but you won't be able to apply it cleanly without its parent review @ https://review.openstack.org/#/c/517327/ so we will need both for this one: curl https://review.openstack.org/changes/517327/revisions/current/patch?download | base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1 curl https://review.openstack.org/changes/521917/revisions/current/patch?download | base64 -d | sudo patch -d /usr/share/openstack-tripleo-heat-templates/ -p1 adding them to trackers too thanks openstack-tripleo-heat-templates-7.0.3-12.el7ost *** Bug 1515822 has been marked as a duplicate of this bug. *** last patch merged pike @ https://review.openstack.org/#/c/521917/ The upgrade completes successfully but the not all the rpm packages of the services migrated to containers are removed.
Checking a controller node we can see the running containers:
[root@controller-0 heat-admin]# docker ps --format "table {{.Names}}"
NAMES
gnocchi_api
gnocchi_statsd
gnocchi_metricd
panko_api
nova_metadata
nova_api
glance_api
swift_account_server
aodh_listener
swift_container_auditor
heat_api_cron
swift_object_expirer
swift_object_updater
swift_container_replicator
swift_account_auditor
logrotate_crond
heat_api_cfn
nova_conductor
swift_object_replicator
swift_container_server
heat_engine
aodh_api
swift_rsync
nova_vnc_proxy
ceilometer_agent_notification
swift_account_reaper
nova_consoleauth
nova_api_cron
aodh_notifier
ceilometer_agent_central
swift_account_replicator
swift_object_auditor
heat_api
swift_proxy
swift_object_server
nova_scheduler
swift_container_updater
aodh_evaluator
keystone_cron
keystone
nova_placement
horizon
haproxy-bundle-docker-0
redis-bundle-docker-0
galera-bundle-docker-0
rabbitmq-bundle-docker-0
clustercheck
memcached
and the list of remove packages during upgrade:
Nov 30 09:12:21 Erased: openstack-aodh-api-4.0.2-2.el7ost.noarch
Nov 30 09:12:25 Erased: openstack-aodh-evaluator-4.0.2-2.el7ost.noarch
Nov 30 09:12:31 Erased: openstack-aodh-listener-4.0.2-2.el7ost.noarch
Nov 30 09:13:00 Erased: openstack-aodh-notifier-4.0.2-2.el7ost.noarch
Nov 30 09:13:04 Erased: 1:openstack-ceilometer-central-8.1.1-2.el7ost.noarch
Nov 30 09:13:14 Erased: 1:openstack-ceilometer-collector-8.1.1-2.el7ost.noarch
Nov 30 09:13:14 Erased: 1:openstack-ceilometer-notification-8.1.1-2.el7ost.noarch
Nov 30 09:13:19 Erased: 1:openstack-glance-14.0.0-3.el7ost.noarch
Nov 30 09:14:01 Erased: 1:openstack-nova-api-15.0.7-3.el7ost.noarch
Nov 30 09:14:05 Erased: 1:openstack-nova-conductor-15.0.7-3.el7ost.noarch
Nov 30 09:14:15 Erased: 1:openstack-nova-console-15.0.7-3.el7ost.noarch
Nov 30 09:14:18 Erased: 1:openstack-nova-placement-api-15.0.7-3.el7ost.noarch
Nov 30 09:14:18 Erased: openstack-sahara-ui-6.0.2-1.el7ost.noarch
Nov 30 09:14:18 Erased: 1:openstack-dashboard-theme-11.0.4-1.el7ost.noarch
Nov 30 09:14:18 Erased: 1:openstack-dashboard-11.0.4-1.el7ost.noarch
Nov 30 09:14:18 Erased: mod_wsgi-3.4-12.el7_0.x86_64
Nov 30 09:14:18 Erased: 1:mod_ssl-2.4.6-67.el7_4.6.x86_64
Nov 30 09:14:18 Erased: mod_auth_mellon-0.11.0-4.el7.x86_64
Nov 30 09:14:18 Erased: httpd-2.4.6-67.el7_4.6.x86_64
Nov 30 09:14:41 Erased: 1:openstack-nova-scheduler-15.0.7-3.el7ost.noarch
Nov 30 09:14:44 Erased: 1:openstack-nova-novncproxy-15.0.7-3.el7ost.noarch
Nov 30 09:15:20 Erased: openstack-swift-proxy-2.13.1-2.el7ost.noarch
Nov 30 09:15:26 Erased: openstack-swift-object-2.13.1-2.el7ost.noarch
Nov 30 09:15:26 Erased: openstack-swift-account-2.13.1-2.el7ost.noarch
Nov 30 09:15:26 Erased: openstack-swift-container-2.13.1-2.el7ost.noarch
So there are remaining packages for the following services which got moved to containers from a quick eye check:
openstack-gnocchi
openstack-panko-api
openstack-heat-api
openstack-ceilometer
haproxy
galera
memcached
rabbitmq-server
Given that this doesn't prevent the upgrade to complete do we want to consider this bug as verified for now and implement the complete removal of packages of containerized services in OSP13 where we expect all Openstack related services to be containerized?
thanks for the info mcornea this is very useful. Yes I think we should mark verified for now. We are running the removal for all the services that are managed by director which are the openstack things. Even then those removal tasks are marked with ignore_errors incase there are issues/dependencies that prevent removal, the upgrade won't fail. imo we should file a new BZ with this info and mark it for 13 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462 |