Bug 1470041 - OSP11 -> OSP12 upgrade: when migrating to containers we should remove the rpm packages associated to the started containers from host
OSP11 -> OSP12 upgrade: when migrating to containers we should remove the rpm...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
12.0 (Pike)
Unspecified Unspecified
high Severity urgent
: rc
: 12.0 (Pike)
Assigned To: Marios Andreou
Marius Cornea
: Triaged
: 1515822 (view as bug list)
Depends On:
Blocks: 1399762
  Show dependency treegraph
 
Reported: 2017-07-12 06:30 EDT by Marius Cornea
Modified: 2017-12-13 16:40 EST (History)
13 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-7.0.3-12.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-12-13 16:40:04 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
the failing upgrade_tasks from controller (2.49 KB, text/plain)
2017-11-21 09:53 EST, Marios Andreou
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1701501 None None None 2017-07-12 06:31 EDT
OpenStack gerrit 510545 None stable/pike: MERGED tripleo-heat-templates: Remove package if service stopped and disabled (Ie4e4a2d41f7752c5a13507a7c15c6f68e203cfca) 2017-11-28 11:44 EST
OpenStack gerrit 517327 None stable/pike: MERGED tripleo-heat-templates: Add validation task in docker services [Swift] (I16f38d9e1042c5d83455a28882b4a024aac27699) 2017-11-28 11:44 EST
OpenStack gerrit 521917 None stable/pike: MERGED tripleo-heat-templates: Stop the object-expirer service before removing swift-proxy (I01518f82cef494682b4359ba7849ba7e37... 2017-11-28 11:44 EST

  None (edit)
Description Marius Cornea 2017-07-12 06:30:58 EDT
Description of problem:
OSP11 -> OSP12 upgrade: when migrating to containers we should remove the rpm packages associated to the started containers from host

During upgrade all packages on the host get updated including the ones for services which are migrated to containers. Since these packages aren't required any longer on the host we should remove them or expose an option for the user to remove or keep them. yum update could take quite a bit and we can save precious time by not updating packages which are not needed.
Comment 4 Marius Cornea 2017-11-21 09:07:24 EST
Setting the following parameter:

parameter_defaults:
  UpgradeRemoveUnusedPackages: true

major-upgrade-composable-steps-docker.yaml fails with:

2017-11-21 14:04:49Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2.0]: SIGNAL_IN_PROGRESS  Signal: deployment 5cbd13ee-fce0-4547-af5b-6e06f509414f failed (2)
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2.0]: CREATE_FAILED  Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2]: CREATE_FAILED  Resource CREATE failed: Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y.ControllerUpgrade_Step2]: CREATE_FAILED  Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2017-11-21 14:04:50Z [overcloud-AllNodesDeploySteps-huxg6dgqhg6y]: UPDATE_FAILED  Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2017-11-21 14:04:51Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2
2017-11-21 14:04:52Z [overcloud]: UPDATE_FAILED  resources.AllNodesDeploySteps: Error: resources.ControllerUpgrade_Step2.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2

 Stack overcloud UPDATE_FAILED 

overcloud.AllNodesDeploySteps.ControllerUpgrade_Step2.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 5cbd13ee-fce0-4547-af5b-6e06f509414f
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    changed: [localhost] => (item=openstack-swift-object-auditor)
    failed: [localhost] (item=openstack-swift-object-expirer) => {"changed": false, "failed": true, "item": "openstack-swift-object-expirer", "msg": "Could not find the requested service openstack-swift-object-expirer: host"}
    changed: [localhost] => (item=openstack-swift-object-replicator)
    changed: [localhost] => (item=openstack-swift-object-updater)
    changed: [localhost] => (item=openstack-swift-object)
        to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8020c133-78f4-4b77-b5f4-1c2251c21317_playbook.retry
    
    PLAY RECAP *********************************************************************
    localhost                  : ok=75   changed=59   unreachable=0    failed=1   
    
    (truncated, view all with --long)
  deploy_stderr: |
     [WARNING]: Consider using yum, dnf or zypper module rather than running rpm
Heat Stack update failed.
Heat Stack update failed.
(undercloud) [stack@undercloud-0 ~]$
Comment 5 Marios Andreou 2017-11-21 09:17:02 EST
thanks mcornea do we have the logs from the failed controller (/var/log/messages especially?) ? I'd like to see more of the upgrade_tasks execution so I can see why this is failing. Seems to be the service stop not the removal that is failing but i wonder if there is some sequencing issue we are missing
Comment 6 Marios Andreou 2017-11-21 09:53 EST
Created attachment 1356762 [details]
the failing upgrade_tasks from controller

thanks for access to the env mcornea ... looks like the problem is that swift-object-expirer is provided by the swift proxy package ** so will need to rework https://review.openstack.org/#/c/510545/2/docker/services/swift-storage.yaml and https://review.openstack.org/#/c/510545/2/docker/services/swift-proxy.yaml a bit... 

there are two cases...stand-alone swift and all-in one on controller swift. If this were stand-alone swift we wouldn't hit the problem i believe.

So the upgrade_tasks are doing stop and remove on swift-proxy (attached here) and then failing to stop the expirer since it was removed with the swift-proxy package. 


** 

        [root@overcloud-controller-0 heat-admin]# yum provides /usr/bin/swift-object-expirer
 ...
        openstack-swift-proxy-2.15.2-0.20170927035729.0344d6e.el7.centos.noarch : A proxy server for
                                                                                : Swift
        Repo        : delorean
        Matched from:
        Filename    : /usr/bin/swift-object-expirer

    openstack-swift-proxy-2.15.2-0.20170927035729.0344d6e.el7.centos.noarch : A proxy server for
                                                                                : Swift
        Repo        : @delorean
        Matched from:
        Filename    : /usr/bin/swift-object-expirer
Comment 7 Marios Andreou 2017-11-21 11:17:54 EST
I posted a fix to Pike @ https://review.openstack.org/#/c/521917/ (master /#/c/521914/ ) but you won't be able to apply it cleanly without its parent review @ https://review.openstack.org/#/c/517327/ so we will need both for this one:

curl https://review.openstack.org/changes/517327/revisions/current/patch?download | base64 -d | sudo patch  -d /usr/share/openstack-tripleo-heat-templates/ -p1

curl https://review.openstack.org/changes/521917/revisions/current/patch?download | base64 -d | sudo patch  -d /usr/share/openstack-tripleo-heat-templates/ -p1

adding them to trackers too thanks
Comment 9 Jon Schlueter 2017-11-22 12:38:44 EST
openstack-tripleo-heat-templates-7.0.3-12.el7ost
Comment 10 Marios Andreou 2017-11-23 03:02:01 EST
*** Bug 1515822 has been marked as a duplicate of this bug. ***
Comment 11 Marios Andreou 2017-11-28 07:27:58 EST
last patch merged pike @ https://review.openstack.org/#/c/521917/
Comment 13 Marius Cornea 2017-11-30 06:09:24 EST
The upgrade completes successfully but the not all the rpm packages of the services migrated to containers are removed.

Checking a controller node we can see the running containers:

[root@controller-0 heat-admin]# docker ps --format "table {{.Names}}"
NAMES
gnocchi_api
gnocchi_statsd
gnocchi_metricd
panko_api
nova_metadata
nova_api
glance_api
swift_account_server
aodh_listener
swift_container_auditor
heat_api_cron
swift_object_expirer
swift_object_updater
swift_container_replicator
swift_account_auditor
logrotate_crond
heat_api_cfn
nova_conductor
swift_object_replicator
swift_container_server
heat_engine
aodh_api
swift_rsync
nova_vnc_proxy
ceilometer_agent_notification
swift_account_reaper
nova_consoleauth
nova_api_cron
aodh_notifier
ceilometer_agent_central
swift_account_replicator
swift_object_auditor
heat_api
swift_proxy
swift_object_server
nova_scheduler
swift_container_updater
aodh_evaluator
keystone_cron
keystone
nova_placement
horizon
haproxy-bundle-docker-0
redis-bundle-docker-0
galera-bundle-docker-0
rabbitmq-bundle-docker-0
clustercheck
memcached

and the list of remove packages during upgrade:

Nov 30 09:12:21 Erased: openstack-aodh-api-4.0.2-2.el7ost.noarch
Nov 30 09:12:25 Erased: openstack-aodh-evaluator-4.0.2-2.el7ost.noarch
Nov 30 09:12:31 Erased: openstack-aodh-listener-4.0.2-2.el7ost.noarch
Nov 30 09:13:00 Erased: openstack-aodh-notifier-4.0.2-2.el7ost.noarch
Nov 30 09:13:04 Erased: 1:openstack-ceilometer-central-8.1.1-2.el7ost.noarch
Nov 30 09:13:14 Erased: 1:openstack-ceilometer-collector-8.1.1-2.el7ost.noarch
Nov 30 09:13:14 Erased: 1:openstack-ceilometer-notification-8.1.1-2.el7ost.noarch
Nov 30 09:13:19 Erased: 1:openstack-glance-14.0.0-3.el7ost.noarch
Nov 30 09:14:01 Erased: 1:openstack-nova-api-15.0.7-3.el7ost.noarch
Nov 30 09:14:05 Erased: 1:openstack-nova-conductor-15.0.7-3.el7ost.noarch
Nov 30 09:14:15 Erased: 1:openstack-nova-console-15.0.7-3.el7ost.noarch
Nov 30 09:14:18 Erased: 1:openstack-nova-placement-api-15.0.7-3.el7ost.noarch
Nov 30 09:14:18 Erased: openstack-sahara-ui-6.0.2-1.el7ost.noarch
Nov 30 09:14:18 Erased: 1:openstack-dashboard-theme-11.0.4-1.el7ost.noarch
Nov 30 09:14:18 Erased: 1:openstack-dashboard-11.0.4-1.el7ost.noarch
Nov 30 09:14:18 Erased: mod_wsgi-3.4-12.el7_0.x86_64
Nov 30 09:14:18 Erased: 1:mod_ssl-2.4.6-67.el7_4.6.x86_64
Nov 30 09:14:18 Erased: mod_auth_mellon-0.11.0-4.el7.x86_64
Nov 30 09:14:18 Erased: httpd-2.4.6-67.el7_4.6.x86_64
Nov 30 09:14:41 Erased: 1:openstack-nova-scheduler-15.0.7-3.el7ost.noarch
Nov 30 09:14:44 Erased: 1:openstack-nova-novncproxy-15.0.7-3.el7ost.noarch
Nov 30 09:15:20 Erased: openstack-swift-proxy-2.13.1-2.el7ost.noarch
Nov 30 09:15:26 Erased: openstack-swift-object-2.13.1-2.el7ost.noarch
Nov 30 09:15:26 Erased: openstack-swift-account-2.13.1-2.el7ost.noarch
Nov 30 09:15:26 Erased: openstack-swift-container-2.13.1-2.el7ost.noarch

So there are remaining packages for the following services which got moved to containers from a quick eye check:

openstack-gnocchi
openstack-panko-api
openstack-heat-api
openstack-ceilometer
haproxy
galera
memcached
rabbitmq-server

Given that this doesn't prevent the upgrade to complete do we want to consider this bug as verified for now and implement the complete removal of packages of containerized services in OSP13 where we expect all Openstack related services to be containerized?
Comment 14 Marios Andreou 2017-11-30 06:19:11 EST
thanks for the info mcornea this is very useful. Yes I think we should mark verified for now. We are running the removal for all the services that are managed by director which are the openstack things. Even then those removal tasks are marked with ignore_errors incase there are issues/dependencies that prevent removal, the upgrade won't fail.

imo we should file a new BZ with this info and mark it for 13
Comment 17 errata-xmlrpc 2017-12-13 16:40:04 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462

Note You need to log in before you can comment on or make changes to this bug.