Hide Forgot
Description of problem: CREATE_FAILED Error: resources.ControllerDeployment_Step3.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-06-30 11:06:20Z [overcloud-AllNodesDeploySteps-4fk2ekopz4jx-AllNodesPostUpgradeSteps-g373amyq2odc]: UPDATE_FAILED Error: resources.ControllerDeployment_Step3.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-06-30 11:06:20Z [overcloud-AllNodesDeploySteps-4fk2ekopz4jx.AllNodesPostUpgradeSteps]: UPDATE_FAILED resources.AllNodesPostUpgradeSteps: Error: resources.ControllerDeployment_Step3.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-06-30 11:06:20Z [overcloud-AllNodesDeploySteps-4fk2ekopz4jx]: UPDATE_FAILED resources.AllNodesPostUpgradeSteps: Error: resources.ControllerDeployment_Step3.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-06-30 11:06:21Z [AllNodesDeploySteps]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: Error: resources.ControllerDeployment_Step3.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 2017-06-30 11:06:21Z [overcloud]: UPDATE_FAILED resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: Error: resources.ControllerDeployment_Step3.resources[0]: Deployment to server failed: deploy_status_code: Deployment exited with non-zero status code: 2 Stack overcloud UPDATE_FAILED overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.ControllerDeployment_Step3.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 9fe7c066-a657-4de4-9476-a6984affbb92 status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 deploy_stdout: | ... TASK [Write the config_step hieradata] ***************************************** changed: [localhost] TASK [Run puppet host configuration for step 3] ******************************** Version-Release number of selected component (if applicable): OSP12 OSP12 openstack-nova-api-16.0.0-0.20170624031428.3863eca.el7ost.noarch openstack-keystone-12.0.0-0.20170623233743.9070172.el7ost.noarch openstack-ironic-inspector-5.1.1-0.20170622234613.05896b5.el7ost.noarch puppet-openstack_extras-11.2.0-0.20170613085321.0b0ea62.el7ost.noarch openstack-tripleo-common-containers-7.1.1-0.20170623115707.4ba7d56.el7ost.noarch openstack-neutron-common-11.0.0-0.20170624003801.11acb1d.el7ost.noarch openstack-neutron-11.0.0-0.20170624003801.11acb1d.el7ost.noarch openstack-nova-compute-16.0.0-0.20170624031428.3863eca.el7ost.noarch openstack-mistral-executor-5.0.0-0.20170623025911.6a1ca0f.el7ost.noarch openstack-tempest-16.0.1-0.20170623203531.4386df8.el7ost.noarch openstack-neutron-openvswitch-11.0.0-0.20170624003801.11acb1d.el7ost.noarch openstack-nova-placement-api-16.0.0-0.20170624031428.3863eca.el7ost.noarch openstack-heat-api-cfn-9.0.0-0.20170623110018.173f03a.el7ost.noarch openstack-mistral-api-5.0.0-0.20170623025911.6a1ca0f.el7ost.noarch openstack-ironic-api-8.0.1-0.20170621044433.f0e6a07.el7ost.noarch python-openstack-mistral-5.0.0-0.20170623025911.6a1ca0f.el7ost.noarch openstack-heat-common-9.0.0-0.20170623110018.173f03a.el7ost.noarch openstack-tripleo-image-elements-7.0.0-0.20170607161959.401d861.el7ost.noarch openstack-tripleo-puppet-elements-7.0.0-0.20170614005502.9285877.el7ost.noarch openstack-swift-object-2.14.1-0.20170622024006.2d18ecd.el7ost.noarch openstack-neutron-ml2-11.0.0-0.20170624003801.11acb1d.el7ost.noarch openstack-glance-15.0.0-0.20170623215940.8188eca.el7ost.noarch openstack-nova-scheduler-16.0.0-0.20170624031428.3863eca.el7ost.noarch openstack-heat-engine-9.0.0-0.20170623110018.173f03a.el7ost.noarch openstack-mistral-engine-5.0.0-0.20170623025911.6a1ca0f.el7ost.noarch openstack-ironic-conductor-8.0.1-0.20170621044433.f0e6a07.el7ost.noarch openstack-tripleo-common-7.1.1-0.20170623115707.4ba7d56.el7ost.noarch python-openstackclient-3.11.0-0.20170613232431.c69304e.el7ost.noarch openstack-ironic-common-8.0.1-0.20170621044433.f0e6a07.el7ost.noarch openstack-swift-account-2.14.1-0.20170622024006.2d18ecd.el7ost.noarch openstack-selinux-0.8.8-0.20170622195307.74ddc0e.el7ost.noarch openstack-swift-container-2.14.1-0.20170622024006.2d18ecd.el7ost.noarch openstack-nova-conductor-16.0.0-0.20170624031428.3863eca.el7ost.noarch openstack-tripleo-validations-7.1.1-0.20170621111847.fb7346f.el7ost.noarch openstack-tripleo-heat-templates-7.0.0-0.20170624014919.el7ost.noarch puppet-openstacklib-11.2.0-0.20170613150439.93b8e7d.el7ost.noarch python-openstacksdk-0.9.17-0.20170621195806.7946243.el7ost.noarch openstack-swift-proxy-2.14.1-0.20170622024006.2d18ecd.el7ost.noarch openstack-heat-api-9.0.0-0.20170623110018.173f03a.el7ost.noarch openstack-zaqar-5.0.0-0.20170623003642.68eac5a.el7ost.noarch openstack-tripleo-ui-7.1.1-0.20170624213423.cb896fd.el7ost.noarch openstack-puppet-modules-10.0.0-1.el7ost.noarch openstack-mistral-common-5.0.0-0.20170623025911.6a1ca0f.el7ost.noarch openstack-nova-common-16.0.0-0.20170624031428.3863eca.el7ost.noarch Steps to Reproduce: 1.Install OSP11 infrared virsh -v --host-address 10.9.76.22 --host-key ~/.ssh/id_rsa --cleanup yes && infrared virsh -v --host-address 10.9.76.22 --host-key ~/.ssh/id_rsa --topology-nodes undercloud:1,controller:1,compute:1 -e override.controller.cpu=8 -e override.controller.memory=16384 -e override.undercloud.disks.disk1.size=100G && infrared tripleo-undercloud --version 11 --images-task=rpm && infrared tripleo-overcloud -v --introspect yes --tagging yes --post no --deployment-files virt --version 11 --deploy yes 2.Upgrade undercloud and overcloud to osp11 latest+ rhel7.4 ir tripleo-undercloud -v --update-undercloud yes --mirror qeos --build 7.4-testing --osrelease 7.4 ir tripleo-overcloud -v --updateto 7.4-testing --deployment-files virt --mirror qeos --osrelease 7.4 #Please note, during yum update stage on overcloud controller run "sudo pcs cluster start and sudo pcs cleanup resources" - w/a for https://bugzilla.redhat.com/show_bug.cgi?id=1464588 3. Upgrade undercloud and overcloud to osp12 http://etherpad.corp.redhat.com/osp12-upgrade #please note, before updating overcloud apply w/a for https://bugzilla.redhat.com/show_bug.cgi?id=1460421 https://bugzilla.redhat.com/show_bug.cgi?id=1463227 https://bugzilla.redhat.com/show_bug.cgi?id=1466219 https://bugzilla.redhat.com/show_bug.cgi?id=1466308 https://bugzilla.redhat.com/show_bug.cgi?id=1466345 Actual results: overcloud upgrade failed Expected results: Additional info: http://pastebin.test.redhat.com/499310
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
also two containers have restarting state:rabbitmq, redis sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7ecc57a830c6 docker-registry.engineering.redhat.com/rhosp12/openstack-mariadb-docker:2017-06-27.2 "kolla_start" 18 minutes ago Up 18 minutes mysql f3de139eae7d docker-registry.engineering.redhat.com/rhosp12/openstack-horizon-docker:2017-06-27.2 "/bin/bash -c 'touch " 18 minutes ago Exited (0) 18 minutes ago horizon_fix_perms b344ad8ba82a docker-registry.engineering.redhat.com/rhosp12/openstack-mariadb-docker:2017-06-27.2 "bash -c 'test -e /va" 18 minutes ago Exited (0) 18 minutes ago mysql_bootstrap ddc222ebce22 docker-registry.engineering.redhat.com/rhosp12/openstack-mongodb-docker:2017-06-27.2 "kolla_start" 18 minutes ago Up 18 minutes mongodb 4235a28b4ae2 docker-registry.engineering.redhat.com/rhosp12/openstack-rabbitmq-docker:2017-06-27.2 "kolla_start" 19 minutes ago Restarting (1) 6 minutes ago rabbitmq a36f9cfa9bcb docker-registry.engineering.redhat.com/rhosp12/openstack-memcached-docker:2017-06-27.2 "/bin/bash -c 'source" 19 minutes ago Up 19 minutes memcached 54ee40b2278b docker-registry.engineering.redhat.com/rhosp12/openstack-rabbitmq-docker:2017-06-27.2 "kolla_start" 19 minutes ago Exited (0) 19 minutes ago rabbitmq_bootstrap ca70a56eb427 docker-registry.engineering.redhat.com/rhosp12/openstack-redis-docker:2017-06-27.2 "kolla_start" 19 minutes ago Restarting (1) 6 minutes ago redis fa7296ad5026 docker-registry.engineering.redhat.com/rhosp12/openstack-memcached-docker:2017-06-27.2 "/bin/bash -c 'source" 19 minutes ago Exited (0) 19 minutes ago memcached_init_logs 6357880edc4a docker-registry.engineering.redhat.com/rhosp12/openstack-aodh-api-docker:2017-06-27.2 "/bin/bash -c 'mkdir " 2 hours ago Exited (0) 2 hours ago aodh_init_log 3813cfc48873 docker-registry.engineering.redhat.com/rhosp12/openstack-panko-api-docker:2017-06-27.2 "/bin/bash -c 'mkdir " 2 hours ago Exited (0) 2 hours ago panko_init_log 807d552ced76 docker-registry.engineering.redhat.com/rhosp12/openstack-keystone-docker:2017-06-27.2 "/bin/bash -c 'mkdir " 2 hours ago Exited (0) 2 hours ago keystone_init_log d1a9d7c30166 docker-registry.engineering.redhat.com/rhosp12/openstack-cinder-api-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago cinder_api_init_logs 74d6648d154e docker-registry.engineering.redhat.com/rhosp12/openstack-glance-api-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago glance_init_logs a89b4c48d1d8 docker-registry.engineering.redhat.com/rhosp12/openstack-neutron-server-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago neutron_init_logs 4ed749b87814 docker-registry.engineering.redhat.com/rhosp12/openstack-heat-engine-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago heat_init_log fbdbf0cf1c6c docker-registry.engineering.redhat.com/rhosp12/openstack-gnocchi-api-docker:2017-06-27.2 "/bin/bash -c 'mkdir " 2 hours ago Exited (0) 2 hours ago gnocchi_init_log 17ef2bc56eaa docker-registry.engineering.redhat.com/rhosp12/openstack-cinder-scheduler-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago cinder_scheduler_init_logs d2d37ecedb03 docker-registry.engineering.redhat.com/rhosp12/openstack-nova-api-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago nova_init_logs 0184d10fcf02 docker-registry.engineering.redhat.com/rhosp12/openstack-redis-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago redis_init_logs ace44b59e747 docker-registry.engineering.redhat.com/rhosp12/openstack-rabbitmq-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago rabbitmq_init_logs b2ee367a8d90 docker-registry.engineering.redhat.com/rhosp12/openstack-mariadb-docker:2017-06-27.2 "/bin/bash -c 'chown " 2 hours ago Exited (0) 2 hours ago mysql_init_logs
logs from rabbitmq container Running command: '/usr/lib/rabbitmq/bin/rabbitmq-server' ERROR: node with name "rabbit" already running on "controller-0" INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/msg_store_persistent INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/cluster_nodes.config INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/nodes_running_at_shutdown INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/schema.DAT INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user_permission.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_vhost.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_route.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_exchange.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_runtime_parameters.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_queue.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/schema_version INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_serial INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/recovery.dets INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/LATEST.LOG INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_vhost.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_exchange.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user_permission.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_runtime_parameters.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/DECISION_TAB.LOG INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/msg_store_transient/0.rdq INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/msg_store_persistent/0.rdq INFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json INFO:__main__:Validating config file INFO:__main__:Kolla config strategy set to: COPY_ALWAYS INFO:__main__:Writing out command to execute INFO:__main__:Setting permission for /var/lib/rabbitmq INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia INFO:__main__:Setting permission for /var/lib/rabbitmq/.erlang.cookie INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0 INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0-plugins-expand INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/msg_store_transient INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/msg_store_persistent INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/cluster_nodes.config INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/nodes_running_at_shutdown INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/schema.DAT INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user_permission.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_vhost.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_route.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_exchange.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_runtime_parameters.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_queue.DCD INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/schema_version INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_serial INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/recovery.dets INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/LATEST.LOG INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_vhost.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_durable_exchange.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_user_permission.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/rabbit_runtime_parameters.DCL INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/DECISION_TAB.LOG INFO:__main__:Setting permission for /var/lib/rabbitmq/mnesia/rabbit@controller-0/msg_store_transient/0.rdq Running command: '/usr/lib/rabbitmq/bin/rabbitmq-server' ERROR: node with name "rabbit" already running on "controller-0"
The logs indicate that Rabbit MQ is perhaps already running on the host so the container version fails to startup. I think we might be missing a part of the composable upgrades for Rabbit here so that we cleanly shutdown the baremetal Rabbitmq (using pcs commands) where the existing composable upgrades stuff still appears to rely on systemd: http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/pacemaker/rabbitmq.yaml#n166 Probably worth getting Damien and or Michele to have a look at this one. Will ping jistr and shardy as well from Containers to have a look with regards to the composable upgrades support.
Yes I think the analysis from Dan is correct - the upgrade_tasks need to stop and disable the service on the host, and it seems we've copied the non-pacemaker tasks so these need adjusting to stop rabbit via pcs? I think this works in the non-container case because we stop the pcs cluster, so there are no explicit tasks in the rabbitmq template (that may also be the solution here, but it seems the existing pacemaker logic may not be sufficient in the new container architecture): http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/puppet/services/pacemaker/rabbitmq.yaml http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/puppet/services/pacemaker.yaml#n148
I hit a similar issue caused by existing pacemaker resources on the host conflicting with the containers created during upgrade when testing upstream with pacemaker enabled: https://bugs.launchpad.net/tripleo/+bug/1701485 As a side note I think it's important when testing downstream to use the both docker.yaml and docker-ha.yaml environment files since downstream has pacemaker enabled by default.
Long story short, we duplicated the upgrade logic from the non-HA case and obviously this doesn't work. For the HA part we need to implement the following: . stop the ha resource that existed in OSP11 . delete the stopped resources (e.g. galera-master) . create new container-specific resource (e.g. galera bundle and galera resource that runs in galera bundle) . restart new service We need to implement that logic for every HA Service (rabbit, galera, haproxy, redis, cinder*). We'll probably rely on ansible-pacemaker fo doing so. So I think we have a plan, we just need to implement it and test it upstream first.
We'll track work needed in https://review.openstack.org/#/c/480202/
Since the review attached is merged and landed, I guess there is nothing more needed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462