Description of problem: The "openstack deploy overcloud" command may return a false positive. If the stack is not successfully created, the command will tell you it failed but the return code will be zero. Version-Release number of selected component (if applicable): python-openstackclient-2.2.0-1.el7ost.noarch How reproducible: Easy Steps to Reproduce: 1. Start with no overcloud stack 2. Attempt to deploy an overcloud that you know will fail 3. Actual results: Here are the last few lines of the deploy command's output: Stack overcloud CREATE_FAILED Deployment failed: Heat Stack create failed. clean_up DeployOvercloud: END return value: 0 [stack@director ~]$ echo $? 0 [stack@director ~]$ openstack stack list +--------------------------------------+------------+---------------+---------------------+--------------+ | ID | Stack Name | Stack Status | Creation Time | Updated Time | +--------------------------------------+------------+---------------+---------------------+--------------+ | 64a7198d-4693-42ca-9c66-38b8a9c4b6e5 | overcloud | CREATE_FAILED | 2016-08-25T20:55:07 | None | +--------------------------------------+------------+---------------+---------------------+--------------+ Expected results: Any non-zero return value. Additional info:
The same thing happens with the update, possible workaround would be adding extra few lines to your deploy.sh script: # We don't always get a useful error code from the openstack deploy command, # so check `heat stack-list` for a FAILED status. if heat stack-list | grep -q 'FAILED'; then for failed in $(heat resource-list \ --nested-depth 5 overcloud | grep FAILED | grep 'StructuredDeployment ' | cut -d '|' -f3) do heat deployment-show $failed > failed_deployment_$failed.log done fi We are already using this for TripleO Quickstart.
Since there is a workaround and time is short I'm moving to z
The workaround only works if the deploy gets as far as submitting the stack update to heat. In this example we had an issue with our undercloud build causing some additional drivers to not be loaded correctly into with ironic. In this case the command still gave the incorrect exit status: [stack@ucl00002i2 osp9-upgrade]$ openstack overcloud deploy --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e $runDir/network-environment.yaml \ -e $runDir/storage-environment.yaml \ -e $runDir/timezone.yaml \ -e $runDir/firstboot.yaml \ -e $runDir/enable-tls.yaml \ -e $runDir/cloudname.yaml \ -e $runDir/placement/scheduler_hints_env.yaml \ -e $runDir/post-configuration.yaml \ -e $runDir/rhel-registration/environment-rhel-registration.yaml \ -e $runDir/rhel-registration/rhel-registration-resource-registry.yaml \ -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-aodh.yaml \ --control-scale 3 \ --compute-scale 2 \ --ceph-storage-scale 3 \ --control-flavor baremetal \ --compute-flavor baremetal \ --ceph-storage-flavor baremetal \ --ntp-server time1.il2management.local \ --neutron-network-type vxlan \ --neutron-tunnel-types vxlan \ --timeout 120 /home/stack/beta-deployment-heat-templates 1 nodes with profile None won't be used for deployment now No valid host was found. Reason: No conductor service registered which supports driver pxe_iscsi_cimc. (HTTP 400) No swift endpoint found, no need to delete. End of deployment [stack@ucl00002i2 osp9-upgrade]$ echo $? 0
Will this be backported to OSP8/OSP9?
Hi Jacob, is there an upstream review for this that we can use for additional tracking, and will this apply to OSP 8, 9 and 10?
Jason/Jakub, any plans to fix this?
*** Bug 1295569 has been marked as a duplicate of this bug. ***
Unfortunately, it's difficult to discuss backports until we understand what the fix is. I suspect the problem is related to the logic in python-tripleoclient rather than the openstack client. There seems to be different results for similar cases, for instance a couple of quick tests on a Newton environment show me: $ openstack overcloud deploy --templates --compute-scale 20 Not enough nodes - available: 5, requested: 21 Configuration has 1 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy. $ echo $? 0 However a failed stack create (no valid host) does return 1 as expected: [...] 12017-02-17 12:26:08Z [overcloud.Controller]: CREATE_FAILED Resource CREATE failed: Operation cancelled 2017-02-17 12:26:10Z [overcloud.Compute.0.UpdateDeployment]: SIGNAL_IN_PROGRESS Signal: deployment c8708964-6c96-4e57-82e1-a0dcc1750e48 succeeded 2017-02-17 12:26:10Z [overcloud.Compute.0.UpdateDeployment]: CREATE_COMPLETE state changed 2017-02-17 12:26:11Z [overcloud.Controller.0.UpdateDeployment]: SIGNAL_IN_PROGRESS Signal: deployment 9de4dc14-318d-4971-bd96-44b9a170f19b succeeded Stack overcloud CREATE_FAILED Heat Stack create failed. [stack@instack ~]$ echo $? 1 I'll see if I can find a way to reproduce the missing drivers issue locally and get the wrong exit code that way. I still need to stand up a Mitaka environment as well.
The issues are indeed in the TripleO client, moving the bug there. 1. About the return code being wrong on stack failure, as mentioned in the description: This is already fixed in OSP 10, thanks to https://review.openstack.org/#/c/299494/ I believe. 2. About the return code being wrong when failing before launching a stack, that one is still an issue. I opened https://bugs.launchpad.net/tripleo/+bug/1672790 to track it upstream.
Summary Update ============== This BZ is about two issues listed below with statuses: About the return code being wrong on stack failure - Fixed in OSP 10 - https://review.openstack.org/#/c/299494/) Return code being wrong when failing before launching a stack, that one has been - Fixed in upstream Pike (master) - https://review.openstack.org/#/c/446470/ - Backported in upstream Ocata - https://review.openstack.org/#/c/452634/ - Will need to also be backported to OSP 10 LL
Thank you for the summary update Sean. The last backport mentioned in comment 12 has now merged in stable/newton (OSP10) upstream.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2654