1370527 – Overcloud create fails but "openstack deploy overcloud" returns 0

Bug 1370527 - Overcloud create fails but "openstack deploy overcloud" returns 0

Summary: Overcloud create fails but "openstack deploy overcloud" returns 0

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-tripleoclient
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z4
Target Release:	10.0 (Newton)
Assignee:	Julie Pichon
QA Contact:	Ola Pavlenko
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1295569 (view as bug list)
Depends On:
Blocks:	1305654
TreeView+	depends on / blocked

Reported:	2016-08-26 14:49 UTC by Alan Bishop
Modified:	2020-12-14 07:42 UTC (History)
CC List:	26 users (show)
Fixed In Version:	python-tripleoclient-5.4.2-2.el7ost
Doc Type:	Bug Fix
Doc Text:	Cause: The "openstack overcloud deploy" command did not raise exceptions when encountering errors. Consequence: The command failed but still returned an error code of 0, making it difficult for scripts to pick up on the failures. Fix: Raise the exception for stack failures and pre-deployment verification failures. Result: The "openstack overcloud deploy" command fails with an error code of 1, as expected.
Clone Of:
Environment:
Last Closed:	2017-09-06 17:09:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1672790	None	None	None	2017-03-14 16:29:49 UTC
OpenStack gerrit	299494	None	MERGED	Raise instead of returning False when take_action fails	2021-01-07 08:14:31 UTC
OpenStack gerrit	473302	None	MERGED	Fix return code when failing before launching the stack	2021-01-07 08:14:31 UTC
Red Hat Product Errata	RHBA-2017:2654	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 director Bug Fix Advisory	2017-09-06 20:55:36 UTC

Description Alan Bishop 2016-08-26 14:49:51 UTC

Description of problem:

The "openstack deploy overcloud" command may return a false positive. If the stack is not successfully created, the command will tell you it failed but the return code will be zero.

Version-Release number of selected component (if applicable):

python-openstackclient-2.2.0-1.el7ost.noarch

How reproducible:

Easy

Steps to Reproduce:
1. Start with no overcloud stack
2. Attempt to deploy an overcloud that you know will fail
3.

Actual results:

Here are the last few lines of the deploy command's output:

Stack overcloud CREATE_FAILED
Deployment failed:  Heat Stack create failed.
clean_up DeployOvercloud: 
END return value: 0

[stack@director ~]$ echo $?
0
[stack@director ~]$ openstack stack list
+--------------------------------------+------------+---------------+---------------------+--------------+
| ID                                   | Stack Name | Stack Status  | Creation Time       | Updated Time |
+--------------------------------------+------------+---------------+---------------------+--------------+
| 64a7198d-4693-42ca-9c66-38b8a9c4b6e5 | overcloud  | CREATE_FAILED | 2016-08-25T20:55:07 | None         |
+--------------------------------------+------------+---------------+---------------------+--------------+

Expected results:

Any non-zero return value.

Additional info:

Comment 2 Robin Cernin 2016-10-03 14:01:45 UTC

The same thing happens with the update, possible workaround would be adding extra few lines to your deploy.sh script:


# We don't always get a useful error code from the openstack deploy command,
# so check `heat stack-list` for a FAILED status.

if heat stack-list | grep -q 'FAILED'; then
    for failed in $(heat resource-list \
        --nested-depth 5 overcloud | grep FAILED |
        grep 'StructuredDeployment ' | cut -d '|' -f3)
    do heat deployment-show $failed > failed_deployment_$failed.log
    done
fi

We are already using this for TripleO Quickstart.

Comment 3 Jason E. Rist 2016-10-14 22:17:38 UTC

Since there is a workaround and time is short I'm moving to z

Comment 4 Charlie Llewellyn 2016-11-10 13:08:07 UTC

The workaround only works if the deploy gets as far as submitting the stack update to heat. In this example we had an issue with our undercloud build causing some additional drivers to not be loaded correctly into with ironic. In this case the command still gave the incorrect exit status:

[stack@ucl00002i2 osp9-upgrade]$ openstack overcloud deploy --templates \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e $runDir/network-environment.yaml \
-e $runDir/storage-environment.yaml \
-e $runDir/timezone.yaml \
-e $runDir/firstboot.yaml \
-e $runDir/enable-tls.yaml \
-e $runDir/cloudname.yaml \
-e $runDir/placement/scheduler_hints_env.yaml \
-e $runDir/post-configuration.yaml \
-e $runDir/rhel-registration/environment-rhel-registration.yaml \
-e $runDir/rhel-registration/rhel-registration-resource-registry.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-aodh.yaml \
--control-scale 3 \
--compute-scale 2 \
--ceph-storage-scale 3 \
--control-flavor baremetal \
--compute-flavor baremetal \
--ceph-storage-flavor baremetal \
--ntp-server time1.il2management.local \
--neutron-network-type vxlan \
--neutron-tunnel-types vxlan \
--timeout 120

/home/stack/beta-deployment-heat-templates
1 nodes with profile None won't be used for deployment now
No valid host was found. Reason: No conductor service registered which supports driver pxe_iscsi_cimc. (HTTP 400)
No swift endpoint found, no need to delete.
End of deployment

[stack@ucl00002i2 osp9-upgrade]$ echo $?
0

Comment 5 Randy Perryman 2017-01-16 16:50:10 UTC

Will this be backported to OSP8/OSP9?

Comment 6 Sean Merrow 2017-01-26 18:27:18 UTC

Hi Jacob, is there an upstream review for this that we can use for additional tracking, and will this apply to OSP 8, 9 and 10?

Comment 7 Mike Burns 2017-02-16 18:53:00 UTC

Jason/Jakub, any plans to fix this?

Comment 9 Julie Pichon 2017-02-17 12:54:49 UTC

*** Bug 1295569 has been marked as a duplicate of this bug. ***

Comment 10 Julie Pichon 2017-02-17 12:57:43 UTC

Unfortunately, it's difficult to discuss backports until we understand what the fix is. I suspect the problem is related to the logic in python-tripleoclient rather than the openstack client. There seems to be different results for similar cases, for instance a couple of quick tests on a Newton environment show me:

$ openstack overcloud deploy --templates --compute-scale 20
Not enough nodes - available: 5, requested: 21
Configuration has 1 errors, fix them before proceeding. Ignoring these errors is likely to lead to a failed deploy.
$ echo $?
0

However a failed stack create (no valid host) does return 1 as expected:

[...]
12017-02-17 12:26:08Z [overcloud.Controller]: CREATE_FAILED  Resource CREATE failed: Operation cancelled
2017-02-17 12:26:10Z [overcloud.Compute.0.UpdateDeployment]: SIGNAL_IN_PROGRESS  Signal: deployment c8708964-6c96-4e57-82e1-a0dcc1750e48 succeeded
2017-02-17 12:26:10Z [overcloud.Compute.0.UpdateDeployment]: CREATE_COMPLETE  state changed
2017-02-17 12:26:11Z [overcloud.Controller.0.UpdateDeployment]: SIGNAL_IN_PROGRESS  Signal: deployment 9de4dc14-318d-4971-bd96-44b9a170f19b succeeded

 Stack overcloud CREATE_FAILED 

Heat Stack create failed.
[stack@instack ~]$ echo $?
1

I'll see if I can find a way to reproduce the missing drivers issue locally and get the wrong exit code that way. I still need to stand up a Mitaka environment as well.

Comment 11 Julie Pichon 2017-03-14 16:29:50 UTC

The issues are indeed in the TripleO client, moving the bug there.

1. About the return code being wrong on stack failure, as mentioned in the description:

This is already fixed in OSP 10, thanks to https://review.openstack.org/#/c/299494/ I believe.

2. About the return code being wrong when failing before launching a stack, that one is still an issue. I opened https://bugs.launchpad.net/tripleo/+bug/1672790 to track it upstream.

Comment 12 Sean Merrow 2017-05-04 14:28:43 UTC

Summary Update
==============
This BZ is about two issues listed below with statuses:

About the return code being wrong on stack failure
- Fixed in OSP 10 - https://review.openstack.org/#/c/299494/)

Return code being wrong when failing before launching a stack, that one has been - Fixed in upstream Pike (master)  - https://review.openstack.org/#/c/446470/ 
- Backported in upstream Ocata     - https://review.openstack.org/#/c/452634/
- Will need to also be backported to OSP 10 LL

Comment 13 Julie Pichon 2017-06-19 15:49:03 UTC

Thank you for the summary update Sean. The last backport mentioned in comment 12 has now merged in stable/newton (OSP10) upstream.

Comment 16 errata-xmlrpc 2017-09-06 17:09:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2654

Note You need to log in before you can comment on or make changes to this bug.

apevec
arkady_kanevsky
cdevine
christopher_dearborn
cllewellyn
david_paterson
hbrock
jjoyce
John_walsh
jpichon
jrist
jruzicka
jslagle
kurt_hey
lhh
mburns
morazi
randy_perryman
rcernin
rhel-osp-director-maint
samccann
shyningcrow
slinaber
smerrow
sreichar
srevivo