Bug 1743871

Summary: [RHHI.next] baremetal: handle 409 conflict from Ironic in installer
Product: OpenShift Container Platform Reporter: Stephen Benjamin <stbenjam>
Component: InstallerAssignee: Stephen Benjamin <stbenjam>
Installer sub component: openshift-installer QA Contact: Arik Chernetsky <achernet>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: gklein, mcornea, ncredi, sasha
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:36:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Stephen Benjamin 2019-08-20 21:12:37 UTC
When Ironic returns a 409 result, it could be that the node is locked while another operation is being performed, like checking power status. This error shouldn't be fatal, and we should retry a few times to see if the lock gets released. baremetal-operator already does this, and v0.1.7 of terraform-provider-ironic will as well.

When the installer encounters this, it current fails installation with an error like below:


level=debug msg="2019/08/20 14:12:52 [DEBUG] module.masters.ironic_node_v1.openshift-master-host[1]: apply errored, but we're indicating that via the Error pointer rather than returning it: could not make node available: Expected HTTP response code [202] when accessing [PUT http://172.22.0.2:6385/v1/nodes/9820eba5-5ae0-44d3-b536-25f46bf3e8c9/states/provision], but got 409 instead"
level=debug msg="{\"error_message\": \"{\\\"debuginfo\\\": null, \\\"faultcode\\\": \\\"Client\\\", \\\"faultstring\\\": \\\"Node 9820eba5-5ae0-44d3-b536-25f46bf3e8c9 is locked by host localhost.localdomain, please retry after the current operation is completed.\\\"}\"}"
level=debug msg="2019/08/20 14:12:52 [ERROR] module.masters: eval: *terraform.EvalApplyPost, err: could not make node available: Expected HTTP response code [202] when accessing [PUT http://172.22.0.2:6385/v1/nodes/9820eba5-5ae0-44d3-b536-25f46bf3e8c9/states/provision], but got 409 instead"
level=debug msg="{\"error_message\": \"{\\\"debuginfo\\\": null, \\\"faultcode\\\": \\\"Client\\\", \\\"faultstring\\\": \\\"Node 9820eba5-5ae0-44d3-b536-25f46bf3e8c9 is locked by host localhost.localdomain, please retry after the current operation is completed.\\\"}\"}"`

Comment 2 Stephen Benjamin 2019-08-24 18:01:45 UTC
*** Bug 1741344 has been marked as a duplicate of this bug. ***

Comment 5 errata-xmlrpc 2019-10-16 06:36:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922