During installation, terraform exits reporting Ironic returned 500: level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [5m0s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[1]: Still creating... [5m0s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[1]: Still creating... [5m10s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [5m10s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[2]: Still creating... [5m10s elapsed] level=error level=error msg=Error: Internal Server Error level=error level=error msg= with ironic_node_v1.openshift-master-host[1], level=error msg= on main.tf line 13, in resource "ironic_node_v1" "openshift-master-host": level=error msg= 13: resource "ironic_node_v1" "openshift-master-host" { level=error level=error level=error msg=Error: Internal Server Error level=error level=error msg= with ironic_node_v1.openshift-master-host[0], level=error msg= on main.tf line 13, in resource "ironic_node_v1" "openshift-master-host": level=error msg= 13: resource "ironic_node_v1" "openshift-master-host" { Digging into the installer log bundle's Ironic logs, I do see errors like this: ironic.common.exception.ServiceUnavailable: Cannot use 'none' RPC to connect to remote conductor 172.22.0.2 : ironic.common.exception.ServiceUnavailable: Cannot use 'none' RPC to connect to remote conductor 172.22.0.2 2022-04-01 15:53:47.486 1 INFO eventlet.wsgi.server [req-5f326ae0-e490-4751-a6e3-93aa088d8ac3 - - - - -] ::ffff:192.168.111.1 "POST /v1/nodes HTTP/1.1" status: 500 len: 476 time: 0.0718205 2022-04-01 15:53:47.490 1 ERROR ironic.api.method [req-fa9859dd-4f5c-4e18-8ce8-afe73948699a - - - - -] Server-side error: "Cannot use 'none' RPC to connect to remote conductor 172.22.0.2". Detail: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ironic/api/method.py", line 42, in callfunction result = f(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/ironic/api/method.py", line 109, in inner_body return function(*args, **kwargs) File "/usr/lib/python3.6/site-packages/ironic/common/args.py", line 379, in inner_check_args return function(*args, **kwargs_next) File "/usr/lib/python3.6/site-packages/ironic/api/controllers/v1/node.py", line 2493, in post new_node, topic) File "/usr/lib/python3.6/site-packages/ironic/conductor/rpcapi.py", line 314, in create_node cctxt = self._prepare_call(topic=topic, version='1.36') File "/usr/lib/python3.6/site-packages/ironic/conductor/rpcapi.py", line 213, in _prepare_call % host) Here's an example run: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-serial-ipv4/1509917381101621248 Looks like it's common enough to be worth investigating: https://search.ci.openshift.org/?search=msg%3DError%3A+Internal+Server+Error&maxAge=48h&context=1&type=build-log&name=metal-ipi&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
This looks like a timing issue at startup. The API comes up before set_global_manager() is called, and if something hits the API in the meantime then we see this failure. The RPC service is launched before the WSGI one, but they both start in separate greenthreads so it's a race. Happy Monday @dtantsur
*** This bug has been marked as a duplicate of bug 2068246 ***