Bug 2071046
| Summary: | Frequent install failures with Ironic 500: "Cannot use 'none' RPC to connect to remote conductor" | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Stephen Benjamin <stbenjam> |
| Component: | Bare Metal Hardware Provisioning | Assignee: | Riccardo Pittau <rpittau> |
| Bare Metal Hardware Provisioning sub component: | ironic | QA Contact: | Amit Ugol <augol> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | rpittau, zbitter |
| Version: | 4.11 | Keywords: | Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-04 08:04:02 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This looks like a timing issue at startup. The API comes up before set_global_manager() is called, and if something hits the API in the meantime then we see this failure. The RPC service is launched before the WSGI one, but they both start in separate greenthreads so it's a race. Happy Monday @dtantsur *** This bug has been marked as a duplicate of bug 2068246 *** |
During installation, terraform exits reporting Ironic returned 500: level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [5m0s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[1]: Still creating... [5m0s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[1]: Still creating... [5m10s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[0]: Still creating... [5m10s elapsed] level=debug msg=ironic_node_v1.openshift-master-host[2]: Still creating... [5m10s elapsed] level=error level=error msg=Error: Internal Server Error level=error level=error msg= with ironic_node_v1.openshift-master-host[1], level=error msg= on main.tf line 13, in resource "ironic_node_v1" "openshift-master-host": level=error msg= 13: resource "ironic_node_v1" "openshift-master-host" { level=error level=error level=error msg=Error: Internal Server Error level=error level=error msg= with ironic_node_v1.openshift-master-host[0], level=error msg= on main.tf line 13, in resource "ironic_node_v1" "openshift-master-host": level=error msg= 13: resource "ironic_node_v1" "openshift-master-host" { Digging into the installer log bundle's Ironic logs, I do see errors like this: ironic.common.exception.ServiceUnavailable: Cannot use 'none' RPC to connect to remote conductor 172.22.0.2 : ironic.common.exception.ServiceUnavailable: Cannot use 'none' RPC to connect to remote conductor 172.22.0.2 2022-04-01 15:53:47.486 1 INFO eventlet.wsgi.server [req-5f326ae0-e490-4751-a6e3-93aa088d8ac3 - - - - -] ::ffff:192.168.111.1 "POST /v1/nodes HTTP/1.1" status: 500 len: 476 time: 0.0718205 2022-04-01 15:53:47.490 1 ERROR ironic.api.method [req-fa9859dd-4f5c-4e18-8ce8-afe73948699a - - - - -] Server-side error: "Cannot use 'none' RPC to connect to remote conductor 172.22.0.2". Detail: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ironic/api/method.py", line 42, in callfunction result = f(self, *args, **kwargs) File "/usr/lib/python3.6/site-packages/ironic/api/method.py", line 109, in inner_body return function(*args, **kwargs) File "/usr/lib/python3.6/site-packages/ironic/common/args.py", line 379, in inner_check_args return function(*args, **kwargs_next) File "/usr/lib/python3.6/site-packages/ironic/api/controllers/v1/node.py", line 2493, in post new_node, topic) File "/usr/lib/python3.6/site-packages/ironic/conductor/rpcapi.py", line 314, in create_node cctxt = self._prepare_call(topic=topic, version='1.36') File "/usr/lib/python3.6/site-packages/ironic/conductor/rpcapi.py", line 213, in _prepare_call % host) Here's an example run: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-metal-ipi-serial-ipv4/1509917381101621248 Looks like it's common enough to be worth investigating: https://search.ci.openshift.org/?search=msg%3DError%3A+Internal+Server+Error&maxAge=48h&context=1&type=build-log&name=metal-ipi&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job