Created attachment 1718069 [details] openshift_install.log Version: $ ./openshift-baremetal-install version ./openshift-baremetal-install 4.6.0-0.nightly-2020-10-01-024558 built from commit 7a772518015fc14b48426344e8b3800b16b50d15 release image registry.svc.ci.openshift.org/ocp/release@sha256:e162d478bde8b33a40b2484cbf79233b9f571e59c025479c0b222724bc995c35 Platform: baremetal IPI What happened? Deployment fails, all BMH in state registering $ oc get bmh -A NAMESPACE NAME STATUS PROVISIONING STATUS CONSUMER BMC HARDWARE PROFILE ONLINE ERROR openshift-machine-api openshift-master-0-0 registering ocp-edge-cluster-0-pgrbv-master-0 redfish://192.168.123.1:8000/redfish/v1/Systems/59ab267f-ebb0-4310-b65d-3dd7196ad390 true openshift-machine-api openshift-master-0-1 registering ocp-edge-cluster-0-pgrbv-master-1 redfish://192.168.123.1:8000/redfish/v1/Systems/c29ab232-2a80-4bc6-a279-266b9b2406db true openshift-machine-api openshift-master-0-2 registering ocp-edge-cluster-0-pgrbv-master-2 redfish://192.168.123.1:8000/redfish/v1/Systems/d07b27f7-6b6a-4784-aa16-937273cc0747 true openshift-machine-api openshift-worker-0-0 registering redfish://192.168.123.1:8000/redfish/v1/Systems/38611443-55e0-466d-adfe-607ce039add8 true openshift-machine-api openshift-worker-0-1 registering redfish://192.168.123.1:8000/redfish/v1/Systems/791df6e1-9494-445f-8ba7-5ded73d57be9 $ oc get nodes NAME STATUS ROLES AGE VERSION master-0-0 Ready master 60m v1.19.0+beb741b master-0-1 Ready master 60m v1.19.0+beb741b master-0-2 Ready master 60m v1.19.0+beb741b $ oc get machine -A -o wide NAMESPACE NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE openshift-machine-api ocp-edge-cluster-0-pgrbv-master-0 Running 84m master-0-0 baremetalhost:///openshift-machine-api/openshift-master-0-0 openshift-machine-api ocp-edge-cluster-0-pgrbv-master-1 Running 84m master-0-1 baremetalhost:///openshift-machine-api/openshift-master-0-1 openshift-machine-api ocp-edge-cluster-0-pgrbv-master-2 Running 84m master-0-2 baremetalhost:///openshift-machine-api/openshift-master-0-2 openshift-machine-api ocp-edge-cluster-0-pgrbv-worker-0-l6wz6 Provisioning 69m openshift-machine-api ocp-edge-cluster-0-pgrbv-worker-0-mvmv2 Provisioning 69m $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication False False True 80m cloud-credential 4.6.0-0.nightly-2020-10-01-024558 True False False 90m cluster-autoscaler 4.6.0-0.nightly-2020-10-01-024558 True False False 77m config-operator 4.6.0-0.nightly-2020-10-01-024558 True False False 80m console 4.6.0-0.nightly-2020-10-01-024558 Unknown True False 58m csi-snapshot-controller 4.6.0-0.nightly-2020-10-01-024558 True False False 79m dns 4.6.0-0.nightly-2020-10-01-024558 True False False 79m etcd 4.6.0-0.nightly-2020-10-01-024558 True False False 78m image-registry 4.6.0-0.nightly-2020-10-01-024558 True False False 58m ingress False True True 77m insights 4.6.0-0.nightly-2020-10-01-024558 True False False 77m kube-apiserver 4.6.0-0.nightly-2020-10-01-024558 True False False 77m kube-controller-manager 4.6.0-0.nightly-2020-10-01-024558 True False False 78m kube-scheduler 4.6.0-0.nightly-2020-10-01-024558 True False False 76m kube-storage-version-migrator 4.6.0-0.nightly-2020-10-01-024558 False False False 79m machine-api 4.6.0-0.nightly-2020-10-01-024558 True False False 64m machine-approver 4.6.0-0.nightly-2020-10-01-024558 True False False 79m machine-config 4.6.0-0.nightly-2020-10-01-024558 True False False 79m marketplace 4.6.0-0.nightly-2020-10-01-024558 True False False 76m monitoring False True True 69m network 4.6.0-0.nightly-2020-10-01-024558 True False False 80m node-tuning 4.6.0-0.nightly-2020-10-01-024558 True False False 79m openshift-apiserver 4.6.0-0.nightly-2020-10-01-024558 True False False 58m openshift-controller-manager 4.6.0-0.nightly-2020-10-01-024558 True False False 77m openshift-samples 4.6.0-0.nightly-2020-10-01-024558 True False False 58m operator-lifecycle-manager 4.6.0-0.nightly-2020-10-01-024558 True False False 79m operator-lifecycle-manager-catalog 4.6.0-0.nightly-2020-10-01-024558 True False False 79m operator-lifecycle-manager-packageserver 4.6.0-0.nightly-2020-10-01-024558 True False False 58m service-ca 4.6.0-0.nightly-2020-10-01-024558 True False False 79m storage 4.6.0-0.nightly-2020-10-01-024558 True False False 79m $ oc get pods -A|grep -vE "Run|Comp" NAMESPACE NAME READY STATUS RESTARTS AGE openshift-ingress router-default-fb68bb68f-2j2sz 0/1 Pending 0 79m openshift-ingress router-default-fb68bb68f-f6hjj 0/1 Pending 0 79m openshift-kube-storage-version-migrator migrator-5d4969c44c-kt8bl 0/1 Pending 0 81m openshift-monitoring kube-state-metrics-685bc9c746-sbp9j 0/3 Pending 0 79m openshift-monitoring openshift-state-metrics-5fdfdcd554-c4pvd 0/3 Pending 0 79m openshift-monitoring prometheus-adapter-74fd9b685c-cwhrm 0/1 Pending 0 71m openshift-monitoring prometheus-adapter-74fd9b685c-r7pqg 0/1 Pending 0 71m What did you expect to happen? Deploy success How to reproduce it (as minimally and precisely as possible)? Run deploy for OCP 4.6
must-gather http://rhos-compute-node-10.lab.eng.rdu2.redhat.com/logs/BZ1884155_must-gather.tar.gz
The problem is the communication between ironic-api and ironic-conductor, we upgraded ironic yesterday (for another issue) and now this has come up, I'll attach a PR 2020-10-01T07:56:30.107047928Z 2020-10-01 07:56:30.106 39 ERROR ironic.api.expose [req-a3747a4d-9aaf-4356-8ec7-621c1170d842 ironic-user - - - -] Server-side error: "No valid authentication is available". Detail: 2020-10-01T07:56:30.107047928Z Traceback (most recent call last): 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/ironic/api/expose.py", line 78, in callfunction 2020-10-01T07:56:30.107047928Z result = f(self, *args, **kwargs) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/ironic/api/controllers/v1/node.py", line 2304, in post 2020-10-01T07:56:30.107047928Z new_node, topic) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/ironic/conductor/rpcapi.py", line 232, in create_node 2020-10-01T07:56:30.107047928Z return cctxt.call(context, 'create_node', node_obj=node_obj) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/ironic/common/json_rpc/client.py", line 123, in call 2020-10-01T07:56:30.107047928Z **kwargs) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/ironic/common/json_rpc/client.py", line 174, in _request 2020-10-01T07:56:30.107047928Z result = _get_session().post(url, json=body) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 401, in post 2020-10-01T07:56:30.107047928Z return self.request(url, 'POST', **kwargs) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 257, in request 2020-10-01T07:56:30.107047928Z return self.session.request(url, method, **kwargs) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 784, in request 2020-10-01T07:56:30.107047928Z raise exceptions.AuthorizationFailure(msg) 2020-10-01T07:56:30.107047928Z 2020-10-01T07:56:30.107047928Z keystoneauth1.exceptions.auth.AuthorizationFailure: No valid authentication is available 2020-10-01T07:56:30.107047928Z : keystoneauth1.exceptions.auth.AuthorizationFailure: No valid authentication is available^[[00m 2020-10-01T07:56:30.107797581Z 2020-10-01 07:56:30.107 39 INFO eventlet.wsgi.server [req-a3747a4d-9aaf-4356-8ec7-621c1170d842 ironic-user - - - -] ::ffff:172.22.0.3 "POST /v1/nodes HTTP/1.1" status: 500 len: 449 time: 0.0172441^[[00m
I've tested the latest nightly, the ironic image has the fix and ironic is now working [derekh@r640-u07 dev-scripts]$ oc get bmh NAME STATUS PROVISIONING STATUS CONSUMER BMC HARDWARE PROFILE ONLINE ERROR ostest-master-0 OK externally provisioned ostest-24sck-master-0 redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/4a2ef36e-3297-45d5-bfa5-ef68204087de true ostest-master-1 OK externally provisioned ostest-24sck-master-1 redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/35b45909-f0cb-47a2-ad45-54e571a6e886 true ostest-master-2 OK externally provisioned ostest-24sck-master-2 redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/d47c8f72-73b3-407c-80bb-d730aef79dd5 true ostest-worker-0 OK provisioned ostest-24sck-worker-0-dddvf redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/35c07b36-d11e-4fdb-b313-4c7b9de9c4a5 unknown true ostest-worker-1 OK provisioned ostest-24sck-worker-0-jznkf redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/b1fd246c-81af-45e8-989a-c9ac9e11909f unknown true
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196