Bug 1884155 - OCP4.6 deploy fails, all bmh in registering state
Summary: OCP4.6 deploy fails, all bmh in registering state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Bare Metal Hardware Provisioning
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Derek Higgins
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-01 08:15 UTC by Lubov
Modified: 2020-10-27 16:47 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:47:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
openshift_install.log (131.21 KB, text/plain)
2020-10-01 08:15 UTC, Lubov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift ironic-image pull 107 0 None closed Bug 1884155: Explicitly set json_rpc.auth_strategy to noauth 2021-01-07 16:04:06 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:47:34 UTC

Description Lubov 2020-10-01 08:15:54 UTC
Created attachment 1718069 [details]
openshift_install.log

Version:

$ ./openshift-baremetal-install version
./openshift-baremetal-install 4.6.0-0.nightly-2020-10-01-024558
built from commit 7a772518015fc14b48426344e8b3800b16b50d15
release image registry.svc.ci.openshift.org/ocp/release@sha256:e162d478bde8b33a40b2484cbf79233b9f571e59c025479c0b222724bc995c35

Platform:

baremetal IPI

What happened?
Deployment fails, all BMH in state registering


$ oc get bmh -A
NAMESPACE               NAME                   STATUS   PROVISIONING STATUS   CONSUMER                            BMC                                                                                    HARDWARE PROFILE   ONLINE   ERROR
openshift-machine-api   openshift-master-0-0            registering           ocp-edge-cluster-0-pgrbv-master-0   redfish://192.168.123.1:8000/redfish/v1/Systems/59ab267f-ebb0-4310-b65d-3dd7196ad390                      true     
openshift-machine-api   openshift-master-0-1            registering           ocp-edge-cluster-0-pgrbv-master-1   redfish://192.168.123.1:8000/redfish/v1/Systems/c29ab232-2a80-4bc6-a279-266b9b2406db                      true     
openshift-machine-api   openshift-master-0-2            registering           ocp-edge-cluster-0-pgrbv-master-2   redfish://192.168.123.1:8000/redfish/v1/Systems/d07b27f7-6b6a-4784-aa16-937273cc0747                      true     
openshift-machine-api   openshift-worker-0-0            registering                                               redfish://192.168.123.1:8000/redfish/v1/Systems/38611443-55e0-466d-adfe-607ce039add8                      true     
openshift-machine-api   openshift-worker-0-1            registering                                               redfish://192.168.123.1:8000/redfish/v1/Systems/791df6e1-9494-445f-8ba7-5ded73d57be9 

$ oc get nodes
NAME         STATUS   ROLES    AGE   VERSION
master-0-0   Ready    master   60m   v1.19.0+beb741b
master-0-1   Ready    master   60m   v1.19.0+beb741b
master-0-2   Ready    master   60m   v1.19.0+beb741b

$ oc get machine -A -o wide
NAMESPACE               NAME                                      PHASE          TYPE   REGION   ZONE   AGE   NODE         PROVIDERID                                                    STATE
openshift-machine-api   ocp-edge-cluster-0-pgrbv-master-0         Running                               84m   master-0-0   baremetalhost:///openshift-machine-api/openshift-master-0-0   
openshift-machine-api   ocp-edge-cluster-0-pgrbv-master-1         Running                               84m   master-0-1   baremetalhost:///openshift-machine-api/openshift-master-0-1   
openshift-machine-api   ocp-edge-cluster-0-pgrbv-master-2         Running                               84m   master-0-2   baremetalhost:///openshift-machine-api/openshift-master-0-2   
openshift-machine-api   ocp-edge-cluster-0-pgrbv-worker-0-l6wz6   Provisioning                          69m                                                                              
openshift-machine-api   ocp-edge-cluster-0-pgrbv-worker-0-mvmv2   Provisioning                          69m

$ oc get co
NAME                                       VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                                 False       False         True       80m
cloud-credential                           4.6.0-0.nightly-2020-10-01-024558   True        False         False      90m
cluster-autoscaler                         4.6.0-0.nightly-2020-10-01-024558   True        False         False      77m
config-operator                            4.6.0-0.nightly-2020-10-01-024558   True        False         False      80m
console                                    4.6.0-0.nightly-2020-10-01-024558   Unknown     True          False      58m
csi-snapshot-controller                    4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
dns                                        4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
etcd                                       4.6.0-0.nightly-2020-10-01-024558   True        False         False      78m
image-registry                             4.6.0-0.nightly-2020-10-01-024558   True        False         False      58m
ingress                                                                        False       True          True       77m
insights                                   4.6.0-0.nightly-2020-10-01-024558   True        False         False      77m
kube-apiserver                             4.6.0-0.nightly-2020-10-01-024558   True        False         False      77m
kube-controller-manager                    4.6.0-0.nightly-2020-10-01-024558   True        False         False      78m
kube-scheduler                             4.6.0-0.nightly-2020-10-01-024558   True        False         False      76m
kube-storage-version-migrator              4.6.0-0.nightly-2020-10-01-024558   False       False         False      79m
machine-api                                4.6.0-0.nightly-2020-10-01-024558   True        False         False      64m
machine-approver                           4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
machine-config                             4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
marketplace                                4.6.0-0.nightly-2020-10-01-024558   True        False         False      76m
monitoring                                                                     False       True          True       69m
network                                    4.6.0-0.nightly-2020-10-01-024558   True        False         False      80m
node-tuning                                4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
openshift-apiserver                        4.6.0-0.nightly-2020-10-01-024558   True        False         False      58m
openshift-controller-manager               4.6.0-0.nightly-2020-10-01-024558   True        False         False      77m
openshift-samples                          4.6.0-0.nightly-2020-10-01-024558   True        False         False      58m
operator-lifecycle-manager                 4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
operator-lifecycle-manager-catalog         4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
operator-lifecycle-manager-packageserver   4.6.0-0.nightly-2020-10-01-024558   True        False         False      58m
service-ca                                 4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m
storage                                    4.6.0-0.nightly-2020-10-01-024558   True        False         False      79m


$ oc get pods -A|grep -vE "Run|Comp"
NAMESPACE                                          NAME                                                     READY   STATUS      RESTARTS   AGE
openshift-ingress                                  router-default-fb68bb68f-2j2sz                           0/1     Pending     0          79m
openshift-ingress                                  router-default-fb68bb68f-f6hjj                           0/1     Pending     0          79m
openshift-kube-storage-version-migrator            migrator-5d4969c44c-kt8bl                                0/1     Pending     0          81m
openshift-monitoring                               kube-state-metrics-685bc9c746-sbp9j                      0/3     Pending     0          79m
openshift-monitoring                               openshift-state-metrics-5fdfdcd554-c4pvd                 0/3     Pending     0          79m
openshift-monitoring                               prometheus-adapter-74fd9b685c-cwhrm                      0/1     Pending     0          71m
openshift-monitoring                               prometheus-adapter-74fd9b685c-r7pqg                      0/1     Pending     0          71m


What did you expect to happen?
Deploy success

How to reproduce it (as minimally and precisely as possible)?
Run deploy for OCP 4.6

Comment 2 Derek Higgins 2020-10-01 08:46:15 UTC
The problem is the communication between ironic-api and ironic-conductor, we upgraded ironic yesterday (for another issue) and now this has come up, I'll attach a PR 

2020-10-01T07:56:30.107047928Z 2020-10-01 07:56:30.106 39 ERROR ironic.api.expose [req-a3747a4d-9aaf-4356-8ec7-621c1170d842 ironic-user - - - -] Server-side error: "No valid authentication is available". Detail: 
2020-10-01T07:56:30.107047928Z Traceback (most recent call last):
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/ironic/api/expose.py", line 78, in callfunction
2020-10-01T07:56:30.107047928Z     result = f(self, *args, **kwargs)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/ironic/api/controllers/v1/node.py", line 2304, in post
2020-10-01T07:56:30.107047928Z     new_node, topic)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/ironic/conductor/rpcapi.py", line 232, in create_node
2020-10-01T07:56:30.107047928Z     return cctxt.call(context, 'create_node', node_obj=node_obj)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/ironic/common/json_rpc/client.py", line 123, in call
2020-10-01T07:56:30.107047928Z     **kwargs)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/ironic/common/json_rpc/client.py", line 174, in _request
2020-10-01T07:56:30.107047928Z     result = _get_session().post(url, json=body)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 401, in post
2020-10-01T07:56:30.107047928Z     return self.request(url, 'POST', **kwargs)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/keystoneauth1/adapter.py", line 257, in request
2020-10-01T07:56:30.107047928Z     return self.session.request(url, method, **kwargs)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z   File "/usr/lib/python3.6/site-packages/keystoneauth1/session.py", line 784, in request
2020-10-01T07:56:30.107047928Z     raise exceptions.AuthorizationFailure(msg)
2020-10-01T07:56:30.107047928Z
2020-10-01T07:56:30.107047928Z keystoneauth1.exceptions.auth.AuthorizationFailure: No valid authentication is available
2020-10-01T07:56:30.107047928Z : keystoneauth1.exceptions.auth.AuthorizationFailure: No valid authentication is available^[[00m
2020-10-01T07:56:30.107797581Z 2020-10-01 07:56:30.107 39 INFO eventlet.wsgi.server [req-a3747a4d-9aaf-4356-8ec7-621c1170d842 ironic-user - - - -] ::ffff:172.22.0.3 "POST /v1/nodes HTTP/1.1" status: 500  len: 449 time: 0.0172441^[[00m

Comment 5 Derek Higgins 2020-10-02 08:49:02 UTC
I've tested the latest nightly, the ironic image has the fix and ironic is now working 

[derekh@r640-u07 dev-scripts]$ oc get bmh 
NAME              STATUS   PROVISIONING STATUS      CONSUMER                      BMC                                                                                                    HARDWARE PROFILE   ONLINE   ERROR
ostest-master-0   OK       externally provisioned   ostest-24sck-master-0         redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/4a2ef36e-3297-45d5-bfa5-ef68204087de                      true     
ostest-master-1   OK       externally provisioned   ostest-24sck-master-1         redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/35b45909-f0cb-47a2-ad45-54e571a6e886                      true     
ostest-master-2   OK       externally provisioned   ostest-24sck-master-2         redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/d47c8f72-73b3-407c-80bb-d730aef79dd5                      true     
ostest-worker-0   OK       provisioned              ostest-24sck-worker-0-dddvf   redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/35c07b36-d11e-4fdb-b313-4c7b9de9c4a5   unknown            true     
ostest-worker-1   OK       provisioned              ostest-24sck-worker-0-jznkf   redfish+http://[fd2e:6f44:5dd8:c956::1]:8000/redfish/v1/Systems/b1fd246c-81af-45e8-989a-c9ac9e11909f   unknown            true

Comment 8 errata-xmlrpc 2020-10-27 16:47:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.