Description of problem: When using iRMC server for OCP baremetal ipi deployment, the deployment failed because inspection to worker failed. ---------------------------------------- $ oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR openshift-machine-api master-0 externally provisioned openshift-bwd8n-master-0 true openshift-machine-api master-1 externally provisioned openshift-bwd8n-master-1 true openshift-machine-api master-2 externally provisioned openshift-bwd8n-master-2 true openshift-machine-api worker-0 inspecting true inspection error ---------------------------------------- The following is the log information of ironic-inspector, this may be the reason for the failure of inspect ---------------------------------------- 2021-06-04 04:01:05.449 1 INFO eventlet.wsgi.server [req-8cf4f186-e733-4af1-ae76-e1aa617d36f6 - - - - -] ::1 "GET /v1 HTTP/1.1" status: 200 len: 507 time: 0.0038972^[[00m 2021-06-04 04:01:05.539 1 INFO eventlet.wsgi.server [req-7b1a73dc-c372-4457-a5f0-23c28c46f163 - - - - -] ::1 "GET /v1/introspection/23931413-6909-4808-9324-f544653a8580 HTTP/1.1" status: 200 len: 488 time: 0.0061669^[[00m 2021-06-04 04:01:07.442 1 DEBUG eventlet.wsgi.server [-] (1) accepted ('::ffff:192.168.20.157', 43920, 0, 0) server /usr/lib/python3.6/site-packages/eventlet/wsgi.py:985^[[00m Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 461, in fire_timers timer() File "/usr/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in __call__ cb(*args, **kw) File "/usr/lib/python3.6/site-packages/eventlet/greenthread.py", line 221, in main result = function(*args, **kwargs) File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 818, in process_request proto.__init__(conn_state, self) File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 357, in __init__ self.handle() File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 390, in handle self.handle_one_request() File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 419, in handle_one_request self.raw_requestline = self._read_request_line() File "/usr/lib/python3.6/site-packages/eventlet/wsgi.py", line 402, in _read_request_line return self.rfile.readline(self.server.url_length_limit) File "/usr/lib64/python3.6/socket.py", line 586, in readinto return self._sock.recv_into(b) File "/usr/lib/python3.6/site-packages/eventlet/green/ssl.py", line 241, in recv_into return self._base_recv(nbytes, flags, into=True, buffer_=buffer) File "/usr/lib/python3.6/site-packages/eventlet/green/ssl.py", line 256, in _base_recv read = self.read(nbytes, buffer_) File "/usr/lib/python3.6/site-packages/eventlet/green/ssl.py", line 176, in read super(GreenSSLSocket, self).read, *args, **kwargs) File "/usr/lib/python3.6/site-packages/eventlet/green/ssl.py", line 150, in _call_trampolining return func(*a, **kw) File "/usr/lib64/python3.6/ssl.py", line 833, in read return self._sslobj.read(len, buffer) File "/usr/lib64/python3.6/ssl.py", line 590, in read v = self._sslobj.read(len, buffer) ssl.SSLError: [SSL: HTTP_REQUEST] http request (_ssl.c:2354) ---------------------------------------- The error occurs from 4.8.0-0.nightly-2021-03-19-184028, and it's caused by the following commit: https://github.com/openshift/cluster-baremetal-operator/commit/671e334d95ed2a17d0e8eef5c6d8357431512a45 This commit supports TLS for ironic and inspector. This seems to be the problem on OCP side, as follows. Whether Ironic uses tls is determined by the existence of the cert files. When ironic related containers start, they will first check if the certs exist and then generate the ironic.conf. If there is no cert file, the ironic.conf will be written to use no tls. If the certs exist, then tls will be enabled. When bootstrapping masters, there is no such certs and bootstrap VM also does not create them. The VM just starts ironic https://github.com/openshift/installer/blob/master/data/data/bootstrap/baremetal/files/usr/local/bin/startironic.sh.template. So during master deployment, ironic does not use tls. But during worker deployment, those cert files will be created by Cluster-Baremetal-Operator (CBO). CBO runs on master node after master deployment is completed. It is responsible for creating BMO, Ironic and the certs they need. CBO will create the certs as a k8s secret called `metal3-ironic-tls` and then create the metal3 deployment with mounting this secret to each BMO and ironic container using VolumeMount https://github.com/openshift/cluster-baremetal-operator/blob/master/provisioning/baremetal_pod.go. As a result, during worker deployment, the ironic on master will use tls. But the IPAs used for master and worker are the same one, they are set to send http request, so worker deployment will fail. In this case, https request is required for worker's IPA. Version-Release number of selected component (if applicable): openshift-baremetal-install 4.8.0-0.nightly-2021-04-15-152737 built from commit d0462d8b5074448e1917da7f0a5d7a904bd60359 release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:70fe4f1a828dcbe493dce6d199eb5d9e76300d053c477f0f4b4577ef7b7d2934 How reproducible: Always. Steps to Reproduce: 1. openshift-install --dir ~/clusterconfigs create manifests 2. cp ~/ipi/99_router-replicas.yaml ~/clusterconfigs/openshift/ 3. openshift-install --dir ~/clusterconfigs --log-level debug create cluster Actual results: Inspecting Worker nodes failed, and baremetal ipi deployment failed. Expected results: Inspecting Worker nodes succeeds, and baremetal ipi deployment complete. Additional info: Upstream issue in the OCP community: https://github.com/openshift/cluster-baremetal-operator/issues/152
Setting blocker- as this is a continuation of https://bugzilla.redhat.com/show_bug.cgi?id=1965168 which was triaged as blocker-
Hi Jacob, The following may be the root cause, I think. https://github.com/openshift/cluster-baremetal-operator/blob/75e3ab4524c200f0c57befd03883afcca13bbd98/provisioning/baremetal_pod.go#L503 The certs are not mounted to the httpd container. I confirmed that this SSL error does not occur by the following modification. (Currently I'm testing again, including other fixes that haven't been incorporated into OCP yet.) ------------------------------------------ diff --git a/provisioning/baremetal_pod.go b/provisioning/baremetal_pod.go index 366af31..daed36e 100644 --- a/provisioning/baremetal_pod.go +++ b/provisioning/baremetal_pod.go @@ -503,6 +503,10 @@ func createContainerMetal3Httpd(images *Images, config *metal3iov1alpha1.Provisi VolumeMounts: []corev1.VolumeMount{ sharedVolumeMount, imageVolumeMount, + inspectorCredentialsMount, + rpcCredentialsMount, + ironicTlsMount, + inspectorTlsMount, }, Env: []corev1.EnvVar{ buildEnvVar(httpPort, config), ------------------------------------------ Regarding this modification, I'm not sure all the contents of these mounts are needed. So I would like RH to clear which one is needed, and check this modification is reasonable or not. BestRegards, Yasuhiro Futakawa
Good catch, I think ironicTlsMount and inspectorTlsMount should be added there (credentials are probably not needed).
Cannot verify due lack of iRMC setup. The problem not happening on HPE and Dell setups. Closing as OtherQA If the problem reproduced on iRMC, please, open/reopen
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438