Version: $ ./openshift-baremetal-install version ./openshift-baremetal-install 4.10.0-0.nightly-2021-12-20-231053 built from commit 9f37ece3620d14e48507f2afc5cf6a667ca2cef0 release image registry.ci.openshift.org/ocp/release@sha256:26f811bd37c593564093aa8e323cf81637c49a9527dbe6772c885fa9f55ab684 release architecture amd64 Platform: IPI What happened? Deploy on real BM failed twice in the row time="2021-12-21T08:18:15+02:00" level=error msg="Error: could not inspect: could not inspect node, node is currently 'inspect failed' , last error was 'timeout reached while inspecting the node'" time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error msg=" on ../../tmp/openshift-install-masters-2606240572/main.tf line 13, in resource \"ironic_node_v1\" \"openshift-master-host\":" time="2021-12-21T08:18:15+02:00" level=error msg=" 13: resource \"ironic_node_v1\" \"openshift-master-host\" {" time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error msg="Error: could not inspect: could not inspect node, node is currently 'inspect failed' , last error was 'timeout reached while inspecting the node'" time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error msg=" on ../../tmp/openshift-install-masters-2606240572/main.tf line 13, in resource \"ironic_node_v1\" \"openshift-master-host\":" time="2021-12-21T08:18:15+02:00" level=error msg=" 13: resource \"ironic_node_v1\" \"openshift-master-host\" {" time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error msg="Error: could not inspect: could not inspect node, node is currently 'inspect failed' , last error was 'timeout reached while inspecting the node'" time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error msg=" on ../../tmp/openshift-install-masters-2606240572/main.tf line 13, in resource \"ironic_node_v1\" \"openshift-master-host\":" time="2021-12-21T08:18:15+02:00" level=error msg=" 13: resource \"ironic_node_v1\" \"openshift-master-host\" {" time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=error time="2021-12-21T08:18:15+02:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply Terraform: failed to complete the change" time="2021-12-21T08:18:16+02:00" level=debug msg="OpenShift Installer 4.10.0-0.nightly-2021-12-20-231053" time="2021-12-21T08:18:16+02:00" level=debug msg="Built from commit 9f37ece3620d14e48507f2afc5cf6a667ca2cef0" time="2021-12-21T08:18:16+02:00" level=info msg="Waiting up to 20m0s (until 8:38AM) for the Kubernetes API at https://api.ocp-edge.lab.eng.tlv2.redhat.com:6443..." time="2021-12-21T08:18:16+02:00" level=info msg="API v1.22.1+6859754 up" time="2021-12-21T08:18:16+02:00" level=info msg="Waiting up to 30m0s (until 8:48AM) for bootstrapping to complete..." time="2021-12-21T08:48:16+02:00" level=info msg="Use the following commands to gather logs from the cluster" time="2021-12-21T08:48:16+02:00" level=info msg="openshift-install gather bootstrap --help" time="2021-12-21T08:48:16+02:00" level=error msg="Bootstrap failed to complete: timed out waiting for the condition" time="2021-12-21T08:48:16+02:00" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane." The status of the cluster: [kni@ocp-edge06 ~]$ oc get nodes No resources found [kni@ocp-edge06 ~]$ oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE openshift-machine-api openshift-master-0 ocp-edge-6xs6f-master-0 true 120m openshift-machine-api openshift-master-1 ocp-edge-6xs6f-master-1 true 120m openshift-machine-api openshift-master-2 ocp-edge-6xs6f-master-2 true 120m openshift-machine-api openshift-worker-0 true 120m openshift-machine-api openshift-worker-1 true 120m openshift-machine-api openshift-worker-2 true 120m [kni@ocp-edge06 ~]$ oc get machineset -A NAMESPACE NAME DESIRED CURRENT READY AVAILABLE AGE openshift-machine-api ocp-edge-6xs6f-worker-0 3 0 121m [kni@ocp-edge06 ~]$ oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api ocp-edge-6xs6f-master-0 121m openshift-machine-api ocp-edge-6xs6f-master-1 121m openshift-machine-api ocp-edge-6xs6f-master-2 121m [kni@ocp-edge06 ~]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication baremetal cloud-controller-manager cloud-credential True False False 122m cluster-autoscaler config-operator console csi-snapshot-controller dns etcd image-registry ingress insights kube-apiserver kube-controller-manager kube-scheduler kube-storage-version-migrator machine-api machine-approver machine-config marketplace monitoring network node-tuning openshift-apiserver openshift-controller-manager openshift-samples operator-lifecycle-manager operator-lifecycle-manager-catalog operator-lifecycle-manager-packageserver service-ca storage must-gathers https://s3.upshift.redhat.com/DH-PROD-OCP-EDGE-QE-CI/Infra/must-gather/1176/index.html https://s3.upshift.redhat.com/DH-PROD-OCP-EDGE-QE-CI/Infra/must-gather/1185/index.html What did you expect to happen? deploy should pass How to reproduce it (as minimally and precisely as possible)? usual deployment Anything else we need to know? #Enter text here.
Could you log into the machine's virtual console to see what is happening there?
Okay, got it. So we're seeing a failure to connect to the rootfs location. Now I wonder if it's a DHCP issue or an issue with routing. The address it is trying to use, do you expect it to be reachable from the machine?
@vvoronko, can you help with DHCP question, please
reproduced for 4.10.0-0.nightly-2021-12-21-130047 bm ipv4 provisioning ipv6 https://s3.upshift.redhat.com/DH-PROD-OCP-EDGE-QE-CI/Infra/must-gather/1191/index.html
*** Bug 2035219 has been marked as a duplicate of this bug. ***
*** Bug 2037419 has been marked as a duplicate of this bug. ***
as far as I can see ip=dhcp is being included in the kernel params, this is what we had in dualstack pre METAL-1, but the provisioning network is ipv6 so RHCOS doesn't boot and cant inspect. I think pre metal-1 this was fine because ip=dhcp wasn't being set on the the IPA image, now it is (as its now using RHCOS) so it is attempting to boot with ip=dhcp and then can't download the rootfs at http://[fd00:1101::2]:80/images/ironic-python-agent.rootfs
This is breaking dual-stack, even in CI, so it is definitely a release blocker.
The cluster is now provisioning in CI but some of the tests are failing, new bz opened https://bugzilla.redhat.com/show_bug.cgi?id=2040671
verified on 4.10.0-0.nightly-2022-01-15-092722
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056