Created attachment 1667591 [details] A screenshot showing CA error Description of problem: IPv6 Disconnected Installation After the cluster is up and running(masters are in Ready state)worker nodes never show up and will be in error/inspecting state ``` oc get nodes NAME STATUS ROLES AGE VERSION master-0.kni7.cloud.lab.eng.bos.redhat.com Ready master 24h v1.16.2 master-1.kni7.cloud.lab.eng.bos.redhat.com Ready master 24h v1.16.2 master-2.kni7.cloud.lab.eng.bos.redhat.com Ready master 24h v1.16.2 ``` Version: ``` oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 24h Unable to apply 4.3.0-0.nightly-2020-03-02-070732: some cluster operators have not yet rolled out ``` How reproducible: Bring up a BM IPI cluster using the above build, after the clister is up and running workers never show up and no signs of csr approval request either ``` oc get csr NAME AGE REQUESTOR CONDITION csr-2vqvz 84m system:node:master-1.kni7.cloud.lab.eng.bos.redhat.com Approved,Issued ``` At this point jump into to idrac and observe the worker nodes to see that its in continuos loop stating X509 Certificate signed by unknown authority Steps to Reproduce: 1. Bring up the cluster 2. Wait for the master nodes to come up 3. Observe the worker nodes using iDRAC to see they are in loop state (a screenshot is attached) Actual results: After masters are up , worker nodes should come up in Ready state instead they error on Certificate approval( screenshot attached) Expected results: Master and worker nodes both should be in Ready state Additional info: 1. oc get bmh -n openshift-machine-api oc get bmh -n openshift-machine-api NAME STATUS PROVISIONING STATUS CONSUMER BMC HARDWARE PROFILE ONLINE ERROR master-0 OK externally provisioned kni7-master-0 ipmi://[fd35:919d:4042:2:c7ed:9a9f:a9ec:100] true master-1 OK externally provisioned kni7-master-1 ipmi://[fd35:919d:4042:2:c7ed:9a9f:a9ec:101] true master-2 error registering kni7-master-2 ipmi://[fd35:919d:4042:2:c7ed:9a9f:a9ec:102] true Failed to get power state for node c1989dcd-3e04-41ca-93a1-9311050b6f38. Error: IPMI call failed: power status. worker-0 error inspecting kni7-worker-0-87smw ipmi://[fd35:919d:4042:2:c7ed:9a9f:a9ec:104] true Introspection timeout worker-1 error inspecting ipmi://[fd35:919d:4042:2:c7ed:9a9f:a9ec:105] true Introspection timeout worker-2 error inspecting kni7-worker-0-fqvst ipmi://[fd35:919d:4042:2:c7ed:9a9f:a9ec:106] true Introspection timeout 2. A strange scenarion where even after control plane is up , bootstrap VM still is in running condition ``` sudo virsh list Id Name State ---------------------------------------------------- 1 kni7-nfkp5-bootstrap running ``` 3. Inspection of logs clearly shows the control plane has been created ``` Mar 03 22:28:58 localhost bootkube.sh[8730]: Tearing down temporary bootstrap control plane... Mar 03 22:28:59 localhost podman[10575]: 2020-03-03 22:28:59.007795759 +0000 UTC m=+979.882708987 container died d798e3b444c911d55eb73aaa18063b50f8e7afbdda48dbe7fef8ad0b977736cd (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:72e8f7e1d55891a5643f9c50dd34d9352b87cb95788af1ae631b866b9ea5955e, name=youthful_clarke) Mar 03 22:28:59 localhost podman[10575]: 2020-03-03 22:28:59.573262095 +0000 UTC m=+980.448175325 container remove d798e3b444c911d55eb73aaa18063b50f8e7afbdda48dbe7fef8ad0b977736cd (image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:72e8f7e1d55891a5643f9c50dd34d9352b87cb95788af1ae631b866b9ea5955e, name=youthful_clarke) Mar 03 22:28:59 localhost bootkube.sh[8730]: bootkube.service complete ``` 4. All the pods are still up in bootstrap VM ``` [core@localhost ~]$ sudo podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 60bafcc2bdf7 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0232366f9b47f5f75c4bd502a60c41a5b0dc5bac8ca50e3676ebc6036d0f539b 25 hours ago Up 25 hours ago ironic-api 0efee46dac66 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad4b2438b1a8336c640d42ee7d5be21cff5242a00ec47326d9d12dbcb8250f54 25 hours ago Up 25 hours ago ironic-inspector 2c8a8cc0d791 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0232366f9b47f5f75c4bd502a60c41a5b0dc5bac8ca50e3676ebc6036d0f539b 25 hours ago Up 25 hours ago ironic-conductor a959cd89efea quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0232366f9b47f5f75c4bd502a60c41a5b0dc5bac8ca50e3676ebc6036d0f539b 25 hours ago Up 25 hours ago httpd b5c37b42b729 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0232366f9b47f5f75c4bd502a60c41a5b0dc5bac8ca50e3676ebc6036d0f539b 25 hours ago Up 25 hours ago dnsmasq 69588ce43687 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0232366f9b47f5f75c4bd502a60c41a5b0dc5bac8ca50e3676ebc6036d0f539b 25 hours ago Up 25 hours ago mariadb ~```
It looks like the control plane is up, could you get a must-gather and upload it somewhere? Also the workers all show inspect timeout, so screenshots of the consoles while this is happening would be helpful - is IPA reporting a problem taking back to Ironic? Were the workers perhaps already powered on from a previous install attempt? Not strictly related, but your master also seems to have a problem with it's BMC for us to fetch power state: Failed to get power state for node c1989dcd-3e04-41ca-93a1-9311050b6f38. Error: IPMI call failed: power status.
No response to needinfo for further details on the issue in +1 month. Closing due to insufficient data. Please reopen with further details if you are still experiencing this issue.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days