Version: $ openshift-install version openshift-install 4.11.0-0.nightly-2022-04-12-000004 built from commit 496c1de207b3fd8ef838e832388df20781bd2efe release image registry.ci.openshift.org/ocp/release@sha256:4c4ae72860c578c0080e83dd2369ec85f0ffc91ab4a6cb6f1ba0e6d44111ec5d release architecture amd64 Platform: openstack Please specify: * IPI with Proxy+BYON with external CCM TP enabled. What happened? The installation fails due to bootstrap timing out: time="2022-04-12T05:47:38-04:00" level=info msg="API v1.23.3+d36f9d8 up" time="2022-04-12T05:47:38-04:00" level=info msg="Waiting up to 30m0s (until 6:17AM) for bootstrapping to complete..." time="2022-04-12T06:17:38-04:00" level=debug msg="Fetching Bootstrap SSH Key Pair..." [...] time="2022-04-12T06:17:39-04:00" level=error msg="Cluster operator cluster-autoscaler Degraded is True with MissingDependency: machine-api not ready" The installation is progressing despite the bootstrap stage timed out and the cluster is finally correctly set. This is observed persistently only with Kuryr and OVNKubernetes network type. With OpenshiftSDN we observed one successful installation. What did you expect to happen? Installation is successful. On OSP D/S CI it is successful 100% in all the releases < 4.11. How to reproduce it (as minimally and precisely as possible)? IPI installation from an installer-vm with two interfaces, one connected to the external network and one connected to a restricted network which is used as BYON in the install-config.yaml (attached). The installer-vm runs a proxy that enables external connectivity to the cluster while deploying. Anything else we need to know? must-gather and post-installation bundle linked on private comment.
Latest CI runs show better results: type number timestamp status Failing stage Auditor Category Rerun networkType osp_puddle ocp_puddle DFG-osasinfra-shiftstack_ci-ocp_install-4.11-kuryr-ipi_proxy 16 2022/06/04 1:50:20 SUCCESS N/A N/A N/A N/A kuryr RHOS-16.2-RHEL-8-20220311.n.1 4.11.0-0.nightly-2022-06-01-200905 DFG-osasinfra-shiftstack_ci-ocp_install-4.11-ovnkubernetes-ipi_proxy 14 2022/06/03 16:41:44 SUCCESS N/A N/A N/A N/A ovnkubernetes RHOS-16.2-RHEL-8-20220311.n.1 4.11.0-0.nightly-2022-06-01-200905 DFG-osasinfra-shiftstack_ci-ocp_install-4.11-kuryr-ipi_proxy 14 2022/05/27 22:09:45 SUCCESS N/A N/A N/A N/A kuryr RHOS-16.1-RHEL-8-20220329.n.1 4.11.0-0.nightly-2022-05-25-193227 DFG-osasinfra-shiftstack_ci-ocp_install-4.11-ovnkubernetes-ipi_proxy 11 2022/05/20 6:04:05 SUCCESS N/A N/A N/A N/A ovnkubernetes RHOS-16.2-RHEL-8-20220311.n.1 4.11.0-0.nightly-2022-05-18-171831 DFG-osasinfra-shiftstack_ci-ocp_install-4.11-kuryr-ipi_proxy 12 2022/05/15 16:27:04 SUCCESS N/A N/A N/A N/A kuryr RHOS-16.1-RHEL-8-20220329.n.1 4.11.0-0.nightly-2022-05-11-054135 DFG-osasinfra-shiftstack_ci-ocp_install-4.11-ovnkubernetes-ipi_proxy 10 2022/05/13 11:06:15 SUCCESS N/A N/A N/A N/A ovnkubernetes RHOS-16.1-RHEL-8-20220329.n.1 4.11.0-0.nightly-2022-05-11-054135 DFG-osasinfra-shiftstack_ci-ocp_install-4.11-kuryr-ipi_proxy 10 2022/05/07 1:23:38 SUCCESS N/A N/A N/A N/A kuryr RHOS-16.2-RHEL-8-20220311.n.1 4.11.0-0.nightly-2022-05-06-180112 DFG-osasinfra-shiftstack_ci-ocp_install-4.11-ovnkubernetes-ipi_proxy 9 2022/05/06 13:32:11 SUCCESS N/A N/A N/A N/A ovnkubernetes RHOS-16.2-RHEL-8-20220311.n.1 4.11.0-0.nightly-2022-05-06-060226
Reopening as it is observed more frequently on latest nightlies of OCP4.11. In this case it is not setting the TP flag so it is using internal CCM, which is default in OCP4.11. Version: $ openshift-install version openshift-install 4.11.0-0.nightly-2022-07-19-043523 built from commit 37684309bcb598757c99d3ea9fbc0758343d64a5 release image registry.ci.openshift.org/ocp/release@sha256:17817866b4f106771ff3098769b866e541082de32d6669a6d20e65c9be5902b6 release architecture amd64 Platform: openstack NetworkType: OVNKubernetes and Kuryr Please specify: * IPI with Proxy+BYON with external CCM TP enabled. What happened? The installation is timing out on bootstrap stage, but the installation is progressing behind scenes and at the end the cluster is healthy, with the bootstrap server still there. [cloud-user@installer-host ~]$ tail -f ostest/.openshift_install.log INFO API v1.24.0+9546431 up DEBUG Loading Install Config... DEBUG Loading SSH Key... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Cluster Name... DEBUG Loading Base Domain... DEBUG Loading Platform... DEBUG Loading Networking... DEBUG Loading Platform... DEBUG Loading Pull Secret... DEBUG Loading Platform... DEBUG Using Install Config loaded from state file INFO Waiting up to 30m0s (until 10:42AM) for bootstrapping to complete... [... stuck here for 30 minutes] ERROR Bootstrap failed to complete: timed out waiting for the condition ERROR Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the co ntrol plane. On the bootstrap logs, is is seen below: [cloud-user@installer-host journals]$ tail -60 bootkube.log Jul 19 14:43:06 ostest-p2mk7-bootstrap bootkube.sh[12411]: Skipped "secret-localhost-serving-signer.yaml" secrets.v1./localhost-serving-signer -n openshift-kube-apiserver-operator as it already exists Jul 19 14:43:07 ostest-p2mk7-bootstrap cluster-bootstrap[12426]: Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists Jul 19 14:43:07 ostest-p2mk7-bootstrap bootkube.sh[12411]: Skipped "secret-service-network-serving-signer.yaml" secrets.v1./service-network-serving-signer -n openshift-kube-apiserver-operator as it already exists Jul 19 14:43:07 ostest-p2mk7-bootstrap cluster-bootstrap[12426]: Tearing down temporary bootstrap control plane... Jul 19 14:43:07 ostest-p2mk7-bootstrap cluster-bootstrap[12426]: Skipped "user-ca-bundle-config.yaml" configmaps.v1./user-ca-bundle -n openshift-config as it already exists Jul 19 14:43:07 ostest-p2mk7-bootstrap bootkube.sh[12411]: Tearing down temporary bootstrap control plane... Jul 19 14:43:07 ostest-p2mk7-bootstrap bootkube.sh[12411]: Sending bootstrap-finished event.Skipped "user-ca-bundle-config.yaml" configmaps.v1./user-ca-bundle -n openshift-config as it already exists Jul 19 14:43:07 ostest-p2mk7-bootstrap cluster-bootstrap[12426]: Sending bootstrap-finished event. Jul 19 14:43:07 ostest-p2mk7-bootstrap bootkube.sh[11105]: Restoring CVO overrides Jul 19 14:43:28 ostest-p2mk7-bootstrap bootkube.sh[17211]: clusterversion.config.openshift.io/version patched Jul 19 14:43:28 ostest-p2mk7-bootstrap bootkube.sh[11105]: Waiting for CEO to finish... Jul 19 14:43:28 ostest-p2mk7-bootstrap podman[17407]: Jul 19 14:43:28 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:43:28.938226 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:28 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:43:28.938226 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:36 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:43:36.053782 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:36 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:43:36.053782 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:38 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:43:38.374880 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:38 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:43:38.374880 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:40 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:43:40.955174 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:40 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:43:40.955174 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:47 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:43:47.844623 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:43:47 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:43:47.844623 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:42 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:45:42.809061 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:42 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:45:42.809061 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:44 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:45:44.195166 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:44 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:45:44.195166 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:53 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:45:53.508446 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:53 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:45:53.508446 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:53 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:45:53.798827 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:53 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:45:53.798827 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:53 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:45:53.988137 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:53 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:45:53.988137 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:57 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:45:57.016734 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:45:57 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:45:57.016734 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:47:12 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:47:12.185879 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:47:12 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:47:12.185879 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:48:00 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:48:00.395236 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:48:00 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:48:00.395236 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:48:01 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:48:01.645413 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:48:01 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:48:01.645413 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:48:07 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:48:07.653805 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:48:07 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:48:07.653805 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:51:02 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:51:02.257249 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:51:02 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:51:02.257249 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:51:09 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:51:09.073935 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:51:09 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:51:09.073935 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:52:53.515760 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:52:53.515760 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:52:53.548649 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:52:53.548649 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:52:53.662238 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:52:53.662238 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:52:53.818046 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:53 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:52:53.818046 1 waitforceo.go:67] waiting on condition EtcdRunningInCluster in etcd CR /cluster to be True. Jul 19 14:52:56 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:52:56.030248 1 waitforceo.go:64] Cluster etcd operator bootstrapped successfully Jul 19 14:52:56 ostest-p2mk7-bootstrap wait-for-ceo[17444]: I0719 14:52:56.030450 1 waitforceo.go:58] cluster-etcd-operator bootstrap etcd Jul 19 14:52:56 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:52:56.030248 1 waitforceo.go:64] Cluster etcd operator bootstrapped successfully Jul 19 14:52:56 ostest-p2mk7-bootstrap bootkube.sh[17407]: I0719 14:52:56.030450 1 waitforceo.go:58] cluster-etcd-operator bootstrap etcd Jul 19 14:52:56 ostest-p2mk7-bootstrap bootkube.sh[11105]: bootkube.service complete Jul 19 14:52:56 ostest-p2mk7-bootstrap systemd[1]: bootkube.service: Succeeded. For some reason, the condition "EtcdRunningInCluster in etcd CR /cluster to be True" is delaying the bootstrap leading to the failure on the installation script. How to reproduce it (as minimally and precisely as possible)? IPI installation from an installer-vm with two interfaces, one connected to the external network and one connected to a restricted network which is used as BYON in the install-config.yaml. The installer-vm runs a proxy that enables external connectivity to the cluster while deploying. The issue is observed with NetworkType Kuryr and OVNKubernetes, and it is not persistent: The ipi-proxy installation was successful 8/20 times in OVNKubernetes, 9/21 in Kuryr and 15/23 in OpenShiftSDN. On the other hand, in versions <4.11 this installation is working fine for all the networkTypes. Anything else we need to know? must-gather and post-installation bundle linked on private comment.
For sure we need to investigate this bug, but as there is a workaround (like using openshift-install wait-for install-complete after failure) to make installation successful despite of the issue, we considered it as a non-blocker.
(In reply to rdobosz from comment #9) > For sure we need to investigate this bug, but as there is a workaround (like > using openshift-install wait-for install-complete after failure) to make > installation successful despite of the issue, we considered it as a > non-blocker. Is it documented that when using Proxy there is a need to use wait-for install-complete?
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9221