NOTE: this is a public version of linked BZ1965168. This BZ is created to meet valid-bug automation requirements for a downstream PR which will include an already merged upstream PR. Description as per the original private BZ: Version: $ openshift-install version openshift-baremetal-install 4.8.0-0.nightly-2021-04-15-152737 built from commit d0462d8b5074448e1917da7f0a5d7a904bd60359 release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:70fe4f1a828dcbe493dce6d199eb5d9e76300d053c477f0f4b4577ef7b7d2934 Platform: baremetal Please specify: IPI What happened? We are using Fujitsu iRMC server to test OCP baremetal ipi deployment. The deployment failed because masters repeated reboot every few minutes. This happened during worker nodes deployment. The master nodes were successfully deployed, the bootstrap vm was deleted and related services were merged into the masters. Then when the installer started to deploy worker nodes, all the master nodes repeated reboot. This resulted in the inability to access the ironic service and the deployment finally failed. To be more specific, according to our observation, the master nodes reboot after ironic related pods are started. The time span between them is less than 1 minute. ``` E0526 14:41:26.719722 1584040 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.openshift.zz.local:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&resourceVersion=39250": dial tcp 192.168.30.201:6443: connect: no route to host E0526 14:42:21.183589 1584040 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.openshift.zz.local:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&resourceVersion=39250": dial tcp 192.168.30.201:6443: connect: no route to host E0526 14:43:15.455763 1584040 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.openshift.zz.local:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&resourceVersion=39250": dial tcp 192.168.30.201:6443: connect: no route to host E0526 14:44:08.447631 1584040 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.openshift.zz.local:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&resourceVersion=39250": dial tcp 192.168.30.201:6443: connect: no route to host E0526 14:44:49.983687 1584040 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.openshift.zz.local:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&resourceVersion=39250": dial tcp 192.168.30.201:6443: connect: no route to host E0526 14:45:32.095693 1584040 reflector.go:138] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: failed to list *v1.ClusterVersion: Get "https://api.openshift.zz.local:6443/apis/config.openshift.io/v1/clusterversions?fieldSelector=metadata.name%3Dversion&resourceVersion=39250": dial tcp 192.168.30.201:6443: connect: no route to host ERROR Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.openshift.zz.local:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 192.168.30.201:6443: connect: no route to host ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation FATAL failed to initialize the cluster: Working towards 4.8.0-0.nightly-2021-04-15-152737: 655 of 677 done (96% complete) ``` What did you expect to happen? Master nodes will not repeat reboot and baremetal ipi deployment will successfully complete. How to reproduce it (as minimally and precisely as possible)? $ openshift-install --dir ~/clusterconfigs create manifests $ cp ~/ipi/99_router-replicas.yaml ~/clusterconfigs/openshift/ $ openshift-install --dir ~/clusterconfigs --log-level debug create cluster Anything else we need to know? * We manually merged related PRs during testing, circumventing known [issue](https://github.com/openshift/installer/issues/4857). * Because of [IPMI credentials](https://github.com/metal3-io/baremetal-operator/issues/879) related patch is not merged into OCP, we cannot use the latest night version for testing. We hope that [PR880](https://github.com/metal3-io/baremetal-operator/pull/880) can be merged into openshift as soon as possible so that the latest version can be used for testing.
*** Bug 1965168 has been marked as a duplicate of this bug. ***
We don't have Fujitsu iRMC setup, so closing as OtherQA The problem is not reproduced on HP or Dell setups If the problems is seen again on iRMC, please, reopen
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438