I have a customer who is facing exactly similar behaviour described in https://bugzilla.redhat.com/show_bug.cgi?id=1973424 on the current latest OCP 4.8.10 version on RHEV using IPI. - Here is how the customer's install-config.yaml networking looks like: ~~~ networking: clusterNetwork: - cidr: 172.20.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 192.168.130.192/26 networkType: OpenShiftSDN serviceNetwork: - 172.24.0.0/16 ~~~ - And the VIPs are: ~~~ api_vip: 192.168.130.213 ingress_vip: 192.168.130.214 ~~~ - The VIPs are initially attached to bootstrap node and as soon as the master nodes come up, the VIPs moved to one of the masters which results in a connection refused over API port 6443. However, the apiserver runs fine on the bootstrap node and I could curl it using localhost. - In the bootstrap's keepalived container logs, we could see the APIs being removed: ~~~ Wed Sep 8 15:58:44 2021: Stopping Wed Sep 8 15:58:44 2021: (API) sent 0 priority Wed Sep 8 15:58:44 2021: (API) removing VIPs. Wed Sep 8 15:58:45 2021: Stopped - used 0.053487 user time, 0.088436 system time Wed Sep 8 15:58:45 2021: CPU usage (self/children) user: 0.003939/0.056966 system: 0.005913/0.090383 Wed Sep 8 15:58:45 2021: Stopped Keepalived v2.1.5 (07/13,2020) ~~~ Version: $ openshift-install version 4.8.10 Platform: #Please specify the platform type: libvirt * IPI (automated install with `openshift-install`. If you don't know, then it's IPI) Anything else we need to know? As requested in https://bugzilla.redhat.com/show_bug.cgi?id=1973424#c16 comment#16, I am opening this BZ. It appears that one for master nodes (master-0) didn't ignite correctly, so the etcd-operator didn't reach 3 healthy nodes: from the etcd-operator log: 2021-09-09T16:28:39.784734070+00:00 stderr F E0909 16:28:39.784702 1 envvarcontroller.go:205] key failed with : can't update etcd pod configurations because scaling is currently unsafe: 3 nodes are required, but only 2 are available
Ben can you please take a look at this BZ? it seems right in your field of expertise