2004102 – ocp 4.8 IPI install fails on RHV API endpoint connection refused

Bug 2004102 - ocp 4.8 IPI install fails on RHV API endpoint connection refused

Summary: ocp 4.8 IPI install fails on RHV API endpoint connection refused

Keywords:
Status:	CLOSED DUPLICATE of bug 1971709
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ben Nemec
QA Contact:	Victor Voronkov
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-09-14 13:49 UTC by Abhijeet Sadawarte
Modified:	2022-03-24 18:57 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 20:37:36 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Abhijeet Sadawarte 2021-09-14 13:49:05 UTC

I have a customer who is facing exactly similar behaviour described in https://bugzilla.redhat.com/show_bug.cgi?id=1973424 on the current latest OCP 4.8.10 version on RHEV using IPI.

- Here is how the customer's install-config.yaml networking looks like:

~~~
    networking:
      clusterNetwork:
      - cidr: 172.20.0.0/14
        hostPrefix: 23
      machineNetwork:
      - cidr: 192.168.130.192/26
      networkType: OpenShiftSDN
      serviceNetwork:
      - 172.24.0.0/16
~~~
- And the VIPs are:

~~~
        api_vip: 192.168.130.213
        ingress_vip: 192.168.130.214
~~~

- The VIPs are initially attached to bootstrap node and as soon as the master nodes come up, the VIPs moved to one of the masters which results in a connection refused over API port 6443. However, the apiserver runs fine on the bootstrap node and I could curl it using localhost. 

- In the bootstrap's keepalived container logs, we could see the APIs being removed:

~~~
Wed Sep  8 15:58:44 2021: Stopping
Wed Sep  8 15:58:44 2021: (API) sent 0 priority
Wed Sep  8 15:58:44 2021: (API) removing VIPs.
Wed Sep  8 15:58:45 2021: Stopped - used 0.053487 user time, 0.088436 system time
Wed Sep  8 15:58:45 2021: CPU usage (self/children) user: 0.003939/0.056966 system: 0.005913/0.090383
Wed Sep  8 15:58:45 2021: Stopped Keepalived v2.1.5 (07/13,2020)
~~~

Version:

$ openshift-install version
4.8.10

Platform:

#Please specify the platform type: libvirt

* IPI (automated install with `openshift-install`. If you don't know, then it's IPI)

Anything else we need to know?

As requested in https://bugzilla.redhat.com/show_bug.cgi?id=1973424#c16 comment#16, I am opening this BZ. 

It appears that one for master nodes (master-0) didn't ignite correctly, 

so the etcd-operator didn't reach 3 healthy nodes:

from the etcd-operator log:
2021-09-09T16:28:39.784734070+00:00 stderr F E0909 16:28:39.784702       1 envvarcontroller.go:205] key failed with : can't update etcd pod configurations because scaling is currently unsafe: 3 nodes are required, but only 2 are available

Comment 10 Gal Zaidman 2021-10-17 06:47:43 UTC

Ben can you please take a look at this BZ? it seems right in your field of expertise

Note You need to log in before you can comment on or make changes to this bug.