1858498 – Haproxy 9443 port conflicts with KCM causing KCM in crashloopbackoff state (vSphere, RHV)

Bug 1858498 - Haproxy 9443 port conflicts with KCM causing KCM in crashloopbackoff state (vSphere, RHV)

Summary: Haproxy 9443 port conflicts with KCM causing KCM in crashloopbackoff state...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.5.z
Assignee:	Gal Zaidman
QA Contact:	Guilherme Santos
Docs Contact:
URL:
Whiteboard:
Duplicates (3):	1860190 1861275 1865944 (view as bug list)
Depends On:	1853889
Blocks:	1862898
TreeView+	depends on / blocked

Reported:	2020-07-18 12:32 UTC by OpenShift BugZilla Robot
Modified:	2023-12-15 18:30 UTC (History)
CC List:	28 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-08-17 20:05:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift baremetal-runtimecfg pull 73	None	closed	[release-4.5] BUG 1858498: Move haproxy port to 9445 due to conflict with KCM	2021-02-06 12:04:13 UTC
Red Hat Knowledge Base (Solution)	5266321	None	None	None	2020-07-29 10:24:46 UTC
Red Hat Product Errata	RHBA-2020:3330	None	None	None	2020-08-17 20:06:19 UTC

Comment 1 RamaKasturi 2020-07-21 13:32:19 UTC

We have hit this bug twice on vsphere where haproxy uses 9443 port conflict with KCM due to which KCM is in crashloopbackoff state.

oc get pods -A |awk '$5 >10'
NAMESPACE                                          NAME                                                      READY   STATUS              RESTARTS   AGE
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-0          4/4     Running             18         121m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-1          3/4     CrashLoopBackOff    21         120m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-2          3/4     CrashLoopBackOff    18         120m

$ oc exec -n openshift-vsphere-infra haproxy-scheng-45-tqgsd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-scheng-45-tqgsd-master-0 -n openshift-vsphere-infra' to see all of the containers in this pod.
  bind :::9443 v4v6

Payload where this bug was hit is 4.5.0-0.nightly-2020-07-20-152128 and the profile name is "ipi-on-vsphere/versioned-installer-6_7-disconnected-vsphere_slave-ci"

Comment 2 Mike Fiedler 2020-07-21 18:22:24 UTC

I reproduced this 2/2 time in a vsphere disconnected cluster on 4.5.3 stable

Comment 5 jima 2020-07-23 01:41:52 UTC

verify the bug on ocp ipi on vsphere with nightly build: 4.5.0-0.nightly-2020-07-21-232150 and passed.

The port which haporxy pod used has been changed to 9445, and KCM pod is not in CrashLoopBackOff state.
$ oc get pod -A | grep kube-controller | grep -v Completed
openshift-kube-controller-manager-operator         kube-controller-manager-operator-5c9d8bd7d4-cms6b            1/1     Running     1          111m
openshift-kube-controller-manager                  kube-controller-manager-jima-072203-6mfqb-master-0           4/4     Running     5          96m
openshift-kube-controller-manager                  kube-controller-manager-jima-072203-6mfqb-master-1           4/4     Running     0          97m
openshift-kube-controller-manager                  kube-controller-manager-jima-072203-6mfqb-master-2           4/4     Running     6          97m

Comment 6 Tomáš Nožička 2020-07-27 15:29:02 UTC

*** Bug 1860190 has been marked as a duplicate of this bug. ***

Comment 9 Tomáš Nožička 2020-07-28 14:06:06 UTC

*** Bug 1861275 has been marked as a duplicate of this bug. ***

Comment 10 Tomáš Nožička 2020-07-28 15:02:53 UTC

please also verify that if you have previously a broken cluster, upgrading to a payload having this fix actually works

Comment 11 Lars Kellogg-Stedman 2020-07-28 15:41:31 UTC

I upgraded from 4.4.13 -> 4.5.4, and that seems to have resulted in a stable environment.

Comment 12 RamaKasturi 2020-07-28 15:47:02 UTC

(In reply to Lars Kellogg-Stedman from comment #11)
> I upgraded from 4.4.13 -> 4.5.4, and that seems to have resulted in a stable
> environment.

I think upgrading from a 4.4.13 -> 4.5.4 will work, but upgrading from a broken cluster to the payload which has the fix needs to be checked.

Comment 20 Keith Fryklund 2020-07-30 19:36:46 UTC

Hey folks, 

I want to note that I hit this in four of my Openshift on Openstack 4.5.3 clusters.  I followed this article [1] to fix them.  


[1] https://access.redhat.com/solutions/5266321

Comment 25 Guilherme Santos 2020-08-07 14:53:35 UTC

Verified on:
openshift-4.5.4 upgrading from 4.5.3

Steps:
1. had a broken 4.5.3 cluster deployed:
# oc -n openshift-kube-controller-manager get pods | grep kube-controller
kube-controller-manager-secondary-42spd-master-0   4/4     Running            15         56m
kube-controller-manager-secondary-42spd-master-1   3/4     CrashLoopBackOff   12         55m
kube-controller-manager-secondary-42spd-master-2   3/4     CrashLoopBackOff   16         54m
# oc -n openshift-ovirt-infra exec haproxy-secondary-42spd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-secondary-42spd-master-0 -n openshift-ovirt-infra' to see all of the containers in this pod.
  bind :::9443 v4v6
  bind :::50936 v4v6
  bind 127.0.0.1:50000

2. upgraded the cluster
# oc adm upgrade --to=4.5.4 --force=true

Results:
broken cluster fixed on upgrade and running as expected
# oc -n openshift-kube-controller-manager get pods | grep kube-controller
kube-controller-manager-secondary-42spd-master-0   4/4     Running     4          156m
kube-controller-manager-secondary-42spd-master-1   4/4     Running     9          161m
kube-controller-manager-secondary-42spd-master-2   4/4     Running     0          158m
# oc -n openshift-ovirt-infra exec haproxy-secondary-42spd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-secondary-42spd-master-0 -n openshift-ovirt-infra' to see all of the containers in this pod.
  bind :::9445 v4v6
  bind :::50936 v4v6
  bind 127.0.0.1:50000

Additional info:
the upgrade took a while and failed a few times, however, even with failing, it continues and in the end it managed to finish everything by itself

Comment 27 Mike Fedosin 2020-08-11 17:56:14 UTC

*** Bug 1865944 has been marked as a duplicate of this bug. ***

Comment 28 David Dreeggors 2020-08-12 14:29:13 UTC

As mentioned in the comments of the previously mentioned article [1], the workaround gets reverted by the haproxy-monitor container. So sadly that is not a valid workaround.

[1] https://access.redhat.com/solutions/5266321

Comment 31 errata-xmlrpc 2020-08-17 20:05:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3330

Comment 32 Mike Fiedler 2020-08-25 17:46:20 UTC

This bug is closed as fixed in 4.5.  If you need a fix for 4.4 please open a bug with 4.4.z as the target release.  You can clone this bug to do that - upper right hand corner.

Comment 33 RamaKasturi 2020-08-26 05:23:44 UTC

(In reply to Mike Fiedler from comment #32)
> This bug is closed as fixed in 4.5.  If you need a fix for 4.4 please open a
> bug with 4.4.z as the target release.  You can clone this bug to do that -
> upper right hand corner.

@mike, fix went in for  4.4.z as well, here is the bug https://bugzilla.redhat.com/show_bug.cgi?id=1862898

Note You need to log in before you can comment on or make changes to this bug.

adeshpan
aos-bugs
bperkins
dahernan
dbewley
ddreggor
fhirtz
hmarques
hpopal
igor.tiunov
jima
jmalde
jrosenta
jsafarik
kfryklun
knarra
lars
lleistne
lmartinh
maszulik
mfojtik
mifiedle
mrhodes
tnozicka
wjiang
wking
xtian
yprokule