Bug 1858498 - Haproxy 9443 port conflicts with KCM causing KCM in crashloopbackoff state (vSphere, RHV)
Summary: Haproxy 9443 port conflicts with KCM causing KCM in crashloopbackoff state...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.5.z
Assignee: Gal Zaidman
QA Contact: Guilherme Santos
URL:
Whiteboard:
: 1860190 1861275 1865944 (view as bug list)
Depends On: 1853889
Blocks: 1862898
TreeView+ depends on / blocked
 
Reported: 2020-07-18 12:32 UTC by OpenShift BugZilla Robot
Modified: 2023-12-15 18:30 UTC (History)
28 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-17 20:05:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 73 0 None closed [release-4.5] BUG 1858498: Move haproxy port to 9445 due to conflict with KCM 2021-02-06 12:04:13 UTC
Red Hat Knowledge Base (Solution) 5266321 0 None None None 2020-07-29 10:24:46 UTC
Red Hat Product Errata RHBA-2020:3330 0 None None None 2020-08-17 20:06:19 UTC

Comment 1 RamaKasturi 2020-07-21 13:32:19 UTC
We have hit this bug twice on vsphere where haproxy uses 9443 port conflict with KCM due to which KCM is in crashloopbackoff state.

oc get pods -A |awk '$5 >10'
NAMESPACE                                          NAME                                                      READY   STATUS              RESTARTS   AGE
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-0          4/4     Running             18         121m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-1          3/4     CrashLoopBackOff    21         120m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-2          3/4     CrashLoopBackOff    18         120m

$ oc exec -n openshift-vsphere-infra haproxy-scheng-45-tqgsd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-scheng-45-tqgsd-master-0 -n openshift-vsphere-infra' to see all of the containers in this pod.
  bind :::9443 v4v6

Payload where this bug was hit is 4.5.0-0.nightly-2020-07-20-152128 and the profile name is "ipi-on-vsphere/versioned-installer-6_7-disconnected-vsphere_slave-ci"

Comment 2 Mike Fiedler 2020-07-21 18:22:24 UTC
I reproduced this 2/2 time in a vsphere disconnected cluster on 4.5.3 stable

Comment 5 jima 2020-07-23 01:41:52 UTC
verify the bug on ocp ipi on vsphere with nightly build: 4.5.0-0.nightly-2020-07-21-232150 and passed.

The port which haporxy pod used has been changed to 9445, and KCM pod is not in CrashLoopBackOff state.
$ oc get pod -A | grep kube-controller | grep -v Completed
openshift-kube-controller-manager-operator         kube-controller-manager-operator-5c9d8bd7d4-cms6b            1/1     Running     1          111m
openshift-kube-controller-manager                  kube-controller-manager-jima-072203-6mfqb-master-0           4/4     Running     5          96m
openshift-kube-controller-manager                  kube-controller-manager-jima-072203-6mfqb-master-1           4/4     Running     0          97m
openshift-kube-controller-manager                  kube-controller-manager-jima-072203-6mfqb-master-2           4/4     Running     6          97m

Comment 6 Tomáš Nožička 2020-07-27 15:29:02 UTC
*** Bug 1860190 has been marked as a duplicate of this bug. ***

Comment 9 Tomáš Nožička 2020-07-28 14:06:06 UTC
*** Bug 1861275 has been marked as a duplicate of this bug. ***

Comment 10 Tomáš Nožička 2020-07-28 15:02:53 UTC
please also verify that if you have previously a broken cluster, upgrading to a payload having this fix actually works

Comment 11 Lars Kellogg-Stedman 2020-07-28 15:41:31 UTC
I upgraded from 4.4.13 -> 4.5.4, and that seems to have resulted in a stable environment.

Comment 12 RamaKasturi 2020-07-28 15:47:02 UTC
(In reply to Lars Kellogg-Stedman from comment #11)
> I upgraded from 4.4.13 -> 4.5.4, and that seems to have resulted in a stable
> environment.

I think upgrading from a 4.4.13 -> 4.5.4 will work, but upgrading from a broken cluster to the payload which has the fix needs to be checked.

Comment 20 Keith Fryklund 2020-07-30 19:36:46 UTC
Hey folks, 

I want to note that I hit this in four of my Openshift on Openstack 4.5.3 clusters.  I followed this article [1] to fix them.  


[1] https://access.redhat.com/solutions/5266321

Comment 25 Guilherme Santos 2020-08-07 14:53:35 UTC
Verified on:
openshift-4.5.4 upgrading from 4.5.3

Steps:
1. had a broken 4.5.3 cluster deployed:
# oc -n openshift-kube-controller-manager get pods | grep kube-controller
kube-controller-manager-secondary-42spd-master-0   4/4     Running            15         56m
kube-controller-manager-secondary-42spd-master-1   3/4     CrashLoopBackOff   12         55m
kube-controller-manager-secondary-42spd-master-2   3/4     CrashLoopBackOff   16         54m
# oc -n openshift-ovirt-infra exec haproxy-secondary-42spd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-secondary-42spd-master-0 -n openshift-ovirt-infra' to see all of the containers in this pod.
  bind :::9443 v4v6
  bind :::50936 v4v6
  bind 127.0.0.1:50000

2. upgraded the cluster
# oc adm upgrade --to=4.5.4 --force=true

Results:
broken cluster fixed on upgrade and running as expected
# oc -n openshift-kube-controller-manager get pods | grep kube-controller
kube-controller-manager-secondary-42spd-master-0   4/4     Running     4          156m
kube-controller-manager-secondary-42spd-master-1   4/4     Running     9          161m
kube-controller-manager-secondary-42spd-master-2   4/4     Running     0          158m
# oc -n openshift-ovirt-infra exec haproxy-secondary-42spd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-secondary-42spd-master-0 -n openshift-ovirt-infra' to see all of the containers in this pod.
  bind :::9445 v4v6
  bind :::50936 v4v6
  bind 127.0.0.1:50000

Additional info:
the upgrade took a while and failed a few times, however, even with failing, it continues and in the end it managed to finish everything by itself

Comment 27 Mike Fedosin 2020-08-11 17:56:14 UTC
*** Bug 1865944 has been marked as a duplicate of this bug. ***

Comment 28 David Dreeggors 2020-08-12 14:29:13 UTC
As mentioned in the comments of the previously mentioned article [1], the workaround gets reverted by the haproxy-monitor container. So sadly that is not a valid workaround.

[1] https://access.redhat.com/solutions/5266321

Comment 31 errata-xmlrpc 2020-08-17 20:05:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.6 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3330

Comment 32 Mike Fiedler 2020-08-25 17:46:20 UTC
This bug is closed as fixed in 4.5.  If you need a fix for 4.4 please open a bug with 4.4.z as the target release.  You can clone this bug to do that - upper right hand corner.

Comment 33 RamaKasturi 2020-08-26 05:23:44 UTC
(In reply to Mike Fiedler from comment #32)
> This bug is closed as fixed in 4.5.  If you need a fix for 4.4 please open a
> bug with 4.4.z as the target release.  You can clone this bug to do that -
> upper right hand corner.

@mike, fix went in for  4.4.z as well, here is the bug https://bugzilla.redhat.com/show_bug.cgi?id=1862898


Note You need to log in before you can comment on or make changes to this bug.