Bug 1851390 - kcm pod crashloops because port is already in use
Summary: kcm pod crashloops because port is already in use
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.z
Assignee: Tomáš Nožička
QA Contact: zhou ying
URL:
Whiteboard:
Depends On: 1851389
Blocks: 1851397 1851404
TreeView+ depends on / blocked
 
Reported: 2020-06-26 12:12 UTC by Tomáš Nožička
Modified: 2020-07-22 12:21 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1851389
: 1851397 1851404 (view as bug list)
Environment:
Last Closed: 2020-07-22 12:20:41 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 422 None closed [release-4.5] Bug 1851390: Fix port check 2020-09-11 07:08:55 UTC
Github openshift images pull 20 None closed [release-4.5] Bug 1851390: Add iproute package to get `ss` tool for port wait 2020-09-11 07:08:56 UTC
Red Hat Product Errata RHBA-2020:2956 None None None 2020-07-22 12:21:08 UTC

Description Tomáš Nožička 2020-06-26 12:12:04 UTC
+++ This bug was initially created as a clone of Bug #1851389 +++

kcm pod crashloops because port is already in use. I saw a case with cluster-policy-manager container but it's not limited to it.

Crashlooping triggers alerts and adds backoff for the pod so it start slower.

Container can be restarted while the pods stays. For that reason, we need to check the port availability in the same process as we listen, not in an init container which isn't re-run.

Comment 5 Ke Wang 2020-07-21 07:02:21 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-20-152128   True        False         108m    Error while reconciling 4.5.0-0.nightly-2020-07-20-152128: the cluster operator kube-controller-manager is degraded

$ oc exec -n openshift-vsphere-infra haproxy-scheng-45-tqgsd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-scheng-45-tqgsd-master-0 -n openshift-vsphere-infra' to see all of the containers in this pod.
  bind :::9443 v4v6


The haproxy occupied the port 9443,  kcm always restarted

$ oc get pods -A |awk '$5 >10'
NAMESPACE                                          NAME                                                      READY   STATUS              RESTARTS   AGE
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-0          4/4     Running             18         121m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-1          3/4     CrashLoopBackOff    21         120m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-2          3/4     CrashLoopBackOff    18         120m

Comment 6 Ke Wang 2020-07-21 07:50:32 UTC
Above problem was tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1858498

Comment 8 errata-xmlrpc 2020-07-22 12:20:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2956


Note You need to log in before you can comment on or make changes to this bug.