Bug 1851390

Summary: kcm pod crashloops because port is already in use
Product: OpenShift Container Platform Reporter: Tomáš Nožička <tnozicka>
Component: kube-controller-managerAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: aos-bugs, kewang, mfojtik, yinzhou
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1851389
: 1851397 1851404 (view as bug list) Environment:
Last Closed: 2020-07-22 12:20:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1851389    
Bug Blocks: 1851397, 1851404    

Description Tomáš Nožička 2020-06-26 12:12:04 UTC
+++ This bug was initially created as a clone of Bug #1851389 +++

kcm pod crashloops because port is already in use. I saw a case with cluster-policy-manager container but it's not limited to it.

Crashlooping triggers alerts and adds backoff for the pod so it start slower.

Container can be restarted while the pods stays. For that reason, we need to check the port availability in the same process as we listen, not in an init container which isn't re-run.

Comment 5 Ke Wang 2020-07-21 07:02:21 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-07-20-152128   True        False         108m    Error while reconciling 4.5.0-0.nightly-2020-07-20-152128: the cluster operator kube-controller-manager is degraded

$ oc exec -n openshift-vsphere-infra haproxy-scheng-45-tqgsd-master-0 -- cat /etc/haproxy/haproxy.cfg | grep bind
Defaulting container name to haproxy.
Use 'oc describe pod/haproxy-scheng-45-tqgsd-master-0 -n openshift-vsphere-infra' to see all of the containers in this pod.
  bind :::9443 v4v6


The haproxy occupied the port 9443,  kcm always restarted

$ oc get pods -A |awk '$5 >10'
NAMESPACE                                          NAME                                                      READY   STATUS              RESTARTS   AGE
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-0          4/4     Running             18         121m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-1          3/4     CrashLoopBackOff    21         120m
openshift-kube-controller-manager                  kube-controller-manager-scheng-45-tqgsd-master-2          3/4     CrashLoopBackOff    18         120m

Comment 6 Ke Wang 2020-07-21 07:50:32 UTC
Above problem was tracked in https://bugzilla.redhat.com/show_bug.cgi?id=1858498

Comment 8 errata-xmlrpc 2020-07-22 12:20:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2956