+++ This bug was initially created as a clone of Bug #1851397 +++ +++ This bug was initially created as a clone of Bug #1851390 +++ +++ This bug was initially created as a clone of Bug #1851389 +++ kcm pod crashloops because port is already in use. I saw a case with cluster-policy-manager container but it's not limited to it. Crashlooping triggers alerts and adds backoff for the pod so it start slower. Container can be restarted while the pods stays. For that reason, we need to check the port availability in the same process as we listen, not in an init container which isn't re-run.
This bug has been already verified via clusterbot using pre-merge flow, since there is an issue with automatically moving the bug to verified state, manually moving it. [ramakasturinarra@dhcp35-60 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2020-09-10-171754 True False 12m Cluster version is 4.3.0-0.nightly-2020-09-10-171754 No Init Containers are present also kube-controller-manger checks for port 10257 [ramakasturinarra@dhcp35-60 ~]$ oc describe pod kube-controller-manager-ip-10-0-128-187.us-east-2.compute.internal -n openshift-kube-controller-manager Name: kube-controller-manager-ip-10-0-128-187.us-east-2.compute.internal Namespace: openshift-kube-controller-manager Priority: 2000001000 Priority Class Name: system-node-critical Node: ip-10-0-128-187.us-east-2.compute.internal/10.0.128.187 Start Time: Fri, 11 Sep 2020 12:13:54 +0530 Labels: app=kube-controller-manager kube-controller-manager=true revision=5 Annotations: kubernetes.io/config.hash: 53ec52e5521ee7e3f1a57b4ee8e4e4b3 kubernetes.io/config.mirror: 53ec52e5521ee7e3f1a57b4ee8e4e4b3 kubernetes.io/config.seen: 2020-09-11T06:52:27.649769277Z kubernetes.io/config.source: file Status: Running IP: 10.0.128.187 IPs: IP: 10.0.128.187 Containers: kube-controller-manager-5: Container ID: cri-o://7a7e1bc5f0cdda9d76080564dd371453b98dc20b684401366fe871ef8cbc5309 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f587ae067cd421fcaef2e77e87342c7fbf11d97053c2fa63e7e12695792772c9 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f587ae067cd421fcaef2e77e87342c7fbf11d97053c2fa63e7e12695792772c9 Port: 10257/TCP Host Port: 10257/TCP Command: /bin/bash -euxo pipefail -c Args: timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 10257 \))" ]; do sleep 1; done' if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then echo "Copying system trust bundle" cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem fi exec hyperkube kube-controller-manager --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml \ --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \ --authentication-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \ --authorization-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \ --client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt \ --requestheader-client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/aggregator-client-ca/ca-bundle.crt -v=2 --tls-cert-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt --tls-private-key-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.key State: Running Started: Fri, 11 Sep 2020 12:22:28 +0530 Ready: True Restart Count: 0 Requests: cpu: 80m memory: 200Mi Liveness: http-get https://:10257/healthz delay=45s timeout=10s period=10s #success=1 #failure=3 Readiness: http-get https://:10257/healthz delay=10s timeout=10s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/kubernetes/static-pod-certs from cert-dir (rw) /etc/kubernetes/static-pod-resources from resource-dir (rw) cluster-policy-controller-5: Container ID: cri-o://2f2660944616ae23a06033f581b00b80c0f6c358f752d6a2f46b977c155a1e9c Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2711360fb07612f1c64311d4e341229798ff058656c5e5305db30a0733cbca75 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2711360fb07612f1c64311d4e341229798ff058656c5e5305db30a0733cbca75 Port: 10357/TCP Host Port: 10357/TCP Command: /bin/bash -euxo pipefail -c Args: timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 10357 \))" ]; do sleep 1; done' exec cluster-policy-controller start --config=/etc/kubernetes/static-pod-resources/configmaps/cluster-policy-controller-config/config.yaml State: Running Started: Fri, 11 Sep 2020 12:22:28 +0530 Ready: True Restart Count: 0 Requests: cpu: 10m memory: 200Mi Liveness: http-get https://:10357/healthz delay=45s timeout=10s period=10s #success=1 #failure=3 Readiness: http-get https://:10357/healthz delay=10s timeout=10s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/kubernetes/static-pod-certs from cert-dir (rw) /etc/kubernetes/static-pod-resources from resource-dir (rw) kube-controller-manager-cert-syncer-5: Container ID: cri-o://b3c02d52dab6d957497fe387a89d8eba7f8e082454cbbf8b78efa7180c9c36bd Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:48fa039f359c83323f73c3e0e137388dd73c042c51bef9ed71e42e152c2034c0 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:48fa039f359c83323f73c3e0e137388dd73c042c51bef9ed71e42e152c2034c0 Port: <none> Host Port: <none> Command: cluster-kube-controller-manager-operator cert-syncer Args: --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-controller-cert-syncer-kubeconfig/kubeconfig --namespace=$(POD_NAMESPACE) --destination-dir=/etc/kubernetes/static-pod-certs State: Running Started: Fri, 11 Sep 2020 12:22:29 +0530 Ready: True Restart Count: 0 Requests: cpu: 5m memory: 50Mi Environment: POD_NAME: kube-controller-manager-ip-10-0-128-187.us-east-2.compute.internal (v1:metadata.name) POD_NAMESPACE: openshift-kube-controller-manager (v1:metadata.namespace) Mounts: /etc/kubernetes/static-pod-certs from cert-dir (rw) /etc/kubernetes/static-pod-resources from resource-dir (rw) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: resource-dir: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-5 HostPathType: cert-dir: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/static-pod-resources/kube-controller-manager-certs HostPathType: QoS Class: Burstable Node-Selectors: <none> Tolerations: op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f587ae067cd421fcaef2e77e87342c7fbf11d97053c2fa63e7e12695792772c9" already present on machine Normal Created 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Created container kube-controller-manager-5 Normal Started 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Started container kube-controller-manager-5 Normal Pulled 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2711360fb07612f1c64311d4e341229798ff058656c5e5305db30a0733cbca75" already present on machine Normal Created 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Created container cluster-policy-controller-5 Normal Started 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Started container cluster-policy-controller-5 Normal Pulled 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:48fa039f359c83323f73c3e0e137388dd73c042c51bef9ed71e42e152c2034c0" already present on machine Normal Created 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Created container kube-controller-manager-cert-syncer-5 Normal Started 17m kubelet, ip-10-0-128-187.us-east-2.compute.internal Started container kube-controller-manager-cert-syncer-5 [ramakasturinarra@dhcp35-60 ~]$ oc describe pod kube-controller-manager-ip-10-0-128-187.us-east-2.compute.internal -n openshift-kube-controller-manager | grep -i init Initialized True
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.3.38 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3609