Bug 1862898
Summary: | [4.4] Haproxy 9443 port conflicts with KCM causing KCM in crashloopbackoff state openstack | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | RamaKasturi <knarra> |
Component: | Installer | Assignee: | Gal Zaidman <gzaidman> |
Installer sub component: | OpenShift on RHV | QA Contact: | Guilherme Santos <gdeolive> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | urgent | CC: | aabhishe, adahiya, adduarte, bjarolim, bperkins, dmoessne, jrosenta, lmartinh, ocprhvteam, openshift-bugs-escalate, pelauter, ppitonak, scuppett |
Version: | 4.4 | ||
Target Milestone: | --- | ||
Target Release: | 4.4.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-09-01 19:41:34 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1858498 | ||
Bug Blocks: |
Comment 1
RamaKasturi
2020-08-03 07:46:10 UTC
(In reply to RamaKasturi from comment #1) > cloned this bug to 4.4.z because i see that doing a fresh install using the latest 4.4 nightly 4.4.0-0.nightly-2020-08-01-220435 causes KCM to be in crashloopbackoff state and did not see a corresponding bug for 4.4.z !! [ramakasturinarra@dhcp35-60 ~]$ oc exec -n openshift-openstack-infra haproxy-knarra08031-82ttr-master-1 -- cat /etc/haproxy/haproxy.cfg | grep bind Defaulting container name to haproxy. Use 'oc describe pod/haproxy-knarra08031-82ttr-master-1 -n openshift-openstack-infra' to see all of the containers in this pod. bind :::9443 v4v6 bind :::50936 v4v6 bind 127.0.0.1:50000 [ramakasturinarra@dhcp35-60 ~]$ oc describe pod kube-controller-manager-knarra08031-82ttr-master-1 -n openshift-kube-controller-manager Name: kube-controller-manager-knarra08031-82ttr-master-1 Namespace: openshift-kube-controller-manager Priority: 2000001000 Priority Class Name: system-node-critical Node: knarra08031-82ttr-master-1/192.168.1.23 Start Time: Mon, 03 Aug 2020 11:24:23 +0530 Labels: app=kube-controller-manager kube-controller-manager=true revision=10 Annotations: kubectl.kubernetes.io/default-logs-container: kube-controller-manager kubernetes.io/config.hash: f80658ae0b53d9d9b95eba77b9cd3a85 kubernetes.io/config.mirror: f80658ae0b53d9d9b95eba77b9cd3a85 kubernetes.io/config.seen: 2020-08-03T06:05:54.150115172Z kubernetes.io/config.source: file Status: Running IP: 192.168.1.23 IPs: IP: 192.168.1.23 Controlled By: Node/knarra08031-82ttr-master-1 Containers: kube-controller-manager: Container ID: cri-o://7b60bc62260339ef21dbe9b842a2dabb880e26a296d9f6bb4f3eb003d4900fa9 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fed963f4a3d4fa81891976fcda8e08d970e1ddfb4076ee4e048b70c581c2c49b Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fed963f4a3d4fa81891976fcda8e08d970e1ddfb4076ee4e048b70c581c2c49b Port: 10257/TCP Host Port: 10257/TCP Command: /bin/bash -euxo pipefail -c Args: timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 10257 \))" ]; do sleep 1; done' if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then echo "Copying system trust bundle" cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem fi exec hyperkube kube-controller-manager --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml \ --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \ --authentication-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \ --authorization-kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/controller-manager-kubeconfig/kubeconfig \ --client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/client-ca/ca-bundle.crt \ --requestheader-client-ca-file=/etc/kubernetes/static-pod-certs/configmaps/aggregator-client-ca/ca-bundle.crt -v=2 --tls-cert-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.crt --tls-private-key-file=/etc/kubernetes/static-pod-resources/secrets/serving-cert/tls.key State: Running Started: Mon, 03 Aug 2020 11:35:55 +0530 Ready: True Restart Count: 0 Requests: cpu: 80m memory: 200Mi Liveness: http-get https://:10257/healthz delay=45s timeout=10s period=10s #success=1 #failure=3 Readiness: http-get https://:10257/healthz delay=10s timeout=10s period=10s #success=1 #failure=3 Environment: HTTPS_PROXY: https://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3130 HTTP_PROXY: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3128 NO_PROXY: .cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.0.0/18,api-int.knarra08031.0803-pbn.qe.rhcloud.com,etcd-0.knarra08031.0803-pbn.qe.rhcloud.com,etcd-1.knarra08031.0803-pbn.qe.rhcloud.com,etcd-2.knarra08031.0803-pbn.qe.rhcloud.com,localhost,oauth-openshift.apps.knarra08031.0803-pbn.qe.rhcloud.com,rhos-d.infra.prod.upshift.rdu2.redhat.com Mounts: /etc/kubernetes/static-pod-certs from cert-dir (rw) /etc/kubernetes/static-pod-resources from resource-dir (rw) cluster-policy-controller: Container ID: cri-o://e164d64ad9c21b9f5f6ffc20bbac84e958331d5a5f7d410f99849454982e956f Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc367e8cb993f0194ad8288a29bb00e9362f9f9d123fb94c7c85f8349cd3599c Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc367e8cb993f0194ad8288a29bb00e9362f9f9d123fb94c7c85f8349cd3599c Port: 10357/TCP Host Port: 10357/TCP Command: /bin/bash -euxo pipefail -c Args: timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 10357 \))" ]; do sleep 1; done' exec cluster-policy-controller start --config=/etc/kubernetes/static-pod-resources/configmaps/cluster-policy-controller-config/config.yaml State: Running Started: Mon, 03 Aug 2020 11:50:45 +0530 Last State: Terminated Reason: Error Message: WatchBookmarks=true&resourceVersion=23031&timeout=5m7s&timeoutSeconds=307&watch=true: dial tcp [::1]:6443: connect: connection refused E0803 06:20:43.856729 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolumeClaim: Get https://localhost:6443/api/v1/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=19306&timeout=8m2s&timeoutSeconds=482&watch=true: dial tcp [::1]:6443: connect: connection refused E0803 06:20:43.860184 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1beta1.PodDisruptionBudget: Get https://localhost:6443/apis/policy/v1beta1/poddisruptionbudgets?allowWatchBookmarks=true&resourceVersion=19308&timeout=8m6s&timeoutSeconds=486&watch=true: dial tcp [::1]:6443: connect: connection refused E0803 06:20:43.861113 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.NetworkPolicy: Get https://localhost:6443/apis/networking.k8s.io/v1/networkpolicies?allowWatchBookmarks=true&resourceVersion=19308&timeout=7m12s&timeoutSeconds=432&watch=true: dial tcp [::1]:6443: connect: connection refused E0803 06:20:43.862184 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.LimitRange: Get https://localhost:6443/api/v1/limitranges?allowWatchBookmarks=true&resourceVersion=19306&timeout=7m36s&timeoutSeconds=456&watch=true: dial tcp [::1]:6443: connect: connection refused E0803 06:20:43.863294 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1beta1.EndpointSlice: Get https://localhost:6443/apis/discovery.k8s.io/v1beta1/endpointslices?allowWatchBookmarks=true&resourceVersion=19308&timeout=9m17s&timeoutSeconds=557&watch=true: dial tcp [::1]:6443: connect: connection refused I0803 06:20:44.538980 1 leaderelection.go:288] failed to renew lease openshift-kube-controller-manager/cluster-policy-controller: timed out waiting for the condition F0803 06:20:44.539108 1 policy_controller.go:94] leaderelection lost Exit Code: 255 Started: Mon, 03 Aug 2020 11:35:55 +0530 Finished: Mon, 03 Aug 2020 11:50:44 +0530 Ready: True Restart Count: 1 Requests: cpu: 10m memory: 200Mi Liveness: http-get https://:10357/healthz delay=45s timeout=10s period=10s #success=1 #failure=3 Readiness: http-get https://:10357/healthz delay=10s timeout=10s period=10s #success=1 #failure=3 Environment: HTTPS_PROXY: https://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3130 HTTP_PROXY: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3128 NO_PROXY: .cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.0.0/18,api-int.knarra08031.0803-pbn.qe.rhcloud.com,etcd-0.knarra08031.0803-pbn.qe.rhcloud.com,etcd-1.knarra08031.0803-pbn.qe.rhcloud.com,etcd-2.knarra08031.0803-pbn.qe.rhcloud.com,localhost,oauth-openshift.apps.knarra08031.0803-pbn.qe.rhcloud.com,rhos-d.infra.prod.upshift.rdu2.redhat.com Mounts: /etc/kubernetes/static-pod-certs from cert-dir (rw) /etc/kubernetes/static-pod-resources from resource-dir (rw) kube-controller-manager-cert-syncer: Container ID: cri-o://16ad9bd82ab65a04a9f7a7c95f525d019fc6244705cc8a11ac93c9dda42320fd Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b761bfa81fdb68866028109fb0092fc30147fa315ca17748e7d9b8c55ef5762d Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b761bfa81fdb68866028109fb0092fc30147fa315ca17748e7d9b8c55ef5762d Port: <none> Host Port: <none> Command: cluster-kube-controller-manager-operator cert-syncer Args: --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-controller-cert-syncer-kubeconfig/kubeconfig --namespace=$(POD_NAMESPACE) --destination-dir=/etc/kubernetes/static-pod-certs State: Running Started: Mon, 03 Aug 2020 11:35:56 +0530 Ready: True Restart Count: 0 Requests: cpu: 5m memory: 50Mi Environment: POD_NAME: kube-controller-manager-knarra08031-82ttr-master-1 (v1:metadata.name) POD_NAMESPACE: openshift-kube-controller-manager (v1:metadata.namespace) HTTPS_PROXY: https://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3130 HTTP_PROXY: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3128 NO_PROXY: .cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.0.0/18,api-int.knarra08031.0803-pbn.qe.rhcloud.com,etcd-0.knarra08031.0803-pbn.qe.rhcloud.com,etcd-1.knarra08031.0803-pbn.qe.rhcloud.com,etcd-2.knarra08031.0803-pbn.qe.rhcloud.com,localhost,oauth-openshift.apps.knarra08031.0803-pbn.qe.rhcloud.com,rhos-d.infra.prod.upshift.rdu2.redhat.com Mounts: /etc/kubernetes/static-pod-certs from cert-dir (rw) /etc/kubernetes/static-pod-resources from resource-dir (rw) kube-controller-manager-recovery-controller: Container ID: cri-o://81f483c2b0fe14c54a7c17d53b6836d442f1e0aa8f67119906db716d005f7375 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b761bfa81fdb68866028109fb0092fc30147fa315ca17748e7d9b8c55ef5762d Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b761bfa81fdb68866028109fb0092fc30147fa315ca17748e7d9b8c55ef5762d Port: <none> Host Port: <none> Command: /bin/bash -euxo pipefail -c Args: timeout 3m /bin/bash -exuo pipefail -c 'while [ -n "$(ss -Htanop \( sport = 9443 \))" ]; do sleep 1; done' exec cluster-kube-controller-manager-operator cert-recovery-controller --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-controller-cert-syncer-kubeconfig/kubeconfig --namespace=${POD_NAMESPACE} --listen=0.0.0.0:9443 -v=2 State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: 43 [::ffff:127.0.0.1]:42884 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:49494 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:38450 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:41566 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:42610 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:46728 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:40492 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:60220 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:47944 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:60366 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:58452 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:46046 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:59756 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:54786 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:55012 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:38142 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:37188 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:51604 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:41648 ESTAB 0 0 [::ffff:127.0.0.1]:9443 [::ffff:127.0.0.1]:56502 ' ']' + sleep 1 Exit Code: 124 Started: Mon, 03 Aug 2020 13:12:50 +0530 Finished: Mon, 03 Aug 2020 13:15:50 +0530 Ready: False Restart Count: 15 Requests: cpu: 5m memory: 50Mi Environment: POD_NAMESPACE: openshift-kube-controller-manager (v1:metadata.namespace) HTTPS_PROXY: https://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3130 HTTP_PROXY: http://proxy-user1:JYgU8qRZV4DY4PXJbxJK@10.0.77.163:3128 NO_PROXY: .cluster.local,.svc,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,192.168.0.0/18,api-int.knarra08031.0803-pbn.qe.rhcloud.com,etcd-0.knarra08031.0803-pbn.qe.rhcloud.com,etcd-1.knarra08031.0803-pbn.qe.rhcloud.com,etcd-2.knarra08031.0803-pbn.qe.rhcloud.com,localhost,oauth-openshift.apps.knarra08031.0803-pbn.qe.rhcloud.com,rhos-d.infra.prod.upshift.rdu2.redhat.com Mounts: /etc/kubernetes/static-pod-certs from cert-dir (rw) /etc/kubernetes/static-pod-resources from resource-dir (rw) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: resource-dir: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-10 HostPathType: cert-dir: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/static-pod-resources/kube-controller-manager-certs HostPathType: QoS Class: Burstable Node-Selectors: <none> Tolerations: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 103m kubelet, knarra08031-82ttr-master-1 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b761bfa81fdb68866028109fb0092fc30147fa315ca17748e7d9b8c55ef5762d" already present on machine Normal Created 103m kubelet, knarra08031-82ttr-master-1 Created container kube-controller-manager Normal Started 103m kubelet, knarra08031-82ttr-master-1 Started container kube-controller-manager Normal Pulled 103m kubelet, knarra08031-82ttr-master-1 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bc367e8cb993f0194ad8288a29bb00e9362f9f9d123fb94c7c85f8349cd3599c" already present on machine Normal Created 103m kubelet, knarra08031-82ttr-master-1 Created container cluster-policy-controller Normal Started 103m kubelet, knarra08031-82ttr-master-1 Started container cluster-policy-controller Normal Pulled 103m kubelet, knarra08031-82ttr-master-1 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:fed963f4a3d4fa81891976fcda8e08d970e1ddfb4076ee4e048b70c581c2c49b" already present on machine Normal Created 103m kubelet, knarra08031-82ttr-master-1 Created container kube-controller-manager-cert-syncer Normal Started 103m kubelet, knarra08031-82ttr-master-1 Started container kube-controller-manager-cert-syncer Normal Pulled 93m (x4 over 103m) kubelet, knarra08031-82ttr-master-1 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b761bfa81fdb68866028109fb0092fc30147fa315ca17748e7d9b8c55ef5762d" already present on machine Normal Created 93m (x4 over 103m) kubelet, knarra08031-82ttr-master-1 Created container kube-controller-manager-recovery-controller Normal Started 93m (x4 over 103m) kubelet, knarra08031-82ttr-master-1 Started container kube-controller-manager-recovery-controller Warning BackOff 3m13s (x238 over 97m) kubelet, knarra08031-82ttr-master-1 Back-off restarting failed container Hit this issue on profile ipi-on-osp Verified on: openshift-4.4.0-0.nightly-2020-08-20-051550 ovirt-engine-4.3.11.2-0.1.el7.noarch Steps: 1. Had a broken 4.4.16 cluster deployed: # oc -n openshift-kube-controller-manager get pods | grep manager kube-controller-manager-secondary-t6g55-master-0 3/4 CrashLoopBackOff 16 10m kube-controller-manager-secondary-t6g55-master-1 3/4 CrashLoopBackOff 15 94m kube-controller-manager-secondary-t6g55-master-2 3/4 CrashLoopBackOff 18 93m # oc -n `oc get projects | grep ovirt | awk '{print $1}'` exec $(oc -n `oc get projects | grep ovirt | awk '{print $1}'` get pods | grep haproxy | awk 'NR==1{print $1}') -- cat /etc/haproxy/haproxy.cfg | grep bind Defaulting container name to haproxy. Use 'oc describe pod/haproxy-secondary-t6g55-master-0 -n openshift-ovirt-infra' to see all of the containers in this pod. bind :::9443 v4v6 bind :::50936 v4v6 bind 127.0.0.1:50000 2. Upgraded the cluster # oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-08-20-051550 --force=true Results: broken cluster fixed on upgrade and running as expected # oc -n openshift-kube-controller-manager get pods | grep manager kube-controller-manager-secondary-t6g55-master-0 4/4 Running 0 75m kube-controller-manager-secondary-t6g55-master-1 4/4 Running 1 74m kube-controller-manager-secondary-t6g55-master-2 4/4 Running 2 75m # oc -n `oc get projects | grep ovirt | awk '{print $1}'` exec $(oc -n `oc get projects | grep ovirt | awk '{print $1}'` get pods | grep haproxy | awk 'NR==1{print $1}') -- cat /etc/haproxy/haproxy.cfg | grep bind Defaulting container name to haproxy. Use 'oc describe pod/haproxy-secondary-t6g55-master-0 -n openshift-ovirt-infra' to see all of the containers in this pod. bind :::9445 v4v6 bind :::50936 v4v6 bind 127.0.0.1:50000 correction: on comment #8 the step 2. command is missing the flag: --allow-explicit-upgrade Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.4.19 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3514 |