+++ This bug was initially created as a clone of Bug #1843752 +++ Description of problem: Hit failed installation of OCP cluster, with kube-apiserver crashloop "timed out waiting for port :6443 and :6080 to be released". After manually removing the checking of 6080 port for the kube-apiserver container in the static pod yaml files in the master, the crash is gone. Seems there is race happening between the start order of the kube-apiserver and kube-apiserver-insecure-readyz containers. Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-06-03-045340 Here is the actual env in action, [root@wj45ios603e-xnj76-master-0 core]# crictl logs 7b4aa4234361c Copying system trust bundle Waiting for port :6443 and :6080 to be released.............................................timed out waiting for port :6443 and :6080 to be released [root@wj45ios603e-xnj76-master-0 core]# fuser -v 6080/tcp USER PID ACCESS COMMAND 6080/tcp: root 39516 F.... cluster-kube-ap [root@wj45ios603e-xnj76-master-0 core]# ps aux|grep -i cluster-kube-ap root 39401 0.2 0.3 927832 60740 ? Ssl 08:44 0:06 cluster-kube-apiserver-operator cert-syncer --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig --namespace=openshift-kube-apiserver --destination-dir=/etc/kubernetes/static-pod-certs root 39516 0.0 0.2 452688 48720 ? Ssl 08:44 0:00 cluster-kube-apiserver-operator insecure-readyz --insecure-port=6080 --delegate-url=https://localhost:6443/readyz root 43119 0.0 0.3 862296 55616 ? Ssl 08:45 0:00 cluster-kube-apiserver-operator cert-regeneration-controller --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig --namespace=openshift-kube-apiserver -v=2 root 180734 0.0 0.0 12920 2528 pts/0 S+ 09:34 0:00 grep --color=auto -i cluster-kube-ap [root@wj45ios603e-xnj76-master-0 core]# crictl ps |grep -i insecure-readyz 3593dbe3245fd 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 50 minutes ago Running kube-apiserver-insecure-readyz 0 2dae574048620 Check the 6080 container: [root@wj45ios603e-xnj76-master-0 core]# crictl ps -a | grep insecure-readyz 3593dbe3245fd 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago Running kube-apiserver-insecure-readyz 0 2dae574048620 [root@wj45ios603e-xnj76-master-0 core]# crictl ps -a | grep 2dae c24d9d214d40a cb6f865a8becaf5e71d9837bebfacd186f0f013e4f22c12347712a118d020657 2 minutes ago Exited kube-apiserver 41 2dae574048620 73640aaa27d1d 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 8 minutes ago Running kube-apiserver-insecure-readyz 1 2dae574048620 1d6055f5b90ea 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago Running kube-apiserver-cert-regeneration-controller 1 2dae574048620 3593dbe3245fd 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago Exited kube-apiserver-insecure-readyz 0 2dae574048620 b941a78bd05f5 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago Exited kube-apiserver-cert-regeneration-controller 0 2dae574048620 87b8f77defb3c 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago Running kube-apiserver-cert-syncer 0 2dae574048620 45cc67221f493 cb6f865a8becaf5e71d9837bebfacd186f0f013e4f22c12347712a118d020657 3 hours ago Exited setup 0 dae574048620 There is a bug 1837992 involved in arising this problem, that bug related PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/864/files#diff-79f70c8858100d23aa0da941b6136509R47-R56 should not include 'or sport = 6080'. Remove 6080 port detecting for the kube-apiserver container in /etc/kubernetes/manifests/kube-apiserver-pod.yaml and /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml (7 is the latest revision), the container can be Running: [root@wj45ios603e-xnj76-master-0 core]# crictl ps -a | grep kube-apiserver 8d472d6325667 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 20 minutes ago Running kube-apiserver-insecure-readyz 0 0008cc9b90a44 0c3c70f8dd1ad 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 20 minutes ago Running kube-apiserver-cert-regeneration-controller 0 0008cc9b90a44 d0f9f3d9cf1e9 81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 20 minutes ago Running kube-apiserver-cert-syncer 0 0008cc9b90a44 8bee9ff6b36dc cb6f865a8becaf5e71d9837bebfacd186f0f013e4f22c12347712a118d020657 20 minutes ago Running kube-apiserver 0 0008cc9b90a44 Expected Results: A restarted kube-apiserver container no need wait for the port 6080 to be available Additional info:
Our OCP 4.5 tests on osp16 is blocked by this bug, clone one to 4.5.
Not only osp16, GCP as well, https://bugzilla.redhat.com/show_bug.cgi?id=1838421.
https://bugzilla.redhat.com/show_bug.cgi?id=1838421#c19 encountered the same error, State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: ...............................................................................timed out waiting for port :6443 and :6080 to be released
*** This bug has been marked as a duplicate of bug 1843752 ***
Reopening point back to 1843752 as its parent.
So far, we've found this problem on several platforms, including Red Hat OpenStack Platform 16.0, Google Cloud Platform and vSphere.
Verified in 4.5.0-0.nightly-2020-06-09-223121: In the kube-apiserver pod yaml, kube-apiserver container has no checking of port 6080 now, as the PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/878/files . Repeatedly rollout: $ scripts/rollout.sh | tee logs/rollout.log | grep -i -e "checking" -e crash -e "timed out" Didn't see Crash and "timed out" $ cat scripts/rollout.sh #!/bin/bash i=0; while true do DATE="$(date)" let i+=1; echo "$i time rollout $DATE" oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "xxia forced test'"$i time rollout $DATE"'" } ]' sleep 60 while true; do echo "checking status $(date)" oc get po -n openshift-kube-apiserver --show-labels -l apiserver oc get po -n openshift-kube-apiserver -l apiserver -o json | jq '.items[].status' if oc get co kube-apiserver | grep "True.*False.*False"; then break fi sleep 10 done done
We have successful installation with latest OCP 4.5 on Red Hat OpenStack Platform 16.0, this bug is not blocking related tests.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409