Bug 1843752 - A restarted kube-apiserver container hits crashloop due to 6080 port of kube-apiserver-insecure-readyz
Summary: A restarted kube-apiserver container hits crashloop due to 6080 port of kube-...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.5
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Stefan Schimanski
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks: 1844288
TreeView+ depends on / blocked
 
Reported: 2020-06-04 03:10 UTC by Ke Wang
Modified: 2020-10-27 16:05 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1844288 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:04:47 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 875 0 None closed Bug 1843752: static pod: don't wait for 6080 in apiserver container 2021-01-28 10:07:29 UTC
Red Hat Knowledge Base (Solution) 5191141 0 None None None 2020-06-30 06:40:13 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:05:14 UTC

Description Ke Wang 2020-06-04 03:10:52 UTC
Description of problem: 
Hit failed installation of OCP cluster, with kube-apiserver crashloop "timed out waiting for port :6443 and :6080 to be released". After manually removing the checking of 6080 port for the kube-apiserver container in the static pod yaml files in the master, the crash is gone. Seems there is race happening between the start order of the kube-apiserver and kube-apiserver-insecure-readyz containers.

Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-06-03-045340

Here is the actual env in action,
[root@wj45ios603e-xnj76-master-0 core]# crictl logs  7b4aa4234361c
Copying system trust bundle
Waiting for port :6443 and :6080 to be released.............................................timed out waiting for port :6443 and :6080 to be released

[root@wj45ios603e-xnj76-master-0 core]# fuser -v 6080/tcp
                     USER        PID ACCESS COMMAND
6080/tcp:            root      39516 F.... cluster-kube-ap

[root@wj45ios603e-xnj76-master-0 core]# ps aux|grep -i cluster-kube-ap
root       39401  0.2  0.3 927832 60740 ?        Ssl  08:44   0:06 cluster-kube-apiserver-operator cert-syncer --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig --namespace=openshift-kube-apiserver --destination-dir=/etc/kubernetes/static-pod-certs
root       39516  0.0  0.2 452688 48720 ?        Ssl  08:44   0:00 cluster-kube-apiserver-operator insecure-readyz --insecure-port=6080 --delegate-url=https://localhost:6443/readyz
root       43119  0.0  0.3 862296 55616 ?        Ssl  08:45   0:00 cluster-kube-apiserver-operator cert-regeneration-controller --kubeconfig=/etc/kubernetes/static-pod-resources/configmaps/kube-apiserver-cert-syncer-kubeconfig/kubeconfig --namespace=openshift-kube-apiserver -v=2
root      180734  0.0  0.0  12920  2528 pts/0    S+   09:34   0:00 grep --color=auto -i cluster-kube-ap

[root@wj45ios603e-xnj76-master-0 core]# crictl ps |grep -i insecure-readyz
3593dbe3245fd       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab    50 minutes ago      Running        kube-apiserver-insecure-readyz      0     2dae574048620


Check the 6080 container:
[root@wj45ios603e-xnj76-master-0 core]# crictl ps -a | grep insecure-readyz
3593dbe3245fd       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago         Running    kube-apiserver-insecure-readyz                0    2dae574048620

[root@wj45ios603e-xnj76-master-0 core]# crictl ps -a | grep 2dae
c24d9d214d40a       cb6f865a8becaf5e71d9837bebfacd186f0f013e4f22c12347712a118d020657 2 minutes ago       Exited     kube-apiserver                                41   2dae574048620
73640aaa27d1d       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 8 minutes ago       Running    kube-apiserver-insecure-readyz                1    2dae574048620
1d6055f5b90ea       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago         Running    kube-apiserver-cert-regeneration-controller   1    2dae574048620
3593dbe3245fd       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago         Exited     kube-apiserver-insecure-readyz                0    2dae574048620
b941a78bd05f5       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago         Exited     kube-apiserver-cert-regeneration-controller   0    2dae574048620
87b8f77defb3c       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab 3 hours ago         Running    kube-apiserver-cert-syncer                    0    2dae574048620
45cc67221f493       cb6f865a8becaf5e71d9837bebfacd186f0f013e4f22c12347712a118d020657 3 hours ago         Exited     setup                                         0    dae574048620

There is a bug 1837992 involved in arising this problem, that bug related PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/864/files#diff-79f70c8858100d23aa0da941b6136509R47-R56 should not include 'or sport = 6080'.

Remove 6080 port detecting for the kube-apiserver container in /etc/kubernetes/manifests/kube-apiserver-pod.yaml and /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml (7 is the latest revision), the container can be Running:
[root@wj45ios603e-xnj76-master-0 core]# crictl ps -a | grep kube-apiserver
8d472d6325667       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab    20 minutes ago      Running             kube-apiserver-insecure-readyz                0     0008cc9b90a44
0c3c70f8dd1ad       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab    20 minutes ago      Running             kube-apiserver-cert-regeneration-controller   0     0008cc9b90a44
d0f9f3d9cf1e9       81b500f67ec4ee958cbb7760aae0b25f077a779f491641ee91e631a50f8393ab    20 minutes ago      Running             kube-apiserver-cert-syncer                    0     0008cc9b90a44
8bee9ff6b36dc       cb6f865a8becaf5e71d9837bebfacd186f0f013e4f22c12347712a118d020657    20 minutes ago      Running             kube-apiserver                                0     0008cc9b90a44

Expected Results:
A restarted kube-apiserver container no need wait for the port 6080 to be available


Additional info:

Comment 1 Stefan Schimanski 2020-06-05 07:40:55 UTC
*** Bug 1844288 has been marked as a duplicate of this bug. ***

Comment 2 Ke Wang 2020-06-05 08:43:03 UTC
OCP 4.5 with Red Hat OpenStack Platform 16.0 run into this bug and blocked related tests.

OCP 4.5 with Google Cloud Platform hit this, see https://bugzilla.redhat.com/show_bug.cgi?id=1838421#c19 encountered the same error,
 State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   ...............................................................................timed out waiting for port :6443 and :6080 to be released

Required a fix about this bug on OCP 4.5.

Comment 3 Stefan Schimanski 2020-06-05 14:55:11 UTC
Blocked by https://github.com/openshift/cluster-kube-apiserver-operator/pull/870.

Comment 8 errata-xmlrpc 2020-10-27 16:04:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.