1837992 – A restarted kube-apiserver doesn't wait for the port to be available; crashloops

Bug 1837992 - A restarted kube-apiserver doesn't wait for the port to be available; crashloops

Summary: A restarted kube-apiserver doesn't wait for the port to be available; crashloops

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Stefan Schimanski
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1834908 1851071 (view as bug list)
Depends On:
Blocks:	1851066 1851071 1851915
TreeView+	depends on / blocked

Reported:	2020-05-20 10:43 UTC by Casey Callendrello
Modified:	2020-08-04 14:04 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1851066 1851071 (view as bug list)
Environment:
Last Closed:	2020-07-13 17:40:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-apiserver-operator pull 864	None	closed	Bug 1837992: wait for port 6443 to be open in the kube-apiserver container; use ss isntead of lsof	2021-02-03 01:58:13 UTC
Github	openshift origin pull 25002	None	closed	Bug 1837992: images/hyperkube: install iproute	2021-02-03 01:58:14 UTC
Red Hat Product Errata	RHBA-2020:2409	None	None	None	2020-07-13 17:40:56 UTC

Description Casey Callendrello 2020-05-20 10:43:23 UTC

Description of problem:
If the kube-apiserver container is restarted, the port (:6443) isn't yet available, so it crashloops for a few times until it's finally able to start.

This causes pod logs to be lost, obscuring the real reason for the restart.

We handle this case when the whole pod is restarted with an InitContainer. We should handle the same case for container restarts.

Version-Release number of selected component (if applicable):


How reproducible: easy


Steps to Reproduce:
1. kill kube-apiserver
2. watch pod logs.
3.


I should have a fix for this shortly.

Comment 1 Jacob Tanenbaum 2020-05-20 16:32:54 UTC

*** Bug 1834908 has been marked as a duplicate of this bug. ***

Comment 2 Stefan Schimanski 2020-05-28 13:58:52 UTC

This is in the merge queue for the whole day, with infra issues blocking merge.

Comment 6 Ke Wang 2020-06-01 07:33:25 UTC

Verified with OCP 4.5.0-0.nightly-2020-05-30-025738, checked the PR changes in kube-apiserver pod, 

$ kubeapiserver_pod=$(oc get pod -n openshift-kube-apiserver | grep kube-apiserver | head -1 | awk '{print $1}')

$ oc get pods -n openshift-kube-apiserver $kubeapiserver_pod -o yaml | grep -n -C8 'Waiting for port :6443 and :6080 to be released'
344-spec:
345-  containers:
346-  - args:
347-    - |-
348-      if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
349-        echo "Copying system trust bundle"
350-        cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
351-      fi
352:      echo -n "Waiting for port :6443 and :6080 to be released."
353-      tries=0
354-      while [ -n "$(ss -Htan '( sport = 6443 or sport = 6080 )')" ]; do
355-        echo -n "."
356-        sleep 1
357-        (( tries += 1 ))
358-        if [[ "${tries}" -gt 105 ]]; then
359-          echo "timed out waiting for port :6443 and :6080 to be released"
360-          exit 1
--
504-  dnsPolicy: ClusterFirst
505-  enableServiceLinks: true
506-  hostNetwork: true
507-  initContainers:
508-  - args:
509-    - |
510-      echo -n "Fixing audit permissions."
511-      chmod 0700 /var/log/kube-apiserver
512:      echo -n "Waiting for port :6443 and :6080 to be released."
513-      while [ -n "$(ss -Htan '( sport = 6443 or sport = 6080 )')" ]; do
514-        echo -n "."
515-        sleep 1
516-      done
517-    command:
518-    - /usr/bin/timeout
519-    - "105"
520-    - /bin/bash


Redeployed the kube-apiserver pod and check if the it works as expected.

$oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "forced test 1" } ]'

$ oc logs -n openshift-kube-apiserver $kubeapiserver_pod | grep 'Waiting for port :6443 and :6080 to be released'
Waiting for port :6443 and :6080 to be released.

From above, we can see the kube-apiserver waited for the port to be available without crashloops, so move the bug verified.

Comment 7 W. Trevor King 2020-06-25 16:13:13 UTC

This bug was verified with a 4.5 nightly when the target was 4.5.0.  It's also attached to a 4.5 errata.  Moving the target back to 4.5.0, reversing Stefan's change from the 4th.

Comment 8 W. Trevor King 2020-06-25 16:13:46 UTC

*** Bug 1851071 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2020-07-13 17:40:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409

Note You need to log in before you can comment on or make changes to this bug.