Bug 1837992 - A restarted kube-apiserver doesn't wait for the port to be available; crashloops
Summary: A restarted kube-apiserver doesn't wait for the port to be available; crashloops
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.5.0
Assignee: Stefan Schimanski
QA Contact: Ke Wang
URL:
Whiteboard:
: 1834908 1851071 (view as bug list)
Depends On:
Blocks: 1851066 1851071 1851915
TreeView+ depends on / blocked
 
Reported: 2020-05-20 10:43 UTC by Casey Callendrello
Modified: 2020-08-04 14:04 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1851066 1851071 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:40:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 864 0 None closed Bug 1837992: wait for port 6443 to be open in the kube-apiserver container; use ss isntead of lsof 2021-02-03 01:58:13 UTC
Github openshift origin pull 25002 0 None closed Bug 1837992: images/hyperkube: install iproute 2021-02-03 01:58:14 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:40:56 UTC

Description Casey Callendrello 2020-05-20 10:43:23 UTC
Description of problem:
If the kube-apiserver container is restarted, the port (:6443) isn't yet available, so it crashloops for a few times until it's finally able to start.

This causes pod logs to be lost, obscuring the real reason for the restart.

We handle this case when the whole pod is restarted with an InitContainer. We should handle the same case for container restarts.

Version-Release number of selected component (if applicable):


How reproducible: easy


Steps to Reproduce:
1. kill kube-apiserver
2. watch pod logs.
3.


I should have a fix for this shortly.

Comment 1 Jacob Tanenbaum 2020-05-20 16:32:54 UTC
*** Bug 1834908 has been marked as a duplicate of this bug. ***

Comment 2 Stefan Schimanski 2020-05-28 13:58:52 UTC
This is in the merge queue for the whole day, with infra issues blocking merge.

Comment 6 Ke Wang 2020-06-01 07:33:25 UTC
Verified with OCP 4.5.0-0.nightly-2020-05-30-025738, checked the PR changes in kube-apiserver pod, 

$ kubeapiserver_pod=$(oc get pod -n openshift-kube-apiserver | grep kube-apiserver | head -1 | awk '{print $1}')

$ oc get pods -n openshift-kube-apiserver $kubeapiserver_pod -o yaml | grep -n -C8 'Waiting for port :6443 and :6080 to be released'
344-spec:
345-  containers:
346-  - args:
347-    - |-
348-      if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then
349-        echo "Copying system trust bundle"
350-        cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
351-      fi
352:      echo -n "Waiting for port :6443 and :6080 to be released."
353-      tries=0
354-      while [ -n "$(ss -Htan '( sport = 6443 or sport = 6080 )')" ]; do
355-        echo -n "."
356-        sleep 1
357-        (( tries += 1 ))
358-        if [[ "${tries}" -gt 105 ]]; then
359-          echo "timed out waiting for port :6443 and :6080 to be released"
360-          exit 1
--
504-  dnsPolicy: ClusterFirst
505-  enableServiceLinks: true
506-  hostNetwork: true
507-  initContainers:
508-  - args:
509-    - |
510-      echo -n "Fixing audit permissions."
511-      chmod 0700 /var/log/kube-apiserver
512:      echo -n "Waiting for port :6443 and :6080 to be released."
513-      while [ -n "$(ss -Htan '( sport = 6443 or sport = 6080 )')" ]; do
514-        echo -n "."
515-        sleep 1
516-      done
517-    command:
518-    - /usr/bin/timeout
519-    - "105"
520-    - /bin/bash


Redeployed the kube-apiserver pod and check if the it works as expected.

$oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "forced test 1" } ]'

$ oc logs -n openshift-kube-apiserver $kubeapiserver_pod | grep 'Waiting for port :6443 and :6080 to be released'
Waiting for port :6443 and :6080 to be released.

From above, we can see the kube-apiserver waited for the port to be available without crashloops, so move the bug verified.

Comment 7 W. Trevor King 2020-06-25 16:13:13 UTC
This bug was verified with a 4.5 nightly when the target was 4.5.0.  It's also attached to a 4.5 errata.  Moving the target back to 4.5.0, reversing Stefan's change from the 4th.

Comment 8 W. Trevor King 2020-06-25 16:13:46 UTC
*** Bug 1851071 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2020-07-13 17:40:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.