Description of problem: If the kube-apiserver container is restarted, the port (:6443) isn't yet available, so it crashloops for a few times until it's finally able to start. This causes pod logs to be lost, obscuring the real reason for the restart. We handle this case when the whole pod is restarted with an InitContainer. We should handle the same case for container restarts. Version-Release number of selected component (if applicable): How reproducible: easy Steps to Reproduce: 1. kill kube-apiserver 2. watch pod logs. 3. I should have a fix for this shortly.
*** Bug 1834908 has been marked as a duplicate of this bug. ***
This is in the merge queue for the whole day, with infra issues blocking merge.
Verified with OCP 4.5.0-0.nightly-2020-05-30-025738, checked the PR changes in kube-apiserver pod, $ kubeapiserver_pod=$(oc get pod -n openshift-kube-apiserver | grep kube-apiserver | head -1 | awk '{print $1}') $ oc get pods -n openshift-kube-apiserver $kubeapiserver_pod -o yaml | grep -n -C8 'Waiting for port :6443 and :6080 to be released' 344-spec: 345- containers: 346- - args: 347- - |- 348- if [ -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt ]; then 349- echo "Copying system trust bundle" 350- cp -f /etc/kubernetes/static-pod-certs/configmaps/trusted-ca-bundle/ca-bundle.crt /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem 351- fi 352: echo -n "Waiting for port :6443 and :6080 to be released." 353- tries=0 354- while [ -n "$(ss -Htan '( sport = 6443 or sport = 6080 )')" ]; do 355- echo -n "." 356- sleep 1 357- (( tries += 1 )) 358- if [[ "${tries}" -gt 105 ]]; then 359- echo "timed out waiting for port :6443 and :6080 to be released" 360- exit 1 -- 504- dnsPolicy: ClusterFirst 505- enableServiceLinks: true 506- hostNetwork: true 507- initContainers: 508- - args: 509- - | 510- echo -n "Fixing audit permissions." 511- chmod 0700 /var/log/kube-apiserver 512: echo -n "Waiting for port :6443 and :6080 to be released." 513- while [ -n "$(ss -Htan '( sport = 6443 or sport = 6080 )')" ]; do 514- echo -n "." 515- sleep 1 516- done 517- command: 518- - /usr/bin/timeout 519- - "105" 520- - /bin/bash Redeployed the kube-apiserver pod and check if the it works as expected. $oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "forced test 1" } ]' $ oc logs -n openshift-kube-apiserver $kubeapiserver_pod | grep 'Waiting for port :6443 and :6080 to be released' Waiting for port :6443 and :6080 to be released. From above, we can see the kube-apiserver waited for the port to be available without crashloops, so move the bug verified.
This bug was verified with a 4.5 nightly when the target was 4.5.0. It's also attached to a 4.5 errata. Moving the target back to 4.5.0, reversing Stefan's change from the 4th.
*** Bug 1851071 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409