Bug 1915363

Summary: kube-scheduler not scheduling pods for certificates not renewed automatically after nodes restoration
Product: OpenShift Container Platform Reporter: Tomáš Nožička <tnozicka>
Component: kube-schedulerAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: RamaKasturi <knarra>
Severity: high Docs Contact:
Priority: high    
Version: 4.5CC: aarapov, aos-bugs, cpassare, dgautam, knarra, malonso, maszulik, mfojtik, mvardhan, ngirard, openshift-bugzilla-robot, tnozicka, tomek
Target Milestone: ---Keywords: UpcomingSprint
Target Release: 4.5.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1903586
: 1915366 (view as bug list) Environment:
Last Closed: 2021-02-02 15:10:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1903586    
Bug Blocks: 1915366    

Comment 1 Tomáš Nožička 2021-01-12 14:22:19 UTC
BZ is having outage for PR links, I'll put it at least in the comment https://github.com/openshift/cluster-kube-scheduler-operator/pull/317

Comment 4 RamaKasturi 2021-01-25 17:26:13 UTC
Verified with the payload below and i see that the fix works fine with out any issues. Below are the steps performed to verify the same.
[core@knarra45afix-8w629-control-plane-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2021-01-22-191735   True        False         365d    Error while reconciling 4.5.0-0.nightly-2021-01-22-191735: some cluster operators have not yet rolled out


Steps performed:
=================
1) Install 4.5 cluster with the payload which has the fix
2) Now run the script below from the master node
https://gist.github.com/tnozicka/b1df897a905be8b6e22ab04ce5b9b90a#file-ocp-shifttime-sh
3) Once the script finishes, check for pending csr's , approve MCO CSR's and approve node & master CSR's
4) wait for clusteroperators.
Could see that there are no pods which are in pending state, some of the pods have issue pulling the image and that is because we would need to setup up or own registry and mirror the payload there first. the current validity of the cert on that registry has to be longer then the skew.

[core@knarra45afix-8w629-control-plane-0 ~]$ oc logs openshift-kube-scheduler-knarra45afix-8w629-control-plane-2 -n openshift-kube-scheduler | grep cert_rotation
I0125 19:07:59.185748       1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials
I0125 19:10:30.886833       1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials


Steps performed to reproduce the issue:
======================================
1) Install 4.5 cluster with the payload where the fix is not present
[core@knarra45bfix-bqgnm-control-plane-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2021-01-21-233035   True        False         365d    Cluster version is 4.5.0-0.nightly-2021-01-21-233035
2) Now run the script below from one of the master node
https://gist.github.com/tnozicka/b1df897a905be8b6e22ab04ce5b9b90a#file-ocp-shifttime-sh
3) Once the script finishes, check for pending CSR's, approve MCO CSR's and approve node & master CSR's
4) wait for clusterOperators
Could see that all the pods are in pending state and checking any of the logs gives error below.
[core@knarra45bfix-bqgnm-control-plane-0 ~]$ oc logs -f openshift-kube-scheduler-knarra45bfix-bqgnm-control-plane-0 -n openshift-kube-scheduler
Error from server: Get https://10.0.96.214:10250/containerLogs/openshift-kube-scheduler/openshift-kube-scheduler-knarra45bfix-bqgnm-control-plane-0/kube-scheduler?follow=true: x509: certificate signed by unknown authority


Based on the above moving bug to verified state.

Comment 6 errata-xmlrpc 2021-02-02 15:10:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.30 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0231