Bug 1915363 - kube-scheduler not scheduling pods for certificates not renewed automatically after nodes restoration
Summary: kube-scheduler not scheduling pods for certificates not renewed automatically...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.z
Assignee: Tomáš Nožička
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On: 1903586
Blocks: 1915366
TreeView+ depends on / blocked
 
Reported: 2021-01-12 14:12 UTC by Tomáš Nožička
Modified: 2021-02-02 15:10 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1903586
: 1915366 (view as bug list)
Environment:
Last Closed: 2021-02-02 15:10:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-scheduler-operator pull 317 0 None closed Bug 1915363: Sync new kube-scheduler-client-cert-key on recovery 2021-02-08 18:05:07 UTC
Red Hat Product Errata RHBA-2021:0231 0 None None None 2021-02-02 15:10:38 UTC

Comment 1 Tomáš Nožička 2021-01-12 14:22:19 UTC
BZ is having outage for PR links, I'll put it at least in the comment https://github.com/openshift/cluster-kube-scheduler-operator/pull/317

Comment 4 RamaKasturi 2021-01-25 17:26:13 UTC
Verified with the payload below and i see that the fix works fine with out any issues. Below are the steps performed to verify the same.
[core@knarra45afix-8w629-control-plane-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2021-01-22-191735   True        False         365d    Error while reconciling 4.5.0-0.nightly-2021-01-22-191735: some cluster operators have not yet rolled out


Steps performed:
=================
1) Install 4.5 cluster with the payload which has the fix
2) Now run the script below from the master node
https://gist.github.com/tnozicka/b1df897a905be8b6e22ab04ce5b9b90a#file-ocp-shifttime-sh
3) Once the script finishes, check for pending csr's , approve MCO CSR's and approve node & master CSR's
4) wait for clusteroperators.
Could see that there are no pods which are in pending state, some of the pods have issue pulling the image and that is because we would need to setup up or own registry and mirror the payload there first. the current validity of the cert on that registry has to be longer then the skew.

[core@knarra45afix-8w629-control-plane-0 ~]$ oc logs openshift-kube-scheduler-knarra45afix-8w629-control-plane-2 -n openshift-kube-scheduler | grep cert_rotation
I0125 19:07:59.185748       1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials
I0125 19:10:30.886833       1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials


Steps performed to reproduce the issue:
======================================
1) Install 4.5 cluster with the payload where the fix is not present
[core@knarra45bfix-bqgnm-control-plane-0 ~]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2021-01-21-233035   True        False         365d    Cluster version is 4.5.0-0.nightly-2021-01-21-233035
2) Now run the script below from one of the master node
https://gist.github.com/tnozicka/b1df897a905be8b6e22ab04ce5b9b90a#file-ocp-shifttime-sh
3) Once the script finishes, check for pending CSR's, approve MCO CSR's and approve node & master CSR's
4) wait for clusterOperators
Could see that all the pods are in pending state and checking any of the logs gives error below.
[core@knarra45bfix-bqgnm-control-plane-0 ~]$ oc logs -f openshift-kube-scheduler-knarra45bfix-bqgnm-control-plane-0 -n openshift-kube-scheduler
Error from server: Get https://10.0.96.214:10250/containerLogs/openshift-kube-scheduler/openshift-kube-scheduler-knarra45bfix-bqgnm-control-plane-0/kube-scheduler?follow=true: x509: certificate signed by unknown authority


Based on the above moving bug to verified state.

Comment 6 errata-xmlrpc 2021-02-02 15:10:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.30 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0231


Note You need to log in before you can comment on or make changes to this bug.