Bug 1881322
| Summary: | kube-scheduler not scheduling pods for certificates not renewed automatically after nodes restoration | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Christian Passarelli <cpassare> | |
| Component: | kube-scheduler | Assignee: | Maciej Szulik <maszulik> | |
| Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.4 | CC: | aarapov, aos-bugs, dgautam, malonso, maszulik, mfojtik, mvardhan, ngirard, tomek | |
| Target Milestone: | --- | |||
| Target Release: | 4.7.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1903586 (view as bug list) | Environment: | ||
| Last Closed: | 2021-02-24 15:18:56 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1903523, 1903586 | |||
|
Comment 10
Tomáš Nožička
2020-10-02 11:39:30 UTC
*** Bug 1876868 has been marked as a duplicate of this bug. *** *** Bug 1882083 has been marked as a duplicate of this bug. *** Hi Tomas,
I have tried reproducing the bug by moving the time ahead to 61 days but i could not reproduce the issue what is said above.
steps followed to reproduce in 4.4 cluster:
==========================================
kube-scheduler-client-cert-key was supposed to expire on Jan23, 2021 from the below
2020-12-24T12:20:56Z 2021-01-23T12:20:57Z openshift-kube-scheduler kube-scheduler-client-cert-key
1) move time a head to 61 days
current Time:
[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-12-21-131655]$ date -u
Thu Dec 24 14:48:42 UTC 2020
Time change: [core@knarra461224-4n4cq-compute-0 ~]$ date
Tue Feb 23 14:33:40 UTC 2021
2) Approved the CSR, all co operators came back, scheduled a pod and it ran fine.
[core@knarra449-v8bmk-compute-0 ~]$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-bcp2m 1/1 Running 0 6m19s 10.131.0.22 knarra449-v8bmk-compute-2 <none> <none>
openshift-kube-scheduler-knarra449-v8bmk-control-plane-0 2/2 Running 2 61d
openshift-kube-scheduler-knarra449-v8bmk-control-plane-1 2/2 Running 0 61d
openshift-kube-scheduler-knarra449-v8bmk-control-plane-2 2/2 Running 2 61d
But do see a error message as below when run "oc get <kube-scheduler-pod> -o yaml"
lastState:
terminated:
containerID: cri-o://e450c3519f0c774cdca56a1a067365d7f513532c09c9f2a6ca64c4d5872dd3f7
exitCode: 255
finishedAt: "2021-02-23T13:19:56Z"
message: |
Get https://localhost:6443/apis/apps/v1/statefulsets?allowWatchBookmarks=true&resourceVersion=35408&timeout=6m21s&timeoutSeconds=381&watch=true: dial tcp [::1]:6443: connect: connection refused
E0223 13:19:56.150096 1 reflector.go:307] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:246: Failed to watch *v1.Pod: Get https://localhost:6443/api/v1/pods?allowWatchBookmarks=true&fieldSelector=status.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=37548&timeoutSeconds=307&watch=true: dial tcp [::1]:6443: connect: connection refused
E0223 13:19:56.151158 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolume: Get https://localhost:6443/api/v1/persistentvolumes?allowWatchBookmarks=true&resourceVersion=18128&timeout=9m35s&timeoutSeconds=575&watch=true: dial tcp [::1]:6443: connect: connection refused
E0223 13:19:56.152186 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.StorageClass: Get https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?allowWatchBookmarks=true&resourceVersion=18133&timeout=5m29s&timeoutSeconds=329&watch=true: dial tcp [::1]:6443: connect: connection refused
E0223 13:19:56.153451 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.ReplicaSet: Get https://localhost:6443/apis/apps/v1/replicasets?allowWatchBookmarks=true&resourceVersion=36851&timeout=7m46s&timeoutSeconds=466&watch=true: dial tcp [::1]:6443: connect: connection refused
E0223 13:19:56.154451 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.Service: Get https://localhost:6443/api/v1/services?allowWatchBookmarks=true&resourceVersion=36030&timeout=6m34s&timeoutSeconds=394&watch=true: dial tcp [::1]:6443: connect: connection refused
I0223 13:19:56.844338 1 leaderelection.go:288] failed to renew lease openshift-kube-scheduler/kube-scheduler: timed out waiting for the condition
F0223 13:19:56.844475 1 server.go:257] leaderelection lost
reason: Error
startedAt: "2021-02-23T13:11:53Z"
Also seen in the logs as below:
====================================
E0223 13:17:42.849631 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.Node: unknown (get nodes)
E0223 13:17:42.849740 1 leaderelection.go:331] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: configmaps "kube-scheduler" is forbidden: User "system:kube-scheduler" cannot get resource "configmaps" in API group "" in the namespace "openshift-kube-scheduler"
E0223 13:17:42.849786 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.ReplicaSet: unknown (get replicasets.apps)
E0223 13:17:42.849812 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolume: unknown (get persistentvolumes)
E0223 13:17:42.849839 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolumeClaim: unknown (get persistentvolumeclaims)
E0223 13:17:42.849869 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.Service: unknown (get services)
E0223 13:17:42.849897 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.CSINode: unknown (get csinodes.storage.k8s.io)
E0223 13:17:42.849924 1 reflector.go:307] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:246: Failed to watch *v1.Pod: unknown (get pods)
E0223 13:17:42.849949 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.StorageClass: unknown (get storageclasses.storage.k8s.io)
E0223 13:17:42.849973 1 reflector.go:307] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.ReplicationController: unknown (get replicationcontrollers)
I0223 13:19:50.009153 1 leaderelection.go:252] successfully acquired lease openshift-kube-scheduler/kube-scheduler
I0223 14:33:13.829148 1 scheduler.go:751] pod default/pod-bcp2m is bound successfully on node "knarra449-v8bmk-compute-2", 6 nodes evaluated, 3 nodes were found feasible.
steps followed to reproduce in 4.6 cluster:
==============================================
kube-scheduler-client-cert-key was supposed to expire on Jan23, 2021 from below
2020-12-24T06:59:52Z 2021-01-23T06:59:53Z openshift-kube-scheduler kube-scheduler-client-cert-key
1) move time a head to 61 days
current Time:
[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2020-12-21-131655]$ date -u
Thu Dec 24 14:53:54 UTC 2020
Time moved ahead:
[core@knarra461224-4n4cq-compute-0 ~]$ date
Tue Feb 23 14:39:06 UTC 2021
2) Approved CSR, all co operators came back, scheduled a pod and it ran fine.
[core@knarra461224-4n4cq-compute-0 ~]$ oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-qz8x8 1/1 Running 0 78m 10.129.2.132 knarra461224-4n4cq-compute-2 <none> <none>
[core@knarra461224-4n4cq-compute-0 ~]$ oc get pods -n openshift-kube-scheduler
NAME READY STATUS RESTARTS AGE
openshift-kube-scheduler-knarra461224-4n4cq-control-plane-0 2/2 Running 1 61d
openshift-kube-scheduler-knarra461224-4n4cq-control-plane-1 2/2 Running 0 61d
openshift-kube-scheduler-knarra461224-4n4cq-control-plane-2 2/2 Running 0 61d
But do see a error message as below when run "oc get <kube-scheduler-pod> -o yaml" command
exitCode: 1
finishedAt: "2021-02-23T11:41:09Z"
message: |
rra461224-4n4cq-compute-2" evaluatedNodes=6 feasibleNodes=1
I1224 11:40:36.411828 1 scheduler.go:597] "Successfully bound pod to node" pod="openshift-marketplace/community-operators-2k755" node="knarra461224-4n4cq-compute-2" evaluatedNodes=6 feasibleNodes=1
E0223 11:40:59.662510 1 leaderelection.go:325] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: Unauthorized
E0223 11:41:01.664787 1 leaderelection.go:325] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: Unauthorized
E0223 11:41:03.664855 1 leaderelection.go:325] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: Unauthorized
E0223 11:41:05.665244 1 leaderelection.go:325] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: Unauthorized
E0223 11:41:07.664552 1 leaderelection.go:325] error retrieving resource lock openshift-kube-scheduler/kube-scheduler: Unauthorized
E0223 11:41:08.508757 1 webhook.go:111] Failed to make webhook authenticator request: Unauthorized
E0223 11:41:08.508810 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Unauthorized]
E0223 11:41:08.508956 1 writers.go:107] apiserver was unable to write a JSON response: no kind is registered for the type v1.Status in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30"
E0223 11:41:08.508993 1 status.go:71] apiserver received an error that is not an metav1.Status: &runtime.notRegisteredErr{schemeName:"k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30", gvk:schema.GroupVersionKind{Group:"", Version:"", Kind:""}, target:runtime.GroupVersioner(nil), t:(*reflect.rtype)(0x1bb25a0)}
I0223 11:41:09.659870 1 leaderelection.go:278] failed to renew lease openshift-kube-scheduler/kube-scheduler: timed out waiting for the condition
E0223 11:41:09.659962 1 leaderelection.go:301] Failed to release lock: resource name may not be empty
F0223 11:41:09.659986 1 server.go:211] leaderelection lost
reason: Error
startedAt: "2020-12-24T07:21:45Z"
see below in the logs:
===========================
E0223 11:52:44.444938 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
E0223 11:52:47.255029 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.PodDisruptionBudget: failed to list *v1beta1.PodDisruptionBudget: Unauthorized
E0223 11:52:49.530676 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.ReplicationController: failed to list *v1.ReplicationController: Unauthorized
E0223 11:52:51.708276 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.PersistentVolumeClaim: failed to list *v1.PersistentVolumeClaim: Unauthorized
E0223 11:52:52.464690 1 webhook.go:111] Failed to make webhook authenticator request: Unauthorized
E0223 11:52:52.464868 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Unauthorized]
E0223 11:52:52.464983 1 writers.go:107] apiserver was unable to write a JSON response: no kind is registered for the type v1.Status in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30"
E0223 11:52:52.465087 1 status.go:71] apiserver received an error that is not an metav1.Status: &runtime.notRegisteredErr{schemeName:"k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30", gvk:schema.GroupVersionKind{Group:"", Version:"", Kind:""}, target:runtime.GroupVersioner(nil), t:(*reflect.rtype)(0x1bb25a0)}
E0223 11:53:05.446270 1 webhook.go:111] Failed to make webhook authenticator request: Unauthorized
E0223 11:53:05.446357 1 authentication.go:53] Unable to authenticate the request due to an error: [invalid bearer token, Unauthorized]
E0223 11:53:05.446520 1 writers.go:107] apiserver was unable to write a JSON response: no kind is registered for the type v1.Status in scheme "k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30"
E0223 11:53:05.446552 1 status.go:71] apiserver received an error that is not an metav1.Status: &runtime.notRegisteredErr{schemeName:"k8s.io/kubernetes/pkg/api/legacyscheme/scheme.go:30", gvk:schema.GroupVersionKind{Group:"", Version:"", Kind:""}, target:runtime.GroupVersioner(nil), t:(*reflect.rtype)(0x1bb25a0)}
E0223 11:53:07.776379 1 reflector.go:127] k8s.io/kubernetes/cmd/kube-scheduler/app/server.go:193: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized
E0223 11:53:08.482780 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Pod: failed to list *v1.Pod: Unauthorized
E0223 11:53:09.409374 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: Unauthorized
E0223 11:53:09.674412 1 reflector.go:127] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.PersistentVolume: failed to list *v1.PersistentVolume: Unauthorized
I0223 11:53:11.409175 1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials
I0223 11:53:11.437030 1 cert_rotation.go:88] certificate rotation detected, shutting down client connections to start using new credentials
Could you please let me know how else i can reproduce the issue here ? Also if i cannot reproduce the issue, is there a way for us to validate it in 4.7 cluster?
I have also tried moving to 365 days but when checked with xxia he recommended me to use 61 days due to this https://source.redhat.com/personal_blogs/ocp4_why_ca_cert_will_be_refreshed_at_292_day_when_the_cert_have_365days_validity
discussed on slack, Rama is trying again with updated script Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |