Bug 1834908
| Summary: | Frequent restarts of kubescheduler pods on baremetal deployments | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Sai Sindhur Malleni <smalleni> |
| Component: | Networking | Assignee: | Jacob Tanenbaum <jtanenba> |
| Networking sub component: | ovn-kubernetes | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | aconstan, aos-bugs, dblack, jtaleric, mfojtik, mkarg |
| Version: | 4.5 | Keywords: | Performance |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-20 16:32:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Also, we checked etcd fsync latency and it was about 2ms which should be acceptable. This looks like a networking issue where kube-scheduler can't reach kube-apiserver which should be available at https://localhost:6443/ I'm sending this to networking since that looks like a misconfiguration. *** This bug has been marked as a duplicate of bug 1837992 *** |
Description of problem: This is an OCP 4.5 deployment on baremetal with OVNKubernetes as the SDN. WE are seeing frequent restarts of the kube-scheduler pods [kni@e19-h24-b01-fc640 etcd-perf]$ oc get pods NAME READY STATUS RESTARTS AGE installer-2-master-0 0/1 Completed 0 3d18h installer-3-master-0 0/1 Completed 0 3d18h installer-3-master-1 0/1 Completed 0 3d18h installer-3-master-2 0/1 Completed 0 3d18h installer-4-master-0 0/1 Completed 0 3d18h installer-4-master-1 0/1 Completed 0 3d18h installer-4-master-2 0/1 Completed 0 3d18h installer-5-master-0 0/1 Completed 0 3d18h installer-5-master-1 0/1 Completed 0 3d18h installer-5-master-2 0/1 Completed 0 3d18h openshift-kube-scheduler-master-0 2/2 Running 12 3d18h openshift-kube-scheduler-master-1 2/2 Running 14 3d18h openshift-kube-scheduler-master-2 2/2 Running 9 3d18h revision-pruner-2-master-0 0/1 Completed 0 3d18h revision-pruner-3-master-0 0/1 Completed 0 3d18h revision-pruner-3-master-1 0/1 Completed 0 3d18h revision-pruner-3-master-2 0/1 Completed 0 3d18h revision-pruner-4-master-0 0/1 Completed 0 3d18h revision-pruner-4-master-1 0/1 Completed 0 3d18h revision-pruner-4-master-2 0/1 Completed 0 3d18h revision-pruner-5-master-0 0/1 Completed 0 3d18h revision-pruner-5-master-1 0/1 Completed 0 3d18h revision-pruner-5-master-2 0/1 Completed 0 3d18h Looking at one of the pods we see State: Running Started: Tue, 12 May 2020 15:40:33 +0000 Last State: Terminated Reason: Error Message: e: Get https://localhost:6443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=2912586&timeout=9m59s&timeoutSeconds=599&watch=true: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.811267 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1beta1.PodDisruptionBudget: Get https://localhost:6443/apis/policy/v1beta1/poddisruptionbudgets?allowWatchBookmarks=true&resourceVersion=2721641&timeout=9m43s&time outSeconds=583&watch=true: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.812242 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolume: Get https://localhost:6443/api/v1/persistentvolumes?allowWatchBookmarks=true&resourceVersion=2721639&timeout=8m15s&timeoutSeconds=495&watch=tru e: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.813289 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolumeClaim: Get https://localhost:6443/api/v1/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=2721639&timeout=8m42s&timeoutSeconds=522 &watch=true: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.814445 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.Service: Get https://localhost:6443/api/v1/services?allowWatchBookmarks=true&resourceVersion=2869975&timeout=9m34s&timeoutSeconds=574&watch=true: dial tcp [::1]: 6443: connect: connection refused E0512 15:40:31.815408 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.StorageClass: Get https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?allowWatchBookmarks=true&resourceVersion=2721642&timeout=5m41s&timeoutSeconds=341& watch=true: dial tcp [::1]:6443: connect: connection refused I0512 15:40:32.638778 1 leaderelection.go:277] failed to renew lease openshift-kube-scheduler/kube-scheduler: timed out waiting for the condition Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-05-04-113741 How reproducible: 100% Steps to Reproduce: 1. Deploy cluster on baremetal with OVN Kubernetes 2. RUn some workloads like running pods 3. Actual results: Kube-scheduler restarts frequently Expected results: Should not see restarts/connect timeouts Additional info: