Description of problem: This is an OCP 4.5 deployment on baremetal with OVNKubernetes as the SDN. WE are seeing frequent restarts of the kube-scheduler pods [kni@e19-h24-b01-fc640 etcd-perf]$ oc get pods NAME READY STATUS RESTARTS AGE installer-2-master-0 0/1 Completed 0 3d18h installer-3-master-0 0/1 Completed 0 3d18h installer-3-master-1 0/1 Completed 0 3d18h installer-3-master-2 0/1 Completed 0 3d18h installer-4-master-0 0/1 Completed 0 3d18h installer-4-master-1 0/1 Completed 0 3d18h installer-4-master-2 0/1 Completed 0 3d18h installer-5-master-0 0/1 Completed 0 3d18h installer-5-master-1 0/1 Completed 0 3d18h installer-5-master-2 0/1 Completed 0 3d18h openshift-kube-scheduler-master-0 2/2 Running 12 3d18h openshift-kube-scheduler-master-1 2/2 Running 14 3d18h openshift-kube-scheduler-master-2 2/2 Running 9 3d18h revision-pruner-2-master-0 0/1 Completed 0 3d18h revision-pruner-3-master-0 0/1 Completed 0 3d18h revision-pruner-3-master-1 0/1 Completed 0 3d18h revision-pruner-3-master-2 0/1 Completed 0 3d18h revision-pruner-4-master-0 0/1 Completed 0 3d18h revision-pruner-4-master-1 0/1 Completed 0 3d18h revision-pruner-4-master-2 0/1 Completed 0 3d18h revision-pruner-5-master-0 0/1 Completed 0 3d18h revision-pruner-5-master-1 0/1 Completed 0 3d18h revision-pruner-5-master-2 0/1 Completed 0 3d18h Looking at one of the pods we see State: Running Started: Tue, 12 May 2020 15:40:33 +0000 Last State: Terminated Reason: Error Message: e: Get https://localhost:6443/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=2912586&timeout=9m59s&timeoutSeconds=599&watch=true: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.811267 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1beta1.PodDisruptionBudget: Get https://localhost:6443/apis/policy/v1beta1/poddisruptionbudgets?allowWatchBookmarks=true&resourceVersion=2721641&timeout=9m43s&time outSeconds=583&watch=true: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.812242 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolume: Get https://localhost:6443/api/v1/persistentvolumes?allowWatchBookmarks=true&resourceVersion=2721639&timeout=8m15s&timeoutSeconds=495&watch=tru e: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.813289 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.PersistentVolumeClaim: Get https://localhost:6443/api/v1/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=2721639&timeout=8m42s&timeoutSeconds=522 &watch=true: dial tcp [::1]:6443: connect: connection refused E0512 15:40:31.814445 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.Service: Get https://localhost:6443/api/v1/services?allowWatchBookmarks=true&resourceVersion=2869975&timeout=9m34s&timeoutSeconds=574&watch=true: dial tcp [::1]: 6443: connect: connection refused E0512 15:40:31.815408 1 reflector.go:380] k8s.io/client-go/informers/factory.go:135: Failed to watch *v1.StorageClass: Get https://localhost:6443/apis/storage.k8s.io/v1/storageclasses?allowWatchBookmarks=true&resourceVersion=2721642&timeout=5m41s&timeoutSeconds=341& watch=true: dial tcp [::1]:6443: connect: connection refused I0512 15:40:32.638778 1 leaderelection.go:277] failed to renew lease openshift-kube-scheduler/kube-scheduler: timed out waiting for the condition Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-05-04-113741 How reproducible: 100% Steps to Reproduce: 1. Deploy cluster on baremetal with OVN Kubernetes 2. RUn some workloads like running pods 3. Actual results: Kube-scheduler restarts frequently Expected results: Should not see restarts/connect timeouts Additional info:
Also, we checked etcd fsync latency and it was about 2ms which should be acceptable.
This looks like a networking issue where kube-scheduler can't reach kube-apiserver which should be available at https://localhost:6443/ I'm sending this to networking since that looks like a misconfiguration.
*** This bug has been marked as a duplicate of bug 1837992 ***