Description of problem: While monitoring the 4.7 CI signal, and noticing the openstack platform had degraded a bit, we noticed that the following test case is failing periodically: (NOTE: it is not isolated to the openstack platform) [sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] I cannot find any root causes in the logs yet, but here is a failed job: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1349053534636609536 Also, I can reproduce this on a cluster-bot cluster. Just launching 4.7 on aws and running the following simple steps: ❯ sudo podman run -it -v $LOCAL_KUBECONFIG_FROM_CLUSTER_BOT:/root/.kube/config:z -i quay.io/openshift/origin-tests:4.7 /bin/bash [root@55f251f22bc9 /]# export KUBECONFIG=/root/.kube/config [root@55f251f22bc9 /]# time while true; do openshift-tests run-test "[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]" if [ $? -ne 0 ]; then break fi done; That will loop the single test case and stop once it fails. I was able to reproduce it a few times in the few hours my cluster was available. Unfortunately, my cluster aged out before I had time to try to collect more debugs than we get from the logs in CI. Version-Release number of selected component (if applicable): 4.7 How reproducible: infrequent, but this search shows it happening a handful of times: https://search.ci.openshift.org/?search=validates+basic+preemption+works&maxAge=168h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job Steps to Reproduce: see description above
I was able to poke around the master nodes after I caught a failure. again, I lost the cluster before figuring out too much, but I did see the below snippet (which I don't find in the logs from CI, so hoping it's a little bit helpful): in this file: pods/openshift-kube-scheduler_openshift-kube-scheduler-ip-10-0-132-85.us-west-2.compute.internal_59db6637-ae92-4357-b607-9312ce4dab44/kube-scheduler/0.log 2021-01-13T21:35:07.403973257+00:00 stderr F I0113 21:35:07.403873 1 scheduler.go:592] "Successfully bound pod to node" pod="e2e-sched-preemption-3378/pod0-sched-preemption-low-priority" node="ip-10-0-158-73.us-west-2.compute.internal" evaluatedNodes=6 feasibleNodes=1 2021-01-13T21:35:07.672966782+00:00 stderr F I0113 21:35:07.672915 1 scheduler.go:592] "Successfully bound pod to node" pod="e2e-sched-preemption-3378/pod1-sched-preemption-medium-priority" node="ip-10-0-175-169.us-west-2.compute.internal" evaluatedNodes=6 feasibleNodes=1 2021-01-13T21:35:07.884861457+00:00 stderr F I0113 21:35:07.884261 1 scheduler.go:592] "Successfully bound pod to node" pod="e2e-sched-preemption-3378/pod2-sched-preemption-medium-priority" node="ip-10-0-202-167.us-west-2.compute.internal" evaluatedNodes=6 feasibleNodes=1 2021-01-13T21:35:14.378479856+00:00 stderr F I0113 21:35:14.378416 1 factory.go:321] "Unable to schedule pod; no fit; waiting" pod="e2e-sched-preemption-3378/preemptor-pod" err="0/6 nodes are available: 3 Insufficient scheduling.k8s.io/foo, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate." 2021-01-13T21:35:14.387474415+00:00 stderr F I0113 21:35:14.387419 1 factory.go:321] "Unable to schedule pod; no fit; waiting" pod="e2e-sched-preemption-3378/preemptor-pod" err="0/6 nodes are available: 3 Insufficient scheduling.k8s.io/foo, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate." 2021-01-13T21:36:17.579211762+00:00 stderr F E0113 21:36:17.579164 1 framework.go:777] "Failed running Bind plugin" err="pods \"preemptor-pod\" is forbidden: unable to create new content in namespace e2e-sched-preemption-3378 because it is being terminated" plugin="DefaultBinder" pod="e2e-sched-preemption-3378/preemptor-pod" 2021-01-13T21:36:17.579351098+00:00 stderr F I0113 21:36:17.579310 1 scheduler.go:420] Failed to bind pod: e2e-sched-preemption-3378/preemptor-pod 2021-01-13T21:36:17.579393569+00:00 stderr F E0113 21:36:17.579377 1 factory.go:337] "Error scheduling pod; retrying" err="binding rejected: running Bind plugin \"DefaultBinder\": pods \"preemptor-pod\" is forbidden: unable to create new content in namespace e2e-sched-preemption-3378 because it is being terminated" pod="e2e-sched-preemption-3378/preemptor-pod" 2021-01-13T21:36:20.361326967+00:00 stderr F E0113 21:36:20.361207 1 event_broadcaster.go:253] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"preemptor-pod.1659e851eed18ccf", GenerateName:"", Namespace:"e2e-sched-preemption-3378", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, EventTime:v1.MicroTime{Time:time.Time{wall:0xbff7f7d5155c2ae8, ext:16359491835897, loc:(*time.Location)(0x2ceed00)}}, Series:(*v1.EventSeries)(nil), ReportingController:"default-scheduler", ReportingInstance:"default-scheduler-ip-10-0-132-85", Action:"Scheduling", Reason:"FailedScheduling", Regarding:v1.ObjectReference{Kind:"Pod", Namespace:"e2e-sched-preemption-3378", Name:"preemptor-pod", UID:"fad0c735-b398-4dfc-a2d5-8774b1db4474", APIVersion:"v1", ResourceVersion:"127277", FieldPath:""}, Related:(*v1.ObjectReference)(nil), Note:"skip schedule deleting pod: e2e-sched-preemption-3378/preemptor-pod", Type:"Warning", DeprecatedSource:v1.EventSource{Component:"", Host:""}, DeprecatedFirstTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeprecatedLastTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeprecatedCount:0}': 'events.events.k8s.io "preemptor-pod.1659e851eed18ccf" is forbidden: unable to create new content in namespace e2e-sched-preemption-3378 because it is being terminated' (will not retry!) These are logs that came for the test case iteration that failed. You can see the timestamps that it's aprox 1m with some messages about 0 nodes available, then finally the 'Error scheduling pod' which I believe comes because the test case has given up and probably tearing things down.... deletes the namespace and the scheduler realizes it on the next retry to find a node to schedule on and gives up.
Checking the recent failures, every test timeouts after 1 minute of waiting for the preemptor pod to be scheduled with a similar kube-scheduler logs like these: ``` I0115 04:44:43.912012 1 factory.go:321] "Unable to schedule pod; no fit; waiting" pod="e2e-sched-preemption-930/preemptor-pod" err="0/6 nodes are available: 3 Insufficient scheduling.k8s.io/foo, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate." I0115 04:44:43.920822 1 factory.go:321] "Unable to schedule pod; no fit; waiting" pod="e2e-sched-preemption-930/preemptor-pod" err="0/6 nodes are available: 3 Insufficient scheduling.k8s.io/foo, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate." ... E0115 04:45:49.610424 1 framework.go:777] "Failed running Bind plugin" err="pods \"preemptor-pod\" is forbidden: unable to create new content in namespace e2e-sched-preemption-930 because it is being terminated" plugin="DefaultBinder" pod="e2e-sched-preemption-930/preemptor-pod" I0115 04:45:49.610463 1 scheduler.go:420] Failed to bind pod: e2e-sched-preemption-930/preemptor-pod ``` - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.7/1350446580682985472 - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.7/1350355743152279552 - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-serial-4.7/1349924268133060608 It seems it takes slightly over a minute to bind the preemptor-pod and then some time to actually run it. In some rare cases the preemptor pod gets scheduled (https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.7/1349872296105349120): ``` I0115 02:23:50.826264 1 factory.go:321] "Unable to schedule pod; no fit; waiting" pod="e2e-sched-preemption-3163/preemptor-pod" err="0/6 nodes are available: 3 Insufficient scheduling.k8s.io/foo, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate." I0115 02:23:50.841743 1 factory.go:321] "Unable to schedule pod; no fit; waiting" pod="e2e-sched-preemption-3163/preemptor-pod" err="0/6 nodes are available: 3 Insufficient scheduling.k8s.io/foo, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate." I0115 02:24:49.538649 1 scheduler.go:592] "Successfully bound pod to node" pod="e2e-sched-preemption-3163/preemptor-pod" node="ip-10-0-146-177.us-west-1.compute.internal" evaluatedNodes=6 feasibleNodes=1 ``` Though it's pretty tight so the test still does not wait long enough for the pod to get running.
I don't see this really happening in 4.6, but it is there in 4.7 and 4.8 which makes me think it's potentially a real regression in performance to bring up a pod. Also, there is this bz (https://bugzilla.redhat.com/show_bug.cgi?id=1915085) sort of calling out a similar thing that could explain this problem. Not that increasing the timeout from 1m to 2m would be wrong, because if a pod was really taking 5m to come up we'd still be failing these test cases. just fyi
(In reply to jamo luhrsen from comment #3) > I don't see this really happening in 4.6, but it is there in 4.7 and 4.8 > which makes me think it's > potentially a real regression in performance to bring up a pod. Also, there > is this bz (https://bugzilla.redhat.com/show_bug.cgi?id=1915085) > sort of calling out a similar thing that could explain this problem. btw, the upstream issue (https://github.com/kubernetes/kubernetes/issues/98142) noted in that bz does have some chatter and a proposed hack/fix that could help. Maybe we can wait for that issue to be resolved before increasing the test timeout? > > Not that increasing the timeout from 1m to 2m would be wrong, because if a > pod was really taking > 5m to come up we'd still be failing these test cases. > > just fyi
The upstream PR already merged in o/k. The change would land in the next rebase anyway. Thanks for pointing out the other bug. Seems like it would be better to have a different test to measure the kubelet performance for runc in kubelet test suite.
From the link below see that test has not been failing for about 7 days, so moving the bug to verified state. https://search.ci.openshift.org/?search=validates+basic+preemption+works&maxAge=168h&context=1&type=junit&name=&maxMatches=5&maxBytes=20971520&groupBy=job
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633