Bunch of sig-scheduling conformance tests failing in a single node cluster instalation: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1364955400901758976 ``` [sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s] expand_less 2m5s fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error: <*errors.errorString | 0xc0002d49b0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred [sig-scheduling] SchedulerPredicates [Serial] validates that there is no conflict between pods with same hostPort but different hostIP and protocol [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] expand_less 1m0s fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error: <*errors.errorString | 0xc0002949a0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred [sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] expand_less 1m1s fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Feb 25 16:39:56.245: We need at least two pods to be created butall nodes are already heavily utilized, so preemption tests cannot be run [sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s] expand_less 1m5s fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error: <*errors.errorString | 0xc0002969a0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] expand_less 2m5s fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error: <*errors.errorString | 0xc0002d49b0>: { s: "timed out waiting for the condition", } timed out waiting for the condition occurred [sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] expand_less 1m1s fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Feb 25 16:52:20.968: We need at least two pods to be created butall nodes are already heavily utilized, so preemption tests cannot be run ``` We need to decide which tests can be skipped and which can be redesigned to test the basic scheduling capabilities for a single node such as insufficient resource available, priorities&preemption and filters (e.g. taints).
[sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s] - requires 2 nodes by default PodTopologySpread makes sense in two or more nodes scenarios. The feature place pods with aim to minimize skew between topology domains. Not applicable for a single domain. [sig-scheduling] SchedulerPredicates [Serial] validates that there is no conflict between pods with same hostPort but different hostIP and protocol [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] ``` Feb 25 16:22:30.656: INFO: At 2021-02-25 16:21:36 +0000 UTC - event for without-label: {kubelet ip-10-0-183-130.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_without-label_e2e-sched-pred-8365_bbec2192-36d7-44ac-abd8-069723f8c565_0(35c716f7bae318cb47da4ac2fb917dca29cbe5750d5a78da76c69e1c7df0cfd2): [e2e-sched-pred-8365/without-label:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'failed to find netid for namespace: e2e-sched-pred-8365, netnamespaces.network.openshift.io "e2e-sched-pred-8365" not found ``` Possibly a flake [sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] - the test does not require two nodes though it creates two pods, each consuming 2/3 of "scheduling.k8s.io/foo" extended resources, which in total consume 4/3 - the idea is to have two pods (one low level priority, the other high level priority), once a third pod gets scheduler, the test make sure only the low level priority pod is scheduled and the high level priority pod is never preempted. We might update the condition the have the same number of priority pods as there is number of nodes. Though we need at least two pods so we can check only the low level pod is always preempted. We might have each node run two pods. First node will run low and high priority pod (each eating 2/5 of the extended resource) and all other nodes running just high priority pods: - 2/5 + 2/5 will consume 4/5, leaving no resources for the third (preemptor) pod - the third pod will then always have to preempt the low priority pod while still keeping the original intention of the test [sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s] - requires 2 nodes by default PodTopologySpread makes sense in two or more nodes scenarios. The feature place pods with aim to minimize skew between topology domains. Not applicable for a single domain. [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] - priorities require at least two nodes to get evaluated [sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] - the same as in the case of "[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works", i.e. creating two pods instead of one consuming 2/5 of the extended resource Summarized: - skip: [sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] - redesign: [sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] [sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] - flaking: [sig-scheduling] SchedulerPredicates [Serial] validates that there is no conflict between pods with same hostPort but different hostIP and protocol [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] Checking other jobs: - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366431373765644288 - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366397921024544768 - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366370466431766528 - https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366339159647588352 Only 5 sig-scheduling tests (to skip and to redesigne) are failing: [sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] [sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
Upstream PR: https://github.com/kubernetes/kubernetes/pull/100128
Waiting for upstream review
Waiting for https://github.com/openshift/origin/pull/26054 to land
https://github.com/openshift/origin/pull/26054 merged. Re-running the tests again the following sig-scheduling tests are failing now: - [sig-scheduling] SchedulerPredicates [Serial] validates that NodeSelector is respected if not matching [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] - [sig-scheduling] SchedulerPredicates [Serial] validates that NodeAffinity is respected if not matching [Suite:openshift/conformance/serial] [Suite:k8s] - [sig-scheduling] SchedulerPredicates [Serial] validates resource limits of pods that are allowed to run [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] - [sig-scheduling] SchedulerPredicates [Serial] validates pod overhead is considered along with resource limits of pods that are allowed to run verify pod overhead is accounted for [Suite:openshift/conformance/serial] [Suite:k8s] All due to (image-registry's pod name sufix may be different): ``` May 10 13:35:36.601: INFO: Timed out waiting for the following pods to schedule May 10 13:35:36.601: INFO: openshift-image-registry/image-registry-746897d64f-stgls May 10 13:35:36.601: FAIL: Timed out after 10m0s waiting for stable cluster. ``` The kube-scheduler logs says: ``` I0510 14:50:03.461339 1 factory.go:338] "Unable to schedule pod; no fit; waiting" pod="openshift-image-registry/image-registry-746897d64f-stgls" err="0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules." ``` From image-registry-746897d64f-stgls's manifest: ``` "spec": { "affinity": { "podAntiAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": [ { "labelSelector": { "matchLabels": { "docker-registry": "default" } }, "namespaces": [ "openshift-image-registry" ], "topologyKey": "kubernetes.io/hostname" } ] } }, ``` Checking https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/17822/rehearse-17822-pull-ci-openshift-origin-master-e2e-aws-single-node-serial/1391727856265990144/artifacts/e2e-aws-single-node-serial/gather-extra/artifacts/pods.json, there are two instances of image-registry-746897d64f pod. Thus the reason why the second instance of the pod can't be scheduled.
Based on the previous comment none of the original tests from [sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] [sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] are failing. Moving to MODIFIED.
Verified that all cases mentioned in comment 9 in link [1] and see that they do not have any failures in the past 48 hours, so moving the bug to verified state. [1] https://search.ci.openshift.org/
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438