1934085 – Scheduling conformance tests failing in a single node cluster

Bug 1934085 - Scheduling conformance tests failing in a single node cluster

Summary: Scheduling conformance tests failing in a single node cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Jan Chaloupka
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-02 13:38 UTC by Jan Chaloupka
Modified:	2021-07-27 22:51 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 22:49:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubernetes kubernetes pull 100128	None	closed	[sig-scheduling] SchedulerPreemption\|SchedulerPredicates\|SchedulerPriorities: adjust some e2e tests to run in a single n...	2021-04-14 06:48:14 UTC
Github	openshift kubernetes pull 665	None	closed	bug 1934085: UPSTREAM: 100128: [sig-scheduling] SchedulerPreemption\|SchedulerPredicates\|SchedulerPriorities: adjust some...	2021-04-19 08:07:37 UTC
Github	openshift origin pull 26054	None	open	Bug 1949050: bump(k8s.io/*): 1.21	2021-04-19 12:49:22 UTC
Red Hat Product Errata	RHSA-2021:2438	None	None	None	2021-07-27 22:51:26 UTC

Description Jan Chaloupka 2021-03-02 13:38:59 UTC

Bunch of sig-scheduling conformance tests failing in a single node cluster instalation:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1364955400901758976

```
[sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s] expand_less	2m5s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error:
    <*errors.errorString | 0xc0002d49b0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

[sig-scheduling] SchedulerPredicates [Serial] validates that there is no conflict between pods with same hostPort but different hostIP and protocol [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] expand_less	1m0s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error:
    <*errors.errorString | 0xc0002949a0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] expand_less	1m1s
fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Feb 25 16:39:56.245: We need at least two pods to be created butall nodes are already heavily utilized, so preemption tests cannot be run

[sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s] expand_less	1m5s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error:
    <*errors.errorString | 0xc0002969a0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] expand_less	2m5s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/predicates.go:942]: Unexpected error:
    <*errors.errorString | 0xc0002d49b0>: {
        s: "timed out waiting for the condition",
    }
    timed out waiting for the condition
occurred

[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s] expand_less	1m1s
fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Feb 25 16:52:20.968: We need at least two pods to be created butall nodes are already heavily utilized, so preemption tests cannot be run
```

We need to decide which tests can be skipped and which can be redesigned to test the basic scheduling capabilities for a single node such as insufficient resource available, priorities&preemption and filters (e.g. taints).

Comment 3 Jan Chaloupka 2021-03-11 11:06:39 UTC

[sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s]
- requires 2 nodes by default
PodTopologySpread makes sense in two or more nodes scenarios. The feature place pods with aim to minimize skew between topology domains. Not applicable for a single domain.


[sig-scheduling] SchedulerPredicates [Serial] validates that there is no conflict between pods with same hostPort but different hostIP and protocol [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
```
Feb 25 16:22:30.656: INFO: At 2021-02-25 16:21:36 +0000 UTC - event for without-label: {kubelet ip-10-0-183-130.ec2.internal} FailedCreatePodSandBox: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_without-label_e2e-sched-pred-8365_bbec2192-36d7-44ac-abd8-069723f8c565_0(35c716f7bae318cb47da4ac2fb917dca29cbe5750d5a78da76c69e1c7df0cfd2): [e2e-sched-pred-8365/without-label:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'failed to find netid for namespace: e2e-sched-pred-8365, netnamespaces.network.openshift.io "e2e-sched-pred-8365" not found
```
Possibly a flake


[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
- the test does not require two nodes though it creates two pods, each consuming 2/3 of "scheduling.k8s.io/foo" extended resources, which in total consume 4/3
- the idea is to have two pods (one low level priority, the other high level priority), once a third pod gets scheduler, the test make sure only the low level priority pod is scheduled and the high level priority pod is never preempted.
We might update the condition the have the same number of priority pods as there is number of nodes. Though we need at least two pods so we can check only the low level pod is always preempted. We might have each node run two pods. First node will run low and high priority pod (each eating 2/5 of the extended resource) and all other nodes running just high priority pods:
- 2/5 + 2/5 will consume 4/5, leaving no resources for the third (preemptor) pod
- the third pod will then always have to preempt the low priority pod while still keeping the original intention of the test


[sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s]
- requires 2 nodes by default
PodTopologySpread makes sense in two or more nodes scenarios. The feature place pods with aim to minimize skew between topology domains. Not applicable for a single domain.


[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]
- priorities require at least two nodes to get evaluated


[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
- the same as in the case of "[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works", i.e. creating two pods instead of one consuming 2/5 of the extended resource


Summarized:
- skip: 
[sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]
- redesign:
[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
- flaking:
[sig-scheduling] SchedulerPredicates [Serial] validates that there is no conflict between pods with same hostPort but different hostIP and protocol [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]

Checking other jobs:
- https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366431373765644288
- https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366397921024544768
- https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366370466431766528
- https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/16290/rehearse-16290-pull-ci-openshift-machine-config-operator-master-e2e-aws-single-node-serial/1366339159647588352

Only 5 sig-scheduling tests (to skip and to redesigne) are failing:
[sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]

Comment 4 Jan Chaloupka 2021-03-11 11:46:13 UTC

Upstream PR: https://github.com/kubernetes/kubernetes/pull/100128

Comment 5 Jan Chaloupka 2021-03-18 14:08:59 UTC

Waiting for upstream review

Comment 7 Jan Chaloupka 2021-04-30 12:04:37 UTC

Waiting for https://github.com/openshift/origin/pull/26054 to land

Comment 8 Jan Chaloupka 2021-05-12 08:54:06 UTC

https://github.com/openshift/origin/pull/26054 merged.

Re-running the tests again the following sig-scheduling tests are failing now:
- [sig-scheduling] SchedulerPredicates [Serial] validates that NodeSelector is respected if not matching [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
- [sig-scheduling] SchedulerPredicates [Serial] validates that NodeAffinity is respected if not matching [Suite:openshift/conformance/serial] [Suite:k8s]
- [sig-scheduling] SchedulerPredicates [Serial] validates resource limits of pods that are allowed to run [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
- [sig-scheduling] SchedulerPredicates [Serial] validates pod overhead is considered along with resource limits of pods that are allowed to run verify pod overhead is accounted for [Suite:openshift/conformance/serial] [Suite:k8s]

All due to (image-registry's pod name sufix may be different):

```
May 10 13:35:36.601: INFO: Timed out waiting for the following pods to schedule
May 10 13:35:36.601: INFO: openshift-image-registry/image-registry-746897d64f-stgls
May 10 13:35:36.601: FAIL: Timed out after 10m0s waiting for stable cluster.
```

The kube-scheduler logs says:

```
I0510 14:50:03.461339       1 factory.go:338] "Unable to schedule pod; no fit; waiting" pod="openshift-image-registry/image-registry-746897d64f-stgls" err="0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't match pod anti-affinity rules."
```

From image-registry-746897d64f-stgls's manifest:

```
            "spec": {
                "affinity": {
                    "podAntiAffinity": {
                        "requiredDuringSchedulingIgnoredDuringExecution": [
                            {
                                "labelSelector": {
                                    "matchLabels": {
                                        "docker-registry": "default"
                                    }
                                },
                                "namespaces": [
                                    "openshift-image-registry"
                                ],
                                "topologyKey": "kubernetes.io/hostname"
                            }
                        ]
                    }
                },
```

Checking https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/17822/rehearse-17822-pull-ci-openshift-origin-master-e2e-aws-single-node-serial/1391727856265990144/artifacts/e2e-aws-single-node-serial/gather-extra/artifacts/pods.json, there are two instances of image-registry-746897d64f pod. Thus the reason why the second instance of the pod can't be scheduled.

Comment 9 Jan Chaloupka 2021-05-14 12:59:39 UTC

Based on the previous comment none of the original tests from

[sig-scheduling] SchedulerPreemption [Serial] PodTopologySpread Preemption validates proper pods are preempted [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPredicates [Serial] PodTopologySpread Filtering validates 4 pods with MaxSkew=1 are evenly distributed into 2 nodes [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPreemption [Serial] validates basic preemption works [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]
[sig-scheduling] SchedulerPreemption [Serial] validates lower priority pod preemption by critical pod [Conformance] [Suite:openshift/conformance/serial/minimal] [Suite:k8s]

are failing. Moving to MODIFIED.

Comment 11 RamaKasturi 2021-05-18 07:24:29 UTC

Verified that all cases mentioned in comment 9 in link [1] and see that they do not have any failures in the past 48 hours, so moving the bug to verified state.

[1] https://search.ci.openshift.org/

Comment 14 errata-xmlrpc 2021-07-27 22:49:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.