2080471 – podAntiAffinity not respected during races (as seen in test: [sig-scheduling][Early] The openshift-etcd pods should be scheduled on different nodes [Suite:openshift/conformance/parallel])

Bug 2080471 - podAntiAffinity not respected during races (as seen in test: [sig-scheduling][Early] The openshift-etcd pods should be scheduled on different nodes [Suite:openshift/conformance/parallel])

Summary: podAntiAffinity not respected during races (as seen in test: [sig-scheduling]...

Keywords:
Status:	CLOSED DUPLICATE of bug 2062459
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Maciej Szulik
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-29 18:04 UTC by Andreas Karis
Modified:	2022-05-04 12:43 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-05-04 12:43:04 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andreas Karis 2022-04-29 18:04:47 UTC

Description of problem:

We hit this issue here in CI:
~~~
{  fail [github.com/openshift/origin/test/extended/scheduling/pods.go:151]: Apr 11 06:38:49.338: ns/openshift-etcd pod etcd-quorum-guard-7cbbc8db97-d9bv8 and pod etcd-quorum-guard-7cbbc8db97-5pfvx are running on the same node: ip-10-0-183-222.us-east-2.compute.internal}
~~~

Here's the situation from the must-gather:
~~~
[akaris@linux sdn2958]$ omg get pods -A -o wide | grep etcd
openshift-etcd                                    etcd-ip-10-0-140-126.us-east-2.compute.internal                            4/4    Running    0         55m    10.0.140.126  ip-10-0-140-126.us-east-2.compute.internal
openshift-etcd                                    etcd-ip-10-0-183-222.us-east-2.compute.internal                            4/4    Running    0         57m    10.0.183.222  ip-10-0-183-222.us-east-2.compute.internal
openshift-etcd                                    etcd-ip-10-0-247-37.us-east-2.compute.internal                             4/4    Running    0         54m    10.0.247.37   ip-10-0-247-37.us-east-2.compute.internal
openshift-etcd                                    etcd-quorum-guard-7cbbc8db97-5pfvx                                         1/1    Running    0         1h14m  10.0.183.222  ip-10-0-183-222.us-east-2.compute.internal
openshift-etcd                                    etcd-quorum-guard-7cbbc8db97-d9bv8                                         1/1    Running    0         1h14m  10.0.183.222  ip-10-0-183-222.us-east-2.compute.internal
openshift-etcd                                    etcd-quorum-guard-7cbbc8db97-vbjqr                                         1/1    Running    0         1h14m  10.0.140.126  ip-10-0-140-126.us-east-2.compute.internal
~~~

Looks like a scheduler issue to me.

The quorum-guards have the following antiaffinity:
~~~
[akaris@linux sdn2958]$ omg get pod -n openshift-etcd                                    etcd-quorum-guard-7cbbc8db97-5pfvx -o yaml | grep -i affinity  -A10
        f:affinity:
          .: {}
          f:podAffinity:
            .: {}
            f:requiredDuringSchedulingIgnoredDuringExecution: {}
          f:podAntiAffinity:
            .: {}
            f:requiredDuringSchedulingIgnoredDuringExecution: {}
        f:containers:
          k:{"name":"guard"}:
            .: {}
            f:args: {}
            f:command: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
--
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: k8s-app
            operator: In
            values:
            - etcd
        topologyKey: kubernetes.io/hostname
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: k8s-app
            operator: In
            values:
            - etcd-quorum-guard
        topologyKey: kubernetes.io/hostname
  containers:
  - args:
~~~

~~~
[akaris@linux sdn2958]$ omg get co kube-apiserver -o yaml | tail
    name: ''
    resource: apirequestcounts
  versions:
  - name: raw-internal
    version: 4.11.0-0.nightly-2022-04-11-055105
  - name: kube-apiserver
    version: 1.23.3
  - name: operator
    version: 4.11.0-0.nightly-2022-04-11-055105
~~~

Both of these pods are scheduled exactly at the same time:
~~~
gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn/1513396005347790848/build-log.txt:Apr 11 06:38:45.719 - 2887s I ns/openshift-etcd pod/etcd-quorum-guard-7cbbc8db97-d9bv8 uid/3cdd1c21-d150-4a08-af4a-2d10ee0bd93f constructed/true reason/Scheduled node/ip-10-0-183-222.us-east-2.compute.internal
gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn/1513396005347790848/build-log.txt:Apr 11 06:38:45.719 - 2887s I ns/openshift-etcd pod/etcd-quorum-guard-7cbbc8db97-5pfvx uid/9248a8ca-8493-423f-80ae-48ec9a66ca9d constructed/true reason/Scheduled node/ip-10-0-183-222.us-east-2.compute.internal
gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn/1513396005347790848/build-log.txt:Apr 11 06:38:46.169 I ns/openshift-etcd pod/etcd-quorum-guard-7cbbc8db97-5pfvx node/ip-10-0-183-222.us-east-2.compute.internal uid/9248a8ca-8493-423f-80ae-48ec9a66ca9d reason/Scheduled node/ip-10-0-183-222.us-east-2.compute.internal
gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.11-e2e-aws-ovn/1513396005347790848/build-log.txt:Apr 11 06:38:46.170 I ns/openshift-etcd pod/etcd-quorum-guard-7cbbc8db97-d9bv8 node/ip-10-0-183-222.us-east-2.compute.internal uid/3cdd1c21-d150-4a08-af4a-2d10ee0bd93f reason/Scheduled node/ip-10-0-183-222.us-east-2.compute.internal
~~~

So I suspect that the scheduler might not be honoring podAntiAffinity during races?


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 ravig 2022-05-04 12:43:04 UTC

Dup of https://bugzilla.redhat.com/show_bug.cgi?id=2062459

*** This bug has been marked as a duplicate of bug 2062459 ***

Comment 4 ravig 2022-05-04 12:43:56 UTC

Feel free to open if you think otherwise.

Note You need to log in before you can comment on or make changes to this bug.