Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1929389

Summary: [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones
Product: OpenShift Container Platform Reporter: Surya Seetharaman <surya>
Component: kube-schedulerAssignee: Maciej Szulik <maszulik>
Status: CLOSED EOL QA Contact: RamaKasturi <knarra>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.7CC: aos-bugs, dgoodwin, fpaoline, mfojtik
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: tag-ci
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1929684 (view as bug list) Environment:
[sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones
Last Closed: 2022-05-25 11:00:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1896558    
Bug Blocks: 1929684    

Description Surya Seetharaman 2021-02-16 18:53:58 UTC
test:
[sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-scheduling%5C%5D+Multi-AZ+Clusters+should+spread+the+pods+of+a+replication+controller+across+zones


Examples:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1361634319889600512

: [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less
Run #0: Failed expand_less	48s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/ubernetes_lite.go:174]: Pods were not evenly spread across zones.  3 in one zone and 6 in another zone
Expected
    <int>: 3
to be within 2 of ~
    <int>: 0

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1361548482682294272

: [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less	14s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/ubernetes_lite.go:174]: Pods were not evenly spread across zones.  0 in one zone and 10 in another zone
Expected
    <int>: 10
to be within 2 of ~
    <int>: 0

Comment 1 Maciej Szulik 2021-02-18 10:49:47 UTC
*** Bug 1929684 has been marked as a duplicate of this bug. ***

Comment 2 Mike Dame 2021-02-23 13:45:49 UTC
This should be addressed by fixes added in https://github.com/openshift/kubernetes/pull/547 and https://github.com/openshift/kubernetes/pull/526

Comment 3 Mike Dame 2021-02-23 18:28:08 UTC
I am wondering if the fix from https://github.com/openshift/kubernetes/pull/547 (which seems to have fixed the Service spreading test) created this failure, or if this failure existed before that.

One thing I notice is that in these failures (example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.8/1364212636652146688) the pods being reported on each node are changing throughout the test. This makes it impossible for the above fix to actually balance the nodes, meaning that resource usage will interfere with the scheduling decision.

Take the output from the above test:

(before balancing)
> Feb 23 14:46:12.203: INFO: Waiting up to 1m0s for all nodes to be ready
> Feb 23 14:47:12.744: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj
> Feb 23 14:47:12.873: INFO: Pod for on the node: pod-handle-http-request, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600
> Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: test-recreate-deployment-5888b58954-2nwzf, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-5kngs, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-8pxdv, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-hdh6s, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760
> Feb 23 14:47:12.873: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-5pft2, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400
> Feb 23 14:47:12.873: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456
> Feb 23 14:47:12.873: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456
> Feb 23 14:47:12.873: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760
> Feb 23 14:47:12.873: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520
> Feb 23 14:47:12.873: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456
> Feb 23 14:47:12.873: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600
> Feb 23 14:47:12.873: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-rhvb9, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520
> Feb 23 14:47:12.873: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120
> Feb 23 14:47:12.873: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960
> Feb 23 14:47:12.873: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440
> Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400
> Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:12.873: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320
> Feb 23 14:47:12.873: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992
> Feb 23 14:47:12.873: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400
> Feb 23 14:47:12.873: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120
> Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040
> Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640
> Feb 23 14:47:12.873: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400
> Feb 23 14:47:12.873: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720
> Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 828, cpuAllocatableMil: 3500, cpuFraction: 0.23657142857142857
> Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4937744384, memAllocatableVal: 14568333312, memFraction: 0.33893680754357525
> Feb 23 14:47:12.873: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428
> Feb 23 14:47:13.028: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-4fcgf, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-6zwpd, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-ddjnt, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-kzt2n, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: gluster-server, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400
> Feb 23 14:47:13.028: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-hook-pre, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: pod-6b81707d-e327-4646-86e7-4018c3794134, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400
> Feb 23 14:47:13.028: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800
> Feb 23 14:47:13.028: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456
> Feb 23 14:47:13.028: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760
> Feb 23 14:47:13.028: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520
> Feb 23 14:47:13.028: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600
> Feb 23 14:47:13.028: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960
> Feb 23 14:47:13.028: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400
> Feb 23 14:47:13.028: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120
> Feb 23 14:47:13.028: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640
> Feb 23 14:47:13.028: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400
> Feb 23 14:47:13.028: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720
> Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 439, cpuAllocatableMil: 3500, cpuFraction: 0.12542857142857142
> Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 1757413376, memAllocatableVal: 14568333312, memFraction: 0.12063242502506522
> Feb 23 14:47:13.028: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t
> Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-2zcbt, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-7tbcx, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-qsj72, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: gluster-client, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: agnhost-pod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400
> Feb 23 14:47:13.234: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800
> Feb 23 14:47:13.234: INFO: Pod for on the node: sample-webhook-deployment-7fdfd97c84-bqscf, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400
> Feb 23 14:47:13.234: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800
> Feb 23 14:47:13.234: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456
> Feb 23 14:47:13.234: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456
> Feb 23 14:47:13.234: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760
> Feb 23 14:47:13.234: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520
> Feb 23 14:47:13.234: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456
> Feb 23 14:47:13.234: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600
> Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520
> Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520
> Feb 23 14:47:13.234: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120
> Feb 23 14:47:13.234: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960
> Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400
> Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:13.234: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992
> Feb 23 14:47:13.234: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400
> Feb 23 14:47:13.234: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120
> Feb 23 14:47:13.234: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640
> Feb 23 14:47:13.234: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400
> Feb 23 14:47:13.234: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720
> Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 756, cpuAllocatableMil: 3500, cpuFraction: 0.216
> Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4423942144, memAllocatableVal: 14568333312, memFraction: 0.30366837779281036
> Feb 23 14:47:13.327: INFO: Waiting for running...
> Feb 23 14:47:23.416: INFO: Waiting for running...
> Feb 23 14:47:33.686: INFO: Waiting for running...

(after balancing)
> STEP: Compute Cpu, Mem Fraction after create balanced pods.
> Feb 23 14:47:38.737: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-attacher-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-provisioner-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-resizer-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-snapshotter-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpathplugin-0, Cpu: 300, Mem: 629145600
> Feb 23 14:47:39.794: INFO: Pod for on the node: inline-volume-tester-kr5cw, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: deployment-1e9e1d60-efb8-4d8a-a3a1-7443062287c6-675fd6b69bdwhct, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: f5c0a6bf-206a-485e-94a6-32762d3a07bc-0, Cpu: 358, Mem: 0
> Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-n6xdb, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: pod-e96ffac9-93ed-470f-a36e-9899cedaa49b, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-cd9w4, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: host-test-container-pod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760
> Feb 23 14:47:39.794: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: history-limit-1-5bxvx, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: bc-custom-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: exec-volume-test-preprovisionedpv-jc8b, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400
> Feb 23 14:47:39.795: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456
> Feb 23 14:47:39.795: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456
> Feb 23 14:47:39.795: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760
> Feb 23 14:47:39.795: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520
> Feb 23 14:47:39.795: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456
> Feb 23 14:47:39.795: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600
> Feb 23 14:47:39.795: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520
> Feb 23 14:47:39.795: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120
> Feb 23 14:47:39.795: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960
> Feb 23 14:47:39.795: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440
> Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400
> Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:39.795: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320
> Feb 23 14:47:39.795: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992
> Feb 23 14:47:39.795: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400
> Feb 23 14:47:39.795: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120
> Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040
> Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640
> Feb 23 14:47:39.795: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400
> Feb 23 14:47:39.795: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720
> Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 1176, cpuAllocatableMil: 3500, cpuFraction: 0.336
> Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4885315584, memAllocatableVal: 14568333312, memFraction: 0.33533798818125227
> STEP: Compute Cpu, Mem Fraction after create balanced pods.
> Feb 23 14:47:39.795: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428
> Feb 23 14:47:40.148: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: pod-init-991109f2-3e8d-45f4-93d0-b1d59d834c23, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: agnhost-primary-vknxs, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: c884b330-bf23-4f22-8086-835d75e71028-0, Cpu: 747, Mem: 3180331008
> Feb 23 14:47:40.148: INFO: Pod for on the node: client-can-connect-81-fd6ph, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400
> Feb 23 14:47:40.148: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: test-container-pod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: alpine-nnp-nil-bef53aa2-5554-4f0e-9de5-fae83135f91f, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: history-limit-2-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400
> Feb 23 14:47:40.148: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800
> Feb 23 14:47:40.148: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456
> Feb 23 14:47:40.148: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760
> Feb 23 14:47:40.148: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520
> Feb 23 14:47:40.148: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600
> Feb 23 14:47:40.148: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960
> Feb 23 14:47:40.148: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400
> Feb 23 14:47:40.148: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120
> Feb 23 14:47:40.148: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640
> Feb 23 14:47:40.148: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400
> Feb 23 14:47:40.148: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720
> Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 1286, cpuAllocatableMil: 3500, cpuFraction: 0.36742857142857144
> Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 5147459584, memAllocatableVal: 14568333312, memFraction: 0.353332084992867
> STEP: Compute Cpu, Mem Fraction after create balanced pods.
> Feb 23 14:47:40.148: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t
> Feb 23 14:47:40.376: INFO: Pod for on the node: dns-test-588d38bf-13a5-4f7d-b8b7-6fd0a8b65494, Cpu: 300, Mem: 629145600
> Feb 23 14:47:40.376: INFO: Pod for on the node: labelsupdate1563eafb-212d-4d40-aaaa-c7068b7ccf62, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: acabda1b-0083-4a80-a1cd-0a4c10cc5949-0, Cpu: 430, Mem: 513802239
> Feb 23 14:47:40.376: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: nosrc-build-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-ns69h, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400
> Feb 23 14:47:40.376: INFO: Pod for on the node: history-limit-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: bc-docker-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: bc-source-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800
> Feb 23 14:47:40.376: INFO: Pod for on the node: execpod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t-hmrth, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: local-injector, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400
> Feb 23 14:47:40.376: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800
> Feb 23 14:47:40.376: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456
> Feb 23 14:47:40.376: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456
> Feb 23 14:47:40.376: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760
> Feb 23 14:47:40.376: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520
> Feb 23 14:47:40.376: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456
> Feb 23 14:47:40.376: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600
> Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520
> Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520
> Feb 23 14:47:40.376: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120
> Feb 23 14:47:40.376: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960
> Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400
> Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:40.376: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992
> Feb 23 14:47:40.376: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400
> Feb 23 14:47:40.376: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120
> Feb 23 14:47:40.376: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640
> Feb 23 14:47:40.376: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400
> Feb 23 14:47:40.376: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720
> Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 1186, cpuAllocatableMil: 3500, cpuFraction: 0.33885714285714286
> Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4937744383, memAllocatableVal: 14568333312, memFraction: 0.3389368074749332

You can see that the pods on each node are different. I am thinking these tests would benefit from being serial (or removed) due to their unpredictability in high-usage clusters like ours.

Comment 4 Mike Dame 2021-05-17 15:06:23 UTC
Moving to MODIFIED, as all the linked PRs have merged or been closed in favor of other PRs which then merged

Comment 6 RamaKasturi 2021-05-20 07:33:03 UTC
Hello Mike,

  Tried verifying the bug here but do not see any failures / flakes from 4.8 cluster but when looked in the link [1] i see that on 4.7 it has always been falking. Is this expected ? Thanks !!

[1] https://search.ci.openshift.org/?search=Multi-AZ+Clusters+should+spread+the+pods+of+a+replication+controller+across+zones&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 7 Mike Dame 2021-05-20 13:49:00 UTC
Okay, thanks for checking. It looks like the changes we merged need to be backported to 4.7 then (I wasn't sure if they already had been). I'll open those PRs and link them to this bug

Comment 8 RamaKasturi 2021-05-20 16:16:45 UTC
Do not see any failures with respect to 4.8 runs but still see that it fails with 4.7, so moving the bug back to assigned state.

Comment 11 Devan Goodwin 2022-03-07 18:15:45 UTC
Only appearing in 4.7 tests and at a very low rate. Propose this gets closed, from TRT perspective this is not a prio.

Comment 12 Maciej Szulik 2022-05-25 11:00:59 UTC
Given the priority and the current time frame I don't think we'll be able to address this issue in 4.7.