Bug 1929389 - [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones
Summary: [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication co...
Keywords:
Status: CLOSED EOL
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.z
Assignee: Maciej Szulik
QA Contact: RamaKasturi
URL:
Whiteboard: tag-ci
: 1929684 (view as bug list)
Depends On: 1896558
Blocks: 1929684
TreeView+ depends on / blocked
 
Reported: 2021-02-16 18:53 UTC by Surya Seetharaman
Modified: 2022-05-25 11:00 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1929684 (view as bug list)
Environment:
[sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones
Last Closed: 2022-05-25 11:00:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 526 0 None closed Bug 1916489: (e2e/scheduler) Ensure minimum memory limit in createBalancedPodForNodes 2021-02-23 13:45:49 UTC
Github openshift kubernetes pull 547 0 None closed Bug 1896558: Balance nodes in scheduling e2e 2021-02-23 13:45:49 UTC
Github openshift origin pull 25915 0 None open Bug 1896558: bump(openshift/kubernetes): multi-az spreading e2e flakes 2021-02-23 14:54:44 UTC

Description Surya Seetharaman 2021-02-16 18:53:58 UTC
test:
[sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones 

is failing frequently in CI, see search results:
https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-scheduling%5C%5D+Multi-AZ+Clusters+should+spread+the+pods+of+a+replication+controller+across+zones


Examples:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1361634319889600512

: [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less
Run #0: Failed expand_less	48s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/ubernetes_lite.go:174]: Pods were not evenly spread across zones.  3 in one zone and 6 in another zone
Expected
    <int>: 3
to be within 2 of ~
    <int>: 0

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1361548482682294272

: [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less	14s
fail [k8s.io/kubernetes.0/test/e2e/scheduling/ubernetes_lite.go:174]: Pods were not evenly spread across zones.  0 in one zone and 10 in another zone
Expected
    <int>: 10
to be within 2 of ~
    <int>: 0

Comment 1 Maciej Szulik 2021-02-18 10:49:47 UTC
*** Bug 1929684 has been marked as a duplicate of this bug. ***

Comment 2 Mike Dame 2021-02-23 13:45:49 UTC
This should be addressed by fixes added in https://github.com/openshift/kubernetes/pull/547 and https://github.com/openshift/kubernetes/pull/526

Comment 3 Mike Dame 2021-02-23 18:28:08 UTC
I am wondering if the fix from https://github.com/openshift/kubernetes/pull/547 (which seems to have fixed the Service spreading test) created this failure, or if this failure existed before that.

One thing I notice is that in these failures (example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.8/1364212636652146688) the pods being reported on each node are changing throughout the test. This makes it impossible for the above fix to actually balance the nodes, meaning that resource usage will interfere with the scheduling decision.

Take the output from the above test:

(before balancing)
> Feb 23 14:46:12.203: INFO: Waiting up to 1m0s for all nodes to be ready
> Feb 23 14:47:12.744: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj
> Feb 23 14:47:12.873: INFO: Pod for on the node: pod-handle-http-request, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600
> Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: test-recreate-deployment-5888b58954-2nwzf, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-5kngs, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-8pxdv, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-hdh6s, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760
> Feb 23 14:47:12.873: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-5pft2, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400
> Feb 23 14:47:12.873: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456
> Feb 23 14:47:12.873: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456
> Feb 23 14:47:12.873: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760
> Feb 23 14:47:12.873: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520
> Feb 23 14:47:12.873: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456
> Feb 23 14:47:12.873: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600
> Feb 23 14:47:12.873: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200
> Feb 23 14:47:12.873: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-rhvb9, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800
> Feb 23 14:47:12.873: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520
> Feb 23 14:47:12.873: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120
> Feb 23 14:47:12.873: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960
> Feb 23 14:47:12.873: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440
> Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400
> Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:12.873: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320
> Feb 23 14:47:12.873: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992
> Feb 23 14:47:12.873: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400
> Feb 23 14:47:12.873: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120
> Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040
> Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640
> Feb 23 14:47:12.873: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400
> Feb 23 14:47:12.873: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720
> Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 828, cpuAllocatableMil: 3500, cpuFraction: 0.23657142857142857
> Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4937744384, memAllocatableVal: 14568333312, memFraction: 0.33893680754357525
> Feb 23 14:47:12.873: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428
> Feb 23 14:47:13.028: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-4fcgf, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-6zwpd, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-ddjnt, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-kzt2n, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: gluster-server, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400
> Feb 23 14:47:13.028: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-hook-pre, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: pod-6b81707d-e327-4646-86e7-4018c3794134, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.028: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400
> Feb 23 14:47:13.028: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800
> Feb 23 14:47:13.028: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456
> Feb 23 14:47:13.028: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760
> Feb 23 14:47:13.028: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520
> Feb 23 14:47:13.028: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600
> Feb 23 14:47:13.028: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960
> Feb 23 14:47:13.028: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400
> Feb 23 14:47:13.028: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120
> Feb 23 14:47:13.028: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640
> Feb 23 14:47:13.028: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400
> Feb 23 14:47:13.028: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720
> Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 439, cpuAllocatableMil: 3500, cpuFraction: 0.12542857142857142
> Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 1757413376, memAllocatableVal: 14568333312, memFraction: 0.12063242502506522
> Feb 23 14:47:13.028: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t
> Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-2zcbt, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-7tbcx, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-qsj72, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: gluster-client, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: agnhost-pod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400
> Feb 23 14:47:13.234: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800
> Feb 23 14:47:13.234: INFO: Pod for on the node: sample-webhook-deployment-7fdfd97c84-bqscf, Cpu: 100, Mem: 209715200
> Feb 23 14:47:13.234: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400
> Feb 23 14:47:13.234: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800
> Feb 23 14:47:13.234: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456
> Feb 23 14:47:13.234: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456
> Feb 23 14:47:13.234: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760
> Feb 23 14:47:13.234: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520
> Feb 23 14:47:13.234: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456
> Feb 23 14:47:13.234: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600
> Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520
> Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520
> Feb 23 14:47:13.234: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120
> Feb 23 14:47:13.234: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960
> Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400
> Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:13.234: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992
> Feb 23 14:47:13.234: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400
> Feb 23 14:47:13.234: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120
> Feb 23 14:47:13.234: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640
> Feb 23 14:47:13.234: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400
> Feb 23 14:47:13.234: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720
> Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 756, cpuAllocatableMil: 3500, cpuFraction: 0.216
> Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4423942144, memAllocatableVal: 14568333312, memFraction: 0.30366837779281036
> Feb 23 14:47:13.327: INFO: Waiting for running...
> Feb 23 14:47:23.416: INFO: Waiting for running...
> Feb 23 14:47:33.686: INFO: Waiting for running...

(after balancing)
> STEP: Compute Cpu, Mem Fraction after create balanced pods.
> Feb 23 14:47:38.737: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-attacher-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-provisioner-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-resizer-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-snapshotter-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpathplugin-0, Cpu: 300, Mem: 629145600
> Feb 23 14:47:39.794: INFO: Pod for on the node: inline-volume-tester-kr5cw, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: deployment-1e9e1d60-efb8-4d8a-a3a1-7443062287c6-675fd6b69bdwhct, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: f5c0a6bf-206a-485e-94a6-32762d3a07bc-0, Cpu: 358, Mem: 0
> Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-n6xdb, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: pod-e96ffac9-93ed-470f-a36e-9899cedaa49b, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-cd9w4, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: host-test-container-pod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760
> Feb 23 14:47:39.794: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.794: INFO: Pod for on the node: history-limit-1-5bxvx, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: bc-custom-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: exec-volume-test-preprovisionedpv-jc8b, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400
> Feb 23 14:47:39.795: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456
> Feb 23 14:47:39.795: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456
> Feb 23 14:47:39.795: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760
> Feb 23 14:47:39.795: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520
> Feb 23 14:47:39.795: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456
> Feb 23 14:47:39.795: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600
> Feb 23 14:47:39.795: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200
> Feb 23 14:47:39.795: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800
> Feb 23 14:47:39.795: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520
> Feb 23 14:47:39.795: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120
> Feb 23 14:47:39.795: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960
> Feb 23 14:47:39.795: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440
> Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400
> Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:39.795: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320
> Feb 23 14:47:39.795: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992
> Feb 23 14:47:39.795: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400
> Feb 23 14:47:39.795: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120
> Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040
> Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640
> Feb 23 14:47:39.795: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400
> Feb 23 14:47:39.795: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720
> Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 1176, cpuAllocatableMil: 3500, cpuFraction: 0.336
> Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4885315584, memAllocatableVal: 14568333312, memFraction: 0.33533798818125227
> STEP: Compute Cpu, Mem Fraction after create balanced pods.
> Feb 23 14:47:39.795: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428
> Feb 23 14:47:40.148: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: pod-init-991109f2-3e8d-45f4-93d0-b1d59d834c23, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: agnhost-primary-vknxs, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: c884b330-bf23-4f22-8086-835d75e71028-0, Cpu: 747, Mem: 3180331008
> Feb 23 14:47:40.148: INFO: Pod for on the node: client-can-connect-81-fd6ph, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400
> Feb 23 14:47:40.148: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: test-container-pod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: alpine-nnp-nil-bef53aa2-5554-4f0e-9de5-fae83135f91f, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: history-limit-2-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.148: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400
> Feb 23 14:47:40.148: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800
> Feb 23 14:47:40.148: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456
> Feb 23 14:47:40.148: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760
> Feb 23 14:47:40.148: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520
> Feb 23 14:47:40.148: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600
> Feb 23 14:47:40.148: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960
> Feb 23 14:47:40.148: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400
> Feb 23 14:47:40.148: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120
> Feb 23 14:47:40.148: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640
> Feb 23 14:47:40.148: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400
> Feb 23 14:47:40.148: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720
> Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 1286, cpuAllocatableMil: 3500, cpuFraction: 0.36742857142857144
> Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 5147459584, memAllocatableVal: 14568333312, memFraction: 0.353332084992867
> STEP: Compute Cpu, Mem Fraction after create balanced pods.
> Feb 23 14:47:40.148: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t
> Feb 23 14:47:40.376: INFO: Pod for on the node: dns-test-588d38bf-13a5-4f7d-b8b7-6fd0a8b65494, Cpu: 300, Mem: 629145600
> Feb 23 14:47:40.376: INFO: Pod for on the node: labelsupdate1563eafb-212d-4d40-aaaa-c7068b7ccf62, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: acabda1b-0083-4a80-a1cd-0a4c10cc5949-0, Cpu: 430, Mem: 513802239
> Feb 23 14:47:40.376: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: nosrc-build-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-ns69h, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400
> Feb 23 14:47:40.376: INFO: Pod for on the node: history-limit-1-deploy, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: bc-docker-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: bc-source-1-build, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800
> Feb 23 14:47:40.376: INFO: Pod for on the node: execpod, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t-hmrth, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: local-injector, Cpu: 100, Mem: 209715200
> Feb 23 14:47:40.376: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400
> Feb 23 14:47:40.376: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800
> Feb 23 14:47:40.376: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456
> Feb 23 14:47:40.376: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456
> Feb 23 14:47:40.376: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760
> Feb 23 14:47:40.376: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520
> Feb 23 14:47:40.376: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456
> Feb 23 14:47:40.376: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600
> Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520
> Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520
> Feb 23 14:47:40.376: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120
> Feb 23 14:47:40.376: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960
> Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400
> Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504
> Feb 23 14:47:40.376: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992
> Feb 23 14:47:40.376: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400
> Feb 23 14:47:40.376: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120
> Feb 23 14:47:40.376: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640
> Feb 23 14:47:40.376: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400
> Feb 23 14:47:40.376: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720
> Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 1186, cpuAllocatableMil: 3500, cpuFraction: 0.33885714285714286
> Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4937744383, memAllocatableVal: 14568333312, memFraction: 0.3389368074749332

You can see that the pods on each node are different. I am thinking these tests would benefit from being serial (or removed) due to their unpredictability in high-usage clusters like ours.

Comment 4 Mike Dame 2021-05-17 15:06:23 UTC
Moving to MODIFIED, as all the linked PRs have merged or been closed in favor of other PRs which then merged

Comment 6 RamaKasturi 2021-05-20 07:33:03 UTC
Hello Mike,

  Tried verifying the bug here but do not see any failures / flakes from 4.8 cluster but when looked in the link [1] i see that on 4.7 it has always been falking. Is this expected ? Thanks !!

[1] https://search.ci.openshift.org/?search=Multi-AZ+Clusters+should+spread+the+pods+of+a+replication+controller+across+zones&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 7 Mike Dame 2021-05-20 13:49:00 UTC
Okay, thanks for checking. It looks like the changes we merged need to be backported to 4.7 then (I wasn't sure if they already had been). I'll open those PRs and link them to this bug

Comment 8 RamaKasturi 2021-05-20 16:16:45 UTC
Do not see any failures with respect to 4.8 runs but still see that it fails with 4.7, so moving the bug back to assigned state.

Comment 11 Devan Goodwin 2022-03-07 18:15:45 UTC
Only appearing in 4.7 tests and at a very low rate. Propose this gets closed, from TRT perspective this is not a prio.

Comment 12 Maciej Szulik 2022-05-25 11:00:59 UTC
Given the priority and the current time frame I don't think we'll be able to address this issue in 4.7.


Note You need to log in before you can comment on or make changes to this bug.