test: [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones is failing frequently in CI, see search results: https://search.ci.openshift.org/?maxAge=168h&context=1&type=bug%2Bjunit&name=&maxMatches=5&maxBytes=20971520&groupBy=job&search=%5C%5Bsig-scheduling%5C%5D+Multi-AZ+Clusters+should+spread+the+pods+of+a+replication+controller+across+zones Examples: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.7/1361634319889600512 : [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less Run #0: Failed expand_less 48s fail [k8s.io/kubernetes.0/test/e2e/scheduling/ubernetes_lite.go:174]: Pods were not evenly spread across zones. 3 in one zone and 6 in another zone Expected <int>: 3 to be within 2 of ~ <int>: 0 https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-ovn-upgrade-4.6-stable-to-4.7-ci/1361548482682294272 : [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones [Suite:openshift/conformance/parallel] [Suite:k8s] expand_less 14s fail [k8s.io/kubernetes.0/test/e2e/scheduling/ubernetes_lite.go:174]: Pods were not evenly spread across zones. 0 in one zone and 10 in another zone Expected <int>: 10 to be within 2 of ~ <int>: 0
*** Bug 1929684 has been marked as a duplicate of this bug. ***
This should be addressed by fixes added in https://github.com/openshift/kubernetes/pull/547 and https://github.com/openshift/kubernetes/pull/526
I am wondering if the fix from https://github.com/openshift/kubernetes/pull/547 (which seems to have fixed the Service spreading test) created this failure, or if this failure existed before that. One thing I notice is that in these failures (example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.8/1364212636652146688) the pods being reported on each node are changing throughout the test. This makes it impossible for the above fix to actually balance the nodes, meaning that resource usage will interfere with the scheduling decision. Take the output from the above test: (before balancing) > Feb 23 14:46:12.203: INFO: Waiting up to 1m0s for all nodes to be ready > Feb 23 14:47:12.744: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj > Feb 23 14:47:12.873: INFO: Pod for on the node: pod-handle-http-request, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600 > Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: test-recreate-deployment-5888b58954-2nwzf, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-5kngs, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-8pxdv, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-hdh6s, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760 > Feb 23 14:47:12.873: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-5pft2, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400 > Feb 23 14:47:12.873: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456 > Feb 23 14:47:12.873: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456 > Feb 23 14:47:12.873: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760 > Feb 23 14:47:12.873: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520 > Feb 23 14:47:12.873: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456 > Feb 23 14:47:12.873: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600 > Feb 23 14:47:12.873: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-rhvb9, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520 > Feb 23 14:47:12.873: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120 > Feb 23 14:47:12.873: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960 > Feb 23 14:47:12.873: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440 > Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400 > Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:12.873: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320 > Feb 23 14:47:12.873: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992 > Feb 23 14:47:12.873: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400 > Feb 23 14:47:12.873: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120 > Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040 > Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640 > Feb 23 14:47:12.873: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400 > Feb 23 14:47:12.873: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720 > Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 828, cpuAllocatableMil: 3500, cpuFraction: 0.23657142857142857 > Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4937744384, memAllocatableVal: 14568333312, memFraction: 0.33893680754357525 > Feb 23 14:47:12.873: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428 > Feb 23 14:47:13.028: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-4fcgf, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-6zwpd, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-ddjnt, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-kzt2n, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: gluster-server, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400 > Feb 23 14:47:13.028: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-hook-pre, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: pod-6b81707d-e327-4646-86e7-4018c3794134, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400 > Feb 23 14:47:13.028: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800 > Feb 23 14:47:13.028: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456 > Feb 23 14:47:13.028: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760 > Feb 23 14:47:13.028: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520 > Feb 23 14:47:13.028: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600 > Feb 23 14:47:13.028: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960 > Feb 23 14:47:13.028: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400 > Feb 23 14:47:13.028: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120 > Feb 23 14:47:13.028: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640 > Feb 23 14:47:13.028: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400 > Feb 23 14:47:13.028: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720 > Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 439, cpuAllocatableMil: 3500, cpuFraction: 0.12542857142857142 > Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 1757413376, memAllocatableVal: 14568333312, memFraction: 0.12063242502506522 > Feb 23 14:47:13.028: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t > Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-2zcbt, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-7tbcx, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-qsj72, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: gluster-client, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: agnhost-pod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400 > Feb 23 14:47:13.234: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800 > Feb 23 14:47:13.234: INFO: Pod for on the node: sample-webhook-deployment-7fdfd97c84-bqscf, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400 > Feb 23 14:47:13.234: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800 > Feb 23 14:47:13.234: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456 > Feb 23 14:47:13.234: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456 > Feb 23 14:47:13.234: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760 > Feb 23 14:47:13.234: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520 > Feb 23 14:47:13.234: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456 > Feb 23 14:47:13.234: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600 > Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520 > Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520 > Feb 23 14:47:13.234: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120 > Feb 23 14:47:13.234: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960 > Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400 > Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:13.234: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992 > Feb 23 14:47:13.234: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400 > Feb 23 14:47:13.234: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120 > Feb 23 14:47:13.234: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640 > Feb 23 14:47:13.234: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400 > Feb 23 14:47:13.234: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720 > Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 756, cpuAllocatableMil: 3500, cpuFraction: 0.216 > Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4423942144, memAllocatableVal: 14568333312, memFraction: 0.30366837779281036 > Feb 23 14:47:13.327: INFO: Waiting for running... > Feb 23 14:47:23.416: INFO: Waiting for running... > Feb 23 14:47:33.686: INFO: Waiting for running... (after balancing) > STEP: Compute Cpu, Mem Fraction after create balanced pods. > Feb 23 14:47:38.737: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-attacher-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-provisioner-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-resizer-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-snapshotter-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpathplugin-0, Cpu: 300, Mem: 629145600 > Feb 23 14:47:39.794: INFO: Pod for on the node: inline-volume-tester-kr5cw, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: deployment-1e9e1d60-efb8-4d8a-a3a1-7443062287c6-675fd6b69bdwhct, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: f5c0a6bf-206a-485e-94a6-32762d3a07bc-0, Cpu: 358, Mem: 0 > Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-n6xdb, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: pod-e96ffac9-93ed-470f-a36e-9899cedaa49b, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-cd9w4, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: host-test-container-pod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760 > Feb 23 14:47:39.794: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: history-limit-1-5bxvx, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: bc-custom-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: exec-volume-test-preprovisionedpv-jc8b, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400 > Feb 23 14:47:39.795: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456 > Feb 23 14:47:39.795: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456 > Feb 23 14:47:39.795: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760 > Feb 23 14:47:39.795: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520 > Feb 23 14:47:39.795: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456 > Feb 23 14:47:39.795: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600 > Feb 23 14:47:39.795: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520 > Feb 23 14:47:39.795: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120 > Feb 23 14:47:39.795: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960 > Feb 23 14:47:39.795: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440 > Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400 > Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:39.795: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320 > Feb 23 14:47:39.795: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992 > Feb 23 14:47:39.795: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400 > Feb 23 14:47:39.795: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120 > Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040 > Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640 > Feb 23 14:47:39.795: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400 > Feb 23 14:47:39.795: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720 > Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 1176, cpuAllocatableMil: 3500, cpuFraction: 0.336 > Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4885315584, memAllocatableVal: 14568333312, memFraction: 0.33533798818125227 > STEP: Compute Cpu, Mem Fraction after create balanced pods. > Feb 23 14:47:39.795: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428 > Feb 23 14:47:40.148: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: pod-init-991109f2-3e8d-45f4-93d0-b1d59d834c23, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: agnhost-primary-vknxs, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: c884b330-bf23-4f22-8086-835d75e71028-0, Cpu: 747, Mem: 3180331008 > Feb 23 14:47:40.148: INFO: Pod for on the node: client-can-connect-81-fd6ph, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400 > Feb 23 14:47:40.148: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: test-container-pod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: alpine-nnp-nil-bef53aa2-5554-4f0e-9de5-fae83135f91f, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: history-limit-2-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400 > Feb 23 14:47:40.148: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800 > Feb 23 14:47:40.148: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456 > Feb 23 14:47:40.148: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760 > Feb 23 14:47:40.148: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520 > Feb 23 14:47:40.148: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600 > Feb 23 14:47:40.148: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960 > Feb 23 14:47:40.148: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400 > Feb 23 14:47:40.148: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120 > Feb 23 14:47:40.148: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640 > Feb 23 14:47:40.148: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400 > Feb 23 14:47:40.148: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720 > Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 1286, cpuAllocatableMil: 3500, cpuFraction: 0.36742857142857144 > Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 5147459584, memAllocatableVal: 14568333312, memFraction: 0.353332084992867 > STEP: Compute Cpu, Mem Fraction after create balanced pods. > Feb 23 14:47:40.148: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t > Feb 23 14:47:40.376: INFO: Pod for on the node: dns-test-588d38bf-13a5-4f7d-b8b7-6fd0a8b65494, Cpu: 300, Mem: 629145600 > Feb 23 14:47:40.376: INFO: Pod for on the node: labelsupdate1563eafb-212d-4d40-aaaa-c7068b7ccf62, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: acabda1b-0083-4a80-a1cd-0a4c10cc5949-0, Cpu: 430, Mem: 513802239 > Feb 23 14:47:40.376: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: nosrc-build-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-ns69h, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400 > Feb 23 14:47:40.376: INFO: Pod for on the node: history-limit-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: bc-docker-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: bc-source-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800 > Feb 23 14:47:40.376: INFO: Pod for on the node: execpod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t-hmrth, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: local-injector, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400 > Feb 23 14:47:40.376: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800 > Feb 23 14:47:40.376: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456 > Feb 23 14:47:40.376: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456 > Feb 23 14:47:40.376: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760 > Feb 23 14:47:40.376: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520 > Feb 23 14:47:40.376: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456 > Feb 23 14:47:40.376: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600 > Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520 > Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520 > Feb 23 14:47:40.376: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120 > Feb 23 14:47:40.376: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960 > Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400 > Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:40.376: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992 > Feb 23 14:47:40.376: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400 > Feb 23 14:47:40.376: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120 > Feb 23 14:47:40.376: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640 > Feb 23 14:47:40.376: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400 > Feb 23 14:47:40.376: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720 > Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 1186, cpuAllocatableMil: 3500, cpuFraction: 0.33885714285714286 > Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4937744383, memAllocatableVal: 14568333312, memFraction: 0.3389368074749332 You can see that the pods on each node are different. I am thinking these tests would benefit from being serial (or removed) due to their unpredictability in high-usage clusters like ours.
Moving to MODIFIED, as all the linked PRs have merged or been closed in favor of other PRs which then merged
Hello Mike, Tried verifying the bug here but do not see any failures / flakes from 4.8 cluster but when looked in the link [1] i see that on 4.7 it has always been falking. Is this expected ? Thanks !! [1] https://search.ci.openshift.org/?search=Multi-AZ+Clusters+should+spread+the+pods+of+a+replication+controller+across+zones&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Okay, thanks for checking. It looks like the changes we merged need to be backported to 4.7 then (I wasn't sure if they already had been). I'll open those PRs and link them to this bug
Do not see any failures with respect to 4.8 runs but still see that it fails with 4.7, so moving the bug back to assigned state.
Hello Mike, I checked the bug again in the following test runs and i still see that it is being listed as flaky in all 4.7 runs again and when looked into the details i see that log says 'passed' with details being nil in [1]..[6] & in one of the run it failed with the error listed at [7]. [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-ovn-upgrade/1434744704989138944 [2] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.7-upgrade-from-stable-4.6-e2e-aws-upgrade/1434735807087775744 [3] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp/1434735817128939520 [4] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.7/1434735807347822592 [5] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-gcp/1434625145909022720 [6] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-gcp/1434267140524871680 [7] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/719/pull-ci-openshift-ovn-kubernetes-release-4.7-e2e-gcp-ovn/1434605526221590528 Thanks kasturi
Only appearing in 4.7 tests and at a very low rate. Propose this gets closed, from TRT perspective this is not a prio.
Given the priority and the current time frame I don't think we'll be able to address this issue in 4.7.