Bug 1929389
| Summary: | [sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Surya Seetharaman <surya> | |
| Component: | kube-scheduler | Assignee: | Maciej Szulik <maszulik> | |
| Status: | CLOSED EOL | QA Contact: | RamaKasturi <knarra> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 4.7 | CC: | aos-bugs, dgoodwin, fpaoline, mfojtik | |
| Target Milestone: | --- | |||
| Target Release: | 4.7.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | tag-ci | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1929684 (view as bug list) | Environment: |
[sig-scheduling] Multi-AZ Clusters should spread the pods of a replication controller across zones
|
|
| Last Closed: | 2022-05-25 11:00:59 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1896558 | |||
| Bug Blocks: | 1929684 | |||
|
Description
Surya Seetharaman
2021-02-16 18:53:58 UTC
*** Bug 1929684 has been marked as a duplicate of this bug. *** This should be addressed by fixes added in https://github.com/openshift/kubernetes/pull/547 and https://github.com/openshift/kubernetes/pull/526 I am wondering if the fix from https://github.com/openshift/kubernetes/pull/547 (which seems to have fixed the Service spreading test) created this failure, or if this failure existed before that. One thing I notice is that in these failures (example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-4.8/1364212636652146688) the pods being reported on each node are changing throughout the test. This makes it impossible for the above fix to actually balance the nodes, meaning that resource usage will interfere with the scheduling decision. Take the output from the above test: (before balancing) > Feb 23 14:46:12.203: INFO: Waiting up to 1m0s for all nodes to be ready > Feb 23 14:47:12.744: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj > Feb 23 14:47:12.873: INFO: Pod for on the node: pod-handle-http-request, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600 > Feb 23 14:47:12.873: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: test-recreate-deployment-5888b58954-2nwzf, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-5kngs, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-8pxdv, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: simpletest.rc-hdh6s, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760 > Feb 23 14:47:12.873: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-5pft2, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400 > Feb 23 14:47:12.873: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456 > Feb 23 14:47:12.873: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456 > Feb 23 14:47:12.873: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760 > Feb 23 14:47:12.873: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520 > Feb 23 14:47:12.873: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456 > Feb 23 14:47:12.873: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600 > Feb 23 14:47:12.873: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200 > Feb 23 14:47:12.873: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: community-operators-rhvb9, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800 > Feb 23 14:47:12.873: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520 > Feb 23 14:47:12.873: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120 > Feb 23 14:47:12.873: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960 > Feb 23 14:47:12.873: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440 > Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400 > Feb 23 14:47:12.873: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:12.873: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320 > Feb 23 14:47:12.873: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992 > Feb 23 14:47:12.873: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400 > Feb 23 14:47:12.873: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120 > Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040 > Feb 23 14:47:12.873: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640 > Feb 23 14:47:12.873: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400 > Feb 23 14:47:12.873: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720 > Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 828, cpuAllocatableMil: 3500, cpuFraction: 0.23657142857142857 > Feb 23 14:47:12.873: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4937744384, memAllocatableVal: 14568333312, memFraction: 0.33893680754357525 > Feb 23 14:47:12.873: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428 > Feb 23 14:47:13.028: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-4fcgf, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-6zwpd, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-ddjnt, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: simpletest.rc-kzt2n, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: gluster-server, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400 > Feb 23 14:47:13.028: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: deployment-simple-1-hook-pre, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: pod-6b81707d-e327-4646-86e7-4018c3794134, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.028: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400 > Feb 23 14:47:13.028: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800 > Feb 23 14:47:13.028: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456 > Feb 23 14:47:13.028: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760 > Feb 23 14:47:13.028: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520 > Feb 23 14:47:13.028: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600 > Feb 23 14:47:13.028: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960 > Feb 23 14:47:13.028: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400 > Feb 23 14:47:13.028: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120 > Feb 23 14:47:13.028: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640 > Feb 23 14:47:13.028: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400 > Feb 23 14:47:13.028: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720 > Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 439, cpuAllocatableMil: 3500, cpuFraction: 0.12542857142857142 > Feb 23 14:47:13.028: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 1757413376, memAllocatableVal: 14568333312, memFraction: 0.12063242502506522 > Feb 23 14:47:13.028: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t > Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-2zcbt, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-7tbcx, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: simpletest.rc-qsj72, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: gluster-client, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: agnhost-pod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400 > Feb 23 14:47:13.234: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800 > Feb 23 14:47:13.234: INFO: Pod for on the node: sample-webhook-deployment-7fdfd97c84-bqscf, Cpu: 100, Mem: 209715200 > Feb 23 14:47:13.234: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400 > Feb 23 14:47:13.234: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800 > Feb 23 14:47:13.234: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456 > Feb 23 14:47:13.234: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456 > Feb 23 14:47:13.234: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760 > Feb 23 14:47:13.234: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520 > Feb 23 14:47:13.234: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456 > Feb 23 14:47:13.234: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600 > Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520 > Feb 23 14:47:13.234: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520 > Feb 23 14:47:13.234: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120 > Feb 23 14:47:13.234: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960 > Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400 > Feb 23 14:47:13.234: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:13.234: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992 > Feb 23 14:47:13.234: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400 > Feb 23 14:47:13.234: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120 > Feb 23 14:47:13.234: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640 > Feb 23 14:47:13.234: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400 > Feb 23 14:47:13.234: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720 > Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 756, cpuAllocatableMil: 3500, cpuFraction: 0.216 > Feb 23 14:47:13.234: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4423942144, memAllocatableVal: 14568333312, memFraction: 0.30366837779281036 > Feb 23 14:47:13.327: INFO: Waiting for running... > Feb 23 14:47:23.416: INFO: Waiting for running... > Feb 23 14:47:33.686: INFO: Waiting for running... (after balancing) > STEP: Compute Cpu, Mem Fraction after create balanced pods. > Feb 23 14:47:38.737: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-0, Cpu: 300, Mem: 629145600 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-mockplugin-attacher-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-attacher-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-provisioner-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-resizer-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpath-snapshotter-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: csi-hostpathplugin-0, Cpu: 300, Mem: 629145600 > Feb 23 14:47:39.794: INFO: Pod for on the node: inline-volume-tester-kr5cw, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: deployment-1e9e1d60-efb8-4d8a-a3a1-7443062287c6-675fd6b69bdwhct, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: f5c0a6bf-206a-485e-94a6-32762d3a07bc-0, Cpu: 358, Mem: 0 > Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-n6xdb, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: pod-e96ffac9-93ed-470f-a36e-9899cedaa49b, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-7nbt7, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj-cd9w4, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: host-test-container-pod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: netserver-0, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: pod-submit-status-0-2, Cpu: 5, Mem: 10485760 > Feb 23 14:47:39.794: INFO: Pod for on the node: explicit-nonroot-uid, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.794: INFO: Pod for on the node: history-limit-1-5bxvx, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: bc-custom-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: exec-volume-test-preprovisionedpv-jc8b, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: gcp-pd-csi-driver-node-ck46r, Cpu: 30, Mem: 157286400 > Feb 23 14:47:39.795: INFO: Pod for on the node: tuned-zz6c9, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: downloads-846fcb6857-xxs7w, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: dns-default-9fszd, Cpu: 65, Mem: 137363456 > Feb 23 14:47:39.795: INFO: Pod for on the node: image-registry-5d7cbc6796-5phf5, Cpu: 100, Mem: 268435456 > Feb 23 14:47:39.795: INFO: Pod for on the node: node-ca-hp5w8, Cpu: 10, Mem: 10485760 > Feb 23 14:47:39.795: INFO: Pod for on the node: ingress-canary-hv7t8, Cpu: 10, Mem: 20971520 > Feb 23 14:47:39.795: INFO: Pod for on the node: router-default-58bb79bdb8-4q4wj, Cpu: 100, Mem: 268435456 > Feb 23 14:47:39.795: INFO: Pod for on the node: migrator-7bc78664fd-fwvcj, Cpu: 10, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: machine-config-daemon-98694, Cpu: 40, Mem: 104857600 > Feb 23 14:47:39.795: INFO: Pod for on the node: ab0ec41ac51719de72554e09c32400b13c6d15dcf7d38302d5ed14fcb2qfbfm, Cpu: 100, Mem: 209715200 > Feb 23 14:47:39.795: INFO: Pod for on the node: certified-operators-v2lm5, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: community-operators-mkzjl, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-marketplace-k7t7v, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: redhat-operators-l8586, Cpu: 10, Mem: 52428800 > Feb 23 14:47:39.795: INFO: Pod for on the node: alertmanager-main-1, Cpu: 8, Mem: 283115520 > Feb 23 14:47:39.795: INFO: Pod for on the node: kube-state-metrics-54b6ff9dc-wfm7f, Cpu: 4, Mem: 125829120 > Feb 23 14:47:39.795: INFO: Pod for on the node: node-exporter-jx2mr, Cpu: 9, Mem: 220200960 > Feb 23 14:47:39.795: INFO: Pod for on the node: openshift-state-metrics-6757ffd766-mmrxq, Cpu: 3, Mem: 199229440 > Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-htmsl, Cpu: 1, Mem: 26214400 > Feb 23 14:47:39.795: INFO: Pod for on the node: prometheus-k8s-1, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:39.795: INFO: Pod for on the node: telemeter-client-649ff75866-dfxb7, Cpu: 3, Mem: 73400320 > Feb 23 14:47:39.795: INFO: Pod for on the node: thanos-querier-57564f89f7-hzjnz, Cpu: 9, Mem: 96468992 > Feb 23 14:47:39.795: INFO: Pod for on the node: multus-dclw7, Cpu: 10, Mem: 157286400 > Feb 23 14:47:39.795: INFO: Pod for on the node: network-metrics-daemon-998jw, Cpu: 20, Mem: 125829120 > Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-source-5584f5cfcc-2dcdt, Cpu: 10, Mem: 41943040 > Feb 23 14:47:39.795: INFO: Pod for on the node: network-check-target-c4tqd, Cpu: 10, Mem: 15728640 > Feb 23 14:47:39.795: INFO: Pod for on the node: ovs-9zwnr, Cpu: 15, Mem: 419430400 > Feb 23 14:47:39.795: INFO: Pod for on the node: sdn-287mj, Cpu: 110, Mem: 230686720 > Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedCPUResource: 1176, cpuAllocatableMil: 3500, cpuFraction: 0.336 > Feb 23 14:47:39.795: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-b-vcsxj, totalRequestedMemResource: 4885315584, memAllocatableVal: 14568333312, memFraction: 0.33533798818125227 > STEP: Compute Cpu, Mem Fraction after create balanced pods. > Feb 23 14:47:39.795: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428 > Feb 23 14:47:40.148: INFO: Pod for on the node: startup-b78f504b-237f-4758-9d3e-a89ce75ff8ea, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: pod-init-991109f2-3e8d-45f4-93d0-b1d59d834c23, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: agnhost-primary-vknxs, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: busybox-readonly-fs8c25040f-a95a-4c95-ab00-1c4b8a16bf67, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: c884b330-bf23-4f22-8086-835d75e71028-0, Cpu: 747, Mem: 3180331008 > Feb 23 14:47:40.148: INFO: Pod for on the node: client-can-connect-81-fd6ph, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: server-7fx9g, Cpu: 200, Mem: 419430400 > Feb 23 14:47:40.148: INFO: Pod for on the node: netserver-1, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: test-container-pod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: alpine-nnp-nil-bef53aa2-5554-4f0e-9de5-fae83135f91f, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: example-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: history-limit-2-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: custom-builder-image-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: sample-custom-build-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.148: INFO: Pod for on the node: gcp-pd-csi-driver-node-49jdt, Cpu: 30, Mem: 157286400 > Feb 23 14:47:40.148: INFO: Pod for on the node: tuned-c2c5t, Cpu: 10, Mem: 52428800 > Feb 23 14:47:40.148: INFO: Pod for on the node: dns-default-mfxcx, Cpu: 65, Mem: 137363456 > Feb 23 14:47:40.148: INFO: Pod for on the node: node-ca-6z27x, Cpu: 10, Mem: 10485760 > Feb 23 14:47:40.148: INFO: Pod for on the node: ingress-canary-6cmx4, Cpu: 10, Mem: 20971520 > Feb 23 14:47:40.148: INFO: Pod for on the node: machine-config-daemon-6xd7h, Cpu: 40, Mem: 104857600 > Feb 23 14:47:40.148: INFO: Pod for on the node: node-exporter-q5pbd, Cpu: 9, Mem: 220200960 > Feb 23 14:47:40.148: INFO: Pod for on the node: multus-hzw4g, Cpu: 10, Mem: 157286400 > Feb 23 14:47:40.148: INFO: Pod for on the node: network-metrics-daemon-drp8f, Cpu: 20, Mem: 125829120 > Feb 23 14:47:40.148: INFO: Pod for on the node: network-check-target-zwx95, Cpu: 10, Mem: 15728640 > Feb 23 14:47:40.148: INFO: Pod for on the node: ovs-2lsjd, Cpu: 15, Mem: 419430400 > Feb 23 14:47:40.148: INFO: Pod for on the node: sdn-tgkxh, Cpu: 110, Mem: 230686720 > Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedCPUResource: 1286, cpuAllocatableMil: 3500, cpuFraction: 0.36742857142857144 > Feb 23 14:47:40.148: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-c-sw428, totalRequestedMemResource: 5147459584, memAllocatableVal: 14568333312, memFraction: 0.353332084992867 > STEP: Compute Cpu, Mem Fraction after create balanced pods. > Feb 23 14:47:40.148: INFO: ComputeCPUMemFraction for node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t > Feb 23 14:47:40.376: INFO: Pod for on the node: dns-test-588d38bf-13a5-4f7d-b8b7-6fd0a8b65494, Cpu: 300, Mem: 629145600 > Feb 23 14:47:40.376: INFO: Pod for on the node: labelsupdate1563eafb-212d-4d40-aaaa-c7068b7ccf62, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: acabda1b-0083-4a80-a1cd-0a4c10cc5949-0, Cpu: 430, Mem: 513802239 > Feb 23 14:47:40.376: INFO: Pod for on the node: netserver-2, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: nosrc-build-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: readiness-1-ns69h, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: example-1-g58rb, Cpu: 200, Mem: 419430400 > Feb 23 14:47:40.376: INFO: Pod for on the node: history-limit-1-deploy, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: append-test, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: bc-docker-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: bc-source-1-build, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: test-oauth-server, Cpu: 10, Mem: 52428800 > Feb 23 14:47:40.376: INFO: Pod for on the node: execpod, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: hostexec-ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t-hmrth, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: local-injector, Cpu: 100, Mem: 209715200 > Feb 23 14:47:40.376: INFO: Pod for on the node: gcp-pd-csi-driver-node-gdmg2, Cpu: 30, Mem: 157286400 > Feb 23 14:47:40.376: INFO: Pod for on the node: tuned-l5b4b, Cpu: 10, Mem: 52428800 > Feb 23 14:47:40.376: INFO: Pod for on the node: dns-default-2djwv, Cpu: 65, Mem: 137363456 > Feb 23 14:47:40.376: INFO: Pod for on the node: image-registry-5d7cbc6796-47p55, Cpu: 100, Mem: 268435456 > Feb 23 14:47:40.376: INFO: Pod for on the node: node-ca-swk58, Cpu: 10, Mem: 10485760 > Feb 23 14:47:40.376: INFO: Pod for on the node: ingress-canary-qhzf6, Cpu: 10, Mem: 20971520 > Feb 23 14:47:40.376: INFO: Pod for on the node: router-default-58bb79bdb8-zs7s6, Cpu: 100, Mem: 268435456 > Feb 23 14:47:40.376: INFO: Pod for on the node: machine-config-daemon-dplfm, Cpu: 40, Mem: 104857600 > Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-0, Cpu: 8, Mem: 283115520 > Feb 23 14:47:40.376: INFO: Pod for on the node: alertmanager-main-2, Cpu: 8, Mem: 283115520 > Feb 23 14:47:40.376: INFO: Pod for on the node: grafana-5b8f5b6d96-gwb98, Cpu: 5, Mem: 125829120 > Feb 23 14:47:40.376: INFO: Pod for on the node: node-exporter-6pw82, Cpu: 9, Mem: 220200960 > Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-adapter-5557d74fdf-xj5sq, Cpu: 1, Mem: 26214400 > Feb 23 14:47:40.376: INFO: Pod for on the node: prometheus-k8s-0, Cpu: 76, Mem: 1262485504 > Feb 23 14:47:40.376: INFO: Pod for on the node: thanos-querier-57564f89f7-xvh4z, Cpu: 9, Mem: 96468992 > Feb 23 14:47:40.376: INFO: Pod for on the node: multus-d76x9, Cpu: 10, Mem: 157286400 > Feb 23 14:47:40.376: INFO: Pod for on the node: network-metrics-daemon-nv4nm, Cpu: 20, Mem: 125829120 > Feb 23 14:47:40.376: INFO: Pod for on the node: network-check-target-rz8wn, Cpu: 10, Mem: 15728640 > Feb 23 14:47:40.376: INFO: Pod for on the node: ovs-rxjl5, Cpu: 15, Mem: 419430400 > Feb 23 14:47:40.376: INFO: Pod for on the node: sdn-8q7d2, Cpu: 110, Mem: 230686720 > Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedCPUResource: 1186, cpuAllocatableMil: 3500, cpuFraction: 0.33885714285714286 > Feb 23 14:47:40.376: INFO: Node: ci-op-cvr5bfr2-df208-g28mm-worker-d-qp78t, totalRequestedMemResource: 4937744383, memAllocatableVal: 14568333312, memFraction: 0.3389368074749332 You can see that the pods on each node are different. I am thinking these tests would benefit from being serial (or removed) due to their unpredictability in high-usage clusters like ours. Moving to MODIFIED, as all the linked PRs have merged or been closed in favor of other PRs which then merged Hello Mike, Tried verifying the bug here but do not see any failures / flakes from 4.8 cluster but when looked in the link [1] i see that on 4.7 it has always been falking. Is this expected ? Thanks !! [1] https://search.ci.openshift.org/?search=Multi-AZ+Clusters+should+spread+the+pods+of+a+replication+controller+across+zones&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job Okay, thanks for checking. It looks like the changes we merged need to be backported to 4.7 then (I wasn't sure if they already had been). I'll open those PRs and link them to this bug Do not see any failures with respect to 4.8 runs but still see that it fails with 4.7, so moving the bug back to assigned state. Hello Mike, I checked the bug again in the following test runs and i still see that it is being listed as flaky in all 4.7 runs again and when looked into the details i see that log says 'passed' with details being nil in [1]..[6] & in one of the run it failed with the error listed at [7]. [1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-upgrade-from-stable-4.6-e2e-aws-ovn-upgrade/1434744704989138944 [2] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.7-upgrade-from-stable-4.6-e2e-aws-upgrade/1434735807087775744 [3] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.7-e2e-gcp/1434735817128939520 [4] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-upi-4.7/1434735807347822592 [5] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-gcp/1434625145909022720 [6] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.7-e2e-gcp/1434267140524871680 [7] https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/719/pull-ci-openshift-ovn-kubernetes-release-4.7-e2e-gcp-ovn/1434605526221590528 Thanks kasturi Only appearing in 4.7 tests and at a very low rate. Propose this gets closed, from TRT perspective this is not a prio. Given the priority and the current time frame I don't think we'll be able to address this issue in 4.7. |