+++ This bug was initially created as a clone of Bug #1999288 +++ Description of problem: The following sig-scheduling OCP tests are failing due to incorrect CPU and memory calculations by the script: [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] Version-Release number of selected component (if applicable): OCP 4.8 How reproducible: Always Steps to Reproduce: 1. Create an OpenShift 4.8 cluster 2. Run the sig-scheduling tests mentioned above. 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script. Actual results: The test fails because the script only checks the CPU and memory values of the tigera-operator pod, and also calculates these values incorrectly. The following logs show the incorrect CPU and memory calculations of the script: ``` Aug 25 14:55:20.299: INFO: ComputeCPUMemFraction for node: 10.5.149.223 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Node: 10.5.149.223, totalRequestedCPUResource: 4300, cpuAllocatableMil: 3910, cpuFraction: 1 Aug 25 14:55:20.299: INFO: Node: 10.5.149.223, totalRequestedMemResource: 1866465280, memAllocatableVal: 13808427008, memFraction: 0.13516856618923007 ``` Expected results: The test must calculate the CPU and memory of every pod in the cluster correctly. Additional info: If we create a namespace "zzz" and the following pod in it, then the test passes: ``` kubectl apply -f - <<EOF --- apiVersion: v1 kind: Pod metadata: name: zzz namespace: zzz spec: containers: - name: zzz image: us.icr.io/armada-master/pause:3.2 resources: requests: cpu: 100m memory: 100Mi EOF 1 pass, 0 skip (1m47s) + [[ 0 -eq 0 ]] + echo 'SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.' SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w. vagrant@verify-cluster:~/kubernetes-e2e-test-cases/tests$ ``` --- Additional comment from Jan Chaloupka on 2021-09-01 11:01:27 UTC --- > 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script. What are the expected actual values? Which command do you use to see the actual values? --- Additional comment from on 2021-09-01 15:36:47 UTC --- (In reply to Jan Chaloupka from comment #1) > > 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script. > > What are the expected actual values? Which command do you use to see the > actual values? You can use the command; watch -n 3 oc adm top pods --namespace="namespace_of_the_pod_that_is_being_verified" to view the actual CPU and memory values in another window. For example: watch -n 3 oc adm top pods --namespace=tigera-operator During the test, these were the actual values of the tigera-operator pod, not the ones that were being displayed/calculated by the test: ``` Every 3.0s: oc adm top pods --namespace=tigera-... NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 3m 77Mi NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 4m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 3m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 4m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 9m 90Mi ``` The test displayed a higher CPU value than what was actually consumed by the pod. --- Additional comment from Jan Chaloupka on 2021-09-02 16:26:40 UTC --- Are you referring to createBalancedPodForNodes? oc adm top pods displays current usage of resources based on what cadvisor provides. Whereas createBalancedPodForNodes relies only on the resource requests provided by pods. So the difference you reported is expected. Can you share links of the failed tests? --- Additional comment from on 2021-09-09 04:18:08 UTC --- (In reply to Jan Chaloupka from comment #3) > Are you referring to createBalancedPodForNodes? > > oc adm top pods displays current usage of resources based on what cadvisor > provides. Whereas createBalancedPodForNodes relies only on the resource > requests provided by pods. So the difference you reported is expected. > > Can you share links of the failed tests? Here are the tests that are failing: [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] Link to the test: https://github.com/openshift/origin/blob/release-4.8/vendor/k8s.io/kubernetes/test/e2e/scheduling/priorities.go --- Additional comment from Jan Chaloupka on 2021-09-09 07:01:53 UTC --- Apologies, I meant CI runs of the failed tests. From https://prow.ci.openshift.org/. --- Additional comment from Richard Theis on 2021-09-13 21:22:06 UTC --- We are running these tests on Red Hat OpenShift on IBM Cloud clusters via IBM Cloud CI. There are no failed test runs in https://prow.ci.openshift.org/ related to this bugzilla. But the lack of test failures in OpenShift CI does not mean that this is not a valid test problem. I suspect that the last pod found on OpenShift clusters run in CI allows the test to pass. I believe the previous comments show how to reproduce the problem. If not, please let us know. Thanks. --- Additional comment from Jan Chaloupka on 2021-09-14 08:46:45 UTC --- I am asking for the test failures from the https://prow.ci.openshift.org/ so I can see the entire failure logs and to also have a proof so we can alter the test upstream if needed. It's hard to convince upstream to merge any change without the failure logs in this case. Checking https://search.ci.openshift.org/ for the last 14 days: - [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s] No results found - [sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s] Few tests failed due to overall cluster reasons (NS not created, error creating a pod, ...) - [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] Few tests failed due to overall cluster reasons (NS not created, no node available for scheduling, ...) smitha.subbarao, can you share the entire test run including the failures? > We are running these tests on Red Hat OpenShift on IBM Cloud clusters via IBM Cloud CI. Is it a part of a CI system? Assuming 4.8 version of OpenShift (as reported). Are there other versions where the test fails as well? --- Additional comment from Richard Theis on 2021-09-14 11:10:57 UTC --- I think that we have provided enough details for a fix to be provide. But we can provide the full logs from our test run if that would help. And there is no failure in https://prow.ci.openshift.org/. This failure is seen in IBM's CI system only on OpenShift version 4.8. Smitha, can you please provide the full test failure logs? --- Additional comment from on 2021-09-14 13:17:37 UTC --- This file contains the full test failure log of the following OCP 4.8 tests: "[sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]" "[sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s]" "[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]" --- Additional comment from Jan Chaloupka on 2021-09-15 11:26:45 UTC --- --- Additional comment from Jan Chaloupka on 2021-09-15 12:17:36 UTC --- Can you share more insight about how you run the tests? Checking the logs all the "Pod for on the node: " lines report exactly the same pod "tigera-operator-7d896c66cd-klhq5" (quite strange). occurrences: - for 10.5.149.223 32 occurrences - for 10.5.149.234 39 occurrences - for 10.5.149.237 43 occurrences Checking cpu fractions: - for 10.5.149.223 0.8439897698209718 - for 10.5.149.234 1 - for 10.5.149.237 1 Meaning both 10.5.149.234 and 10.5.149.237 are saturated. So the filler pods will fail to be scheduled (at least for 10.5.149.234 and 10.5.149.237) since there's no cpu resource left. Thus the test must fail. Questions: - how saturated your pod is before running the test suite (i.e. resource consumption of each node)? - how do you create the tigera-operator pod(s)? - does every tigera-operator have its own NS? Or, is there only a single replica of the operator? Or, each node has its own replica of the operator? In which NS the operator lives? - do you run the test over a real cluster or over a mock/fake cluster (i.e. with fake clientset?) - can you run `oc get pods -A` every second during the test run (to see how many tigera pods are in Terminated/Running state) while running only those 3 tests? - can you provide all kube-scheduler logs (3 files assuming there are 3 master nodes)? --- Additional comment from Richard Theis on 2021-09-15 17:17:48 UTC --- Exactly... "Pod for on the node: " lines report exactly the same pod "tigera-operator-7d896c66cd-klhq5" (quite strange). This is the test bug in my opinion. The test is incorrectly calculating cpu and memory because it is only using the last pod found in the cluster. This bugzilla description show how we can manipulate the cluster to yield either a test failure or success. --- Additional comment from on 2021-09-20 20:00:29 UTC --- Resource consumption of each node before the test is shown below (the test is conducted using an actual ROKS cluster. The `oc get pods -A` logs will be added in a following comment. ``` ➜ amd64 git:(release-4.8) kubectl describe nodes | grep 'Name:\| cpu\| memory' git:(release-4.8|) Name: 10.5.149.170 cpu: 4 memory: 16260860Ki cpu: 3910m memory: 13484796Ki cpu 1246m (31%) 1800m (46%) memory 3751443Ki (27%) 2036000Ki (15%) Name: 10.5.149.191 cpu: 4 memory: 16260856Ki cpu: 3910m memory: 13484792Ki cpu 1218m (31%) 600m (15%) memory 2928147Ki (21%) 3952928Ki (29%) Name: 10.5.149.196 cpu: 4 memory: 16260852Ki cpu: 3910m memory: 13484788Ki cpu 1354m (34%) 600m (15%) memory 3567123Ki (26%) 826572800 (5%) ``` To reiterate Richard's response, the test keeps referring to the tiger-operator pod because it seems to check the last pod found in the cluster. The steps to manipulate the cluster to successfully pass the test are below (same as the ones in the description): 1. Create a namespace "zzz" 2. Create the following pod in the "zzz" namespace and re-run the test - the test will pass. ``` kubectl apply -f - <<EOF --- apiVersion: v1 kind: Pod metadata: name: zzz namespace: zzz spec: containers: - name: zzz image: us.icr.io/armada-master/pause:3.2 resources: requests: cpu: 100m memory: 100Mi EOF 1 pass, 0 skip (1m47s) + [[ 0 -eq 0 ]] + echo 'SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.' SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w. vagrant@verify-cluster:~/kubernetes-e2e-test-cases/tests$ ``` --- Additional comment from on 2021-09-20 22:46:19 UTC --- There's only 1 tigera-operator pod that is running throughput the test, tigera-operator-7d896c66cd-qlbwt: ``` ➜ amd64 git:(release-4.8) oc get pods -A git:(release-4.8|) NAMESPACE NAME READY STATUS RESTARTS AGE calico-system calico-kube-controllers-d78c469ff-jjvpj 1/1 Running 0 7d7h calico-system calico-node-92xqf 1/1 Running 0 7d7h calico-system calico-node-9zxlr 1/1 Running 0 7d7h calico-system calico-node-lrttq 1/1 Running 0 7d7h calico-system calico-typha-75bbbcf6df-9wgd6 1/1 Running 0 7d7h calico-system calico-typha-75bbbcf6df-v98vn 1/1 Running 0 7d7h calico-system calico-typha-75bbbcf6df-vlqz8 1/1 Running 0 7d7h e2e-sched-priority-2483 aa311e35-cedc-4326-b6cd-3ac2d809626b-0 0/1 Pending 0 5m14s ibm-system ibm-cloud-provider-ip-169-60-45-162-5dc8b94d6d-hcftw 1/1 Running 0 8h ibm-system ibm-cloud-provider-ip-169-60-45-162-5dc8b94d6d-xpw7f 1/1 Running 0 8h kube-system ibm-file-plugin-699bf5596-dwc4r 1/1 Running 0 8h kube-system ibm-keepalived-watcher-64mgg 1/1 Running 0 8h kube-system ibm-keepalived-watcher-7vl2x 1/1 Running 0 8h kube-system ibm-keepalived-watcher-9z4mh 1/1 Running 0 8h kube-system ibm-master-proxy-static-10.5.149.170 2/2 Running 0 7d7h kube-system ibm-master-proxy-static-10.5.149.191 2/2 Running 0 7d7h kube-system ibm-master-proxy-static-10.5.149.196 2/2 Running 0 7d7h kube-system ibm-storage-metrics-agent-5dc6c457c7-spspn 1/1 Running 0 4h12m kube-system ibm-storage-watcher-856bcd698b-j8wzx 1/1 Running 0 8h kube-system ibmcloud-block-storage-driver-4hwj5 1/1 Running 0 8h kube-system ibmcloud-block-storage-driver-mptq6 1/1 Running 0 8h kube-system ibmcloud-block-storage-driver-nds5x 1/1 Running 0 8h kube-system ibmcloud-block-storage-plugin-649688f859-6pzcc 1/1 Running 0 8h kube-system vpn-56c795f968-92n5f 1/1 Running 0 7d7h openshift-cluster-node-tuning-operator cluster-node-tuning-operator-7b764df77c-qh9k2 1/1 Running 0 8h openshift-cluster-node-tuning-operator tuned-jjmfq 1/1 Running 0 8h openshift-cluster-node-tuning-operator tuned-msbxb 1/1 Running 0 8h openshift-cluster-node-tuning-operator tuned-rlwsk 1/1 Running 0 8h openshift-cluster-samples-operator cluster-samples-operator-59f699dcbf-sz76r 2/2 Running 0 8h openshift-cluster-storage-operator cluster-storage-operator-78c6bfb7b4-d5qrp 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-controller-cb6558866-4x2lp 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-controller-cb6558866-zgc4j 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-controller-operator-7b4c9b4ffc-w96lf 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-webhook-687d7ddb94-6thcn 1/1 Running 0 8h openshift-cluster-storage-operator csi-snapshot-webhook-687d7ddb94-d4bg2 1/1 Running 0 8h openshift-console-operator console-operator-5588c56b5b-ql56x 1/1 Running 1 8h openshift-console console-5c5b64c998-br9rq 1/1 Running 0 8h openshift-console console-5c5b64c998-jwctb 1/1 Running 0 8h openshift-console downloads-8b49bb4c5-dj7d9 1/1 Running 0 8h openshift-console downloads-8b49bb4c5-k9wcd 1/1 Running 0 8h openshift-dns-operator dns-operator-74cd5949f5-lxhwt 2/2 Running 0 8h openshift-dns dns-default-d99dg 2/2 Running 0 8h openshift-dns dns-default-x85rk 2/2 Running 0 8h openshift-dns dns-default-z4pjg 2/2 Running 0 8h openshift-dns node-resolver-m9mlz 1/1 Running 0 8h openshift-dns node-resolver-md5v2 1/1 Running 0 8h openshift-dns node-resolver-nj2mc 1/1 Running 0 8h openshift-image-registry cluster-image-registry-operator-75d5684d7c-8nf47 1/1 Running 1 8h openshift-image-registry image-pruner-27198720-4zt76 0/1 Completed 0 2d22h openshift-image-registry image-pruner-27200160-82m5m 0/1 Completed 0 46h openshift-image-registry image-pruner-27201600-r6clj 0/1 Completed 0 22h openshift-image-registry image-registry-868f5d4b5c-pft2z 1/1 Running 0 8h openshift-image-registry node-ca-cxggw 1/1 Running 0 8h openshift-image-registry node-ca-nqldr 1/1 Running 0 8h openshift-image-registry node-ca-w4qll 1/1 Running 0 8h openshift-image-registry registry-pvc-permissions-gsg9b 0/1 Completed 0 8h openshift-ingress-canary ingress-canary-2dlp9 1/1 Running 0 8h openshift-ingress-canary ingress-canary-75krd 1/1 Running 0 8h openshift-ingress-canary ingress-canary-wk8tx 1/1 Running 0 8h openshift-ingress-operator ingress-operator-76f5b96d7c-dh9fn 2/2 Running 0 8h openshift-ingress router-default-77c7f8cb7d-2px27 1/1 Running 0 8h openshift-ingress router-default-77c7f8cb7d-cwr96 1/1 Running 0 8h openshift-kube-proxy openshift-kube-proxy-dzz98 2/2 Running 0 8h openshift-kube-proxy openshift-kube-proxy-gg6gs 2/2 Running 0 8h openshift-kube-proxy openshift-kube-proxy-swttg 2/2 Running 0 8h openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-6879c94bfc-rmmz8 1/1 Running 1 8h openshift-kube-storage-version-migrator migrator-7d5cdcd9cc-klwf6 1/1 Running 0 8h openshift-marketplace certified-operators-jnps6 1/1 Running 0 12h openshift-marketplace community-operators-zptdk 1/1 Running 0 3h50m openshift-marketplace marketplace-operator-7c69549b9f-dg6t6 1/1 Running 0 8h openshift-marketplace redhat-marketplace-jk66g 1/1 Running 0 12h openshift-marketplace redhat-operators-7vndn 1/1 Running 0 5h43m openshift-monitoring alertmanager-main-0 5/5 Running 0 8h openshift-monitoring alertmanager-main-1 5/5 Running 0 8h openshift-monitoring alertmanager-main-2 5/5 Running 0 8h openshift-monitoring cluster-monitoring-operator-7b5f987df8-j2vpk 2/2 Running 0 8h openshift-monitoring grafana-5c98cd844-tcnwt 2/2 Running 0 8h openshift-monitoring kube-state-metrics-7485cb5695-zf848 3/3 Running 0 8h openshift-monitoring node-exporter-fs554 2/2 Running 0 8h openshift-monitoring node-exporter-lq957 2/2 Running 0 8h openshift-monitoring node-exporter-ww6sh 2/2 Running 0 8h openshift-monitoring openshift-state-metrics-65c6597c7-zcfvp 3/3 Running 0 8h openshift-monitoring prometheus-adapter-7586b977cb-cv44c 1/1 Running 0 8h openshift-monitoring prometheus-adapter-7586b977cb-vpjfv 1/1 Running 0 8h openshift-monitoring prometheus-k8s-0 7/7 Running 1 8h openshift-monitoring prometheus-k8s-1 7/7 Running 1 8h openshift-monitoring prometheus-operator-599d68ffbf-wvg5w 2/2 Running 0 8h openshift-monitoring telemeter-client-767f4f8d6b-7649d 3/3 Running 0 8h openshift-monitoring thanos-querier-84bcffdd-h7dj6 5/5 Running 0 8h openshift-monitoring thanos-querier-84bcffdd-ndznd 5/5 Running 0 8h openshift-multus multus-57tn4 1/1 Running 0 8h openshift-multus multus-additional-cni-plugins-dbvq2 1/1 Running 0 8h openshift-multus multus-additional-cni-plugins-fkxg5 1/1 Running 0 8h openshift-multus multus-additional-cni-plugins-wlzq8 1/1 Running 0 8h openshift-multus multus-admission-controller-n7qcx 2/2 Running 0 8h openshift-multus multus-admission-controller-v9vx6 2/2 Running 0 8h openshift-multus multus-admission-controller-vlfn7 2/2 Running 0 8h openshift-multus multus-n6dsg 1/1 Running 0 8h openshift-multus multus-p8bpq 1/1 Running 0 8h openshift-multus network-metrics-daemon-25jh8 2/2 Running 0 8h openshift-multus network-metrics-daemon-jjpgw 2/2 Running 0 8h openshift-multus network-metrics-daemon-tv555 2/2 Running 0 8h openshift-network-diagnostics network-check-source-6ccd7c5589-glnkg 1/1 Running 0 8h openshift-network-diagnostics network-check-target-5qf9j 1/1 Running 0 8h openshift-network-diagnostics network-check-target-sjmvb 1/1 Running 0 8h openshift-network-diagnostics network-check-target-thbzf 1/1 Running 0 8h openshift-network-operator network-operator-85544fbdbc-4nb5h 1/1 Running 1 8h openshift-operator-lifecycle-manager catalog-operator-7bbb999f99-492vz 1/1 Running 0 8h openshift-operator-lifecycle-manager olm-operator-7bfd55d5c7-swmzn 1/1 Running 0 8h openshift-operator-lifecycle-manager packageserver-c8d74b46d-6j6sn 1/1 Running 0 8h openshift-operator-lifecycle-manager packageserver-c8d74b46d-9j4gz 1/1 Running 0 8h openshift-roks-metrics metrics-5fb9d747f7-6mjh5 1/1 Running 0 8h openshift-roks-metrics push-gateway-57868bfdb9-d5lq2 1/1 Running 0 8h openshift-service-ca-operator service-ca-operator-7f994cb49b-shkgm 1/1 Running 1 8h openshift-service-ca service-ca-847c7856dc-7tmwz 1/1 Running 1 8h tigera-operator tigera-operator-7d896c66cd-qlbwt 1/1 Running 4 7d7h ``` --- Additional comment from Jan Chaloupka on 2021-09-23 11:16:47 UTC --- Thank you for all the provided data. Refactoring done in https://github.com/kubernetes/kubernetes/pull/100762 incorrectly constructs the list of pods. Opened a fix upstream in https://github.com/kubernetes/kubernetes/pull/105205.
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Whiteboard if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
Still waiting for the rebase
The fix got pulled into 4.9 through https://github.com/openshift/kubernetes/pull/1048
Resolution of this ticket depends on resolution of the same issue in higher versions. I am waiting for higher version fixes to merge so the PR in https://github.com/openshift/origin/pull/26697 gets all the required permissions.
The LifecycleStale keyword was removed because the bug moved to QE. The bug assignee was notified.
After discussing with dev on how to validate this bug i have learnt that "Given this is impossible to see in the junit.xml (since it show logs of the failed tests only by default) and given the bug was not about failing the test, you can move it to VERIFIED directly." Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.18 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0279