Hide Forgot
Description of problem: The following sig-scheduling OCP tests are failing due to incorrect CPU and memory calculations by the script: [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] Version-Release number of selected component (if applicable): OCP 4.8 How reproducible: Always Steps to Reproduce: 1. Create an OpenShift 4.8 cluster 2. Run the sig-scheduling tests mentioned above. 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script. Actual results: The test fails because the script only checks the CPU and memory values of the tigera-operator pod, and also calculates these values incorrectly. The following logs show the incorrect CPU and memory calculations of the script: ``` Aug 25 14:55:20.299: INFO: ComputeCPUMemFraction for node: 10.5.149.223 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040 Aug 25 14:55:20.299: INFO: Node: 10.5.149.223, totalRequestedCPUResource: 4300, cpuAllocatableMil: 3910, cpuFraction: 1 Aug 25 14:55:20.299: INFO: Node: 10.5.149.223, totalRequestedMemResource: 1866465280, memAllocatableVal: 13808427008, memFraction: 0.13516856618923007 ``` Expected results: The test must calculate the CPU and memory of every pod in the cluster correctly. Additional info: If we create a namespace "zzz" and the following pod in it, then the test passes: ``` kubectl apply -f - <<EOF --- apiVersion: v1 kind: Pod metadata: name: zzz namespace: zzz spec: containers: - name: zzz image: us.icr.io/armada-master/pause:3.2 resources: requests: cpu: 100m memory: 100Mi EOF 1 pass, 0 skip (1m47s) + [[ 0 -eq 0 ]] + echo 'SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.' SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w. vagrant@verify-cluster:~/kubernetes-e2e-test-cases/tests$ ```
> 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script. What are the expected actual values? Which command do you use to see the actual values?
(In reply to Jan Chaloupka from comment #1) > > 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script. > > What are the expected actual values? Which command do you use to see the > actual values? You can use the command; watch -n 3 oc adm top pods --namespace="namespace_of_the_pod_that_is_being_verified" to view the actual CPU and memory values in another window. For example: watch -n 3 oc adm top pods --namespace=tigera-operator During the test, these were the actual values of the tigera-operator pod, not the ones that were being displayed/calculated by the test: ``` Every 3.0s: oc adm top pods --namespace=tigera-... NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 3m 77Mi NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 4m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 2m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 3m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 4m 97Mi ➜ ~ oc adm top pods --namespace=tigera-operator NAME CPU(cores) MEMORY(bytes) tigera-operator-667cd558f7-szmrj 9m 90Mi ``` The test displayed a higher CPU value than what was actually consumed by the pod.
Are you referring to createBalancedPodForNodes? oc adm top pods displays current usage of resources based on what cadvisor provides. Whereas createBalancedPodForNodes relies only on the resource requests provided by pods. So the difference you reported is expected. Can you share links of the failed tests?
(In reply to Jan Chaloupka from comment #3) > Are you referring to createBalancedPodForNodes? > > oc adm top pods displays current usage of resources based on what cadvisor > provides. Whereas createBalancedPodForNodes relies only on the resource > requests provided by pods. So the difference you reported is expected. > > Can you share links of the failed tests? Here are the tests that are failing: [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s] [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] Link to the test: https://github.com/openshift/origin/blob/release-4.8/vendor/k8s.io/kubernetes/test/e2e/scheduling/priorities.go
Apologies, I meant CI runs of the failed tests. From https://prow.ci.openshift.org/.
We are running these tests on Red Hat OpenShift on IBM Cloud clusters via IBM Cloud CI. There are no failed test runs in https://prow.ci.openshift.org/ related to this bugzilla. But the lack of test failures in OpenShift CI does not mean that this is not a valid test problem. I suspect that the last pod found on OpenShift clusters run in CI allows the test to pass. I believe the previous comments show how to reproduce the problem. If not, please let us know. Thanks.
I am asking for the test failures from the https://prow.ci.openshift.org/ so I can see the entire failure logs and to also have a proof so we can alter the test upstream if needed. It's hard to convince upstream to merge any change without the failure logs in this case. Checking https://search.ci.openshift.org/ for the last 14 days: - [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s] No results found - [sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s] Few tests failed due to overall cluster reasons (NS not created, error creating a pod, ...) - [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s] Few tests failed due to overall cluster reasons (NS not created, no node available for scheduling, ...) smitha.subbarao, can you share the entire test run including the failures? > We are running these tests on Red Hat OpenShift on IBM Cloud clusters via IBM Cloud CI. Is it a part of a CI system? Assuming 4.8 version of OpenShift (as reported). Are there other versions where the test fails as well?
I think that we have provided enough details for a fix to be provide. But we can provide the full logs from our test run if that would help. And there is no failure in https://prow.ci.openshift.org/. This failure is seen in IBM's CI system only on OpenShift version 4.8. Smitha, can you please provide the full test failure logs?
Created attachment 1822990 [details] Complete test failure log of the failing OCP 4.8 tests This file contains the full test failure log of the following OCP 4.8 tests: "[sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]" "[sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s]" "[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]"
*** Bug 1999285 has been marked as a duplicate of this bug. ***
Can you share more insight about how you run the tests? Checking the logs all the "Pod for on the node: " lines report exactly the same pod "tigera-operator-7d896c66cd-klhq5" (quite strange). occurrences: - for 10.5.149.223 32 occurrences - for 10.5.149.234 39 occurrences - for 10.5.149.237 43 occurrences Checking cpu fractions: - for 10.5.149.223 0.8439897698209718 - for 10.5.149.234 1 - for 10.5.149.237 1 Meaning both 10.5.149.234 and 10.5.149.237 are saturated. So the filler pods will fail to be scheduled (at least for 10.5.149.234 and 10.5.149.237) since there's no cpu resource left. Thus the test must fail. Questions: - how saturated your pod is before running the test suite (i.e. resource consumption of each node)? - how do you create the tigera-operator pod(s)? - does every tigera-operator have its own NS? Or, is there only a single replica of the operator? Or, each node has its own replica of the operator? In which NS the operator lives? - do you run the test over a real cluster or over a mock/fake cluster (i.e. with fake clientset?) - can you run `oc get pods -A` every second during the test run (to see how many tigera pods are in Terminated/Running state) while running only those 3 tests? - can you provide all kube-scheduler logs (3 files assuming there are 3 master nodes)?
Exactly... "Pod for on the node: " lines report exactly the same pod "tigera-operator-7d896c66cd-klhq5" (quite strange). This is the test bug in my opinion. The test is incorrectly calculating cpu and memory because it is only using the last pod found in the cluster. This bugzilla description show how we can manipulate the cluster to yield either a test failure or success.
Resource consumption of each node before the test is shown below (the test is conducted using an actual ROKS cluster. The `oc get pods -A` logs will be added in a following comment. ``` ➜ amd64 git:(release-4.8) kubectl describe nodes | grep 'Name:\| cpu\| memory' git:(release-4.8|) Name: 10.5.149.170 cpu: 4 memory: 16260860Ki cpu: 3910m memory: 13484796Ki cpu 1246m (31%) 1800m (46%) memory 3751443Ki (27%) 2036000Ki (15%) Name: 10.5.149.191 cpu: 4 memory: 16260856Ki cpu: 3910m memory: 13484792Ki cpu 1218m (31%) 600m (15%) memory 2928147Ki (21%) 3952928Ki (29%) Name: 10.5.149.196 cpu: 4 memory: 16260852Ki cpu: 3910m memory: 13484788Ki cpu 1354m (34%) 600m (15%) memory 3567123Ki (26%) 826572800 (5%) ``` To reiterate Richard's response, the test keeps referring to the tiger-operator pod because it seems to check the last pod found in the cluster. The steps to manipulate the cluster to successfully pass the test are below (same as the ones in the description): 1. Create a namespace "zzz" 2. Create the following pod in the "zzz" namespace and re-run the test - the test will pass. ``` kubectl apply -f - <<EOF --- apiVersion: v1 kind: Pod metadata: name: zzz namespace: zzz spec: containers: - name: zzz image: us.icr.io/armada-master/pause:3.2 resources: requests: cpu: 100m memory: 100Mi EOF 1 pass, 0 skip (1m47s) + [[ 0 -eq 0 ]] + echo 'SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.' SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w. vagrant@verify-cluster:~/kubernetes-e2e-test-cases/tests$ ```
There's only 1 tigera-operator pod that is running throughput the test, tigera-operator-7d896c66cd-qlbwt: ``` ➜ amd64 git:(release-4.8) oc get pods -A git:(release-4.8|) NAMESPACE NAME READY STATUS RESTARTS AGE calico-system calico-kube-controllers-d78c469ff-jjvpj 1/1 Running 0 7d7h calico-system calico-node-92xqf 1/1 Running 0 7d7h calico-system calico-node-9zxlr 1/1 Running 0 7d7h calico-system calico-node-lrttq 1/1 Running 0 7d7h calico-system calico-typha-75bbbcf6df-9wgd6 1/1 Running 0 7d7h calico-system calico-typha-75bbbcf6df-v98vn 1/1 Running 0 7d7h calico-system calico-typha-75bbbcf6df-vlqz8 1/1 Running 0 7d7h e2e-sched-priority-2483 aa311e35-cedc-4326-b6cd-3ac2d809626b-0 0/1 Pending 0 5m14s ibm-system ibm-cloud-provider-ip-169-60-45-162-5dc8b94d6d-hcftw 1/1 Running 0 8h ibm-system ibm-cloud-provider-ip-169-60-45-162-5dc8b94d6d-xpw7f 1/1 Running 0 8h kube-system ibm-file-plugin-699bf5596-dwc4r 1/1 Running 0 8h kube-system ibm-keepalived-watcher-64mgg 1/1 Running 0 8h kube-system ibm-keepalived-watcher-7vl2x 1/1 Running 0 8h kube-system ibm-keepalived-watcher-9z4mh 1/1 Running 0 8h kube-system ibm-master-proxy-static-10.5.149.170 2/2 Running 0 7d7h kube-system ibm-master-proxy-static-10.5.149.191 2/2 Running 0 7d7h kube-system ibm-master-proxy-static-10.5.149.196 2/2 Running 0 7d7h kube-system ibm-storage-metrics-agent-5dc6c457c7-spspn 1/1 Running 0 4h12m kube-system ibm-storage-watcher-856bcd698b-j8wzx 1/1 Running 0 8h kube-system ibmcloud-block-storage-driver-4hwj5 1/1 Running 0 8h kube-system ibmcloud-block-storage-driver-mptq6 1/1 Running 0 8h kube-system ibmcloud-block-storage-driver-nds5x 1/1 Running 0 8h kube-system ibmcloud-block-storage-plugin-649688f859-6pzcc 1/1 Running 0 8h kube-system vpn-56c795f968-92n5f 1/1 Running 0 7d7h openshift-cluster-node-tuning-operator cluster-node-tuning-operator-7b764df77c-qh9k2 1/1 Running 0 8h openshift-cluster-node-tuning-operator tuned-jjmfq 1/1 Running 0 8h openshift-cluster-node-tuning-operator tuned-msbxb 1/1 Running 0 8h openshift-cluster-node-tuning-operator tuned-rlwsk 1/1 Running 0 8h openshift-cluster-samples-operator cluster-samples-operator-59f699dcbf-sz76r 2/2 Running 0 8h openshift-cluster-storage-operator cluster-storage-operator-78c6bfb7b4-d5qrp 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-controller-cb6558866-4x2lp 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-controller-cb6558866-zgc4j 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-controller-operator-7b4c9b4ffc-w96lf 1/1 Running 1 8h openshift-cluster-storage-operator csi-snapshot-webhook-687d7ddb94-6thcn 1/1 Running 0 8h openshift-cluster-storage-operator csi-snapshot-webhook-687d7ddb94-d4bg2 1/1 Running 0 8h openshift-console-operator console-operator-5588c56b5b-ql56x 1/1 Running 1 8h openshift-console console-5c5b64c998-br9rq 1/1 Running 0 8h openshift-console console-5c5b64c998-jwctb 1/1 Running 0 8h openshift-console downloads-8b49bb4c5-dj7d9 1/1 Running 0 8h openshift-console downloads-8b49bb4c5-k9wcd 1/1 Running 0 8h openshift-dns-operator dns-operator-74cd5949f5-lxhwt 2/2 Running 0 8h openshift-dns dns-default-d99dg 2/2 Running 0 8h openshift-dns dns-default-x85rk 2/2 Running 0 8h openshift-dns dns-default-z4pjg 2/2 Running 0 8h openshift-dns node-resolver-m9mlz 1/1 Running 0 8h openshift-dns node-resolver-md5v2 1/1 Running 0 8h openshift-dns node-resolver-nj2mc 1/1 Running 0 8h openshift-image-registry cluster-image-registry-operator-75d5684d7c-8nf47 1/1 Running 1 8h openshift-image-registry image-pruner-27198720-4zt76 0/1 Completed 0 2d22h openshift-image-registry image-pruner-27200160-82m5m 0/1 Completed 0 46h openshift-image-registry image-pruner-27201600-r6clj 0/1 Completed 0 22h openshift-image-registry image-registry-868f5d4b5c-pft2z 1/1 Running 0 8h openshift-image-registry node-ca-cxggw 1/1 Running 0 8h openshift-image-registry node-ca-nqldr 1/1 Running 0 8h openshift-image-registry node-ca-w4qll 1/1 Running 0 8h openshift-image-registry registry-pvc-permissions-gsg9b 0/1 Completed 0 8h openshift-ingress-canary ingress-canary-2dlp9 1/1 Running 0 8h openshift-ingress-canary ingress-canary-75krd 1/1 Running 0 8h openshift-ingress-canary ingress-canary-wk8tx 1/1 Running 0 8h openshift-ingress-operator ingress-operator-76f5b96d7c-dh9fn 2/2 Running 0 8h openshift-ingress router-default-77c7f8cb7d-2px27 1/1 Running 0 8h openshift-ingress router-default-77c7f8cb7d-cwr96 1/1 Running 0 8h openshift-kube-proxy openshift-kube-proxy-dzz98 2/2 Running 0 8h openshift-kube-proxy openshift-kube-proxy-gg6gs 2/2 Running 0 8h openshift-kube-proxy openshift-kube-proxy-swttg 2/2 Running 0 8h openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-6879c94bfc-rmmz8 1/1 Running 1 8h openshift-kube-storage-version-migrator migrator-7d5cdcd9cc-klwf6 1/1 Running 0 8h openshift-marketplace certified-operators-jnps6 1/1 Running 0 12h openshift-marketplace community-operators-zptdk 1/1 Running 0 3h50m openshift-marketplace marketplace-operator-7c69549b9f-dg6t6 1/1 Running 0 8h openshift-marketplace redhat-marketplace-jk66g 1/1 Running 0 12h openshift-marketplace redhat-operators-7vndn 1/1 Running 0 5h43m openshift-monitoring alertmanager-main-0 5/5 Running 0 8h openshift-monitoring alertmanager-main-1 5/5 Running 0 8h openshift-monitoring alertmanager-main-2 5/5 Running 0 8h openshift-monitoring cluster-monitoring-operator-7b5f987df8-j2vpk 2/2 Running 0 8h openshift-monitoring grafana-5c98cd844-tcnwt 2/2 Running 0 8h openshift-monitoring kube-state-metrics-7485cb5695-zf848 3/3 Running 0 8h openshift-monitoring node-exporter-fs554 2/2 Running 0 8h openshift-monitoring node-exporter-lq957 2/2 Running 0 8h openshift-monitoring node-exporter-ww6sh 2/2 Running 0 8h openshift-monitoring openshift-state-metrics-65c6597c7-zcfvp 3/3 Running 0 8h openshift-monitoring prometheus-adapter-7586b977cb-cv44c 1/1 Running 0 8h openshift-monitoring prometheus-adapter-7586b977cb-vpjfv 1/1 Running 0 8h openshift-monitoring prometheus-k8s-0 7/7 Running 1 8h openshift-monitoring prometheus-k8s-1 7/7 Running 1 8h openshift-monitoring prometheus-operator-599d68ffbf-wvg5w 2/2 Running 0 8h openshift-monitoring telemeter-client-767f4f8d6b-7649d 3/3 Running 0 8h openshift-monitoring thanos-querier-84bcffdd-h7dj6 5/5 Running 0 8h openshift-monitoring thanos-querier-84bcffdd-ndznd 5/5 Running 0 8h openshift-multus multus-57tn4 1/1 Running 0 8h openshift-multus multus-additional-cni-plugins-dbvq2 1/1 Running 0 8h openshift-multus multus-additional-cni-plugins-fkxg5 1/1 Running 0 8h openshift-multus multus-additional-cni-plugins-wlzq8 1/1 Running 0 8h openshift-multus multus-admission-controller-n7qcx 2/2 Running 0 8h openshift-multus multus-admission-controller-v9vx6 2/2 Running 0 8h openshift-multus multus-admission-controller-vlfn7 2/2 Running 0 8h openshift-multus multus-n6dsg 1/1 Running 0 8h openshift-multus multus-p8bpq 1/1 Running 0 8h openshift-multus network-metrics-daemon-25jh8 2/2 Running 0 8h openshift-multus network-metrics-daemon-jjpgw 2/2 Running 0 8h openshift-multus network-metrics-daemon-tv555 2/2 Running 0 8h openshift-network-diagnostics network-check-source-6ccd7c5589-glnkg 1/1 Running 0 8h openshift-network-diagnostics network-check-target-5qf9j 1/1 Running 0 8h openshift-network-diagnostics network-check-target-sjmvb 1/1 Running 0 8h openshift-network-diagnostics network-check-target-thbzf 1/1 Running 0 8h openshift-network-operator network-operator-85544fbdbc-4nb5h 1/1 Running 1 8h openshift-operator-lifecycle-manager catalog-operator-7bbb999f99-492vz 1/1 Running 0 8h openshift-operator-lifecycle-manager olm-operator-7bfd55d5c7-swmzn 1/1 Running 0 8h openshift-operator-lifecycle-manager packageserver-c8d74b46d-6j6sn 1/1 Running 0 8h openshift-operator-lifecycle-manager packageserver-c8d74b46d-9j4gz 1/1 Running 0 8h openshift-roks-metrics metrics-5fb9d747f7-6mjh5 1/1 Running 0 8h openshift-roks-metrics push-gateway-57868bfdb9-d5lq2 1/1 Running 0 8h openshift-service-ca-operator service-ca-operator-7f994cb49b-shkgm 1/1 Running 1 8h openshift-service-ca service-ca-847c7856dc-7tmwz 1/1 Running 1 8h tigera-operator tigera-operator-7d896c66cd-qlbwt 1/1 Running 4 7d7h ```
Thank you for all the provided data. Refactoring done in https://github.com/kubernetes/kubernetes/pull/100762 incorrectly constructs the list of pods. Opened a fix upstream in https://github.com/kubernetes/kubernetes/pull/105205.
The 4.8 release corresponds with 1.21 kubernetes version. The current process of backporting new changes/fixes from the upstream is based on periodic sync with the released kubernetes minor versions. The latest 1.21 version is 1.12.5 which still does not carry the fix. Thus waiting for the next 1.21.6 release. If the process is too slow and the issue needs to be resolved sooner, please justify and increase severity of the issue.
Still waiting for the rebase
(In reply to Jan Chaloupka from comment #17) > Still waiting for the rebase Hello Jan - We would like to know when the rebase can be completed, as we are still waiting for the fix to be applied. Thank you.
We are waiting until https://github.com/openshift/kubernetes/pull/1087 merges and the changes get propagated into openshift/origin's test suite.
Correction: The upstream fix got already merged into 4.8 through https://github.com/openshift/kubernetes/pull/1060.
Resolution of this ticket depends on resolution of the same issue in higher versions. I am waiting for higher version fixes to merge so the PR in https://github.com/openshift/origin/pull/26696 gets all the required permissions.
After discussing with dev on how to validate this bug i have learnt that "Given this is impossible to see in the junit.xml (since it show logs of the failed tests only by default) and given the bug was not about failing the test, you can move it to VERIFIED directly." Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.34 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0795