1999288 – Scheduling conformance tests are failing due to incorrect CPU and memory calculations in the test script

Bug 1999288 - Scheduling conformance tests are failing due to incorrect CPU and memory calculations in the test script

Summary: Scheduling conformance tests are failing due to incorrect CPU and memory calc...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Jan Chaloupka
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1999285 (view as bug list)
Depends On:	2008181
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-30 19:32 UTC by smitha.subbarao
Modified:	2022-03-16 11:30 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	2008181 (view as bug list)
Environment:
Last Closed:	2022-03-16 11:30:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Complete test failure log of the failing OCP 4.8 tests (119.33 KB, text/plain) 2021-09-14 13:17 UTC, smitha.subbarao	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubernetes kubernetes pull 105205	0	None	Merged	e2e scheduling priorities: do not reference control loop variable	2021-10-14 08:26:30 UTC
Github	openshift kubernetes pull 1060	0	None	Waiting on Red Hat	[Sat6/Tools/Question] katello-host-tools yum plugins in rhel7 but not in rhel8	2022-06-16 13:24:43 UTC
Github	openshift origin pull 26886	0	None	Merged	bug 1999288: [release-4.8]: bump openshift/kubernetes to the latest	2022-03-09 09:45:54 UTC
Red Hat Bugzilla	1916489	1	None	None	None	2021-08-30 19:32:27 UTC
Red Hat Product Errata	RHBA-2022:0795	0	None	None	None	2022-03-16 11:30:33 UTC

Description smitha.subbarao 2021-08-30 19:32:28 UTC

Description of problem:
The following sig-scheduling OCP tests are failing due to incorrect CPU and memory calculations by the script:

[sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]


Version-Release number of selected component (if applicable):
OCP 4.8

How reproducible:
Always

Steps to Reproduce:
1. Create an OpenShift 4.8 cluster 
2. Run the sig-scheduling tests mentioned above.
3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script.

Actual results:
The test fails because the script only checks the CPU and memory values of the tigera-operator pod, and also calculates these values incorrectly.

The following logs show the incorrect CPU and memory calculations of the script:
```
Aug 25 14:55:20.299: INFO: ComputeCPUMemFraction for node: 10.5.149.223
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Pod for on the node: tigera-operator-7d896c66cd-klhq5, Cpu: 100, Mem: 41943040
Aug 25 14:55:20.299: INFO: Node: 10.5.149.223, totalRequestedCPUResource: 4300, cpuAllocatableMil: 3910, cpuFraction: 1
Aug 25 14:55:20.299: INFO: Node: 10.5.149.223, totalRequestedMemResource: 1866465280, memAllocatableVal: 13808427008, memFraction: 0.13516856618923007
```

Expected results:
The test must calculate the CPU and memory of every pod in the cluster correctly.


Additional info:

If we create a namespace "zzz" and the following pod in it, then the test passes:

```
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Pod
metadata:
  name: zzz
  namespace: zzz
spec:
  containers:
  - name: zzz
    image: us.icr.io/armada-master/pause:3.2
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
EOF
1 pass, 0 skip (1m47s)
+ [[ 0 -eq 0 ]]
+ echo 'SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.'
SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.
vagrant@verify-cluster:~/kubernetes-e2e-test-cases/tests$ 
```

Comment 1 Jan Chaloupka 2021-09-01 11:01:27 UTC

> 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script.

What are the expected actual values? Which command do you use to see the actual values?

Comment 2 smitha.subbarao 2021-09-01 15:36:47 UTC

(In reply to Jan Chaloupka from comment #1)
> > 3. Observe the CPU and memory logs of the pods in the cluster in another window - the actual CPU and memory consumption values of the pods are much less than what is being calculated by the script.
> 
> What are the expected actual values? Which command do you use to see the
> actual values?

You can use the command; watch -n 3 oc adm top pods --namespace="namespace_of_the_pod_that_is_being_verified" to view the actual CPU and memory values in another window. For example: watch -n 3 oc adm top pods --namespace=tigera-operator

During the test, these were the actual values of the tigera-operator pod, not the ones that were being displayed/calculated by the test:

```
Every 3.0s: oc adm top pods --namespace=tigera-... 

NAME                               CPU(cores)   MEMORY(bytes)
tigera-operator-667cd558f7-szmrj   3m           77Mi
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   2m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   4m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   2m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   2m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   2m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   2m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   2m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   3m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   4m           97Mi            
➜  ~ oc adm top pods --namespace=tigera-operator
NAME                               CPU(cores)   MEMORY(bytes)   
tigera-operator-667cd558f7-szmrj   9m           90Mi   
```


The test displayed a higher CPU value than what was actually consumed by the pod.

Comment 3 Jan Chaloupka 2021-09-02 16:26:40 UTC

Are you referring to createBalancedPodForNodes?

oc adm top pods displays current usage of resources based on what cadvisor provides. Whereas createBalancedPodForNodes relies only on the resource requests provided by pods. So the difference you reported is expected.

Can you share links of the failed tests?

Comment 4 smitha.subbarao 2021-09-09 04:18:08 UTC

(In reply to Jan Chaloupka from comment #3)
> Are you referring to createBalancedPodForNodes?
> 
> oc adm top pods displays current usage of resources based on what cadvisor
> provides. Whereas createBalancedPodForNodes relies only on the resource
> requests provided by pods. So the difference you reported is expected.
> 
> Can you share links of the failed tests?

Here are the tests that are failing:

[sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s]
[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]

Link to the test: https://github.com/openshift/origin/blob/release-4.8/vendor/k8s.io/kubernetes/test/e2e/scheduling/priorities.go

Comment 5 Jan Chaloupka 2021-09-09 07:01:53 UTC

Apologies, I meant CI runs of the failed tests. From https://prow.ci.openshift.org/.

Comment 6 Richard Theis 2021-09-13 21:22:06 UTC

We are running these tests on Red Hat OpenShift on IBM Cloud clusters via IBM Cloud CI.  There are no failed test runs in https://prow.ci.openshift.org/ related to this bugzilla.  But the lack of test failures in OpenShift CI does not mean that this is not a valid test problem.  I suspect that the last pod found on OpenShift clusters run in CI allows the test to pass.  I believe the previous comments show how to reproduce the problem.  If not, please let us know.  Thanks.

Comment 7 Jan Chaloupka 2021-09-14 08:46:45 UTC

I am asking for the test failures from the https://prow.ci.openshift.org/ so I can see the entire failure logs and to also have a proof so we can alter the test upstream if needed. It's hard to convince upstream to merge any change without the failure logs in this case. Checking https://search.ci.openshift.org/ for the last 14 days:
- [sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]
  No results found
- [sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s]
  Few tests failed due to overall cluster reasons (NS not created, error creating a pod, ...)
- [sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]
  Few tests failed due to overall cluster reasons (NS not created, no node available for scheduling, ...)

smitha.subbarao, can you share the entire test run including the failures?

> We are running these tests on Red Hat OpenShift on IBM Cloud clusters via IBM Cloud CI.

Is it a part of a CI system? Assuming 4.8 version of OpenShift (as reported). Are there other versions where the test fails as well?

Comment 8 Richard Theis 2021-09-14 11:10:57 UTC

I think that we have provided enough details for a fix to be provide.  But we can provide the full logs from our test run if that would help.  And there is no failure in https://prow.ci.openshift.org/. This failure is seen in IBM's CI system only on OpenShift version 4.8.

Smitha, can you please provide the full test failure logs?

Comment 9 smitha.subbarao 2021-09-14 13:17:37 UTC

Created attachment 1822990 [details]
Complete test failure log of the failing OCP 4.8 tests

This file contains the full test failure log of the following OCP 4.8 tests:

"[sig-scheduling] SchedulerPriorities [Serial] Pod should avoid nodes that have avoidPod annotation [Suite:openshift/conformance/serial] [Suite:k8s]"
"[sig-scheduling] SchedulerPriorities [Serial] Pod should be preferably scheduled to nodes pod can tolerate [Suite:openshift/conformance/serial] [Suite:k8s]"
"[sig-scheduling] SchedulerPriorities [Serial] PodTopologySpread Scoring validates pod should be preferably scheduled to node which makes the matching pods more evenly distributed [Suite:openshift/conformance/serial] [Suite:k8s]"

Comment 10 Jan Chaloupka 2021-09-15 11:26:45 UTC

*** Bug 1999285 has been marked as a duplicate of this bug. ***

Comment 11 Jan Chaloupka 2021-09-15 12:17:36 UTC

Can you share more insight about how you run the tests? Checking the logs all the "Pod for on the node: " lines report exactly the same pod "tigera-operator-7d896c66cd-klhq5" (quite strange).

occurrences:
- for 10.5.149.223 32 occurrences
- for 10.5.149.234 39 occurrences
- for 10.5.149.237 43 occurrences

Checking cpu fractions:
- for 10.5.149.223 0.8439897698209718
- for 10.5.149.234 1
- for 10.5.149.237 1

Meaning both 10.5.149.234 and 10.5.149.237 are saturated. So the filler pods will fail to be scheduled (at least for 10.5.149.234 and 10.5.149.237) since there's no cpu resource left.
Thus the test must fail.

Questions:
- how saturated your pod is before running the test suite (i.e. resource consumption of each node)?
- how do you create the tigera-operator pod(s)?
- does every tigera-operator have its own NS? Or, is there only a single replica of the operator? Or, each node has its own replica of the operator? In which NS the operator lives?
- do you run the test over a real cluster or over a mock/fake cluster (i.e. with fake clientset?)
- can you run `oc get pods -A` every second during the test run (to see how many tigera pods are in Terminated/Running state) while running only those 3 tests?
- can you provide all kube-scheduler logs (3 files assuming there are 3 master nodes)?

Comment 12 Richard Theis 2021-09-15 17:17:48 UTC

Exactly...   

"Pod for on the node: " lines report exactly the same pod "tigera-operator-7d896c66cd-klhq5" (quite strange).

This is the test bug in my opinion.  The test is incorrectly calculating cpu and memory because it is only using the last pod found in the cluster.  This bugzilla description show how we can manipulate the cluster to yield either a test failure or success.

Comment 13 smitha.subbarao 2021-09-20 20:00:29 UTC

Resource consumption of each node before the test is shown below (the test is conducted using an actual ROKS cluster. The `oc get pods -A` logs will be added in a following comment.

```
➜  amd64 git:(release-4.8) kubectl describe nodes | grep 'Name:\|  cpu\|  memory'                                                git:(release-4.8|) 
Name:               10.5.149.170
  cpu:                4
  memory:             16260860Ki
  cpu:                3910m
  memory:             13484796Ki
  cpu                1246m (31%)      1800m (46%)
  memory             3751443Ki (27%)  2036000Ki (15%)
Name:               10.5.149.191
  cpu:                4
  memory:             16260856Ki
  cpu:                3910m
  memory:             13484792Ki
  cpu                1218m (31%)      600m (15%)
  memory             2928147Ki (21%)  3952928Ki (29%)
Name:               10.5.149.196
  cpu:                4
  memory:             16260852Ki
  cpu:                3910m
  memory:             13484788Ki
  cpu                1354m (34%)      600m (15%)
  memory             3567123Ki (26%)  826572800 (5%)
```

To reiterate Richard's response, the test keeps referring to the tiger-operator pod because it seems to check the last pod found in the cluster. 

The steps to manipulate the cluster to successfully pass the test are below (same as the ones in the description):

1. Create a namespace "zzz" 

2. Create the following pod in the "zzz" namespace and re-run the test - the test will pass.

```
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Pod
metadata:
  name: zzz
  namespace: zzz
spec:
  containers:
  - name: zzz
    image: us.icr.io/armada-master/pause:3.2
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
EOF
1 pass, 0 skip (1m47s)
+ [[ 0 -eq 0 ]]
+ echo 'SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.'
SUCCESS: PVG ocp_conformance.sh was successful. Test results are available in directory /tmp/ocp-conformance-z8w.
vagrant@verify-cluster:~/kubernetes-e2e-test-cases/tests$ 
```

Comment 14 smitha.subbarao 2021-09-20 22:46:19 UTC

There's only 1 tigera-operator pod that is running throughput the test, tigera-operator-7d896c66cd-qlbwt:                         
```
➜  amd64 git:(release-4.8) oc get pods -A                                                                  git:(release-4.8|) 
NAMESPACE                                          NAME                                                      READY   STATUS      RESTARTS   AGE
calico-system                                      calico-kube-controllers-d78c469ff-jjvpj                   1/1     Running     0          7d7h
calico-system                                      calico-node-92xqf                                         1/1     Running     0          7d7h
calico-system                                      calico-node-9zxlr                                         1/1     Running     0          7d7h
calico-system                                      calico-node-lrttq                                         1/1     Running     0          7d7h
calico-system                                      calico-typha-75bbbcf6df-9wgd6                             1/1     Running     0          7d7h
calico-system                                      calico-typha-75bbbcf6df-v98vn                             1/1     Running     0          7d7h
calico-system                                      calico-typha-75bbbcf6df-vlqz8                             1/1     Running     0          7d7h
e2e-sched-priority-2483                            aa311e35-cedc-4326-b6cd-3ac2d809626b-0                    0/1     Pending     0          5m14s
ibm-system                                         ibm-cloud-provider-ip-169-60-45-162-5dc8b94d6d-hcftw      1/1     Running     0          8h
ibm-system                                         ibm-cloud-provider-ip-169-60-45-162-5dc8b94d6d-xpw7f      1/1     Running     0          8h
kube-system                                        ibm-file-plugin-699bf5596-dwc4r                           1/1     Running     0          8h
kube-system                                        ibm-keepalived-watcher-64mgg                              1/1     Running     0          8h
kube-system                                        ibm-keepalived-watcher-7vl2x                              1/1     Running     0          8h
kube-system                                        ibm-keepalived-watcher-9z4mh                              1/1     Running     0          8h
kube-system                                        ibm-master-proxy-static-10.5.149.170                      2/2     Running     0          7d7h
kube-system                                        ibm-master-proxy-static-10.5.149.191                      2/2     Running     0          7d7h
kube-system                                        ibm-master-proxy-static-10.5.149.196                      2/2     Running     0          7d7h
kube-system                                        ibm-storage-metrics-agent-5dc6c457c7-spspn                1/1     Running     0          4h12m
kube-system                                        ibm-storage-watcher-856bcd698b-j8wzx                      1/1     Running     0          8h
kube-system                                        ibmcloud-block-storage-driver-4hwj5                       1/1     Running     0          8h
kube-system                                        ibmcloud-block-storage-driver-mptq6                       1/1     Running     0          8h
kube-system                                        ibmcloud-block-storage-driver-nds5x                       1/1     Running     0          8h
kube-system                                        ibmcloud-block-storage-plugin-649688f859-6pzcc            1/1     Running     0          8h
kube-system                                        vpn-56c795f968-92n5f                                      1/1     Running     0          7d7h
openshift-cluster-node-tuning-operator             cluster-node-tuning-operator-7b764df77c-qh9k2             1/1     Running     0          8h
openshift-cluster-node-tuning-operator             tuned-jjmfq                                               1/1     Running     0          8h
openshift-cluster-node-tuning-operator             tuned-msbxb                                               1/1     Running     0          8h
openshift-cluster-node-tuning-operator             tuned-rlwsk                                               1/1     Running     0          8h
openshift-cluster-samples-operator                 cluster-samples-operator-59f699dcbf-sz76r                 2/2     Running     0          8h
openshift-cluster-storage-operator                 cluster-storage-operator-78c6bfb7b4-d5qrp                 1/1     Running     1          8h
openshift-cluster-storage-operator                 csi-snapshot-controller-cb6558866-4x2lp                   1/1     Running     1          8h
openshift-cluster-storage-operator                 csi-snapshot-controller-cb6558866-zgc4j                   1/1     Running     1          8h
openshift-cluster-storage-operator                 csi-snapshot-controller-operator-7b4c9b4ffc-w96lf         1/1     Running     1          8h
openshift-cluster-storage-operator                 csi-snapshot-webhook-687d7ddb94-6thcn                     1/1     Running     0          8h
openshift-cluster-storage-operator                 csi-snapshot-webhook-687d7ddb94-d4bg2                     1/1     Running     0          8h
openshift-console-operator                         console-operator-5588c56b5b-ql56x                         1/1     Running     1          8h
openshift-console                                  console-5c5b64c998-br9rq                                  1/1     Running     0          8h
openshift-console                                  console-5c5b64c998-jwctb                                  1/1     Running     0          8h
openshift-console                                  downloads-8b49bb4c5-dj7d9                                 1/1     Running     0          8h
openshift-console                                  downloads-8b49bb4c5-k9wcd                                 1/1     Running     0          8h
openshift-dns-operator                             dns-operator-74cd5949f5-lxhwt                             2/2     Running     0          8h
openshift-dns                                      dns-default-d99dg                                         2/2     Running     0          8h
openshift-dns                                      dns-default-x85rk                                         2/2     Running     0          8h
openshift-dns                                      dns-default-z4pjg                                         2/2     Running     0          8h
openshift-dns                                      node-resolver-m9mlz                                       1/1     Running     0          8h
openshift-dns                                      node-resolver-md5v2                                       1/1     Running     0          8h
openshift-dns                                      node-resolver-nj2mc                                       1/1     Running     0          8h
openshift-image-registry                           cluster-image-registry-operator-75d5684d7c-8nf47          1/1     Running     1          8h
openshift-image-registry                           image-pruner-27198720-4zt76                               0/1     Completed   0          2d22h
openshift-image-registry                           image-pruner-27200160-82m5m                               0/1     Completed   0          46h
openshift-image-registry                           image-pruner-27201600-r6clj                               0/1     Completed   0          22h
openshift-image-registry                           image-registry-868f5d4b5c-pft2z                           1/1     Running     0          8h
openshift-image-registry                           node-ca-cxggw                                             1/1     Running     0          8h
openshift-image-registry                           node-ca-nqldr                                             1/1     Running     0          8h
openshift-image-registry                           node-ca-w4qll                                             1/1     Running     0          8h
openshift-image-registry                           registry-pvc-permissions-gsg9b                            0/1     Completed   0          8h
openshift-ingress-canary                           ingress-canary-2dlp9                                      1/1     Running     0          8h
openshift-ingress-canary                           ingress-canary-75krd                                      1/1     Running     0          8h
openshift-ingress-canary                           ingress-canary-wk8tx                                      1/1     Running     0          8h
openshift-ingress-operator                         ingress-operator-76f5b96d7c-dh9fn                         2/2     Running     0          8h
openshift-ingress                                  router-default-77c7f8cb7d-2px27                           1/1     Running     0          8h
openshift-ingress                                  router-default-77c7f8cb7d-cwr96                           1/1     Running     0          8h
openshift-kube-proxy                               openshift-kube-proxy-dzz98                                2/2     Running     0          8h
openshift-kube-proxy                               openshift-kube-proxy-gg6gs                                2/2     Running     0          8h
openshift-kube-proxy                               openshift-kube-proxy-swttg                                2/2     Running     0          8h
openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-6879c94bfc-rmmz8   1/1     Running     1          8h
openshift-kube-storage-version-migrator            migrator-7d5cdcd9cc-klwf6                                 1/1     Running     0          8h
openshift-marketplace                              certified-operators-jnps6                                 1/1     Running     0          12h
openshift-marketplace                              community-operators-zptdk                                 1/1     Running     0          3h50m
openshift-marketplace                              marketplace-operator-7c69549b9f-dg6t6                     1/1     Running     0          8h
openshift-marketplace                              redhat-marketplace-jk66g                                  1/1     Running     0          12h
openshift-marketplace                              redhat-operators-7vndn                                    1/1     Running     0          5h43m
openshift-monitoring                               alertmanager-main-0                                       5/5     Running     0          8h
openshift-monitoring                               alertmanager-main-1                                       5/5     Running     0          8h
openshift-monitoring                               alertmanager-main-2                                       5/5     Running     0          8h
openshift-monitoring                               cluster-monitoring-operator-7b5f987df8-j2vpk              2/2     Running     0          8h
openshift-monitoring                               grafana-5c98cd844-tcnwt                                   2/2     Running     0          8h
openshift-monitoring                               kube-state-metrics-7485cb5695-zf848                       3/3     Running     0          8h
openshift-monitoring                               node-exporter-fs554                                       2/2     Running     0          8h
openshift-monitoring                               node-exporter-lq957                                       2/2     Running     0          8h
openshift-monitoring                               node-exporter-ww6sh                                       2/2     Running     0          8h
openshift-monitoring                               openshift-state-metrics-65c6597c7-zcfvp                   3/3     Running     0          8h
openshift-monitoring                               prometheus-adapter-7586b977cb-cv44c                       1/1     Running     0          8h
openshift-monitoring                               prometheus-adapter-7586b977cb-vpjfv                       1/1     Running     0          8h
openshift-monitoring                               prometheus-k8s-0                                          7/7     Running     1          8h
openshift-monitoring                               prometheus-k8s-1                                          7/7     Running     1          8h
openshift-monitoring                               prometheus-operator-599d68ffbf-wvg5w                      2/2     Running     0          8h
openshift-monitoring                               telemeter-client-767f4f8d6b-7649d                         3/3     Running     0          8h
openshift-monitoring                               thanos-querier-84bcffdd-h7dj6                             5/5     Running     0          8h
openshift-monitoring                               thanos-querier-84bcffdd-ndznd                             5/5     Running     0          8h
openshift-multus                                   multus-57tn4                                              1/1     Running     0          8h
openshift-multus                                   multus-additional-cni-plugins-dbvq2                       1/1     Running     0          8h
openshift-multus                                   multus-additional-cni-plugins-fkxg5                       1/1     Running     0          8h
openshift-multus                                   multus-additional-cni-plugins-wlzq8                       1/1     Running     0          8h
openshift-multus                                   multus-admission-controller-n7qcx                         2/2     Running     0          8h
openshift-multus                                   multus-admission-controller-v9vx6                         2/2     Running     0          8h
openshift-multus                                   multus-admission-controller-vlfn7                         2/2     Running     0          8h
openshift-multus                                   multus-n6dsg                                              1/1     Running     0          8h
openshift-multus                                   multus-p8bpq                                              1/1     Running     0          8h
openshift-multus                                   network-metrics-daemon-25jh8                              2/2     Running     0          8h
openshift-multus                                   network-metrics-daemon-jjpgw                              2/2     Running     0          8h
openshift-multus                                   network-metrics-daemon-tv555                              2/2     Running     0          8h
openshift-network-diagnostics                      network-check-source-6ccd7c5589-glnkg                     1/1     Running     0          8h
openshift-network-diagnostics                      network-check-target-5qf9j                                1/1     Running     0          8h
openshift-network-diagnostics                      network-check-target-sjmvb                                1/1     Running     0          8h
openshift-network-diagnostics                      network-check-target-thbzf                                1/1     Running     0          8h
openshift-network-operator                         network-operator-85544fbdbc-4nb5h                         1/1     Running     1          8h
openshift-operator-lifecycle-manager               catalog-operator-7bbb999f99-492vz                         1/1     Running     0          8h
openshift-operator-lifecycle-manager               olm-operator-7bfd55d5c7-swmzn                             1/1     Running     0          8h
openshift-operator-lifecycle-manager               packageserver-c8d74b46d-6j6sn                             1/1     Running     0          8h
openshift-operator-lifecycle-manager               packageserver-c8d74b46d-9j4gz                             1/1     Running     0          8h
openshift-roks-metrics                             metrics-5fb9d747f7-6mjh5                                  1/1     Running     0          8h
openshift-roks-metrics                             push-gateway-57868bfdb9-d5lq2                             1/1     Running     0          8h
openshift-service-ca-operator                      service-ca-operator-7f994cb49b-shkgm                      1/1     Running     1          8h
openshift-service-ca                               service-ca-847c7856dc-7tmwz                               1/1     Running     1          8h
tigera-operator                                    tigera-operator-7d896c66cd-qlbwt                          1/1     Running     4          7d7h
```

Comment 15 Jan Chaloupka 2021-09-23 11:16:47 UTC

Thank you for all the provided data. Refactoring done in https://github.com/kubernetes/kubernetes/pull/100762 incorrectly constructs the list of pods. Opened a fix upstream in https://github.com/kubernetes/kubernetes/pull/105205.

Comment 16 Jan Chaloupka 2021-10-14 08:37:20 UTC

The 4.8 release corresponds with 1.21 kubernetes version. The current process of backporting new changes/fixes from the upstream is based on periodic sync with the released kubernetes minor versions. The latest 1.21 version is 1.12.5 which still does not carry the fix. Thus waiting for the next 1.21.6 release. If the process is too slow and the issue needs to be resolved sooner, please justify and increase severity of the issue.

Comment 17 Jan Chaloupka 2021-11-25 15:28:34 UTC

Still waiting for the rebase

Comment 18 smitha.subbarao 2021-12-13 16:20:15 UTC

(In reply to Jan Chaloupka from comment #17)
> Still waiting for the rebase

Hello Jan - We would like to know when the rebase can be completed, as we are still waiting for the fix to be applied. Thank you.

Comment 19 Jan Chaloupka 2021-12-16 11:48:24 UTC

We are waiting until https://github.com/openshift/kubernetes/pull/1087 merges and the changes get propagated into openshift/origin's test suite.

Comment 20 Jan Chaloupka 2021-12-16 11:53:15 UTC

Correction: The upstream fix got already merged into 4.8 through https://github.com/openshift/kubernetes/pull/1060.

Comment 21 Jan Chaloupka 2022-01-07 11:42:51 UTC

Resolution of this ticket depends on resolution of the same issue in higher versions. I am waiting for higher version fixes to merge so the PR in https://github.com/openshift/origin/pull/26696 gets all the required permissions.

Comment 25 RamaKasturi 2022-03-09 12:02:07 UTC

After discussing with dev on how to validate this bug i have learnt that "Given this is impossible to see in the junit.xml (since it show logs of the failed tests only by default) and given the bug was not about failing the test, you can move it to VERIFIED directly."

Based on the above moving bug to verified state.

Comment 27 errata-xmlrpc 2022-03-16 11:30:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.34 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0795

Note You need to log in before you can comment on or make changes to this bug.