Bug 1858292

Summary: [Descheduler] Descheduler logs get truncated
Product: OpenShift Container Platform Reporter: RamaKasturi <knarra>
Component: kube-schedulerAssignee: Mike Dame <mdame>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: medium    
Version: 4.6CC: aos-bugs, maszulik, mdame, mfojtik, yinzhou
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:15:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description RamaKasturi 2020-07-17 12:40:25 UTC
Description of problem:
I see that descheduler logs get truncated. 
Previous version of log: http://pastebin.test.redhat.com/885328
current version of log: http://pastebin.test.redhat.com/885329
configmap for the cluster: http://pastebin.test.redhat.com/885333

Version-Release number of selected component (if applicable):
4.6
"quay.io/openshift/origin-cluster-kube-descheduler-operator:latest"
"quay.io/openshift/origin-descheduler:latest"

How reproducible:
always 

Steps to Reproduce:
1. Install latest descheduler from master
2. Enable all strategies
3. Run command oc logs -f <cluster-XXX>

Actual results:
I see that logs of descheduler gets truncated and below is what is shown
pute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.264997       1 request.go:557] Throttling request took 170.200367ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-166-203.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.465021       1 request.go:557] Throttling request took 101.944454ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.664981       1 request.go:557] Throttling request took 102.624764ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-209-116.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.864972       1 request.go:557] Throttling request took 174.351059ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-210-161.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:40:59.565144       1 request.go:557] Throttling request took 99.540026ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:40:59.765149       1 request.go:557] Throttling request took 101.625155ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-209-116.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:40:59.965160       1 request.go:557] Throttling request took 104.098369ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-210-161.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.165155       1 request.go:557] Throttling request took 101.089016ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-136-162.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.365158       1 request.go:557] Throttling request took 102.427223ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-137-48.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.565159       1 request.go:557] Throttling request took 138.368711ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-166-203.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.765144       1 request.go:557] Throttling request took 102.584083ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.965157       1 request.go:557] Throttling request took 163.471416ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-209-116.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.165162       1 request.go:557] Throttling request took 167.637197ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-210-161.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.365147       1 request.go:557] Throttling request took 100.932733ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-136-162.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.565153       1 request.go:557] Throttling request took 102.073953ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-137-48.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.765158       1 request.go:557] Throttling request took 172.430921ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-166-203.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.965172       1 request.go:557] Throttling request took 102.31456ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded

Expected results:
Logs should not get truncated

Additional info:
Upon checking with dev learned that recently synced openshift/descheduler with the upstream base where we switched from klog v1 to v2. It might be related.

Comment 1 Mike Dame 2020-07-17 14:02:32 UTC
Confirmed that something broke with the klog/v2 switch upstream. Opened https://github.com/kubernetes-sigs/descheduler/issues/353 to try to find a fix and if not, that change will be reverted

Comment 2 Mike Dame 2020-07-20 18:36:47 UTC
Update: this will be fixed with the upstream 1.19 rebase

Comment 3 Mike Dame 2020-07-28 19:55:00 UTC
1.19-rc.2 rebase containing this fix: https://github.com/openshift/descheduler/pull/35

Comment 6 zhou ying 2020-08-04 07:08:49 UTC
The issue has fixed with payload: 4.6.0-0.nightly-2020-08-04-002217
[root@dhcp-140-138 ~]# oc get csv
NAME                                                   DISPLAY                     VERSION                 REPLACES   PHASE
clusterkubedescheduleroperator.4.6.0-202008031851.p0   Kube Descheduler Operator   4.6.0-202008031851.p0              Succeeded


[root@dhcp-140-138 ~]# oc logs -f po/cluster-798996655c-kts9c
I0804 06:12:31.132654       1 node.go:45] node lister returned empty list, now fetch directly
I0804 06:12:31.139665       1 duplicates.go:48] Processing node: "ip-10-0-135-127.us-east-2.compute.internal"
I0804 06:12:31.333707       1 evictions.go:105] Pod aws-ebs-csi-driver-node-9r5t9 in namespace openshift-cluster-csi-drivers is not evictable: Pod lacks an eviction annotation and fails the following checks: [pod is critical, pod is a DaemonSet pod, pod has local storage and descheduler is not configured with --evict-local-storage-pods]
....

Comment 8 errata-xmlrpc 2020-10-27 16:15:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196