1858292 – [Descheduler] Descheduler logs get truncated

Bug 1858292 - [Descheduler] Descheduler logs get truncated

Summary: [Descheduler] Descheduler logs get truncated

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Mike Dame
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-07-17 12:40 UTC by RamaKasturi
Modified:	2020-10-27 16:16 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:15:30 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift descheduler pull 35	0	None	closed	Bug 1855568: Bump to upstream k8s 1.19-rc.2	2021-01-20 09:51:54 UTC
Red Hat Product Errata	RHBA-2020:4196	0	None	None	None	2020-10-27 16:16:01 UTC

Description RamaKasturi 2020-07-17 12:40:25 UTC

Description of problem:
I see that descheduler logs get truncated. 
Previous version of log: http://pastebin.test.redhat.com/885328
current version of log: http://pastebin.test.redhat.com/885329
configmap for the cluster: http://pastebin.test.redhat.com/885333

Version-Release number of selected component (if applicable):
4.6
"quay.io/openshift/origin-cluster-kube-descheduler-operator:latest"
"quay.io/openshift/origin-descheduler:latest"

How reproducible:
always 

Steps to Reproduce:
1. Install latest descheduler from master
2. Enable all strategies
3. Run command oc logs -f <cluster-XXX>

Actual results:
I see that logs of descheduler gets truncated and below is what is shown
pute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.264997       1 request.go:557] Throttling request took 170.200367ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-166-203.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.465021       1 request.go:557] Throttling request took 101.944454ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.664981       1 request.go:557] Throttling request took 102.624764ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-209-116.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:39:56.864972       1 request.go:557] Throttling request took 174.351059ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-210-161.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:40:59.565144       1 request.go:557] Throttling request took 99.540026ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:40:59.765149       1 request.go:557] Throttling request took 101.625155ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-209-116.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:40:59.965160       1 request.go:557] Throttling request took 104.098369ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-210-161.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.165155       1 request.go:557] Throttling request took 101.089016ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-136-162.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.365158       1 request.go:557] Throttling request took 102.427223ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-137-48.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.565159       1 request.go:557] Throttling request took 138.368711ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-166-203.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.765144       1 request.go:557] Throttling request took 102.584083ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:00.965157       1 request.go:557] Throttling request took 163.471416ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-209-116.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.165162       1 request.go:557] Throttling request took 167.637197ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-210-161.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.365147       1 request.go:557] Throttling request took 100.932733ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-136-162.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.565153       1 request.go:557] Throttling request took 102.073953ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-137-48.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.765158       1 request.go:557] Throttling request took 172.430921ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-166-203.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded
I0717 11:41:01.965172       1 request.go:557] Throttling request took 102.31456ms, request: GET:https://172.30.0.1:443/api/v1/pods?fieldSelector=spec.nodeName%3Dip-10-0-185-126.us-east-2.compute.internal%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded

Expected results:
Logs should not get truncated

Additional info:
Upon checking with dev learned that recently synced openshift/descheduler with the upstream base where we switched from klog v1 to v2. It might be related.

Comment 1 Mike Dame 2020-07-17 14:02:32 UTC

Confirmed that something broke with the klog/v2 switch upstream. Opened https://github.com/kubernetes-sigs/descheduler/issues/353 to try to find a fix and if not, that change will be reverted

Comment 2 Mike Dame 2020-07-20 18:36:47 UTC

Update: this will be fixed with the upstream 1.19 rebase

Comment 3 Mike Dame 2020-07-28 19:55:00 UTC

1.19-rc.2 rebase containing this fix: https://github.com/openshift/descheduler/pull/35

Comment 6 zhou ying 2020-08-04 07:08:49 UTC

The issue has fixed with payload: 4.6.0-0.nightly-2020-08-04-002217
[root@dhcp-140-138 ~]# oc get csv
NAME                                                   DISPLAY                     VERSION                 REPLACES   PHASE
clusterkubedescheduleroperator.4.6.0-202008031851.p0   Kube Descheduler Operator   4.6.0-202008031851.p0              Succeeded


[root@dhcp-140-138 ~]# oc logs -f po/cluster-798996655c-kts9c
I0804 06:12:31.132654       1 node.go:45] node lister returned empty list, now fetch directly
I0804 06:12:31.139665       1 duplicates.go:48] Processing node: "ip-10-0-135-127.us-east-2.compute.internal"
I0804 06:12:31.333707       1 evictions.go:105] Pod aws-ebs-csi-driver-node-9r5t9 in namespace openshift-cluster-csi-drivers is not evictable: Pod lacks an eviction annotation and fails the following checks: [pod is critical, pod is a DaemonSet pod, pod has local storage and descheduler is not configured with --evict-local-storage-pods]
....

Comment 8 errata-xmlrpc 2020-10-27 16:15:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.