1895385 – Revert KUBELET_LOG_LEVEL back to level 3

Bug 1895385 - Revert KUBELET_LOG_LEVEL back to level 3

Summary: Revert KUBELET_LOG_LEVEL back to level 3

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Machine Config Operator
Sub Component:
Version:	4.6
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Kirsten Garrison
QA Contact:	Michael Nguyen
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1896329
TreeView+	depends on / blocked

Reported:	2020-11-06 14:38 UTC by Matthew Robson
Modified:	2024-10-01 17:03 UTC (History)
CC List:	20 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1896329 1896332 (view as bug list)
Environment:
Last Closed:	2021-02-24 15:31:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift machine-config-operator pull 2211	None	closed	Bug 1895385: Drop kubelet logging back down to level 3	2021-02-17 11:08:36 UTC
Red Hat Knowledge Base (Solution)	4619431	None	None	None	2021-01-19 16:12:36 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:32:00 UTC

Description Matthew Robson 2020-11-06 14:38:48 UTC

Description of problem:

Kubelet log level was increased to 4 to air in CI debugging via: https://github.com/openshift/machine-config-operator/pull/1672 and https://bugzilla.redhat.com/show_bug.cgi?id=1828622

This is putting an enormous burden on cluster logging and customer storage requirements.

Looking at some of our larger clusters, daily operations logs are exceeding hundreds of GB per day.

A OpenShift 4.5 clusters with 68 sees about 500 GB per day. 

Log level 4 is not sustainable for production clusters.

Longer term, we need a more dynamic mechanism to be able to tune the kubelet log level globally and on a per node basis for debugging.


Version-Release number of selected component (if applicable):
4.6.x
Master


How reproducible:
Always

Steps to Reproduce:
1. Look at the default KUBELET_LOG_LEVEL
2.
3.

Actual results:
Massive amount of logs


Expected results:
Logs need to be manageable 

Additional info:

Comment 6 Michael Nguyen 2020-11-10 21:37:37 UTC

Verified on 4.7.0-0.nightly-2020-11-10-093436


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-10-093436   True        False         4h6m    Cluster version is 4.7.0-0.nightly-2020-11-10-093436
$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-143-20.us-west-2.compute.internal    Ready    master   4h35m   v1.19.2+9c2f84c
ip-10-0-154-71.us-west-2.compute.internal    Ready    worker   4h22m   v1.19.2+9c2f84c
ip-10-0-171-153.us-west-2.compute.internal   Ready    master   4h31m   v1.19.2+9c2f84c
ip-10-0-189-196.us-west-2.compute.internal   Ready    worker   4h22m   v1.19.2+9c2f84c
ip-10-0-194-240.us-west-2.compute.internal   Ready    worker   4h22m   v1.19.2+9c2f84c
ip-10-0-209-84.us-west-2.compute.internal    Ready    master   4h31m   v1.19.2+9c2f84c
$ oc debug node/ip-10-0-154-71.us-west-2.compute.internal
Starting pod/ip-10-0-154-71us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
Environment="KUBELET_LOG_LEVEL=3"
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
      --cloud-provider=aws \
       \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:294d83df14138faee411ef07d6ce2d19d62d636cf313817f1093f9ca9b93f3bc \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-mco-default-env.conf
[Unit]
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...
$ oc debug node/ip-10-0-143-20.us-west-2.compute.internal -- chroot /host systemctl cat kubelet.service
Starting pod/ip-10-0-143-20us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
Environment="KUBELET_LOG_LEVEL=3"
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider=aws \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:294d83df14138faee411ef07d6ce2d19d62d636cf313817f1093f9ca9b93f3bc \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-mco-default-env.conf
[Unit]

Removing debug pod ...

Comment 12 errata-xmlrpc 2021-02-24 15:31:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.

danijel.soldo
erich
Holger.Wolf
jerzhang
john.johansson
kgarriso
lakshmi.ravichandran1
mkrejci
nnosenzo
ocasalsa
rheinzma
rioliu
rphillips
scott.worthington
sreber
ssadhale
steven.barre
tmicheli
travi
wvoesch