Bug 1895385 - Revert KUBELET_LOG_LEVEL back to level 3
Summary: Revert KUBELET_LOG_LEVEL back to level 3
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.7.0
Assignee: Kirsten Garrison
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks: 1896329
TreeView+ depends on / blocked
 
Reported: 2020-11-06 14:38 UTC by Matthew Robson
Modified: 2023-12-15 20:02 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1896329 1896332 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:31:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 2211 0 None closed Bug 1895385: Drop kubelet logging back down to level 3 2021-02-17 11:08:36 UTC
Red Hat Knowledge Base (Solution) 4619431 0 None None None 2021-01-19 16:12:36 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:32:00 UTC

Description Matthew Robson 2020-11-06 14:38:48 UTC
Description of problem:

Kubelet log level was increased to 4 to air in CI debugging via: https://github.com/openshift/machine-config-operator/pull/1672 and https://bugzilla.redhat.com/show_bug.cgi?id=1828622

This is putting an enormous burden on cluster logging and customer storage requirements.

Looking at some of our larger clusters, daily operations logs are exceeding hundreds of GB per day.

A OpenShift 4.5 clusters with 68 sees about 500 GB per day. 

Log level 4 is not sustainable for production clusters.

Longer term, we need a more dynamic mechanism to be able to tune the kubelet log level globally and on a per node basis for debugging.


Version-Release number of selected component (if applicable):
4.6.x
Master


How reproducible:
Always

Steps to Reproduce:
1. Look at the default KUBELET_LOG_LEVEL
2.
3.

Actual results:
Massive amount of logs


Expected results:
Logs need to be manageable 

Additional info:

Comment 6 Michael Nguyen 2020-11-10 21:37:37 UTC
Verified on 4.7.0-0.nightly-2020-11-10-093436


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-11-10-093436   True        False         4h6m    Cluster version is 4.7.0-0.nightly-2020-11-10-093436
$ oc get nodes
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-143-20.us-west-2.compute.internal    Ready    master   4h35m   v1.19.2+9c2f84c
ip-10-0-154-71.us-west-2.compute.internal    Ready    worker   4h22m   v1.19.2+9c2f84c
ip-10-0-171-153.us-west-2.compute.internal   Ready    master   4h31m   v1.19.2+9c2f84c
ip-10-0-189-196.us-west-2.compute.internal   Ready    worker   4h22m   v1.19.2+9c2f84c
ip-10-0-194-240.us-west-2.compute.internal   Ready    worker   4h22m   v1.19.2+9c2f84c
ip-10-0-209-84.us-west-2.compute.internal    Ready    master   4h31m   v1.19.2+9c2f84c
$ oc debug node/ip-10-0-154-71.us-west-2.compute.internal
Starting pod/ip-10-0-154-71us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
Environment="KUBELET_LOG_LEVEL=3"
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
      --cloud-provider=aws \
       \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:294d83df14138faee411ef07d6ce2d19d62d636cf313817f1093f9ca9b93f3bc \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-mco-default-env.conf
[Unit]
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...
$ oc debug node/ip-10-0-143-20.us-west-2.compute.internal -- chroot /host systemctl cat kubelet.service
Starting pod/ip-10-0-143-20us-west-2computeinternal-debug ...
To use host binaries, run `chroot /host`
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target crio.service
After=network-online.target crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
Environment="KUBELET_LOG_LEVEL=3"
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider=aws \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:294d83df14138faee411ef07d6ce2d19d62d636cf313817f1093f9ca9b93f3bc \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-mco-default-env.conf
[Unit]

Removing debug pod ...

Comment 12 errata-xmlrpc 2021-02-24 15:31:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.