Hide Forgot
Description of problem: Kubelet log level was increased to 4 to air in CI debugging via: https://github.com/openshift/machine-config-operator/pull/1672 and https://bugzilla.redhat.com/show_bug.cgi?id=1828622 This is putting an enormous burden on cluster logging and customer storage requirements. Looking at some of our larger clusters, daily operations logs are exceeding hundreds of GB per day. A OpenShift 4.5 clusters with 68 sees about 500 GB per day. Log level 4 is not sustainable for production clusters. Longer term, we need a more dynamic mechanism to be able to tune the kubelet log level globally and on a per node basis for debugging. Version-Release number of selected component (if applicable): 4.6.x Master How reproducible: Always Steps to Reproduce: 1. Look at the default KUBELET_LOG_LEVEL 2. 3. Actual results: Massive amount of logs Expected results: Logs need to be manageable Additional info:
Verified on 4.7.0-0.nightly-2020-11-10-093436 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-11-10-093436 True False 4h6m Cluster version is 4.7.0-0.nightly-2020-11-10-093436 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-143-20.us-west-2.compute.internal Ready master 4h35m v1.19.2+9c2f84c ip-10-0-154-71.us-west-2.compute.internal Ready worker 4h22m v1.19.2+9c2f84c ip-10-0-171-153.us-west-2.compute.internal Ready master 4h31m v1.19.2+9c2f84c ip-10-0-189-196.us-west-2.compute.internal Ready worker 4h22m v1.19.2+9c2f84c ip-10-0-194-240.us-west-2.compute.internal Ready worker 4h22m v1.19.2+9c2f84c ip-10-0-209-84.us-west-2.compute.internal Ready master 4h31m v1.19.2+9c2f84c $ oc debug node/ip-10-0-154-71.us-west-2.compute.internal Starting pod/ip-10-0-154-71us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# systemctl cat kubelet.service # /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Wants=rpc-statd.service network-online.target crio.service After=network-online.target crio.service [Service] Type=notify ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state Environment="KUBELET_LOG_LEVEL=3" EnvironmentFile=/etc/os-release EnvironmentFile=-/etc/kubernetes/kubelet-workaround EnvironmentFile=-/etc/kubernetes/kubelet-env ExecStart=/usr/bin/hyperkube \ kubelet \ --config=/etc/kubernetes/kubelet.conf \ --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --runtime-cgroups=/system.slice/crio.service \ --node-labels=node-role.kubernetes.io/worker,node.openshift.io/os_id=${ID} \ --node-ip=${KUBELET_NODE_IP} \ --minimum-container-ttl-duration=6m0s \ --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ --cloud-provider=aws \ \ --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:294d83df14138faee411ef07d6ce2d19d62d636cf313817f1093f9ca9b93f3bc \ --v=${KUBELET_LOG_LEVEL} Restart=always RestartSec=10 [Install] WantedBy=multi-user.target # /etc/systemd/system/kubelet.service.d/10-mco-default-env.conf [Unit] sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... $ oc debug node/ip-10-0-143-20.us-west-2.compute.internal -- chroot /host systemctl cat kubelet.service Starting pod/ip-10-0-143-20us-west-2computeinternal-debug ... To use host binaries, run `chroot /host` # /etc/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet Wants=rpc-statd.service network-online.target crio.service After=network-online.target crio.service [Service] Type=notify ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state Environment="KUBELET_LOG_LEVEL=3" EnvironmentFile=/etc/os-release EnvironmentFile=-/etc/kubernetes/kubelet-workaround EnvironmentFile=-/etc/kubernetes/kubelet-env ExecStart=/usr/bin/hyperkube \ kubelet \ --config=/etc/kubernetes/kubelet.conf \ --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \ --kubeconfig=/var/lib/kubelet/kubeconfig \ --container-runtime=remote \ --container-runtime-endpoint=/var/run/crio/crio.sock \ --runtime-cgroups=/system.slice/crio.service \ --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \ --node-ip=${KUBELET_NODE_IP} \ --minimum-container-ttl-duration=6m0s \ --cloud-provider=aws \ --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \ \ --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \ --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:294d83df14138faee411ef07d6ce2d19d62d636cf313817f1093f9ca9b93f3bc \ --v=${KUBELET_LOG_LEVEL} Restart=always RestartSec=10 [Install] WantedBy=multi-user.target # /etc/systemd/system/kubelet.service.d/10-mco-default-env.conf [Unit] Removing debug pod ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633