I am seeing the kubelet repeatedly log the same log line from exited containers from near the end of that container's log stream. Example: current time is around 18:40 UTC: [root@aro-v4-shared-gxqb4-master-1 ~]# journalctl -S '1 hour ago' SYSLOG_IDENTIFIER=hyperkube | cut -d: -f4-|sort |uniq -c |sort -nr|head 109 I1130 14:03:35.562313 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "AggregatorProxyClientCert" 109 I1130 14:03:35.561000 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "InternalLoadBalancerServing" 109 I1130 14:03:35.560982 1 certrotationcontroller.go:556] Shutting down CertRotation 109 I1130 14:03:35.560971 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "LocalhostRecoveryServing" 109 I1130 14:03:35.560967 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeSchedulerClient" 109 I1130 14:03:35.560965 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "ExternalLoadBalancerServing" 109 I1130 14:03:35.560961 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeControllerManagerClient" 109 I1130 14:03:35.560932 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "LocalhostServing" 109 I1130 14:03:35.560928 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "ServiceNetworkServing" 109 I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" These are piles of stale logs from some thing that exited around 14:03:35. [root@aro-v4-shared-gxqb4-master-1 ~]# journalctl -S '1 hour ago' |grep 'client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert"' | head Dec 01 17:42:31 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:42:32 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:43:31 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:43:56 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:44:31 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:45:12 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:45:31 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:46:23 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:46:31 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" Dec 01 17:47:31 aro-v4-shared-gxqb4-master-1 hyperkube[1493]: I1130 14:03:35.560906 1 client_cert_rotation_controller.go:180] Shutting down CertRotationController - "KubeAPIServerToKubeletClientCert" We are seeing this commonly in ARO production on 4.4 (e.g. 4.4.27) and 4.5 (e.g. 4.5.22 prerelease) clusters. I think we need to get this fixed, we need logs to be reliable.
https://github.com/kubernetes/kubernetes/issues/91765 https://github.com/kubernetes/kubernetes/pull/93333
Updating the Version of this BZ to 4.4.27. The patch I pasted is in 4.7. A reproducer pod would be super helpful.
The following pod manifest recreates this issue for me: it's the terminationMessagePolicy: FallbackToLogsOnError that's the difference between this happening and not. apiVersion: v1 kind: Pod metadata: name: logtrouble spec: containers: - name: logtrouble image: busybox args: - sh - -c - "while true; do echo -n 'LOGTROUBLE: '; date; sleep 1; done" terminationMessagePolicy: FallbackToLogsOnError livenessProbe: tcpSocket: port: 80 restartPolicy: Always nodeName: aro-v4-shared-gxqb4-worker-eastus1-xh9ml terminationGracePeriodSeconds: 0
The log gets injected in the pod status field and then subsequently gets printed at this line at log level 3: https://github.com/openshift/origin/blob/release-4.4/vendor/k8s.io/kubernetes/pkg/kubelet/status/status_manager.go#L434 We have a separate PR to refactor the log level, and I will be proposing to set the log level of the kubelet to 2. https://github.com/openshift/machine-config-operator/pull/2262
Why solve the problem once when you can solve it twice https://github.com/kubernetes/kubernetes/pull/97056 Commenting here just for completeness. https://github.com/openshift/machine-config-operator/pull/2262 will still be the official fix for this.
verified! run pod in Comment 4, and check log in version 4.7.0-0.nightly-2020-12-17-201522: sh-4.4# journalctl -S '1 hour ago' SYSLOG_IDENTIFIER=hyperkube | cut -d: -f4-|sort |uniq -c |sort -nr|head 2 W1218 07:26:48.916985 1434 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease 1 08:44 UTC. -- 1 W1218 08:08:39.682254 1434 kubelet_getters.go:300] Path "/var/lib/kubelet/pods/f8971161-785a-4f70-9192-753da4bf6cc6/volumes" does not exist 1 W1218 08:08:38.777575 1434 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease 1 W1218 08:08:38.777405 1434 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease 1 W1218 08:08:37.102257 1434 pod_container_deletor.go:79] Container "9fde818aa1863bffb5ca524f357c98e47080af94726bf39391f271b96ff43289" not found in pod's containers 1 W1218 08:05:09.681799 1434 kubelet_getters.go:300] Path "/var/lib/kubelet/pods/8c06a856-b749-4708-b5cd-ce6ce7397877/volumes" does not exist 1 W1218 08:05:07.749987 1434 pod_container_deletor.go:79] Container "25301ae88902e09b906839544448e23a545605b6009ab7d8963fba1892262328" not found in pod's containers 1 W1218 08:03:18.916752 1434 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease 1 W1218 08:03:18.916746 1434 conversion.go:111] Could not get instant cpu stats: cumulative stats decrease check log in version 4.4.27: sh-4.4# journalctl -S '1 hour ago' SYSLOG_IDENTIFIER=hyperkube | cut -d: -f4-|sort |uniq -c |sort -nr|head 26 LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:16 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:15 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:14 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:13 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:12 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:11 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:10 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:09 UTC 2020 26 LOGTROUBLE: Fri Dec 18 07:54:08 UTC 2020 sh-4.4# journalctl -S '1 hour ago' SYSLOG_IDENTIFIER=hyperkube | grep 'LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020' Dec 18 07:54:18 ip-10-0-150-213 hyperkube[1379]: I1218 07:54:18.835233 1379 status_manager.go:569] Patch status for pod "logtrouble_default(3716a3b0-55c0-444f-a1fd-75e45d11d3d1)" with "{\"metadata\":{\"uid\":\"3716a3b0-55c0-444f-a1fd-75e45d11d3d1\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-12-18T07:54:18Z\",\"message\":\"containers with unready status: [logtrouble]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-12-18T07:54:18Z\",\"message\":\"containers with unready status: [logtrouble]\",\"reason\":\"ContainersNotReady\",\"status\":\"False\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://1fe885942bd0ea37d892fd0ee118144f9f62f64c4b9318e0109e4bd12ed57c84\",\"image\":\"docker.io/library/busybox:latest\",\"imageID\":\"docker.io/library/busybox@sha256:31a54a0cf86d7354788a8265f60ae6acb4b348a67efbcf7c1007dd3cf7af05ab\",\"lastState\":{\"terminated\":{\"containerID\":\"cri-o://1fe885942bd0ea37d892fd0ee118144f9f62f64c4b9318e0109e4bd12ed57c84\",\"exitCode\":137,\"finishedAt\":\"2020-12-18T07:54:18Z\",\"message\":\"LOGTROUBLE: Fri Dec 18 07:53:49 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:50 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:51 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:52 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:53 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:54 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:55 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:56 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:57 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:58 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:59 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:00 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:01 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:02 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:03 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:04 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:05 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:06 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:07 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:08 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:09 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:10 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:11 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:12 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:13 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:14 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:15 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:16 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020\\n\",\"reason\":\"Error\",\"startedAt\":\"2020-12-18T07:53:49Z\"}},\"name\":\"logtrouble\",\"ready\":false,\"restartCount\":13,\"started\":false,\"state\":{\"waiting\":{\"message\":\"back-off 5m0s restarting failed container=logtrouble pod=logtrouble_default(3716a3b0-55c0-444f-a1fd-75e45d11d3d1)\",\"reason\":\"CrashLoopBackOff\"}}}]}}" Dec 18 07:54:18 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:54:33 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:54:46 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:54:57 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:55:09 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:55:23 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:55:37 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:55:49 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:56:01 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:56:15 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:56:30 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:56:44 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:56:57 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:57:11 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:57:26 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:57:40 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:57:51 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:58:06 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:58:18 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:58:31 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:58:42 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:58:55 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:59:07 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:59:18 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:59:20 ip-10-0-150-213 hyperkube[1379]: I1218 07:59:20.630716 1379 status_manager.go:569] Patch status for pod "logtrouble_default(3716a3b0-55c0-444f-a1fd-75e45d11d3d1)" with "{\"metadata\":{\"uid\":\"3716a3b0-55c0-444f-a1fd-75e45d11d3d1\"},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Initialized\"},{\"type\":\"Ready\"},{\"type\":\"ContainersReady\"},{\"type\":\"PodScheduled\"}],\"conditions\":[{\"lastTransitionTime\":\"2020-12-18T07:59:20Z\",\"message\":null,\"reason\":null,\"status\":\"True\",\"type\":\"Ready\"},{\"lastTransitionTime\":\"2020-12-18T07:59:20Z\",\"message\":null,\"reason\":null,\"status\":\"True\",\"type\":\"ContainersReady\"}],\"containerStatuses\":[{\"containerID\":\"cri-o://88148e7367464eeb81cd853522b7533b5c6f16eb1ed0602f62436483a2d27ff7\",\"image\":\"docker.io/library/busybox:latest\",\"imageID\":\"docker.io/library/busybox@sha256:31a54a0cf86d7354788a8265f60ae6acb4b348a67efbcf7c1007dd3cf7af05ab\",\"lastState\":{\"terminated\":{\"containerID\":\"cri-o://1fe885942bd0ea37d892fd0ee118144f9f62f64c4b9318e0109e4bd12ed57c84\",\"exitCode\":137,\"finishedAt\":\"2020-12-18T07:54:18Z\",\"message\":\"LOGTROUBLE: Fri Dec 18 07:53:49 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:50 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:51 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:52 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:53 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:54 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:55 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:56 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:57 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:58 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:53:59 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:00 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:01 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:02 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:03 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:04 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:05 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:06 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:07 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:08 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:09 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:10 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:11 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:12 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:13 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:14 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:15 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:16 UTC 2020\\nLOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020\\n\",\"reason\":\"Error\",\"startedAt\":\"2020-12-18T07:53:49Z\"}},\"name\":\"logtrouble\",\"ready\":true,\"restartCount\":14,\"started\":true,\"state\":{\"running\":{\"startedAt\":\"2020-12-18T07:59:19Z\"}}}]}}" Dec 18 07:59:20 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020 Dec 18 07:59:46 ip-10-0-150-213 hyperkube[1379]: LOGTROUBLE: Fri Dec 18 07:54:17 UTC 2020
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633