Bug 1705657
There is nothing the kubelet can do here. It will re-attempt GC later, but retrying individual calls to the CRI is not something that is done, or likely to fix anything. Sending to Containers. Why not? For example interactions with the GitHub API that are both: 1. known to be faulty with some regularity 2. highly important are re-tried in Prow (think merges). Not doing a re-try (maybe for specific errors?) causes evictions. You can never be certain the runtime underneath is not having a hiccup, but nuking pods on the cluster seems like a high consequence result of a hiccup when all that must happen is a re-try. Created attachment 1566281 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566282 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566283 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566284 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566285 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566286 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566287 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566288 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566289 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566290 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566291 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566292 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566294 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566295 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566296 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
Created attachment 1566297 [details]
logs from kubelet and docker during a GC interval that resulted in eviction
|
When the kubelet begins to meet an eviction threshold on ephemeral storage, it kicks off container and image garbage collection. However, there are a number of calls that must succeed for garbage collection to even begin, like listing containers or images. When these calls fail, the entire garbage collection is aborted and the kubelet begins to evict pods to reclaim storage. This is not preferred, and it would be much better if the kubelet were to retry these calls as they are high consequence. When the calls do not fail, GC runs correctly and no evictions must occur. Example logs: remote_runtime.go:262] ListContainers with filter &ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded kuberuntime_container.go:329] getKubeletContainers failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded eviction_manager.go:414] eviction manager: unexpected error when attempting to reduce ephemeral-storage pressure: rpc error: code = DeadlineExceeded desc = context deadline exceeded eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage image_gc_manager.go:181] [imageGCManager] Failed to monitor images: rpc error: code = DeadlineExceeded desc = context deadline exceeded remote_runtime.go:262] ListContainers with filter &ContainerFilter{Id:,State:nil,PodSandboxId:,LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded kuberuntime_container.go:329] getKubeletContainers failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded kubelet.go:1216] Container garbage collection failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded eviction_manager.go:414] eviction manager: unexpected error when attempting to reduce ephemeral-storage pressure: rpc error: code = DeadlineExceeded desc = context deadline exceeded eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage remote_runtime.go:169] ListPodSandbox with filter nil from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded kuberuntime_sandbox.go:198] ListPodSandbox failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded eviction_manager.go:414] eviction manager: unexpected error when attempting to reduce ephemeral-storage pressure: rpc error: code = DeadlineExceeded desc = context deadline exceeded eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage