Description of problem: It "seems" that deleted pods are not properly reaped on nodes. The observed behavior is that after deletion, we see the pods added to a list of recurring events that keeps spamming the logs, which looks like this: Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005055 1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-multus_ip-reconciler-27519450-mwwrd_f25de9a9-c6a7-423d-93e9-cdd3c7910868: no such file or directory" pod="openshift-multus/ip-reconciler-27519450-mwwrd" Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005072 1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-operator-lifecycle-manager_collect-profiles-27519390-lqz2x_73db1aae-8cd8-47f9-b1e0-84d190298903: no such file or directory" pod="openshift-operator-lifecycle-manager/collect-profiles-27519390-lqz2x" Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005088 1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-multus_ip-reconciler-27520170-2cn2p_1aff54f0-4ae4-4de4-960d-2e65bcd6f755: no such file or directory" pod="openshift-multus/ip-reconciler-27520170-2cn2p" Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005102 1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-multus_ip-reconciler-27519855-q7hpx_ffb55c6b-39da-4b0a-be81-dc7980b98265: no such file or directory" pod="openshift-multus/ip-reconciler-27519855-q7hpx" As pods come and go, the list continues to grow. These events are reported about 14-15 times per minute, causing excessive log spam. Version-Release number of selected component (if applicable): --- 4.10.0-0.okd-2022-04-23-131357 --- crio version 1.23.2 Version: 1.23.2 GitCommit: c0b2474b80fd0844b883729bda88961bed7b472b GitTreeState: clean BuildDate: 2022-04-14T15:21:44Z GoVersion: go1.17.5 Compiler: gc Platform: linux/amd64 Linkmode: dynamic BuildTags: seccomp, selinux SeccompEnabled: true AppArmorEnabled: false --- How reproducible: Always Steps to Reproduce: 1. Delete a running pod 2. Observe logs on node that was running pod Actual results: Expected results: Additional info: must-gather available at https://drive.google.com/file/d/1_uzVOyTJp50VZXzRVa2c-413771QJfN0/view?usp=sharing
A node reboot clears the current list of events but as pods get created/destroyed on a node, the list builds up again.
I've also found an upstream ticket about what we're seeing on this version: https://github.com/kubernetes/kubernetes/issues/106957