Bug 2080253

Summary: deleted pods generate log spam on nodes (cadvisor_stats_provider.go:146] "Unable to fetch pod log stats")
Product: OpenShift Container Platform Reporter: Andrei Neagoe <andrei>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED NEXTRELEASE Docs Contact:
Severity: low    
Priority: low CC: aos-bugs, harpatil, jhonce, pehunt
Version: 4.10   
Target Milestone: ---   
Target Release: 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-05 20:17:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrei Neagoe 2022-04-29 11:14:31 UTC
Description of problem:
It "seems" that deleted pods are not properly reaped on nodes. The observed behavior is that after deletion, we see the pods added to a list of recurring events that keeps spamming the logs, which looks like this:

Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005055    1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-multus_ip-reconciler-27519450-mwwrd_f25de9a9-c6a7-423d-93e9-cdd3c7910868: no such file or directory" pod="openshift-multus/ip-reconciler-27519450-mwwrd"
Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005072    1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-operator-lifecycle-manager_collect-profiles-27519390-lqz2x_73db1aae-8cd8-47f9-b1e0-84d190298903: no such file or directory" pod="openshift-operator-lifecycle-manager/collect-profiles-27519390-lqz2x"
Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005088    1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-multus_ip-reconciler-27520170-2cn2p_1aff54f0-4ae4-4de4-960d-2e65bcd6f755: no such file or directory" pod="openshift-multus/ip-reconciler-27520170-2cn2p"
Apr 29 08:46:43 worker-01.okdtest.example.com hyperkube[1823]: E0429 08:46:43.005102    1823 cadvisor_stats_provider.go:146] "Unable to fetch pod log stats" err="open /var/log/pods/openshift-multus_ip-reconciler-27519855-q7hpx_ffb55c6b-39da-4b0a-be81-dc7980b98265: no such file or directory" pod="openshift-multus/ip-reconciler-27519855-q7hpx"

As pods come and go, the list continues to grow. These events are reported about 14-15 times per minute, causing excessive log spam. 


Version-Release number of selected component (if applicable):
---
4.10.0-0.okd-2022-04-23-131357
---
crio version 1.23.2
Version:          1.23.2
GitCommit:        c0b2474b80fd0844b883729bda88961bed7b472b
GitTreeState:     clean
BuildDate:        2022-04-14T15:21:44Z
GoVersion:        go1.17.5
Compiler:         gc
Platform:         linux/amd64
Linkmode:         dynamic
BuildTags:        seccomp, selinux
SeccompEnabled:   true
AppArmorEnabled:  false
---

How reproducible:
Always

Steps to Reproduce:
1. Delete a running pod
2. Observe logs on node that was running pod

Actual results:


Expected results:


Additional info:
must-gather available at https://drive.google.com/file/d/1_uzVOyTJp50VZXzRVa2c-413771QJfN0/view?usp=sharing

Comment 1 Andrei Neagoe 2022-04-29 11:17:51 UTC
A node reboot clears the current list of events but as pods get created/destroyed on a node, the list builds up again.

Comment 2 Andrei Neagoe 2022-04-29 11:27:38 UTC
I've also found an upstream ticket about what we're seeing on this version: https://github.com/kubernetes/kubernetes/issues/106957