Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1866045

Summary: Event logs are flooded with "failed to get pod stats: failed to get imageFs info: non-existent label "crio-images"" messages
Product: OpenShift Container Platform Reporter: Devon <dshumake>
Component: NodeAssignee: Peter Hunt <pehunt>
Status: CLOSED DUPLICATE QA Contact: MinLi <minmli>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.3.zCC: aos-bugs, cmarches, gekulkar, jokerman, mharri, pehunt
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-08-20 13:43:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Devon 2020-08-04 17:49:52 UTC
Description of problem:

This is essentially a recreation of bug 1741608 for 4.3 as the previous bug was closed and told to open a new bug if we saw the issue in supported versions.

apiVersion: v1
items:
- apiVersion: v1
  count: 1914
  eventTime: null
  firstTimestamp: "2020-07-28T23:11:25Z"
  involvedObject:
    kind: Node
    name: ip-10-10-0-156
    uid: ip-10-10-0-156
  kind: Event
  lastTimestamp: "2020-08-04T14:36:25Z"
  message: 'failed to get imageFs info: non-existent label "crio-images"'
  metadata:
    creationTimestamp: "2020-07-28T23:11:25Z"
    name: ip-10-10-0-156.16260d6e422596ca
    namespace: default
    resourceVersion: "108382042"
    selfLink: /api/v1/namespaces/default/events/ip-10-10-0-156.16260d6e422596ca
    uid: f78588f6-903a-408e-b2ae-1b11b7bb9c89
  reason: ImageGCFailed
  reportingComponent: ""
  reportingInstance: ""
  source:
    component: kubelet
    host: ip-10-10-0-156
  type: Warning
- apiVersion: v1
  count: 1994
  eventTime: null
  firstTimestamp: "2020-07-28T16:31:34Z"
  involvedObject:
    kind: Node
    name: ip-10-10-0-196
    uid: ip-10-10-0-196
  kind: Event
  lastTimestamp: "2020-08-04T14:36:34Z"
  message: 'failed to get imageFs info: non-existent label "crio-images"'
  metadata:
    creationTimestamp: "2020-07-28T16:31:34Z"
    name: ip-10-10-0-196.1625f79c691b4323
    namespace: default
    resourceVersion: "108382197"
    selfLink: /api/v1/namespaces/default/events/ip-10-10-0-196.1625f79c691b4323
    uid: 7182bf45-14a4-4f46-81cc-f671d69814c9
  reason: ImageGCFailed
  reportingComponent: ""
  reportingInstance: ""
  source:
    component: kubelet
    host: ip-10-10-0-196
  type: Warning


Version-Release number of selected component (if applicable):

4.3.9

Steps to Reproduce:

It seems as though this was caused by introducing a new ContainerRuntimeConfig changing the max PID limit of containers in this case. Once this config was added, the events log and journal was flooded with these messages.

Comment 1 Peter Hunt 2020-08-06 16:28:14 UTC
what's `systemctl status crio`? We have seen this problem before when kubelet comes up before crio does. the fact you applied a ctrcfg implies to me crio may have had trouble with it and failed to start

Comment 2 Devon 2020-08-06 16:55:16 UTC
Output from a randomly selected node logging the error is:


[daniel.s.tarleton@ip-10-10-7-110 ~]$ sudo systemctl status crio
● crio.service - Open Container Initiative Daemon
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-default-env.conf
   Active: active (running) since Tue 2020-07-28 16:52:44 UTC; 1 weeks 1 days ago
     Docs: https://github.com/cri-o/cri-o
 Main PID: 1392 (crio)
    Tasks: 44
   Memory: 6.8G
      CPU: 6h 4min 43.254s
   CGroup: /system.slice/crio.service
           └─1392 /usr/bin/crio --enable-metrics=true --metrics-port=9537

Aug 05 00:49:12 ip-10-10-7-110 crio[1392]: container not running
Aug 05 00:49:12 ip-10-10-7-110 crio[1392]: 2020-08-05T00:49:12Z [error] SetNetworkStatus: failed to query the pod logging-filebeat-kl9ld in out of cluster comm: pods "logging-filebeat-kl9ld" not found
Aug 05 00:49:12 ip-10-10-7-110 crio[1392]: 2020-08-05T00:49:12Z [error] Multus: error unsetting the networks status: SetNetworkStatus: failed to query the pod logging-filebeat-kl9ld in out of cluster comm: pods "logging-filebeat-kl9ld" not found
Aug 05 00:49:12 ip-10-10-7-110 crio[1392]: 2020-08-05T00:49:12Z [verbose] Del: co-monitoring-plat-01:logging-filebeat-kl9ld:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}
Aug 06 00:49:15 ip-10-10-7-110 crio[1392]: 2020-08-06T00:49:15Z [verbose] Add: co-monitoring-plat-01:logging-filebeat-dt8lg:openshift-sdn:eth0 {"cniVersion":"0.3.1","interfaces":[{"name":"eth0","sandbox":"/proc/1011768/ns/net"}],"ips":[{"version":"4","interface":0,"address":"10.130.52.17/23"}],"routes":[{"dst":"0.0.0.0/0","gw":"10.130.52.1"},{"dst":"224.0.0.0/4"},{"dst":"10.128.0.0/14"}],"dns":{}}
Aug 06 00:49:15 ip-10-10-7-110 crio[1392]: time="2020-08-06T00:49:15Z" level=error msg="container not running"
Aug 06 00:49:15 ip-10-10-7-110 crio[1392]: container not running
Aug 06 00:49:15 ip-10-10-7-110 crio[1392]: 2020-08-06T00:49:15Z [error] SetNetworkStatus: failed to query the pod logging-filebeat-469bp in out of cluster comm: pods "logging-filebeat-469bp" not found
Aug 06 00:49:15 ip-10-10-7-110 crio[1392]: 2020-08-06T00:49:15Z [error] Multus: error unsetting the networks status: SetNetworkStatus: failed to query the pod logging-filebeat-469bp in out of cluster comm: pods "logging-filebeat-469bp" not found
Aug 06 00:49:15 ip-10-10-7-110 crio[1392]: 2020-08-06T00:49:15Z [verbose] Del: co-monitoring-plat-01:logging-filebeat-469bp:openshift-sdn:eth0 {"cniVersion":"0.3.1","name":"openshift-sdn","type":"openshift-sdn"}

Comment 3 Peter Hunt 2020-08-06 17:09:12 UTC
well, cri-o is running, that's good. 
Though, I really should have asked for as much of the kubelet and crio log as you can get me. Could you attach
`journalctl -u crio -u kubelet`

Comment 4 Devon 2020-08-07 13:48:46 UTC
`journalctl -u crio -u kubelet` was attached to the case, it is almost 1 Gb so I cannot attach it to the bug, if you need me to attach it somewhere else outside the case for you to access let me know.

The case number is 02714905.

Thanks!
Devon

Comment 5 Peter Hunt 2020-08-10 15:18:39 UTC
*** Bug 1867710 has been marked as a duplicate of this bug. ***

Comment 6 Peter Hunt 2020-08-20 13:43:46 UTC
for the purpose of bookkeeping, I'm going to mark this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1866702 (Despite being reported on different versions) because I have little reason to believe it's not the same issue

*** This bug has been marked as a duplicate of bug 1866702 ***