Description of problem: We see deployments in our production clusters being stuck in a `ContainerCreateError` loop which leads to hundreds/thousands of empty logfiles in the filesystem which then again leads to high memory/cpu load and open file handles on the log ingester process which ultimately leads to severe performance and stability issues on the cluster node itself, if not mitigated fast by manually scaling down the rogue deployment. Version-Release number of selected component (if applicable): OCP 4.8.34 How reproducible: # cat troubleshoot_CreateContainerError.yml apiVersion: v1 kind: Pod metadata: name: demo-pod spec: containers: - name: demo-container image: quay.io/bitnami/nginx command: ["/bin/bla"] resources: limits: memory: "100Mi" cpu: "1" requests: memory: "100Mi" cpu: "1" # oc apply -f troubleshoot_CreateContainerError.yml ... # oc get pods NAME READY STATUS RESTARTS AGE demo-pod 0/1 CreateContainerError 0 24m [core@xxxx ~]$ find /var/log/pods -type f | grep demo-pod /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/1ebb9bcde6fcdc87d049eeacfb8448bfe165499a486001b97b4b0a043081ea5f.log /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/demo-container/0.log ... /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/demo-container/10.log ... /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/demo-container/40.log ... Actual results: - "ContainerCreateError" loop which leads to hundreds/thousands of empty logfiles in the filesystem - Which then again leads to high memory/cpu load and open file handles on the log ingester process - Which ultimately leads to severe performance and stability issues on the cluster node itself - If not mitigated fast by manually scaling down the rogue deployment Expected results: - No empty logfiles are created - No severe performance and stability issues Additional info: Creating this new bug as erratum from bug 2060494 doesn't solve the issue. Reference: Initially bug 2052450 was created, which was CLOSED DUPLICATE of bug 2042175, which was CLOSED DUPLICATE of bug 2060494, which resulted in an erratum, erratum doesn't fix the issue.
*** Bug 2052450 has been marked as a duplicate of this bug. ***
verified! % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-05-10-182617 True False 85m Cluster version is 4.10.0-0.nightly-2022-05-10-182617 % oc get pod NAME READY STATUS RESTARTS AGE demo-pod 0/1 CreateContainerError 0 6m49s sh-4.4# find /var/log/pods/ -type f | grep demo sh-4.4# when ContainerCreateError loop, not generate any empty logfiles.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069