Bug 2072957
| Summary: | ContainerCreateError loop leads to several thousand empty logfiles in the file system | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Rainer Beyel <rbeyel> | |
| Component: | Node | Assignee: | Peter Hunt <pehunt> | |
| Node sub component: | Kubelet | QA Contact: | MinLi <minmli> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | unspecified | |||
| Priority: | unspecified | CC: | gferrazs, jdee, minmli, pehunt, rphillips, sambekar | |
| Version: | 4.8 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.11.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2088478 (view as bug list) | Environment: | ||
| Last Closed: | 2022-08-10 11:04:38 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2088478, 2088480, 2088482 | |||
*** Bug 2052450 has been marked as a duplicate of this bug. *** verified! % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-05-10-182617 True False 85m Cluster version is 4.10.0-0.nightly-2022-05-10-182617 % oc get pod NAME READY STATUS RESTARTS AGE demo-pod 0/1 CreateContainerError 0 6m49s sh-4.4# find /var/log/pods/ -type f | grep demo sh-4.4# when ContainerCreateError loop, not generate any empty logfiles. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Description of problem: We see deployments in our production clusters being stuck in a `ContainerCreateError` loop which leads to hundreds/thousands of empty logfiles in the filesystem which then again leads to high memory/cpu load and open file handles on the log ingester process which ultimately leads to severe performance and stability issues on the cluster node itself, if not mitigated fast by manually scaling down the rogue deployment. Version-Release number of selected component (if applicable): OCP 4.8.34 How reproducible: # cat troubleshoot_CreateContainerError.yml apiVersion: v1 kind: Pod metadata: name: demo-pod spec: containers: - name: demo-container image: quay.io/bitnami/nginx command: ["/bin/bla"] resources: limits: memory: "100Mi" cpu: "1" requests: memory: "100Mi" cpu: "1" # oc apply -f troubleshoot_CreateContainerError.yml ... # oc get pods NAME READY STATUS RESTARTS AGE demo-pod 0/1 CreateContainerError 0 24m [core@xxxx ~]$ find /var/log/pods -type f | grep demo-pod /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/1ebb9bcde6fcdc87d049eeacfb8448bfe165499a486001b97b4b0a043081ea5f.log /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/demo-container/0.log ... /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/demo-container/10.log ... /var/log/pods/rainer_demo-pod_e964ff25-07f9-40ec-89da-1b378ea2fb89/demo-container/40.log ... Actual results: - "ContainerCreateError" loop which leads to hundreds/thousands of empty logfiles in the filesystem - Which then again leads to high memory/cpu load and open file handles on the log ingester process - Which ultimately leads to severe performance and stability issues on the cluster node itself - If not mitigated fast by manually scaling down the rogue deployment Expected results: - No empty logfiles are created - No severe performance and stability issues Additional info: Creating this new bug as erratum from bug 2060494 doesn't solve the issue. Reference: Initially bug 2052450 was created, which was CLOSED DUPLICATE of bug 2042175, which was CLOSED DUPLICATE of bug 2060494, which resulted in an erratum, erratum doesn't fix the issue.