Bug 1965900 - Pods are getting stuck in ContainerCreating/ContainerCreateError/Terminating status
Summary: Pods are getting stuck in ContainerCreating/ContainerCreateError/Terminating ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 3.11.z
Assignee: Peter Hunt
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-31 05:24 UTC by Rutvik
Modified: 2024-10-01 18:23 UTC (History)
1 user (show)

Fixed In Version: cri-o-1.11.16-0.15.rhaos3.11.gitd7a399f.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-25 15:16:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 4958 0 None open [1.11] set inactive-or-failed collectmode if appropriate 2021-08-09 18:04:05 UTC
Red Hat Product Errata RHSA-2021:3193 0 None None None 2021-08-25 15:17:05 UTC

Description Rutvik 2021-05-31 05:24:28 UTC
Description of problem:

We have identified symptoms similar to this https://bugzilla.redhat.com/show_bug.cgi?id=1787148

In this case, the customer is using cri-o as a runtime, hence it's not clear if the same fix that was given for docker in the above BZ would help here.


Version-Release number of selected component (if applicable):
# rpm -qa | grep systemd
systemd-libs-219-62.el7_6.5.x86_64
systemd-sysv-219-62.el7_6.5.x86_64
oci-systemd-hook-0.1.18-3.git8787307.el7_6.x86_64
systemd-219-62.el7_6.5.x86_64
---
cri-tools-1.11.1-2.rhaos3.11.gitedabfb5.el7.x86_64
criu-3.12-2.el7.x86_64
cri-o-1.11.16-0.8.dev.rhaos3.11.git6d43aae.el7.x86_64
---
atomic-openshift-docker-excluder-3.11.188-1.git.0.db0eaa8.el7.noarch
docker-1.13.1-94.gitb2f74b2.el7.x86_64 
docker-client-1.13.1-94.gitb2f74b2.el7.x86_64
---


How reproducible:
This is happening randomly

Actual results:

---
May 06 14:26:39 e2n1-1-worker atomic-openshift-node[16190]: E0506 02:26:39.026250   16190 pod_workers.go:186] Error syncing pod a19eaedc-ae2e-11eb-a2e3-009bfd250fac ("f8331805-7754-4893-ac73-540c1adb4fbf-65025-1620280080-brffp_zen(a19eaedc-ae2e-11eb-a2e3-009bfd250fac)"), skipping: failed to ensure that the pod: a19eaedc-ae2e-11eb-a2e3-009bfd250fac cgroups exist and are correctly applied: failed to create container for [kubepods besteffort poda19eaedc-ae2e-11eb-a2e3-009bfd250fac] : Argument list too long
---

------------ >>>
$ cat e1n4-1-worker/e1n4-1-worker.atomic.log | grep -i "Argument list too long" | wc -l
280244

$ cat e2n1-1-worker/e2n1-1-worker.atomic.log |  grep -i "Argument list too long" | wc -l
489776

$ cat e2n2-1-worker/e2n2-1-worker.atomic.log |  grep -i "Argument list too long" | wc -l
362871


Expected results:
The pod should not get stuck in Container Creating / Error phases.

Additional info:
When the problem was occurring, they deleted the affected Pods (Terminating status) using "oc delete pod xxx --force --grace-period=0" command, which recreated the pods with healthy status.

Comment 1 Peter Hunt 2021-06-01 18:58:27 UTC
fix for cri-o is attached

Comment 2 Peter Hunt 2021-06-11 18:57:08 UTC
oops forgot about this, I'll try to push the PR through

Comment 3 Peter Hunt 2021-07-02 20:36:45 UTC

ci is still wonky, hopefully I'll have cycles to fix it next sprint

Comment 4 Peter Hunt 2021-07-23 19:59:18 UTC
alas I did not

Comment 5 Peter Hunt 2021-08-09 18:04:15 UTC
pr merged

Comment 10 errata-xmlrpc 2021-08-25 15:16:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 3.11.z security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3193


Note You need to log in before you can comment on or make changes to this bug.