Description of problem: We have identified symptoms similar to this https://bugzilla.redhat.com/show_bug.cgi?id=1787148 In this case, the customer is using cri-o as a runtime, hence it's not clear if the same fix that was given for docker in the above BZ would help here. Version-Release number of selected component (if applicable): # rpm -qa | grep systemd systemd-libs-219-62.el7_6.5.x86_64 systemd-sysv-219-62.el7_6.5.x86_64 oci-systemd-hook-0.1.18-3.git8787307.el7_6.x86_64 systemd-219-62.el7_6.5.x86_64 --- cri-tools-1.11.1-2.rhaos3.11.gitedabfb5.el7.x86_64 criu-3.12-2.el7.x86_64 cri-o-1.11.16-0.8.dev.rhaos3.11.git6d43aae.el7.x86_64 --- atomic-openshift-docker-excluder-3.11.188-1.git.0.db0eaa8.el7.noarch docker-1.13.1-94.gitb2f74b2.el7.x86_64 docker-client-1.13.1-94.gitb2f74b2.el7.x86_64 --- How reproducible: This is happening randomly Actual results: --- May 06 14:26:39 e2n1-1-worker atomic-openshift-node[16190]: E0506 02:26:39.026250 16190 pod_workers.go:186] Error syncing pod a19eaedc-ae2e-11eb-a2e3-009bfd250fac ("f8331805-7754-4893-ac73-540c1adb4fbf-65025-1620280080-brffp_zen(a19eaedc-ae2e-11eb-a2e3-009bfd250fac)"), skipping: failed to ensure that the pod: a19eaedc-ae2e-11eb-a2e3-009bfd250fac cgroups exist and are correctly applied: failed to create container for [kubepods besteffort poda19eaedc-ae2e-11eb-a2e3-009bfd250fac] : Argument list too long --- ------------ >>> $ cat e1n4-1-worker/e1n4-1-worker.atomic.log | grep -i "Argument list too long" | wc -l 280244 $ cat e2n1-1-worker/e2n1-1-worker.atomic.log | grep -i "Argument list too long" | wc -l 489776 $ cat e2n2-1-worker/e2n2-1-worker.atomic.log | grep -i "Argument list too long" | wc -l 362871 Expected results: The pod should not get stuck in Container Creating / Error phases. Additional info: When the problem was occurring, they deleted the affected Pods (Terminating status) using "oc delete pod xxx --force --grace-period=0" command, which recreated the pods with healthy status.
fix for cri-o is attached
oops forgot about this, I'll try to push the PR through
ci is still wonky, hopefully I'll have cycles to fix it next sprint
alas I did not
pr merged
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 3.11.z security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3193