Bug 1999273 - [4.8] Kubelet regularly freeze control groups causing issues further down
Summary: [4.8] Kubelet regularly freeze control groups causing issues further down
Keywords:
Status: CLOSED DUPLICATE of bug 1998391
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.8.z
Assignee: Kir Kolyshkin
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On: 1993980
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-30 18:57 UTC by Kir Kolyshkin
Modified: 2021-09-02 02:19 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1993980
Environment:
Last Closed: 2021-08-30 19:34:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Kir Kolyshkin 2021-08-30 18:57:07 UTC
+++ This bug was initially created as a clone of Bug #1993980 +++

[sig-arch] events should not repeat pathologically


Since the 1.22 rebase, we've seen these events repeating:

event happened 47 times, something is wrong: ns/openshift-cluster-node-tuning-operator pod/tuned-9vncf node/ip-10-0-203-226.ec2.internal - reason/FailedCreatePodContainer unable to ensure pod container exists: failed to create container for [kubepods burstable pod4724bbf2-ae65-4296-87a2-f36f56b7cc03] : Unit kubepods-burstable-pod4724bbf2_ae65_4296_87a2_f36f56b7cc03.slice already exists.


see:
https://search.ci.openshift.org/?search=Unit+.*.slice+already+exists&maxAge=48h&context=1&type=bug%2Bjunit&name=4.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job


This has an issue upstream in kubernetes, see https://github.com/kubernetes/kubernetes/issues/104280 for details.

--- Additional comment from Eric Paris on 2021-08-19 16:00:18 UTC ---

This bug sets blocker+ without setting a Target Release. This is an invalid state as it is impossible to determine what is being blocked. Please be sure to set Priority, Severity, and Target Release before you attempt to set blocker+

--- Additional comment from Kir Kolyshkin on 2021-08-23 19:53:13 UTC ---

Fixed in https://github.com/opencontainers/runc/releases/tag/v1.0.2 by https://github.com/opencontainers/runc/pull/3167

--- Additional comment from Kir Kolyshkin on 2021-08-24 01:17:27 UTC ---

Now we need to vendor it to all kubernetes releases

master: https://github.com/kubernetes/kubernetes/pull/104528
  1.22: https://github.com/kubernetes/kubernetes/pull/104529
  1.21: https://github.com/kubernetes/kubernetes/pull/104530

I think that we should not try to bring it to 1.20 (the runc in there does the freeze so it's affected, but the amount of changes to backport is too high).

--- Additional comment from Ryan Phillips on 2021-08-25 14:40:11 UTC ---



--- Additional comment from Kir Kolyshkin on 2021-08-26 21:25:36 UTC ---

The bug is valid for 4.8 and 4.9.

Fixed by
4.9: https://github.com/openshift/kubernetes/pull/910
4.8: https://github.com/openshift/kubernetes/pull/912

--- Additional comment from Kir Kolyshkin on 2021-08-26 21:37:08 UTC ---

Strictly speaking, we have two bugs here.

One is "unit already exists", introduced in runc rc94 and it's fixed in runc 1.0.0. 
Another is "cgroup freeze", introduced in runc rc92, mostly fixed in runc 1.0.1, fully fixed in runc 1.0.2 (or so we hope).

--- Additional comment from Kir Kolyshkin on 2021-08-26 21:45:53 UTC ---

> Another is "cgroup freeze", introduced in runc rc92, mostly fixed in runc 1.0.1, fully fixed in runc 1.0.2 (or so we hope).

Pardon me, this was introduced in rc91 (runc commit b810da149), not rc92.

--- Additional comment from Kir Kolyshkin on 2021-08-27 01:07:21 UTC ---



--- Additional comment from OpenShift Automated Release Tooling on 2021-08-27 21:41:27 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.

--- Additional comment from OpenShift Automated Release Tooling on 2021-08-27 21:41:29 UTC ---

This bug will be shipped at next planned release date of 4.9 if this is not a GA bug.

Comment 1 Kir Kolyshkin 2021-08-30 19:34:49 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1998391

*** This bug has been marked as a duplicate of bug 1998391 ***


Note You need to log in before you can comment on or make changes to this bug.