Bug 2003199 - CRI-O leaks some children PIDs
Summary: CRI-O leaks some children PIDs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.z
Assignee: Peter Hunt
QA Contact: MinLi
URL:
Whiteboard:
: 2016459 (view as bug list)
Depends On: 2003197
Blocks: 2009752
TreeView+ depends on / blocked
 
Reported: 2021-09-10 15:03 UTC by OpenShift BugZilla Robot
Modified: 2021-11-08 17:07 UTC (History)
5 users (show)

Fixed In Version: 4.8.16
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2009752 (view as bug list)
Environment:
Last Closed: 2021-10-12 06:01:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 5297 0 None Merged [release-1.21] oci: call wait on conmon if cgroup move fails 2021-10-01 13:41:54 UTC
Red Hat Knowledge Base (Solution) 6446151 0 None None None 2021-10-21 15:47:39 UTC
Red Hat Product Errata RHBA-2021:3682 0 None None None 2021-10-12 06:01:45 UTC

Description OpenShift BugZilla Robot 2021-09-10 15:03:40 UTC
+++ This bug was initially created as a clone of Bug #2003197 +++

+++ This bug was initially created as a clone of Bug #2002434 +++

Description of problem:
Occasionally, CRI-O may leak a child pid of a process it creates. These situations are weird and tough to reproduce. The most common one is if systemd fails to move conmon to the conmon cgroup for some reason. 

I don't have a great reproducer, but this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1994444 (though branched away to allow for other bugs to be investigated there).

Version-Release number of selected component (if applicable):
All released CRI-O versions

--- Additional comment from Peter Hunt on 2021-09-10 13:29:31 UTC ---

PR merged

--- Additional comment from OpenShift Automated Release Tooling on 2021-09-10 14:39:39 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.
This bug is expected to ship in the next 4.10 release created.

--- Additional comment from pehunt on 2021-09-10 14:59:48 UTC ---

fixed in attached PR

Comment 1 Peter Hunt 2021-09-10 15:04:22 UTC
fixed by PR

Comment 4 Peter Hunt 2021-10-01 13:40:48 UTC
PR merged!

Comment 7 MinLi 2021-10-11 07:29:00 UTC
It's tough to reproduce. Verifying it based on some sanity checks on 4.8.14

$ oc get clusterversion 
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.14    True        False         62m     Cluster version is 4.8.14

Comment 9 errata-xmlrpc 2021-10-12 06:01:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3682

Comment 10 Peter Hunt 2021-10-21 15:31:21 UTC
*** Bug 2016459 has been marked as a duplicate of this bug. ***

Comment 11 Peter Hunt 2021-11-02 17:15:47 UTC
*** Bug 2019346 has been marked as a duplicate of this bug. ***

Comment 12 Peter Hunt 2021-11-08 17:07:06 UTC
FYI for folks ending up here. This was actually fixed in 4.8.16, as there was another required patch that slipped in after this was verified with a sanity check. If you run into this, please upgrade to 4.8.16, and report any issues if there still exist any


Note You need to log in before you can comment on or make changes to this bug.