Bug 2003199

Summary: CRI-O leaks some children PIDs
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CPU manager QA Contact: MinLi <minmli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, miminar, minmli, schoudha, ssonigra
Version: 4.10   
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.8.16 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2009752 (view as bug list) Environment:
Last Closed: 2021-10-12 06:01:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2003197    
Bug Blocks: 2009752    

Description OpenShift BugZilla Robot 2021-09-10 15:03:40 UTC
+++ This bug was initially created as a clone of Bug #2003197 +++

+++ This bug was initially created as a clone of Bug #2002434 +++

Description of problem:
Occasionally, CRI-O may leak a child pid of a process it creates. These situations are weird and tough to reproduce. The most common one is if systemd fails to move conmon to the conmon cgroup for some reason. 

I don't have a great reproducer, but this is related to https://bugzilla.redhat.com/show_bug.cgi?id=1994444 (though branched away to allow for other bugs to be investigated there).

Version-Release number of selected component (if applicable):
All released CRI-O versions

--- Additional comment from Peter Hunt on 2021-09-10 13:29:31 UTC ---

PR merged

--- Additional comment from OpenShift Automated Release Tooling on 2021-09-10 14:39:39 UTC ---

Elliott changed bug status from MODIFIED to ON_QA.
This bug is expected to ship in the next 4.10 release created.

--- Additional comment from pehunt on 2021-09-10 14:59:48 UTC ---

fixed in attached PR

Comment 1 Peter Hunt 2021-09-10 15:04:22 UTC
fixed by PR

Comment 4 Peter Hunt 2021-10-01 13:40:48 UTC
PR merged!

Comment 7 MinLi 2021-10-11 07:29:00 UTC
It's tough to reproduce. Verifying it based on some sanity checks on 4.8.14

$ oc get clusterversion 
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.14    True        False         62m     Cluster version is 4.8.14

Comment 9 errata-xmlrpc 2021-10-12 06:01:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3682

Comment 10 Peter Hunt 2021-10-21 15:31:21 UTC
*** Bug 2016459 has been marked as a duplicate of this bug. ***

Comment 11 Peter Hunt 2021-11-02 17:15:47 UTC
*** Bug 2019346 has been marked as a duplicate of this bug. ***

Comment 12 Peter Hunt 2021-11-08 17:07:06 UTC
FYI for folks ending up here. This was actually fixed in 4.8.16, as there was another required patch that slipped in after this was verified with a sanity check. If you run into this, please upgrade to 4.8.16, and report any issues if there still exist any