Bug 2003197
| Summary: | CRI-O leaks some children PIDs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Peter Hunt <pehunt> |
| Component: | Node | Assignee: | Peter Hunt <pehunt> |
| Node sub component: | CRI-O | QA Contact: | Sunil Choudhary <schoudha> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | high | CC: | aos-bugs, mburke, schoudha |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause:
Systemd experiencing load, causing CRI-O to fail to move conmon to the systemd cgroup
Consequence:
A bug in CRI-O that leaked the conmon process, causing a zombie
Fix:
Don't leak the conmon process
Result:
No zombies under CRI-O, even if systemd is overloaded
|
Story Points: | --- |
| Clone Of: | 2002434 | Environment: | |
| Last Closed: | 2021-10-18 17:51:28 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2002434 | ||
| Bug Blocks: | 2003199, 2009752 | ||
|
Description
Peter Hunt
2021-09-10 14:58:52 UTC
fixed in attached PR oops, we need a 4.9 variant of https://github.com/cri-o/cri-o/pull/5306 as well PR merged I see this is tough to reproduce. Verifying it based on some sanity checks on 4.9.0-0.nightly-2021-09-20-203004 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-20-203004 True False 175m Cluster version is 4.9.0-0.nightly-2021-09-20-203004 Peter -- Can you review my proposed release note for this issue? I see the doc text you added. Thank you. However I think it needs to be cleaned up a bit. * Previously, a bug in CRI-O caused CRI-O to leak a child pid of a process it created. As a result, under load Systemd could create a significant number of zombie processes. CRI-O was fixed to prevent the leakage, As a result, theze zombie process are no longer being created. Is there a consequence of the zombie processes? Node failure or such? Thank you for your help. the blurb content LGTM! The consequence of zombies *could* be node failure if the node runs out of PIDs, which is quite unlikely. More likely than not, it'll just hold entries in the kernel process table, look bad and be wasteful. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |