Bug 2019346
Summary: | zombie processes accumulation and Argument list too long | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Michal Minar <miminar> | ||||||
Component: | Node | Assignee: | Peter Hunt <pehunt> | ||||||
Node sub component: | CRI-O | QA Contact: | MinLi <minmli> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | high | CC: | aos-bugs, minmli, nagrawal, openshift-bugs-escalate, palshure, pehunt, srengan | ||||||
Version: | 4.8 | Keywords: | Reopened | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.10.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 2032466 (view as bug list) | Environment: | |||||||
Last Closed: | 2022-03-10 16:24:30 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 2032466 | ||||||||
Attachments: |
|
Description
Michal Minar
2021-11-02 10:18:53 UTC
Based on the note about the zombies, it looks like this may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2003199, can you try 4.8.16 to see if it has the same troubles? *** This bug has been marked as a duplicate of bug 2003199 *** Created attachment 1844029 [details]
zombie producers in grafana
Time window shows the start of zombie accumulation on ocs-beworker1
sorry if this has already been answered, but do you have more information about what the zombie processes are? The latest stats* show only 2 significant contributors on ocs-beworker1:
- conmon (parent cri-o) 99k
- conmon (parent multus) 71k
* taken just before the restart the nodes
The attachment 1844029 [details] shows just the former 30 minutes after the issue started.
wait just to verify, conmon is a child of *multus*? that is unexpected. Created attachment 1844219 [details]
grafana multus zombie spikes
Yes, that's what I see. Interesting are also bursts of those multus child zombies occurring at ~12 hour periods. As this attachment shows.
Is there something else I can collect that would reveal more (e.g. full cmdline)?
Deploy OpenShift Data Foundation on aws cluster with template private-templates/functionality-testing/aos-4_10/ipi-on-aws/versioned-installer of vm_type: 'm5.4xlarge' After 48 hours, no zombie processes accumulation, set verified! $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-01-22-102609 True False 26h Cluster version is 4.10.0-0.nightly-2022-01-22-102609 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days |