Bug 2016339
| Summary: | slow pod deletion during load: NamespaceContentRemaining + NamespaceDeletionContentFailure: unexpected items still remain in namespace | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Rahul Gangwar <rgangwar> | ||||
| Component: | Node | Assignee: | Sai Ramesh Vanka <svanka> | ||||
| Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> | ||||
| Status: | CLOSED DUPLICATE | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | unspecified | CC: | aos-bugs, harpatil, pehunt, tsweeney | ||||
| Version: | 4.9 | Keywords: | Reopened | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-02-09 15:16:20 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Rahul Gangwar
2021-10-21 10:33:46 UTC
Is this reproducible with the released 4.9? This looks like a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=2003206 can we get the crio goroutines to confirm? (see https://github.com/cri-o/cri-o/blob/main/tutorials/debugging.md) @Peter Hunt, Shared must gather logs. Are must-gather logs are not useful? Peter, yes it is with the released 4.9. must-gather doesn't yet contain the information requested in https://bugzilla.redhat.com/show_bug.cgi?id=2016339#c2 Not completed this sprint. *** This bug has been marked as a duplicate of bug 2021432 *** Harsha still issue I can reproduced on 4.10 and it is not duplicate.
oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-01-17-182202 True False 136m Cluster version is 4.10.0-0.nightly-2022-01-17-182202
Tue Jan 18 12:08:51 IST 2022
3pndd
NAME STATUS AGE
3pndd Terminating 2m24s
status:
conditions:
- lastTransitionTime: "2022-01-18T06:36:40Z"
message: All resources successfully discovered
reason: ResourcesDiscovered
status: "False"
type: NamespaceDeletionDiscoveryFailure
- lastTransitionTime: "2022-01-18T06:36:40Z"
message: All legacy kube types successfully parsed
reason: ParsedGroupVersions
status: "False"
type: NamespaceDeletionGroupVersionParsingFailure
- lastTransitionTime: "2022-01-18T06:37:20Z"
message: 'Failed to delete all resource types, 1 remaining: unexpected items still remain in namespace: 3pndd for gvr: /v1, Resource=pods'
reason: ContentDeletionFailed
status: "True"
type: NamespaceDeletionContentFailure
- lastTransitionTime: "2022-01-18T06:36:40Z"
message: 'Some resources are remaining: pods. has 1 resource instances'
reason: SomeResourcesRemain
status: "True"
type: NamespaceContentRemaining
- lastTransitionTime: "2022-01-18T06:36:40Z"
message: All content-preserving finalizers finished
reason: ContentHasNoFinalizers
status: "False"
type: NamespaceFinalizersRemaining
phase: Terminating
Hi Rahul, I couldn't find any kubelet logs related to the namespace "3pndd" in the attached must-gather output - https://bugzilla.redhat.com/show_bug.cgi?id=2016339#c10 [svanka@svanka masters]$ ls -lrt total 10696 -rw-r--r--. 1 svanka svanka 435691 Jan 18 12:34 ovs-vswitchd_service.log -rw-r--r--. 1 svanka svanka 10332 Jan 18 12:34 ovsdb-server_service.log -rw-r--r--. 1 svanka svanka 50834 Jan 18 12:34 ovs-configuration_service.log -rw-r--r--. 1 svanka svanka 2348 Jan 18 12:34 openvswitch_service.log -rw-r--r--. 1 svanka svanka 157923 Jan 18 12:34 NetworkManager_service.log -rw-r--r--. 1 svanka svanka 612 Jan 18 12:34 machine-config-daemon-host_service.log -rw-r--r--. 1 svanka svanka 30215 Jan 18 12:34 machine-config-daemon-firstboot_service.log -rw-r--r--. 1 svanka svanka 6221625 Jan 18 12:34 kubelet_service.log -rw-r--r--. 1 svanka svanka 4026143 Jan 18 12:34 crio_service.log [svanka@svanka masters]$ grep -nri "3pndd" * [svanka@svanka masters]$ Could you attach the kubelet log from the node on which the pods in the above namespace are running? This would really help in debugging the root cause of the issue. Moreover, I didn't find any abnormal behavior in the kubelet and crio services' logs attached in the must-gather output. Thanks, Ramesh must-gather only gathers from a random master by default. we'll need the must-gather or sos_report from the node with the slowly deleted pod must-gather of all masters https://drive.google.com/file/d/1ZqyNX2kBs-Dga_AMPYmSNE8wiVnz_vEN/view?usp=sharing sorry if it was vague. I presume this workload is landing on a worker, so we'll need the must-gather of the worker this workload is stuck on Hey Rahul, I still couldn't find any logs relevant to the "3pndd" namespace in the must-gather logs attached in https://bugzilla.redhat.com/show_bug.cgi?id=2016339#c17 As Peter mentioned, could you attach the kubelet logs of the worker node on which the workload is present? Thanks, Ramesh Hi Rahul, The attached must-gather logs aren't really helpful in debugging the actual root cause for the issue. Could you attach the kubelet logs of the worker node on which the workload is present as requested in the above comments? Following command helps in gathering the kubelet logs from a node. `oc adm node-logs -u kubelet <node-name>` Thanks, Ramesh kubelet logs of all nodes https://drive.google.com/file/d/1rikI8nfDNKOJZ53y7S9ZOgr9GzdA9siN/view?usp=sharing. If this is not help you, I can guide you for replication of issue. *** This bug has been marked as a duplicate of bug 2038780 *** |