Bug 1997476
| Summary: | pod in Error due to KillPodSandbox | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> |
| Component: | Networking | Assignee: | Riccardo Ravaioli <rravaiol> |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> |
| Status: | CLOSED WONTFIX | Docs Contact: | |
| Severity: | high | ||
| Priority: | high | CC: | astoycos, rravaiol |
| Version: | 4.9 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-13 13:06:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
zhaozhanqi
2021-08-25 10:23:29 UTC
Even this issue not already happen and can be workaround by recreating the pod. However I think this issue wound affect the customer experience if it occur ****** From the logs, what most probably happened was that the cluster was running out of resources and pod "network-check-target-7prn7" had to be killed as a preemption measure. The deletion of the pod then was not successful since it exceeded the specified grace period (was it set to 0?): $ omg get events -o wide | grep 7prn7 2h4m Normal Scheduled pod/network-check-target-7prn7 Successfully assigned openshift-network-diagnostics/network-check-target-7prn7 to control-plane-0 2h4m Warning NetworkNotReady pod/network-check-target-7prn7 network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? 2h4m Normal AddedInterface pod/network-check-target-7prn7 Add eth0 [10.128.0.2/23] from openshift-sdn 2h4m Normal Pulling pod/network-check-target-7prn7 Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:73b7930bc2ce99c36902a8d7ee524c68432247b55489000a1d66ce8030078952" 2h3m Normal Pulled pod/network-check-target-7prn7 Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:73b7930bc2ce99c36902a8d7ee524c68432247b55489000a1d66ce8030078952" in 8.386299402s 2h3m Normal Created pod/network-check-target-7prn7 Created container network-check-target-container 2h3m Normal Started pod/network-check-target-7prn7 Started container network-check-target-container 1h23m Warning NodeNotReady pod/network-check-target-7prn7 Node is not ready 1h22m Normal AddedInterface pod/network-check-target-7prn7 Add eth0 [10.128.0.5/23] from openshift-sdn 1h22m Normal Pulled pod/network-check-target-7prn7 Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:73b7930bc2ce99c36902a8d7ee524c68432247b55489000a1d66ce8030078952" already present on machine 1h22m Normal Created pod/network-check-target-7prn7 Created container network-check-target-container 1h22m Normal Started pod/network-check-target-7prn7 Started container network-check-target-container 1h21m Warning Preempting pod/network-check-target-7prn7 Preempted in order to admit critical pod 1h21m Normal Killing pod/network-check-target-7prn7 Stopping container network-check-target-container 1h20m Warning FailedKillPod pod/network-check-target-7prn7 error killing pod: failed to "KillPodSandbox" for "077ff051-9c1f-4a48-8df9-1007bc104aa3" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_network-check-target-7prn7_openshift-network-diagnostics_077ff051-9c1f-4a48-8df9-1007bc104aa3_0(a73ed428742d847858f290bdf081ac7f10d1cd70dd97d4697e981ddcbfe5e95c): error removing pod openshift-network-diagnostics_network-check-target-7prn7 from CNI network \"multus-cni-network\": Multus: [openshift-network-diagnostics/network-check-target-7prn7]: error getting pod: an error on the server (\"\") has prevented the request from succeeding (get pods network-check-target-7prn7)" 1h21m Warning ExceededGracePeriod pod/network-check-target-7prn7 Container runtime did not kill the pod within specified grace period. ****** More specifically, the event happened at "2021-08-25T08:53:00Z" and SDN got the CNI_DEL a few seconds later (08:53:04.088948214Z). Seems like multus had a glitch where it didn't get a response for "oc get pod" but eventually SDN got the request: namespaces/openshift-sdn/pods/sdn-sgmlk/sdn/sdn/logs/current.log:221:2021-08-25T08:53:04.088948214Z I0825 08:53:04.088905 2514 pod.go:542] CNI_DEL openshift-network-diagnostics/network-check-target-7prn7 Given that the condition from above is hard to reproduce and there's a workaround (recreating the affected pod), I'd close this BZ for now. If ever we run into the bug again, we can reopen the BZ and possibly give me access to a live cluster in order to debug further. |