Bug 1881678
Summary: | [vsphere] Error while adding pod to CNI network \"multus-cni-network\": netplugin failed with no error message: signal: killed" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joseph Callen <jcallen> |
Component: | Networking | Assignee: | zenghui.shi <zshi> |
Networking sub component: | multus | QA Contact: | Weibin Liang <weliang> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | akashem, aos-bugs, bbennett, dosmith, mfojtik, sttts, xxia |
Version: | 4.6 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-13 19:49:04 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Joseph Callen
2020-09-22 21:07:03 UTC
What this looks like is that crio's context for the CNI executable is ending, and it kills the process. Therefore, Multus doesn't have time return an error. Without debug logging turned on for Multus, it makes it difficult to diagnose exactly what can cause this. (I will return to my team with a note about improving some of the visibility of the logs in must-gather) However, there are two likelihoods for where Multus would wait long enough to get context killed: 1. The default network provider (openshift-sdn or ovn-k) isn't ready. 2. The API server isn't responding. In this case, I believe it's #2. I've taken a look at must-gather, and I can get see that there is a time period where the default network provider isn't ready, however, this occurs at approximately 18:23 and lasts for 41 seconds according to `./host_service_logs/masters/kubelet_service.log` Taking a look at the API server logs, we can see there's been a throttling event @ 18:47 which aligns with the time period where crio is context killing Multus per the above logs. ``` cat ./namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-5f9df6bb56-62g7f/kube-apiserver-operator/kube-apiserver-operator/logs/current.log | grep "18:47" 020-09-22T18:47:00.090030431Z I0922 18:47:00.089963 1 request.go:645] Throttling request took 1.391262181s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue 2020-09-22T18:47:01.289957476Z I0922 18:47:01.289802 1 request.go:645] Throttling request took 1.389825161s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/configmaps/config 2020-09-22T18:47:02.489992922Z I0922 18:47:02.489885 1 request.go:645] Throttling request took 1.392346474s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue 2020-09-22T18:47:03.489978342Z I0922 18:47:03.489894 1 request.go:645] Throttling request took 1.591991131s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue 2020-09-22T18:47:04.689902998Z I0922 18:47:04.689856 1 request.go:645] Throttling request took 1.191412227s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue 2020-09-22T18:47:05.889962580Z I0922 18:47:05.889790 1 request.go:645] Throttling request took 1.214219132s, request: GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver ``` I would advise that we have the folks from the kube-apiserver component take a look. Thank you! > cat ./namespaces/openshift-kube-apiserver-operator/pods/kube-apiserver-operator-5f9df6bb56-62g7f/kube-apiserver-operator/kube-apiserver- > operator/logs/current.log | grep "18:47" > 020-09-22T18:47:00.090030431Z I0922 18:47:00.089963 1 request.go:645] Throttling request took 1.391262181s, request: > GET:https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/pods?labelSelector=apiserver%3Dtrue This is client side throttling. If you check the operator's log from the ci job, you will see that the this message is prevalent throughout the log. https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_release/12106/rehearse-12106-pull-ci-openshift-machine-api-operator-master-e2e-vsphere-serial/1308465647310606336/artifacts/e2e-vsphere-serial/gather-extra/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-5f9df6bb56-62g7f_kube-apiserver-operator.log I have downloaded the prometheus dump from the CI job: A: I noticed that pods "CONNECT" call was taking longer (order of 50s to 60s) for a brief window of 5 minutes (from 15:55 to 16:00 as shown in the metrics) B: Two requests failed with 429, pods (subresource=eviction, verb=POST). This happened around the same window. The API server was not under load and was not rejecting any requests. Other Pod requests were normal. Do you think either A or B can be attributed to the issue you are observing? It's hard conclude so, A lasted for about 5 minutes and for B it's just two requests. > I have noticed the last few days a significant slow down in installation time where some pods take an extended amount of time to start. > Within `oc get events` there is an error: > Failed to create pod sandbox: rpc error: code = Unknown desc = Kubelet may be retrying requests that are timing out in CRI-O due to system load This does not seem related to apiserver and based on my investigation so far it looks like this is not an issue with the apiserver. But if you can pinpoint the timeline and the pod resources I would be happy to look at it again. But for now, I am going to send it in your direction. Moving back to Multus to reevaluate with Abu's analysis. Note that slow CONNECT calls look like networking issues, don't they Abu? Thanks for the investigation Abu, and information about the throttling error messages. Grepping through the must gather looking for failures in this time period with `grep -Prin "18:47:" *` I'm seeing a lot of errors that look like: ``` mustgather/host_service_logs/masters/kubelet_service.log:1237839:Sep 22 20:08:50.798654 ci-op-9mdkn32w-b51ab-8n2lh-master-2 hyperkube[1864]: 2020-09-22T18:47:28Z auth: error contacting auth provider (retrying in 10s): discovery through endpoint https://kubernetes.default.svc/.well-known/oauth-authorization-server failed: 404 Not Found ``` My investigation shows that there's 410 of these error messages within this minute, such as: ``` grep -Prin "18:47:" * | grep -i error | grep -i "error contacting auth provider" | wc -l 410 ``` Whereas, there's only a few references to Multus errors within this time period: ``` grep -Prin "18:47:" * | grep -i error | grep -i "multus" | wc -l 15 ``` Additionally, I believe that the CRIO error related to Multus is a symptom and indicative of a deeper systemic problem with a slowdown somewhere. Looking at all the Multus errors on the system, there are three events that occur: 1. There is a time period where the default network provider isn't ready, however, this occurs at approximately 18:23 and lasts for 41 seconds according to `./host_service_logs/masters/kubelet_service.log`, this appears nominal to me, as it can take some time for the default network provider (openshift-sdn in this case) to become ready. 2. between 18:47:26.2 and 18:47:27.2, a 1 second period, there are 14 events related to the CRIO context deadline in `./host_service_logs/masters/crio_service.log` 3. between 19:13:06.3 and 19:13:06.9, a 0.6 second period, there are 16 events related to the the CRIO context deadline in `./host_service_logs/masters/crio_service.log` Typically, if there were errors related to Multus, this would appear in Kubelet logs, as CNI errors go to STDERR which is captured by the kubelet. The only errors that Multus is reporting is the default network not being ready @ 18:23 as mentioned in event #1 above. In consulting with my team, one observation was made regarding this error: ``` pods/sdn-844k6/sdn/sdn/logs/current.log:407:2020-09-22T18:47:26.204841994Z W0922 18:47:26.204767 4408 pod.go:276] CNI_ADD openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0 failed: Get "https://api-int.ci-op-9mdkn32w-b51ab.origin-ci-int-aws.dev.rhcloud.com:6443/api/v1/namespaces/openshift-etcd/pods/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0": read tcp 172.31.252.213:36158->172.31.252.213:6443: read: connection timed out ``` This is from the SDN, which is over host networking. This could be related to infrastructure problems, or the apiserver not being up on this node, or it could be that it's a load balancer problem. > This is from the SDN, which is over host networking. This could be related to infrastructure problems, or the apiserver not being up on this node, > or it could be that it's a load balancer problem. - Unfortunately, the prometheus metric data does not go back as far as "18:47:26.204841994Z". - The apiserver logs do not go back that far either due to this issue https://bugzilla.redhat.com/show_bug.cgi?id=1882446. I checked the events and saw this: > "2020-09-22T18:26:07Z StaticPodInstallerCompleted Successfully installed revision 2" > "2020-09-22T18:26:10Z Started Started container kube-apiserver-check-endpoints" > "2020-09-22T18:26:44Z Started Started container pruner" > "2020-09-22T18:26:58Z Started Started container kube-apiserver" > "2020-09-22T18:28:14Z StaticPodInstallerCompleted Successfully installed revision 2" > "2020-09-22T18:29:12Z StaticPodInstallerCompleted Successfully installed revision 3" > "2020-09-22T18:38:48Z FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded" > "2020-09-22T18:43:55Z FailedCreatePodSandBox (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = Kubelet may be > retrying requests that are timing out in CRI-O due to system load: error reserving pod name k8s_revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master- > 0_openshift-kube-apiserver_c094de13-e3c8-4f4e-b49e-5076bfe25504_0 for id 966299ad7d5e5982922ebb380c673693847ec7576c031b26fd7cdd3e95e21898: name is reserved" "FailedCreatePodSandBox" error applies to the following Pods (not limited to the apiserver) $ cat events.json | jq '.items | sort_by(.lastTimestamp) | .[] | select(.reason=="FailedCreatePodSandBox") | .lastTimestamp + " " + .reason + " " + .involvedObject.namespace + "/" + .involvedObject.name' "2020-09-22T18:33:15Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:33:27Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:33:38Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:33:38Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:33:40Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:33:49Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:33:53Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:33:54Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:01Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:04Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:07Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:12Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:18Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:21Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:24Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:29Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:35Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:35Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:40Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:47Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:47Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:54Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:34:58Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:35:01Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:35:07Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:35:13Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:35:20Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:37:47Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:37:56Z FailedCreatePodSandBox openshift-authentication/oauth-openshift-5c4dd97b94-hq9ss" "2020-09-22T18:38:00Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:38:11Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:38:23Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:38:34Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:38:40Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:38:48Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:38:49Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:38:51Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:01Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:39:01Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:03Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:14Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:15Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:39:16Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:26Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:39:29Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:29Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:40Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:43Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:54Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:39:55Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:40:09Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:40:11Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:40:23Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:40:24Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:40:35Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:43:08Z FailedCreatePodSandBox openshift-console/console-84bd7b6b74-5cx59" "2020-09-22T18:43:22Z FailedCreatePodSandBox openshift-etcd/installer-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:43:40Z FailedCreatePodSandBox openshift-kube-controller-manager/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:43:45Z FailedCreatePodSandBox openshift-kube-scheduler/installer-5-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:43:55Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-3-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T18:44:13Z FailedCreatePodSandBox openshift-kube-apiserver/installer-4-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:01:08Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:01:21Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:01:27Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:01:35Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:01:36Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:01:40Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:01:46Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:01:47Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:01:55Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:01:58Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:01:59Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:02:08Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:09Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:11Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:02:22Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:23Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:26Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:02:33Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:38Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:41Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:02:45Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:49Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:02:55Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:02:57Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:03:02Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:03:10Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:03:15Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:03:38Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:03:53Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:04:07Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:04:22Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:04:34Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:04:47Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:05:01Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:05:14Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:05:28Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:09:12Z FailedCreatePodSandBox openshift-controller-manager/controller-manager-dp5q4" "2020-09-22T19:11:12Z FailedCreatePodSandBox openshift-apiserver/apiserver-7bf97f8874-4mxp9" "2020-09-22T19:11:36Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-6-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:11:38Z FailedCreatePodSandBox openshift-kube-apiserver/installer-7-ci-op-9mdkn32w-b51ab-8n2lh-master-0" "2020-09-22T19:25:29Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:25:44Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:25:56Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:26:07Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:26:19Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:26:31Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:26:44Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:26:58Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:27:10Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:29:08Z FailedCreatePodSandBox openshift-kube-apiserver/revision-pruner-7-ci-op-9mdkn32w-b51ab-8n2lh-master-2" "2020-09-22T19:39:58Z FailedCreatePodSandBox kube-system/critical-pod" I also see evidence of network blips in the events. This could be a network related issue. My recommendation is to get it to reproduce with https://github.com/openshift/kubernetes/pull/374 included in the build. This way we can look at the apiserver log in depth. In the meantime, I would also recommend investigating the "FailedCreatePodSandBox" error further, where is it originating from? Moving to 4.7 to investigate further. This seems to occur rarely (based on https://search.ci.openshift.org/) and doesn't break the cluster when it happens, it just causes slowdown in pod creation on the affected node. It may be a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1785399 After skimming though 1785399 I think it is a dup.(In reply to Ben Bennett from comment #8) > Moving to 4.7 to investigate further. This seems to occur rarely (based on This isn't quite correct. I am seeing on every install I have been running manually. Including the one I am presently running. > https://search.ci.openshift.org/) and doesn't break the cluster when it > happens, it just causes slowdown in pod creation on the affected node. > > It may be a dupe of https://bugzilla.redhat.com/show_bug.cgi?id=1785399 After skimming the above BZ yeah it could be a dup *** This bug has been marked as a duplicate of bug 1785399 *** |