Bug 1781297
Summary: | pod fails to create after delete ovnkube-master OVN_Southbound leader - status 400: 'failed to get pod annotation: timed out waiting for the condition | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ross Brattain <rbrattai> |
Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> |
Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | aconstan, anusaxen, ddelcian, fsimonce, mcornea, ncredi, nkononov, william.caban |
Version: | 4.3.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-13 21:54:24 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ross Brattain
2019-12-09 17:51:56 UTC
Hello, I just noticed the same error in my environment. Once I apply MachineConfig resource and it restarts some worker nodes I face with the same error: Normal Scheduled 51s default-scheduler Successfully assigned sctptest/checksctppbsls to worker-1 Warning FailedCreatePodSandBox 23s kubelet, worker-1 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_checksctppbsls_sctptest_31321e45-0ccb-4de5-bcc6-dcac0f4a73b9_0(25c0a5d3429f8ede19aa65166c7767d84c5028069e3ad9d00c5fc7be7f437d9d): Multus: error adding pod to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: 'failed to get pod annotation: timed out waiting for the condition It happened sporadically but pretty common like every second pod is in ContainerCreating state. My environment is libvirt [root@cnv-qe-01 ~]# oc version Client Version: 4.4.0-0.ci-2020-01-15-100606 Server Version: 4.4.0-0.ci-2020-01-15-100606 Kubernetes Version: v1.17.0 Also I notice that ovn pods are restarting openshift-ovn-kubernetes ovnkube-master-fx8gg 4/4 Running 4 20h openshift-ovn-kubernetes ovnkube-master-xczjm 4/4 Running 9 71m openshift-ovn-kubernetes ovnkube-master-z7g44 4/4 Running 20 20h I tried to reboot entire cluster and after it I noticed the following panics in OVN master: The first one is: I0116 10:19:01.629834 1 leaderelection.go:235] attempting to acquire leader lease openshift-ovn-kubernetes/ovn-kubernetes-master... I0116 10:19:01.640741 1 leaderelection.go:245] successfully acquired lease openshift-ovn-kubernetes/ovn-kubernetes-master time="2020-01-16T10:19:01Z" level=info msg=" I (master-1) won the election. In active mode" panic: assignment to entry in nil map goroutine 29 [running]: github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).getLbEndpoints(0xc000341440, 0xc0008ad3a8, 0xc0007aaa50, 0xc0007aaa80) /go-controller/pkg/ovn/endpoints.go:33 +0x421 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).handleNodePortLB(0xc000341440, 0xc00014b000, 0x1, 0xc00003e3f0) /go-controller/pkg/ovn/endpoints.go:162 +0x783 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).syncGatewayLogicalNetwork(0xc000341440, 0xc00014b000, 0xc000822db0, 0xc00081eea0, 0xd, 0x1463860, 0x1634fac) /go-controller/pkg/ovn/master.go:409 +0x567 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).syncNodeGateway(0xc000341440, 0xc00014b000, 0xc000822750, 0x0, 0x0) /go-controller/pkg/ovn/ovn.go:498 +0x11d github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchNodes.func1(0x161e500, 0xc00014b000) /go-controller/pkg/ovn/ovn.go:524 +0x246 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:195 k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnAdd(0xc0004e5b80, 0x18b2ce0, 0xc0007c3640, 0x161e500, 0xc00014b000) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:227 +0x69 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).addHandler(0xc000347a10, 0x18f97c0, 0x161e500, 0x0, 0x0, 0x0, 0x18b2ce0, 0xc0007c3640, 0xc0008ade08, 0x0, ...) /go-controller/pkg/factory/factory.go:268 +0x575 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).AddFilteredNodeHandler(...) /go-controller/pkg/factory/factory.go:374 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchNodes(0xc000341440, 0x0, 0xc0000e7590, 0x7) /go-controller/pkg/ovn/ovn.go:509 +0x1d5 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).Run(0xc000341440, 0x0, 0xc00009a780, 0x0, 0x0) /go-controller/pkg/ovn/ovn.go:166 +0x52 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*HAMasterController).ConfigureAsActive(0xc000717720, 0x7fffa9418bef, 0x8, 0x28, 0xc00042f7b0) /go-controller/pkg/ovn/ha_master.go:209 +0x96 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*HAMasterController).StartHAMasterController.func1(0x18b6da0, 0xc0002bc400) /go-controller/pkg/ovn/ha_master.go:79 +0xd7 created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run /go-controller/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec The second one is: time="2020-01-16T10:24:20Z" level=info msg="Setting annotations map[k8s.ovn.org/pod-networks:{\"default\":{\"ip_address\":\"10.128.2.8/23\",\"mac_address\":\"fa:e8:5b:80:02:09\",\"gateway_ip\":\"10.128.2.1\"}} ovn:{\"ip_address\":\"10.128.2.8/23\",\"mac_address\":\"fa:e8:5b:80:02:09\",\"gateway_ip\":\"10.128.2.1\"}] on pod sctptest/sctpclient2gphn" time="2020-01-16T10:24:21Z" level=info msg="[sctptest/sctpclient2gphn] addLogicalPort took 517.143088ms" E0116 10:24:30.357812 1 runtime.go:73] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map) goroutine 87 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x148db40, 0x186a0d0) /go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:69 +0x7b k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 +0x82 panic(0x148db40, 0x186a0d0) /usr/local/go/src/runtime/panic.go:522 +0x1b5 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).getLbEndpoints(0xc000aa86c0, 0xc0001857c0, 0xc0007d5290, 0xc0007d52c0) /go-controller/pkg/ovn/endpoints.go:33 +0x421 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).AddEndpoints(0xc000aa86c0, 0xc0001857c0, 0x13ea7c0, 0x2510280) /go-controller/pkg/ovn/endpoints.go:59 +0x109 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchEndpoints.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0) /go-controller/pkg/ovn/ovn.go:424 +0x19c k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202 k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate(0xc00057ed00, 0x18b2ce0, 0xc00038fe20, 0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:236 +0x120 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2.1(0xc000bdc1e0) /go-controller/pkg/factory/factory.go:151 +0x64 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).forEachHandler(0xc000412090, 0x161d700, 0xc0001857c0, 0xc00086fcd8) /go-controller/pkg/factory/factory.go:54 +0x16d github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0) /go-controller/pkg/factory/factory.go:150 +0x8c k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202 k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc000466c00, 0xc000480a00, 0x0) /go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:605 +0x1dd k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc00086fe38, 0x42c47f, 0xc000480a30) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:284 +0x51 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:601 +0x79 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00032ef68) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00086ff68, 0xdf8475800, 0x0, 0x42cd01, 0xc0003298c0) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8 k8s.io/apimachinery/pkg/util/wait.Until(...) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 k8s.io/client-go/tools/cache.(*processorListener).run(0xc0003eac80) /go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:599 +0x9c k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000393170, 0xc0004a6080) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62 panic: assignment to entry in nil map [recovered] panic: assignment to entry in nil map goroutine 87 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105 panic(0x148db40, 0x186a0d0) /usr/local/go/src/runtime/panic.go:522 +0x1b5 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).getLbEndpoints(0xc000aa86c0, 0xc0001857c0, 0xc0007d5290, 0xc0007d52c0) /go-controller/pkg/ovn/endpoints.go:33 +0x421 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).AddEndpoints(0xc000aa86c0, 0xc0001857c0, 0x13ea7c0, 0x2510280) /go-controller/pkg/ovn/endpoints.go:59 +0x109 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchEndpoints.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0) /go-controller/pkg/ovn/ovn.go:424 +0x19c k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202 k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate(0xc00057ed00, 0x18b2ce0, 0xc00038fe20, 0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:236 +0x120 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2.1(0xc000bdc1e0) /go-controller/pkg/factory/factory.go:151 +0x64 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).forEachHandler(0xc000412090, 0x161d700, 0xc0001857c0, 0xc00086fcd8) /go-controller/pkg/factory/factory.go:54 +0x16d github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0) /go-controller/pkg/factory/factory.go:150 +0x8c k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202 k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc000466c00, 0xc000480a00, 0x0) /go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:605 +0x1dd k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc00086fe38, 0x42c47f, 0xc000480a30) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:284 +0x51 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:601 +0x79 k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00032ef68) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00086ff68, 0xdf8475800, 0x0, 0x42cd01, 0xc0003298c0) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8 k8s.io/apimachinery/pkg/util/wait.Until(...) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 k8s.io/client-go/tools/cache.(*processorListener).run(0xc0003eac80) /go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:599 +0x9c k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000393170, 0xc0004a6080) /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start /go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62 Please update regarding my comment above. Hi So this has been fixed in upstreams ovnkubernetes, and should be existing downstreams openshift/ovn-kubernetes in 4.3/4.4/4.5 Go ahead and re-test -Alex Unable to reproduce in 4.4.0-0.nightly-2020-03-05-084644 @Alexander, how did you figure that this is fixed on 4.3? because we see it failing in 4.3, And apparently there is no 4.3 clone, so I can create one, but I wanted to make sure before that we were expecting it to be fixed, and somehow it got missed Hi Yes this issue together with a bunch of other should be fixed because branch 4.4 and 4.3 should be completely aligned (we have basically rebased them one on top of the other). What are you experiencing in your tests on 4.3, is it the panic in #comment 4? Or the 400 in #comment 1? Which tests are you actually performing by the way (I guess the steps described in #comment 1)? - Alex Ill let Nikita or Marius provide the technical details Hi,
We faced with the same issue. Please take a look on the logs below:
We noticed that ovnkube pods crash all the time
> oc get pods -n openshift-ovn-kubernetes
NAME READY STATUS RESTARTS AGE
ovnkube-master-lwggs 4/4 Running 14 3d18h
ovnkube-master-q2xht 3/4 Error 14 3d18h
ovnkube-master-s29zx 3/4 CrashLoopBackOff 13 3d18h
ovnkube-node-9nt7j 3/3 Running 0 3d17h
ovnkube-node-ffvjz 3/3 Running 0 3d18h
ovnkube-node-fvqbb 3/3 Running 0 3d18h
ovnkube-node-hchb2 3/3 Running 0 3d17h
ovnkube-node-jlzjd 3/3 Running 0 3d18h
This is log from the pod which I'm trying to start:
Warning FailedCreatePodSandBox 64s kubelet, openshift-worker-1.qe1.kni.lab.eng.bos.redhat.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_checksctph5z8j_sctptest_881d34ae-fb06-4b3e-98be-a58715b599a2_0(4db0832a72074eb3ffb389132ebcb2e41636e26b2c808d2d8d2c587e6a6113e5): Multus: [sctptest/checksctph5z8j]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[sctptest/checksctph5z8j] failed to get pod annotation: timed out waiting for the condition
This happened before actual test. I can not run the pod.
The idea of the test is validate that sctp protocol works correctly but as I noticed it falls before the test.
Hi Wait a minute, you're testing SCTP? This is not support by OVN right now and is a current work-in-progress, https://github.com/ovn-org/ovn-kubernetes/issues/1120 and https://issues.redhat.com/secure/RapidBoard.jspa?rapidView=5254&view=planning&selectedIssue=SDN-138 -Alex Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |