Bug 1781297
| Summary: | pod fails to create after delete ovnkube-master OVN_Southbound leader - status 400: 'failed to get pod annotation: timed out waiting for the condition | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ross Brattain <rbrattai> |
| Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | aconstan, anusaxen, ddelcian, fsimonce, mcornea, ncredi, nkononov, william.caban |
| Version: | 4.3.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-05-13 21:54:24 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Ross Brattain
2019-12-09 17:51:56 UTC
Hello,
I just noticed the same error in my environment. Once I apply MachineConfig resource and it restarts some worker nodes I face with the same error:
Normal Scheduled 51s default-scheduler Successfully assigned sctptest/checksctppbsls to worker-1
Warning FailedCreatePodSandBox 23s kubelet, worker-1 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_checksctppbsls_sctptest_31321e45-0ccb-4de5-bcc6-dcac0f4a73b9_0(25c0a5d3429f8ede19aa65166c7767d84c5028069e3ad9d00c5fc7be7f437d9d): Multus: error adding pod to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: 'failed to get pod annotation: timed out waiting for the condition
It happened sporadically but pretty common like every second pod is in ContainerCreating state.
My environment is libvirt
[root@cnv-qe-01 ~]# oc version
Client Version: 4.4.0-0.ci-2020-01-15-100606
Server Version: 4.4.0-0.ci-2020-01-15-100606
Kubernetes Version: v1.17.0
Also I notice that ovn pods are restarting
openshift-ovn-kubernetes ovnkube-master-fx8gg 4/4 Running 4 20h
openshift-ovn-kubernetes ovnkube-master-xczjm 4/4 Running 9 71m
openshift-ovn-kubernetes ovnkube-master-z7g44 4/4 Running 20 20h
I tried to reboot entire cluster and after it I noticed the following panics in OVN master:
The first one is:
I0116 10:19:01.629834 1 leaderelection.go:235] attempting to acquire leader lease openshift-ovn-kubernetes/ovn-kubernetes-master...
I0116 10:19:01.640741 1 leaderelection.go:245] successfully acquired lease openshift-ovn-kubernetes/ovn-kubernetes-master
time="2020-01-16T10:19:01Z" level=info msg=" I (master-1) won the election. In active mode"
panic: assignment to entry in nil map
goroutine 29 [running]:
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).getLbEndpoints(0xc000341440, 0xc0008ad3a8, 0xc0007aaa50, 0xc0007aaa80)
/go-controller/pkg/ovn/endpoints.go:33 +0x421
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).handleNodePortLB(0xc000341440, 0xc00014b000, 0x1, 0xc00003e3f0)
/go-controller/pkg/ovn/endpoints.go:162 +0x783
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).syncGatewayLogicalNetwork(0xc000341440, 0xc00014b000, 0xc000822db0, 0xc00081eea0, 0xd, 0x1463860, 0x1634fac)
/go-controller/pkg/ovn/master.go:409 +0x567
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).syncNodeGateway(0xc000341440, 0xc00014b000, 0xc000822750, 0x0, 0x0)
/go-controller/pkg/ovn/ovn.go:498 +0x11d
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchNodes.func1(0x161e500, 0xc00014b000)
/go-controller/pkg/ovn/ovn.go:524 +0x246
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:195
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnAdd(0xc0004e5b80, 0x18b2ce0, 0xc0007c3640, 0x161e500, 0xc00014b000)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:227 +0x69
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).addHandler(0xc000347a10, 0x18f97c0, 0x161e500, 0x0, 0x0, 0x0, 0x18b2ce0, 0xc0007c3640, 0xc0008ade08, 0x0, ...)
/go-controller/pkg/factory/factory.go:268 +0x575
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).AddFilteredNodeHandler(...)
/go-controller/pkg/factory/factory.go:374
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchNodes(0xc000341440, 0x0, 0xc0000e7590, 0x7)
/go-controller/pkg/ovn/ovn.go:509 +0x1d5
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).Run(0xc000341440, 0x0, 0xc00009a780, 0x0, 0x0)
/go-controller/pkg/ovn/ovn.go:166 +0x52
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*HAMasterController).ConfigureAsActive(0xc000717720, 0x7fffa9418bef, 0x8, 0x28, 0xc00042f7b0)
/go-controller/pkg/ovn/ha_master.go:209 +0x96
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*HAMasterController).StartHAMasterController.func1(0x18b6da0, 0xc0002bc400)
/go-controller/pkg/ovn/ha_master.go:79 +0xd7
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/go-controller/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:200 +0xec
The second one is:
time="2020-01-16T10:24:20Z" level=info msg="Setting annotations map[k8s.ovn.org/pod-networks:{\"default\":{\"ip_address\":\"10.128.2.8/23\",\"mac_address\":\"fa:e8:5b:80:02:09\",\"gateway_ip\":\"10.128.2.1\"}} ovn:{\"ip_address\":\"10.128.2.8/23\",\"mac_address\":\"fa:e8:5b:80:02:09\",\"gateway_ip\":\"10.128.2.1\"}] on pod sctptest/sctpclient2gphn"
time="2020-01-16T10:24:21Z" level=info msg="[sctptest/sctpclient2gphn] addLogicalPort took 517.143088ms"
E0116 10:24:30.357812 1 runtime.go:73] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)
goroutine 87 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x148db40, 0x186a0d0)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:69 +0x7b
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 +0x82
panic(0x148db40, 0x186a0d0)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).getLbEndpoints(0xc000aa86c0, 0xc0001857c0, 0xc0007d5290, 0xc0007d52c0)
/go-controller/pkg/ovn/endpoints.go:33 +0x421
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).AddEndpoints(0xc000aa86c0, 0xc0001857c0, 0x13ea7c0, 0x2510280)
/go-controller/pkg/ovn/endpoints.go:59 +0x109
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchEndpoints.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0)
/go-controller/pkg/ovn/ovn.go:424 +0x19c
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate(0xc00057ed00, 0x18b2ce0, 0xc00038fe20, 0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:236 +0x120
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2.1(0xc000bdc1e0)
/go-controller/pkg/factory/factory.go:151 +0x64
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).forEachHandler(0xc000412090, 0x161d700, 0xc0001857c0, 0xc00086fcd8)
/go-controller/pkg/factory/factory.go:54 +0x16d
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0)
/go-controller/pkg/factory/factory.go:150 +0x8c
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc000466c00, 0xc000480a00, 0x0)
/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:605 +0x1dd
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc00086fe38, 0x42c47f, 0xc000480a30)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:284 +0x51
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:601 +0x79
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00032ef68)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00086ff68, 0xdf8475800, 0x0, 0x42cd01, 0xc0003298c0)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0003eac80)
/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:599 +0x9c
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000393170, 0xc0004a6080)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62
panic: assignment to entry in nil map [recovered]
panic: assignment to entry in nil map
goroutine 87 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x105
panic(0x148db40, 0x186a0d0)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).getLbEndpoints(0xc000aa86c0, 0xc0001857c0, 0xc0007d5290, 0xc0007d52c0)
/go-controller/pkg/ovn/endpoints.go:33 +0x421
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).AddEndpoints(0xc000aa86c0, 0xc0001857c0, 0x13ea7c0, 0x2510280)
/go-controller/pkg/ovn/endpoints.go:59 +0x109
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchEndpoints.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0)
/go-controller/pkg/ovn/ovn.go:424 +0x19c
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate(0xc00057ed00, 0x18b2ce0, 0xc00038fe20, 0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:236 +0x120
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2.1(0xc000bdc1e0)
/go-controller/pkg/factory/factory.go:151 +0x64
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).forEachHandler(0xc000412090, 0x161d700, 0xc0001857c0, 0xc00086fcd8)
/go-controller/pkg/factory/factory.go:54 +0x16d
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).newFederatedHandler.func2(0x161d700, 0xc000ad6b40, 0x161d700, 0xc0001857c0)
/go-controller/pkg/factory/factory.go:150 +0x8c
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:202
k8s.io/client-go/tools/cache.(*processorListener).run.func1.1(0xc000466c00, 0xc000480a00, 0x0)
/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:605 +0x1dd
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff(0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0, 0xc00086fe38, 0x42c47f, 0xc000480a30)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:284 +0x51
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:601 +0x79
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00032ef68)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152 +0x54
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00086ff68, 0xdf8475800, 0x0, 0x42cd01, 0xc0003298c0)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
k8s.io/client-go/tools/cache.(*processorListener).run(0xc0003eac80)
/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:599 +0x9c
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc000393170, 0xc0004a6080)
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:69 +0x62
Please update regarding my comment above. Hi So this has been fixed in upstreams ovnkubernetes, and should be existing downstreams openshift/ovn-kubernetes in 4.3/4.4/4.5 Go ahead and re-test -Alex Unable to reproduce in 4.4.0-0.nightly-2020-03-05-084644 @Alexander, how did you figure that this is fixed on 4.3? because we see it failing in 4.3, And apparently there is no 4.3 clone, so I can create one, but I wanted to make sure before that we were expecting it to be fixed, and somehow it got missed Hi Yes this issue together with a bunch of other should be fixed because branch 4.4 and 4.3 should be completely aligned (we have basically rebased them one on top of the other). What are you experiencing in your tests on 4.3, is it the panic in #comment 4? Or the 400 in #comment 1? Which tests are you actually performing by the way (I guess the steps described in #comment 1)? - Alex Ill let Nikita or Marius provide the technical details Hi,
We faced with the same issue. Please take a look on the logs below:
We noticed that ovnkube pods crash all the time
> oc get pods -n openshift-ovn-kubernetes
NAME READY STATUS RESTARTS AGE
ovnkube-master-lwggs 4/4 Running 14 3d18h
ovnkube-master-q2xht 3/4 Error 14 3d18h
ovnkube-master-s29zx 3/4 CrashLoopBackOff 13 3d18h
ovnkube-node-9nt7j 3/3 Running 0 3d17h
ovnkube-node-ffvjz 3/3 Running 0 3d18h
ovnkube-node-fvqbb 3/3 Running 0 3d18h
ovnkube-node-hchb2 3/3 Running 0 3d17h
ovnkube-node-jlzjd 3/3 Running 0 3d18h
This is log from the pod which I'm trying to start:
Warning FailedCreatePodSandBox 64s kubelet, openshift-worker-1.qe1.kni.lab.eng.bos.redhat.com Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_checksctph5z8j_sctptest_881d34ae-fb06-4b3e-98be-a58715b599a2_0(4db0832a72074eb3ffb389132ebcb2e41636e26b2c808d2d8d2c587e6a6113e5): Multus: [sctptest/checksctph5z8j]: error adding container to network "ovn-kubernetes": delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[sctptest/checksctph5z8j] failed to get pod annotation: timed out waiting for the condition
This happened before actual test. I can not run the pod.
The idea of the test is validate that sctp protocol works correctly but as I noticed it falls before the test.
Hi Wait a minute, you're testing SCTP? This is not support by OVN right now and is a current work-in-progress, https://github.com/ovn-org/ovn-kubernetes/issues/1120 and https://issues.redhat.com/secure/RapidBoard.jspa?rapidView=5254&view=planning&selectedIssue=SDN-138 -Alex Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |