Hide Forgot
Description of problem: The pods ovnkube-node are in CrashLoopBackOff after SDN to OVN Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-12-14-165231 How reproducible: Steps to Reproduce: 1. oc annotate Network.operator.openshift.io cluster networkoperator.openshift.io/network-migration="" 2. oc patch MachineConfigPool master --type=merge --patch '{"spec":{"paused":true}}' machineconfigpool.machineconfiguration.openshift.io/master patched oc patch MachineConfigPool worker --type=merge --patch '{"spec":{"paused":true}}' machineconfigpool.machineconfiguration.openshift.io/worker patched 3. Wait until the multus DaemonSet pods of multus in namespace openshift-multus are recreated 4. Manually reboot all the nodes from cloud portal 5. oc patch MachineConfigPool master --type='merge' --patch "{\"spec\":{\"paused\":false}}" machineconfigpool.machineconfiguration.openshift.io/master patched oc patch MachineConfigPool worker --type='merge' --patch "{\"spec\":{\"paused\":false}}" machineconfigpool.machineconfiguration.openshift.io/worker patched Actual results: MCO did not updated the node config. Then check openshift-ovn-kubernetes pods oc get pods -n openshift-ovn-kubernetes -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-master-27x47 6/6 Running 0 53m 10.0.0.7 huirwang-azure-lmnbl-master-0 <none> <none> ovnkube-master-hvgj5 6/6 Running 0 53m 10.0.0.8 huirwang-azure-lmnbl-master-1 <none> <none> ovnkube-master-n4cxt 6/6 Running 0 53m 10.0.0.6 huirwang-azure-lmnbl-master-2 <none> <none> ovnkube-node-5hjzg 2/3 CrashLoopBackOff 13 53m 10.0.32.5 huirwang-azure-lmnbl-worker-centralus2-jccbd <none> <none> ovnkube-node-9fz5d 2/3 CrashLoopBackOff 13 53m 10.0.32.4 huirwang-azure-lmnbl-worker-centralus1-jzzbm <none> <none> ovnkube-node-k5jdn 2/3 CrashLoopBackOff 13 53m 10.0.32.6 huirwang-azure-lmnbl-worker-centralus3-k6rgf <none> <none> ovnkube-node-q2c5h 2/3 CrashLoopBackOff 13 53m 10.0.0.8 huirwang-azure-lmnbl-master-1 <none> <none> ovnkube-node-vvdxs 2/3 CrashLoopBackOff 13 53m 10.0.0.7 huirwang-azure-lmnbl-master-0 <none> <none> ovnkube-node-wsz5q 2/3 CrashLoopBackOff 13 53m 10.0.0.6 huirwang-azure-lmnbl-master-2 <none> <none> ovs-node-6hsfc 1/1 Running 0 54m 10.0.0.8 huirwang-azure-lmnbl-master-1 <none> <none> ovs-node-f8d22 1/1 Running 0 54m 10.0.32.5 huirwang-azure-lmnbl-worker-centralus2-jccbd <none> <none> ovs-node-g6lp5 1/1 Running 0 54m 10.0.0.6 huirwang-azure-lmnbl-master-2 <none> <none> ovs-node-k4jbj 1/1 Running 0 54m 10.0.32.6 huirwang-azure-lmnbl-worker-centralus3-k6rgf <none> <none> ovs-node-ttslv 1/1 Running 0 54m 10.0.32.4 huirwang-azure-lmnbl-worker-centralus1-jzzbm <none> <none> ovs-node-wf2jm 1/1 Running 0 54m 10.0.0.7 huirwang-azure-lmnbl-master-0 <none> <none> oc describe pod ovnkube-node-5hjzg -n openshift-ovn-kubernetes .......... State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Message: oller/pkg/node/startup-waiter.go:44 +0x94 created by github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:42 +0xd4 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1737744] goroutine 182 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c panic(0x1948160, 0x288e420) /usr/lib/golang/src/runtime/panic.go:969 +0x175 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait.func1.1(0x414f9b, 0xc000298460, 0xc0005667b0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:45 +0x24 k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0xc000566798, 0x1339b00, 0x0, 0x0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:211 +0x69 k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc00059c6a0, 0xc0005eff98, 0xc00059c6a0, 0xc0003b6540) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:445 +0x2f k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x1dcd6500, 0x45d964b800, 0xc000566798, 0xc0005667b0, 0x1) /go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:441 +0x4d github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait.func1(0xc0003a3280, 0xc0003e41e0, 0xc00038f150) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:44 +0x94 created by github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*startupWaiter).Wait /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/startup-waiter.go:42 +0xd4 Exit Code: 2 Started: Wed, 16 Dec 2020 14:54:28 +0800 Finished: Wed, 16 Dec 2020 14:54:31 +0800 Ready: False Reboot all the nodes again doesn't help. Expected results: SDN migrated to OVN successfully. Additional info:
Created attachment 1740116 [details] ovnkube-node.log
*** Bug 1908076 has been marked as a duplicate of this bug. ***
*** Bug 1909187 has been marked as a duplicate of this bug. ***
On power, we dont see the ovnkube pods crashlooping anymore, so we can close this bug. However, we noticed that one of the node remains in scheduling disabled state because some pod evictions fail. It could be a new issue, we will check further.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633