+++ This bug was initially created as a clone of Bug #2015481 +++ Description of problem: After a successful deployment of the OCP on OSP (Shift-on-Stack), as part of our Telco testing, the sriov-network-operator is required to be installed and configured. But immediately after the initial operator installation, I found the daemon pods in the "CrashLoopBackOff" status, rebooting them didn't fix the main problem: [cloud-user@installer-host ~]$ oc get -n openshift-sriov-network-operator all NAME READY STATUS RESTARTS AGE pod/network-resources-injector-7zvb6 1/1 Running 0 3m22s pod/network-resources-injector-llx8q 1/1 Running 0 3m22s pod/network-resources-injector-swzxk 1/1 Running 0 3m22s pod/sriov-network-config-daemon-5jd4m 0/1 CrashLoopBackOff 4 3m22s pod/sriov-network-config-daemon-mwzmz 0/1 CrashLoopBackOff 4 3m22s pod/sriov-network-operator-6947d96c-lmcxn 1/1 Running 0 3m37s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/network-resources-injector-service ClusterIP 172.30.179.150 <none> 443/TCP 3m22s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/network-resources-injector 3 3 3 3 3 beta.kubernetes.io/os=linux,node-role.kubernetes.io/master= 3m22s daemonset.apps/sriov-network-config-daemon 2 2 0 2 0 beta.kubernetes.io/os=linux,node-role.kubernetes.io/worker= 3m22s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/sriov-network-operator 1/1 1 1 3m37s By looking at their specific logs, I could see the following errors: [cloud-user@installer-host ~]$ oc logs pod/sriov-network-config-daemon-5jd4m -n openshift-sriov-network-operator I1018 18:15:11.524256 190307 start.go:107] overriding kubernetes api to https://api-int.ostest.shiftstack.com:6443 I1018 18:15:11.525581 190307 start.go:138] starting node writer I1018 18:15:11.534127 190307 start.go:158] Running on platform: Virtual/Openstack I1018 18:15:11.534142 190307 writer.go:44] Run(): start writer I1018 18:15:11.534146 190307 writer.go:47] Run(): once I1018 18:15:11.560971 190307 utils.go:598] getLinkType(): Device 0000:00:03.0 I1018 18:15:11.561041 190307 utils.go:598] getLinkType(): Device 0000:00:04.0 I1018 18:15:11.561098 190307 utils.go:598] getLinkType(): Device 0000:00:05.0 I1018 18:15:11.566328 190307 writer.go:132] setNodeStateStatus(): syncStatus: , lastSyncError: I1018 18:15:11.571454 190307 writer.go:170] writeCheckpointFile(): try to decode the checkpoint file I1018 18:15:11.571553 190307 start.go:164] Starting SriovNetworkConfigDaemon I1018 18:15:11.571572 190307 writer.go:44] Run(): start writer I1018 18:15:11.571579 190307 daemon.go:257] Run(): start daemon E1018 18:15:11.581359 190307 daemon.go:951] tryEnableRdma(): fail to enable rdma exit status 1: I1018 18:15:11.587662 190307 daemon.go:442] Set log verbose level to: 2 I1018 18:15:16.686993 190307 daemon.go:319] Starting workers I1018 18:15:16.687012 190307 daemon.go:322] Started workers I1018 18:15:16.687027 190307 daemon.go:362] worker queue size: 1 I1018 18:15:16.687032 190307 daemon.go:364] get item: 1 I1018 18:15:16.687037 190307 daemon.go:454] nodeStateSyncHandler(): new generation is 1 I1018 18:15:16.689510 190307 daemon.go:689] loadVendorPlugins(): try to load plugin virtual_plugin I1018 18:15:16.689523 190307 plugin.go:39] loadPlugin(): load plugin from /plugins/virtual_plugin.so I1018 18:15:16.689576 190307 writer.go:61] Run(): refresh trigger I1018 18:15:16.689584 190307 writer.go:80] pollNicStatus() I1018 18:15:16.689588 190307 utils_virtual.go:158] DiscoverSriovDevicesVirtual I1018 18:15:16.708806 190307 virtual_plugin.go:52] virtual-plugin OnNodeStateAdd() I1018 18:15:16.708855 190307 daemon.go:509] nodeStateSyncHandler(): plugin virtual_plugin: reqDrain false, reqReboot false I1018 18:15:16.708868 190307 daemon.go:513] nodeStateSyncHandler(): reqDrain false, reqReboot false disableDrain false I1018 18:15:16.708875 190307 virtual_plugin.go:84] virtual-plugin Apply(): desiredState={186996 []} I1018 18:15:16.718493 190307 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:03.0 I1018 18:15:16.718606 190307 utils.go:598] getLinkType(): Device 0000:00:03.0 I1018 18:15:16.718720 190307 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:04.0 I1018 18:15:16.718797 190307 utils.go:598] getLinkType(): Device 0000:00:04.0 I1018 18:15:16.718916 190307 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:05.0 I1018 18:15:16.718991 190307 utils.go:598] getLinkType(): Device 0000:00:05.0 I1018 18:15:16.724615 190307 writer.go:132] setNodeStateStatus(): syncStatus: InProgress, lastSyncError: E1018 18:15:16.730394 190307 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 102 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1eb0e00, 0x2f17450) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa6 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86 panic(0x1eb0e00, 0x2f17450) /usr/lib/golang/src/runtime/panic.go:965 +0x1b9 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).nodeStateSyncHandler(0xc001440270, 0x1, 0xc0005ea0d0, 0xc001484630) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:548 +0x101b github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem.func1(0xc001440270, 0x1e3c2a0, 0x2f5e708, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:385 +0xdf github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem(0xc001440270, 0x203000) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:401 +0x169 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).runWorker(...) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:346 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00073e080) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00073e080, 0x22bc000, 0xc0001a69f0, 0x1, 0xc00010e360) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00073e080, 0x3b9aca00, 0x0, 0xc0004c8d01, 0xc00010e360) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc00073e080, 0x3b9aca00, 0xc00010e360) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d created by github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).Run /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:321 +0xac5 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1d375bb] goroutine 102 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109 panic(0x1eb0e00, 0x2f17450) /usr/lib/golang/src/runtime/panic.go:965 +0x1b9 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).nodeStateSyncHandler(0xc001440270, 0x1, 0xc0005ea0d0, 0xc001484630) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:548 +0x101b github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem.func1(0xc001440270, 0x1e3c2a0, 0x2f5e708, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:385 +0xdf github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem(0xc001440270, 0x203000) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:401 +0x169 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).runWorker(...) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:346 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00073e080) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00073e080, 0x22bc000, 0xc0001a69f0, 0x1, 0xc00010e360) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00073e080, 0x3b9aca00, 0x0, 0xc0004c8d01, 0xc00010e360) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc00073e080, 0x3b9aca00, 0xc00010e360) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d created by github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).Run /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:321 +0xac5 [cloud-user@installer-host ~]$ oc logs pod/sriov-network-config-daemon-mwzmz -n openshift-sriov-network-operator I1018 18:16:50.082468 424839 start.go:107] overriding kubernetes api to https://api-int.ostest.shiftstack.com:6443 I1018 18:16:50.084404 424839 start.go:138] starting node writer I1018 18:16:50.092922 424839 start.go:158] Running on platform: Virtual/Openstack I1018 18:16:50.093055 424839 writer.go:44] Run(): start writer I1018 18:16:50.093125 424839 writer.go:47] Run(): once I1018 18:16:50.122076 424839 utils.go:598] getLinkType(): Device 0000:00:03.0 I1018 18:16:50.122282 424839 utils.go:598] getLinkType(): Device 0000:00:05.0 I1018 18:16:50.122887 424839 utils.go:598] getLinkType(): Device 0000:00:06.0 I1018 18:16:50.125775 424839 writer.go:132] setNodeStateStatus(): syncStatus: , lastSyncError: I1018 18:16:50.131089 424839 writer.go:170] writeCheckpointFile(): try to decode the checkpoint file I1018 18:16:50.131332 424839 start.go:164] Starting SriovNetworkConfigDaemon I1018 18:16:50.131349 424839 writer.go:44] Run(): start writer I1018 18:16:50.131496 424839 daemon.go:257] Run(): start daemon E1018 18:16:50.142113 424839 daemon.go:951] tryEnableRdma(): fail to enable rdma exit status 1: I1018 18:16:50.147463 424839 daemon.go:442] Set log verbose level to: 2 I1018 18:16:55.247070 424839 daemon.go:319] Starting workers I1018 18:16:55.247230 424839 daemon.go:322] Started workers I1018 18:16:55.247254 424839 daemon.go:362] worker queue size: 1 I1018 18:16:55.247382 424839 daemon.go:364] get item: 1 I1018 18:16:55.247449 424839 daemon.go:454] nodeStateSyncHandler(): new generation is 1 I1018 18:16:55.250544 424839 daemon.go:689] loadVendorPlugins(): try to load plugin virtual_plugin I1018 18:16:55.250556 424839 plugin.go:39] loadPlugin(): load plugin from /plugins/virtual_plugin.so I1018 18:16:55.250558 424839 writer.go:61] Run(): refresh trigger I1018 18:16:55.250566 424839 writer.go:80] pollNicStatus() I1018 18:16:55.250579 424839 utils_virtual.go:158] DiscoverSriovDevicesVirtual I1018 18:16:55.270524 424839 virtual_plugin.go:52] virtual-plugin OnNodeStateAdd() I1018 18:16:55.270568 424839 daemon.go:509] nodeStateSyncHandler(): plugin virtual_plugin: reqDrain false, reqReboot false I1018 18:16:55.270577 424839 daemon.go:513] nodeStateSyncHandler(): reqDrain false, reqReboot false disableDrain false I1018 18:16:55.270583 424839 virtual_plugin.go:84] virtual-plugin Apply(): desiredState={186996 []} I1018 18:16:55.279729 424839 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:03.0 I1018 18:16:55.279757 424839 utils.go:598] getLinkType(): Device 0000:00:03.0 I1018 18:16:55.279811 424839 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:05.0 I1018 18:16:55.279853 424839 utils.go:404] tryGetInterfaceName(): name is ens5 I1018 18:16:55.279899 424839 utils.go:404] tryGetInterfaceName(): name is ens5 I1018 18:16:55.279902 424839 utils.go:430] getNetDevMac(): get Mac for device ens5 I1018 18:16:55.279923 424839 utils.go:442] getNetDevLinkSpeed(): get LinkSpeed for device ens5 I1018 18:16:55.279939 424839 utils.go:598] getLinkType(): Device 0000:00:05.0 I1018 18:16:55.280070 424839 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:06.0 I1018 18:16:55.280112 424839 utils.go:404] tryGetInterfaceName(): name is ens6 I1018 18:16:55.280154 424839 utils.go:404] tryGetInterfaceName(): name is ens6 I1018 18:16:55.280157 424839 utils.go:430] getNetDevMac(): get Mac for device ens6 I1018 18:16:55.280179 424839 utils.go:442] getNetDevLinkSpeed(): get LinkSpeed for device ens6 I1018 18:16:55.280198 424839 utils.go:598] getLinkType(): Device 0000:00:06.0 I1018 18:16:55.282415 424839 writer.go:132] setNodeStateStatus(): syncStatus: InProgress, lastSyncError: E1018 18:16:55.291837 424839 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 123 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1eb0e00, 0x2f17450) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa6 k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86 panic(0x1eb0e00, 0x2f17450) /usr/lib/golang/src/runtime/panic.go:965 +0x1b9 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).nodeStateSyncHandler(0xc0014984e0, 0x1, 0xc000b84000, 0xc000314000) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:548 +0x101b github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem.func1(0xc0014984e0, 0x1e3c2a0, 0x2f5e708, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:385 +0xdf github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem(0xc0014984e0, 0x203000) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:401 +0x169 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).runWorker(...) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:346 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc001000590) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001000590, 0x22bc000, 0xc000c17cb0, 0x1, 0xc00010e540) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001000590, 0x3b9aca00, 0x0, 0x217b801, 0xc00010e540) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc001000590, 0x3b9aca00, 0xc00010e540) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d created by github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).Run /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:321 +0xac5 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1d375bb] goroutine 123 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x109 panic(0x1eb0e00, 0x2f17450) /usr/lib/golang/src/runtime/panic.go:965 +0x1b9 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).nodeStateSyncHandler(0xc0014984e0, 0x1, 0xc000b84000, 0xc000314000) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:548 +0x101b github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem.func1(0xc0014984e0, 0x1e3c2a0, 0x2f5e708, 0x0, 0x0) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:385 +0xdf github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem(0xc0014984e0, 0x203000) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:401 +0x169 github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).runWorker(...) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:346 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc001000590) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001000590, 0x22bc000, 0xc000c17cb0, 0x1, 0xc00010e540) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001000590, 0x3b9aca00, 0x0, 0x217b801, 0xc00010e540) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98 k8s.io/apimachinery/pkg/util/wait.Until(0xc001000590, 0x3b9aca00, 0xc00010e540) /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d created by github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).Run /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:321 +0xac5 Version-Release number of selected component (if applicable): Cluster version is 4.8.0-0.nightly-2021-10-16-024756 Additional info: The actual bug and the proposed solution could be tracked here: https://github.com/openshift/sriov-network-operator/commit/1d954a5304283f62808abbe13c55c6dd7b2b4083#diff-a53b7b593d3d778e62eaeeafa40088656f9212bfa2c2b7991df15fa78e60b0f0 --- Additional comment from Aaron Smith on 2021-10-19 19:18:59 UTC --- The issue affects both the 4.8 and 4.9 releases. I have verified an upstream patch by @pliu (https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/191/files) that fixes the issue on the 4.9 release branch.
why doesn't it move to ON_QA and gets verified? This is blocking this backport to merge: https://github.com/openshift/sriov-network-operator/pull/598
*** This bug has been marked as a duplicate of bug 2028256 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days