Bug 2028246

Summary: sriov-network-config-daemon in crashloop: "invalid memory address or nil pointer dereference"
Product: OpenShift Container Platform Reporter: Emilien Macchi <emacchi>
Component: NetworkingAssignee: zenghui.shi <zshi>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified    
Version: 4.10   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-01 19:48:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must-gather none

Description Emilien Macchi 2021-12-01 19:21:17 UTC
Created attachment 1844390 [details]
must-gather

Description of problem:

Deploying OpenShift on OpenStack and trying to schedule a Pod attached to a VF.

Version-Release number of selected component (if applicable):
OCP 4.10.0-0.nightly-2021-11-29-142540
sriov-network-operator.4.9.0-202111151318

How reproducible:
Follow the docs for configuring SR-IOV in OpenShift until you try to schedule a Pod.


Steps to Reproduce:
1. Prepare the worker for SR-IOV (enable config-drive, NOIOMMU, SR-IOV capable label, operator deployed & configured)
2. Schedule a Pod using a VF
3. Watch for sriov-network-config-daemon container

Actual results:
1. The sriov-network-config-daemon container is in CrashLoop:

I1201 18:56:51.489671  290651 utils.go:513] LoadKernelModule(): try to load kernel module vfio_pci
I1201 18:56:51.504231  290651 utils_virtual.go:235] SyncNodeStateVirtual(): no need update interface 0000:00:06.0
I1201 18:56:51.512122  290651 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:03.0
I1201 18:56:51.512187  290651 utils.go:598] getLinkType(): Device 0000:00:03.0
I1201 18:56:51.512364  290651 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:04.0
I1201 18:56:51.512451  290651 utils.go:598] getLinkType(): Device 0000:00:04.0
I1201 18:56:51.512737  290651 utils.go:409] getNetdevMTU(): get MTU for device 0000:00:06.0
I1201 18:56:51.512796  290651 utils.go:598] getLinkType(): Device 0000:00:06.0
I1201 18:56:51.543044  290651 writer.go:132] setNodeStateStatus(): syncStatus: InProgress, lastSyncError:
E1201 18:56:51.550941  290651 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 113 :
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x203d9c0, 0x3186470)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
panic(0x203d9c0, 0x3186470)
    /usr/lib/golang/src/runtime/panic.go:965 +0x1b9
github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).nodeStateSyncHandler(0xc00059d040, 0x4, 0xc000617380, 0xc000270000)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:508 +0xef1
github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem.func1(0xc00059d040, 0x1fc4380, 0x31d28e0, 0x0, 0x0)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:360 +0xdf
github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).processNextWorkItem(0xc00059d040, 0x203000)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:376 +0x169
github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).runWorker(...)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:321
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0012e0f00)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0012e0f00, 0x2482da0, 0xc0012eeb10, 0x1, 0xc0005c4060)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0x9b
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0012e0f00, 0x3b9aca00, 0x0, 0x1, 0xc0005c4060)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(0xc0012e0f00, 0x3b9aca00, 0xc0005c4060)
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x4d
created by github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon.(*Daemon).Run
    /go/src/github.com/k8snetworkplumbingwg/sriov-network-operator/pkg/daemon/daemon.go:296 +0xb05
panic: runtime error: invalid memory address or nil pointer dereference
    panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1eb3811]

2. The pod supposed to use the VF is never scheduled


Expected results:
No error, and the pod using the VF

Additional info:

See attachments for logs.
Note: I used https://access.redhat.com/solutions/5496071 to gather logs.

Comment 5 Emilien Macchi 2021-12-01 19:48:03 UTC

*** This bug has been marked as a duplicate of bug 2015481 ***