Bug 1840642 - [sriov][4.3.z] sriov config daemon pod restarted due to panic
Summary: [sriov][4.3.z] sriov config daemon pod restarted due to panic
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.3.z
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On: 1840637 1840639
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-27 11:30 UTC by zhaozhanqi
Modified: 2020-08-12 13:12 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1840639
Environment:
Last Closed: 2020-08-12 13:12:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description zhaozhanqi 2020-05-27 11:30:13 UTC
+++ This bug was initially created as a clone of Bug #1840639 +++

+++ This bug was initially created as a clone of Bug #1840637 +++

Description of problem:
Given the sriov pod running some days. found the sriov config daemon pod restarted. Check the logs with `--previous`, see: 

I0526 07:19:32.218230 1168941 utils.go:282] tryGetInterfaceName(): name is ens1f0
I0526 07:19:32.218299 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:3b:00.1
I0526 07:19:32.218329 1168941 utils.go:282] tryGetInterfaceName(): name is ens1f1
I0526 07:19:32.218372 1168941 utils.go:282] tryGetInterfaceName(): name is ens1f1
I0526 07:19:32.218449 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.0
I0526 07:19:32.218478 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0
I0526 07:19:32.218525 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0
I0526 07:19:32.218852 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.2
I0526 07:19:32.218885 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0v0
I0526 07:19:32.219055 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.3
I0526 07:19:32.219091 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0v1
I0526 07:19:32.219138 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.1
I0526 07:19:32.219172 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f1
I0526 07:19:32.219217 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f1
I0526 07:19:32.219306 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.0
I0526 07:19:32.219337 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0
I0526 07:19:32.219387 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0
I0526 07:19:32.219679 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.2
I0526 07:19:32.219710 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0v0
I0526 07:19:32.219880 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.3
I0526 07:19:32.219909 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0v1
I0526 07:19:32.219947 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.1
I0526 07:19:32.219975 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f1
I0526 07:19:32.220019 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f1
I0526 07:19:32.532207 1168941 daemon.go:245] nodeStateChangeHandler(): new generation is 5
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x160 pc=0x1643514]

goroutine 70 [running]:
github.com/openshift/sriov-network-operator/pkg/daemon.setNodeStateStatus(0x1abdd60, 0xc000358cf0, 0xc00004800a, 0x27, 0xc000e14c00, 0xa, 0x10, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/sriov-network-operator/pkg/daemon/writer.go:111 +0x154
github.com/openshift/sriov-network-operator/pkg/daemon.(*NodeStateStatusWriter).Run(0xc000116b40, 0xc0000ea3c0, 0xc0000ea600, 0xc0000ea5a0, 0x0)
	/go/src/github.com/openshift/sriov-network-operator/pkg/daemon/writer.go:61 +0x42f
created by main.runStartCmd
	/go/src/github.com/openshift/sriov-network-operator/cmd/sriov-network-config-daemon/start.go:98 +0x4a9

Version-Release number of selected component (if applicable):
4.4.0-202005221118

How reproducible:
not sure

Steps to Reproduce:
1. oc logs sriov-network-config-daemon-7mlhz --previous
2.
3.

Actual results:

oc get pod sriov-network-config-daemon-7mlhz
NAME                                READY   STATUS    RESTARTS   AGE
sriov-network-config-daemon-7mlhz   1/1     Running   6          2d1h



Expected results:


Additional info:

Comment 1 Peng Liu 2020-08-12 13:12:01 UTC
We don't think we have SRIOV user on 4.3.


Note You need to log in before you can comment on or make changes to this bug.