Bug 1840639 - [sriov][4.4.z] sriov config daemon pod restarted due to panic
Summary: [sriov][4.4.z] sriov config daemon pod restarted due to panic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.z
Assignee: Peng Liu
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On: 1840637
Blocks: 1840642
TreeView+ depends on / blocked
 
Reported: 2020-05-27 11:28 UTC by zhaozhanqi
Modified: 2020-06-17 22:27 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1840637
: 1840642 (view as bug list)
Environment:
Last Closed: 2020-06-17 22:26:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2445 0 None None None 2020-06-17 22:27:01 UTC

Description zhaozhanqi 2020-05-27 11:28:20 UTC
+++ This bug was initially created as a clone of Bug #1840637 +++

Description of problem:
Given the sriov pod running some days. found the sriov config daemon pod restarted. Check the logs with `--previous`, see: 

I0526 07:19:32.218230 1168941 utils.go:282] tryGetInterfaceName(): name is ens1f0
I0526 07:19:32.218299 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:3b:00.1
I0526 07:19:32.218329 1168941 utils.go:282] tryGetInterfaceName(): name is ens1f1
I0526 07:19:32.218372 1168941 utils.go:282] tryGetInterfaceName(): name is ens1f1
I0526 07:19:32.218449 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.0
I0526 07:19:32.218478 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0
I0526 07:19:32.218525 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0
I0526 07:19:32.218852 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.2
I0526 07:19:32.218885 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0v0
I0526 07:19:32.219055 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.3
I0526 07:19:32.219091 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f0v1
I0526 07:19:32.219138 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:5e:00.1
I0526 07:19:32.219172 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f1
I0526 07:19:32.219217 1168941 utils.go:282] tryGetInterfaceName(): name is ens3f1
I0526 07:19:32.219306 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.0
I0526 07:19:32.219337 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0
I0526 07:19:32.219387 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0
I0526 07:19:32.219679 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.2
I0526 07:19:32.219710 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0v0
I0526 07:19:32.219880 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.3
I0526 07:19:32.219909 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f0v1
I0526 07:19:32.219947 1168941 utils.go:287] getNetdevMTU(): get MTU for device 0000:60:00.1
I0526 07:19:32.219975 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f1
I0526 07:19:32.220019 1168941 utils.go:282] tryGetInterfaceName(): name is ens2f1
I0526 07:19:32.532207 1168941 daemon.go:245] nodeStateChangeHandler(): new generation is 5
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x160 pc=0x1643514]

goroutine 70 [running]:
github.com/openshift/sriov-network-operator/pkg/daemon.setNodeStateStatus(0x1abdd60, 0xc000358cf0, 0xc00004800a, 0x27, 0xc000e14c00, 0xa, 0x10, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/openshift/sriov-network-operator/pkg/daemon/writer.go:111 +0x154
github.com/openshift/sriov-network-operator/pkg/daemon.(*NodeStateStatusWriter).Run(0xc000116b40, 0xc0000ea3c0, 0xc0000ea600, 0xc0000ea5a0, 0x0)
	/go/src/github.com/openshift/sriov-network-operator/pkg/daemon/writer.go:61 +0x42f
created by main.runStartCmd
	/go/src/github.com/openshift/sriov-network-operator/cmd/sriov-network-config-daemon/start.go:98 +0x4a9

Version-Release number of selected component (if applicable):
4.4.0-202005221118

How reproducible:
not sure

Steps to Reproduce:
1. oc logs sriov-network-config-daemon-7mlhz --previous
2.
3.

Actual results:

oc get pod sriov-network-config-daemon-7mlhz
NAME                                READY   STATUS    RESTARTS   AGE
sriov-network-config-daemon-7mlhz   1/1     Running   6          2d1h



Expected results:


Additional info:

Comment 1 Peng Liu 2020-06-01 03:51:08 UTC
Also fixed by https://github.com/openshift/sriov-network-operator/pull/225

Comment 4 zhaozhanqi 2020-06-08 05:55:04 UTC
Verified this bug on 4.4.0-202006061254

Comment 6 errata-xmlrpc 2020-06-17 22:26:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2445


Note You need to log in before you can comment on or make changes to this bug.