Bug 1962634

Summary: fail to get interface name for ... GetNetName(): no net directory under pci device during upgrade from OCP 4.5.16 to 4.6.17
Product: OpenShift Container Platform Reporter: Andreas Karis <akaris>
Component: NetworkingAssignee: Federico Paolinelli <fpaoline>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: medium    
Priority: medium CC: bhershbe, dosmith, fpaoline, gdiotte, snalawad, yjoseph
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-08 17:43:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andreas Karis 2021-05-20 12:36:58 UTC
Description of problem:
Fail to get interface name for ... GetNetName(): no net directory under pci device  during upgrade from OCP 4.5.16 to 4.6.17

The purpose of this BZ is to identify whether this is a known issue or one that warrants attention, and to track it through to resolution.

~~~
$ oc logs -n openshift-sriov-network-operator sriov-network-config-daemon-plkp6
W0511 17:52:18.398178   33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory"
W0511 17:52:19.398672   33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory"
W0511 17:52:20.398961   33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory"
W0511 17:52:21.399302   33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory"
W0511 17:52:22.399551   33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory"
W0511 17:52:23.399774   33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory"
W0511 17:52:24.989896   33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0f.3: GetNetName(): no net directory under pci device 0000:12:0f.3: "lstat /sys/bus/pci/devices/0000:12:0f.3/net: no such file or directory"
~~~

The above error messages appeared after the node was rebooted, and the `virt-launcher` pods on that node were found not to be running.

This issue was resolved by unbinding and re-binding the SRIOV VFs on that node, and then deleting its sriov-network-config-daemon pod:

unbind and re-bind the node's SRIOV VFs:
~~~
ssh core@node-06
sudo -i
cd /sys/bus/pci/devices/
for x in $(ls -1 | grep 0000\:12\:0\*); do echo $x > /sys/bus/pci/drivers/iavf/unbind; done
for x in $(ls -1 | grep 0000\:12\:0\*); do echo $x > /sys/bus/pci/drivers/iavf/bind; done
~~~

delete the node's sriov-network-config-daemon pod:
~~~
oc delete pod -n openshift-sriov-network-operator sriov-network-config-daemon-plkp6
~~~


Version-Release number of selected component (if applicable):
ocp 4.5.16 -> 4.6.17

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 11 Federico Paolinelli 2021-11-08 17:43:37 UTC
Closing this, feel free to reopen if you are able to reproduce it again.

Comment 12 Red Hat Bugzilla 2023-09-15 01:06:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days