Description of problem: Fail to get interface name for ... GetNetName(): no net directory under pci device during upgrade from OCP 4.5.16 to 4.6.17 The purpose of this BZ is to identify whether this is a known issue or one that warrants attention, and to track it through to resolution. ~~~ $ oc logs -n openshift-sriov-network-operator sriov-network-config-daemon-plkp6 W0511 17:52:18.398178 33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory" W0511 17:52:19.398672 33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory" W0511 17:52:20.398961 33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory" W0511 17:52:21.399302 33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory" W0511 17:52:22.399551 33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory" W0511 17:52:23.399774 33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0a.0: GetNetName(): no net directory under pci device 0000:12:0a.0: "lstat /sys/bus/pci/devices/0000:12:0a.0/net: no such file or directory" W0511 17:52:24.989896 33692 utils.go:275] setNetdevMTU(): fail to get interface name for 0000:12:0f.3: GetNetName(): no net directory under pci device 0000:12:0f.3: "lstat /sys/bus/pci/devices/0000:12:0f.3/net: no such file or directory" ~~~ The above error messages appeared after the node was rebooted, and the `virt-launcher` pods on that node were found not to be running. This issue was resolved by unbinding and re-binding the SRIOV VFs on that node, and then deleting its sriov-network-config-daemon pod: unbind and re-bind the node's SRIOV VFs: ~~~ ssh core@node-06 sudo -i cd /sys/bus/pci/devices/ for x in $(ls -1 | grep 0000\:12\:0\*); do echo $x > /sys/bus/pci/drivers/iavf/unbind; done for x in $(ls -1 | grep 0000\:12\:0\*); do echo $x > /sys/bus/pci/drivers/iavf/bind; done ~~~ delete the node's sriov-network-config-daemon pod: ~~~ oc delete pod -n openshift-sriov-network-operator sriov-network-config-daemon-plkp6 ~~~ Version-Release number of selected component (if applicable): ocp 4.5.16 -> 4.6.17 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Closing this, feel free to reopen if you are able to reproduce it again.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days