Bug 1955874 - Webscale: sriov vfs are not created and sriovnetworknodestate indicates sync succeeded - state is not correct
Summary: Webscale: sriov vfs are not created and sriovnetworknodestate indicates sync ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: zenghui.shi
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1958467
TreeView+ depends on / blocked
 
Reported: 2021-05-01 05:46 UTC by Nabeel Cocker
Modified: 2024-10-01 18:04 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1958467 (view as bug list)
Environment:
Last Closed: 2021-07-27 23:05:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dmeseg, (56.76 KB, application/zip)
2021-05-01 05:46 UTC, Nabeel Cocker
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 499 0 None open Bug 1955874: Sync upstream: 2021-05-08 2021-05-08 06:46:11 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 23:05:56 UTC

Description Nabeel Cocker 2021-05-01 05:46:52 UTC
Created attachment 1777999 [details]
dmeseg,

Description of problem:

Inaccurate state between the config daemon and sriovnetwork node state.  

We are seeing that the vfs are not getting created until the config-daemon pod is deleted and in some cases deleting the sriovnetworknodestate.

This is happening when the node is first enabled with nnp.



Version-Release number of selected component (if applicable):
OCP 4.6.17



How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
[corona@stablcurco ~]$ 
[corona@stablcurco ~]$ cat sriov-nnp.yaml 
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ens3f0vf1115
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  mtu: 9100
  nicSelector:
    deviceID: "1017"
    pfNames:
    - ens3f0#11-14
    vendor: 15b3
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 16
  priority: 99
  resourceName: ens3f0vf1115
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ens3f1vf1115
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  mtu: 9100
  nicSelector:
    deviceID: "1017"
    pfNames:
    - ens3f1#11-14
    vendor: 15b3
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 16
  priority: 99
  resourceName: ens3f1vf1115





[core@worker-149 ~]$ cat /sys/bus/pci
pci/         pci_express/ 
[core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d
0000:d7:00.0/ 0000:d7:0e.1/ 0000:d7:12.0/ 0000:d7:15.0/ 0000:d8:00.1/ 0000:d8:00.6/ 0000:d8:01.3/ 0000:d8:02.0/ 0000:d8:02.5/ 0000:d8:03.2/ 0000:d8:03.7/ 
0000:d7:05.0/ 0000:d7:0f.0/ 0000:d7:12.1/ 0000:d7:16.0/ 0000:d8:00.2/ 0000:d8:00.7/ 0000:d8:01.4/ 0000:d8:02.1/ 0000:d8:02.6/ 0000:d8:03.3/ 0000:d8:04.0/ 
0000:d7:05.2/ 0000:d7:0f.1/ 0000:d7:12.2/ 0000:d7:16.4/ 0000:d8:00.3/ 0000:d8:01.0/ 0000:d8:01.5/ 0000:d8:02.2/ 0000:d8:02.7/ 0000:d8:03.4/ 0000:d8:04.1/ 
0000:d7:05.4/ 0000:d7:10.0/ 0000:d7:12.4/ 0000:d7:17.0/ 0000:d8:00.4/ 0000:d8:01.1/ 0000:d8:01.6/ 0000:d8:02.3/ 0000:d8:03.0/ 0000:d8:03.5/ 
0000:d7:0e.0/ 0000:d7:10.1/ 0000:d7:12.5/ 0000:d8:00.0/ 0000:d8:00.5/ 0000:d8:01.2/ 0000:d8:01.7/ 0000:d8:02.4/ 0000:d8:03.1/ 0000:d8:03.6/ 
[core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:0
0000:d8:00.0/ 0000:d8:00.3/ 0000:d8:00.6/ 0000:d8:01.1/ 0000:d8:01.4/ 0000:d8:01.7/ 0000:d8:02.2/ 0000:d8:02.5/ 0000:d8:03.0/ 0000:d8:03.3/ 0000:d8:03.6/ 0000:d8:04.1/ 
0000:d8:00.1/ 0000:d8:00.4/ 0000:d8:00.7/ 0000:d8:01.2/ 0000:d8:01.5/ 0000:d8:02.0/ 0000:d8:02.3/ 0000:d8:02.6/ 0000:d8:03.1/ 0000:d8:03.4/ 0000:d8:03.7/ 
0000:d8:00.2/ 0000:d8:00.5/ 0000:d8:01.0/ 0000:d8:01.3/ 0000:d8:01.6/ 0000:d8:02.1/ 0000:d8:02.4/ 0000:d8:02.7/ 0000:d8:03.2/ 0000:d8:03.5/ 0000:d8:04.0/ 
[core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:0
0000:d8:00.0/ 0000:d8:00.3/ 0000:d8:00.6/ 0000:d8:01.1/ 0000:d8:01.4/ 0000:d8:01.7/ 0000:d8:02.2/ 0000:d8:02.5/ 0000:d8:03.0/ 0000:d8:03.3/ 0000:d8:03.6/ 0000:d8:04.1/ 
0000:d8:00.1/ 0000:d8:00.4/ 0000:d8:00.7/ 0000:d8:01.2/ 0000:d8:01.5/ 0000:d8:02.0/ 0000:d8:02.3/ 0000:d8:02.6/ 0000:d8:03.1/ 0000:d8:03.4/ 0000:d8:03.7/ 
0000:d8:00.2/ 0000:d8:00.5/ 0000:d8:01.0/ 0000:d8:01.3/ 0000:d8:01.6/ 0000:d8:02.1/ 0000:d8:02.4/ 0000:d8:02.7/ 0000:d8:03.2/ 0000:d8:03.5/ 0000:d8:04.0/ 
[core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:00.6/
ari_enabled               d3cold_allowed            infiniband_mad/           local_cpus                numa_node                 resource0                 vendor
broken_parity_status      device                    infiniband_srp/           max_link_speed            physfn/                   resource0_wc              
class                     dma_mask_bits             infiniband_verbs/         max_link_width            pools                     revision                  
config                    driver/                   iommu/                    modalias                  power/                    subsystem/                
consistent_dma_mask_bits  driver_override           iommu_group/              msi_bus                   ptp/                      subsystem_device          
current_link_speed        enable                    irq                       msi_irqs/                 reset                     subsystem_vendor          
current_link_width        infiniband/               local_cpulist             net/                      resource                  uevent                    
[core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:00.6/n
net/       numa_node  
[core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:00.6/net/
cat: '/sys/bus/pci/devices/0000:d8:00.6/net/': Is a directory
[core@worker-149 ~]$ cd /sys/bus/pci/devices/0000\:d8\:00.6/net/ 
[core@worker-149 net]$

Comment 2 Nabeel Cocker 2021-05-05 20:05:39 UTC
Hello,

Could you please let me know what info is needed?

thank you

Nabeel

Comment 3 zenghui.shi 2021-05-07 13:26:21 UTC
upstream fix for config daemon panic (index out of range when getting VF interface name): https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/127

Comment 5 zhaozhanqi 2021-05-10 08:04:59 UTC
@ncocker hi, I have a try with your above policy many times on old version. However I did not met this issue. Do you have steps to reproduce this issue which can help verify this bug?

Comment 6 zhaozhanqi 2021-05-11 07:52:42 UTC
Tried many times, this issue cannot be reproduced on 4.8.0-202105100942.p0

Move this to verified.

Comment 11 errata-xmlrpc 2021-07-27 23:05:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Comment 12 Red Hat Bugzilla 2023-09-15 01:06:00 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.