Bug 2056340

Summary: [4.8] SRIOV exclusive pooling
Product: OpenShift Container Platform Reporter: zenghui.shi <zshi>
Component: NetworkingAssignee: Balazs Nemeth <bnemeth>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: ddelcian, dosmith, zshi, zzhao
Version: 4.6   
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2056339
: 2056342 (view as bug list) Environment:
Last Closed: 2022-08-09 12:52:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2056339    
Bug Blocks: 2056342    

Comment 1 Balazs Nemeth 2022-06-29 09:42:45 UTC
@zzhao

I am unable to reproduce this on my setup for 4.8.

[root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-mlx
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    deviceID: "1019"
    rootDevices:
    - 0000:d8:00.0
    vendor: "15b3"
    pfNames:
    - ens8f0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 6
  priority: 5
  resourceName: mlxnics
[root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx-2.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-mlx-2
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  mtu: 1100
  nicSelector:
    deviceID: "1019"
    rootDevices:
    - 0000:d8:00.0
    vendor: "15b3"
    pfNames:
    - ens8f0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 6
  priority: 5
  resourceName: mlxnics


I'm creating these two policies and the webhook already stops me from doing this:

[root@wsfd-advnetlab50 sriov-network-operator]# oc create -f policy-mlx-2.yaml
Error from server (VF index range in ens8f0 is overlapped with existing policy policy-mlx): error when creating "policy-mlx-2.yaml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: VF index range in ens8f0 is overlapped with existing policy policy-mlx


Can you please check how to reproduce this?

Comment 2 zhaozhanqi 2022-07-11 03:04:31 UTC
Balazs Nemeth 
we need to disable webhook as above first step mentioned. 
1.  disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false`

Comment 4 zhaozhanqi 2022-07-19 07:32:50 UTC
Verified this bug on 4.8.0-202207180915  

# oc get csv -n openshift-sriov-network-operator
NAME                                        DISPLAY                   VERSION              REPLACES                                    PHASE
sriov-network-operator.4.8.0-202207180915   SR-IOV Network Operator   4.8.0-202207180915   sriov-network-operator.4.8.0-202207071636   Succeeded


with steps

1.  disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false`
2.  Create two policy with same PF . eg 

# cat intel-dpdk.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  mtu: 1700
  nicSelector:
    deviceID: "158b"
    pfNames:
      - ens1f1
    rootDevices:
      - '0000:3b:00.1'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 2
  priority: 99
  resourceName: inteldpdk
# cat intel-dpdk.yaml-2
cat: intel-dpdk.yaml-2: No such file or directory
[root@dell-per740-36 rhcos]# cat intel-dpdk.yaml_2 
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk3
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  nicSelector:
    deviceID: "158b"
    pfNames:
      - ens1f1
    rootDevices:
      - '0000:3b:00.1'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 2
  priority: 99
  resourceName: inteldpdk3


3. Check the dp logs

# oc logs sriov-device-plugin-8w7sm -n openshift-sriov-network-operator

I0719 07:09:40.565241       1 manager.go:112] number of config: 2
I0719 07:09:40.565247       1 manager.go:116] 
I0719 07:09:40.565252       1 manager.go:117] Creating new ResourcePool: inteldpdk
I0719 07:09:40.565257       1 manager.go:118] DeviceType: netDevice
I0719 07:09:40.569687       1 factory.go:108] device added: [pciAddr: 0000:3b:0a.0, vendor: 8086, device: 154c, driver: vfio-pci]
I0719 07:09:40.569701       1 factory.go:108] device added: [pciAddr: 0000:3b:0a.1, vendor: 8086, device: 154c, driver: vfio-pci]
I0719 07:09:40.569720       1 manager.go:146] New resource server is created for inteldpdk ResourcePool
I0719 07:09:40.569725       1 manager.go:116] 
I0719 07:09:40.569728       1 manager.go:117] Creating new ResourcePool: inteldpdk3
I0719 07:09:40.569732       1 manager.go:118] DeviceType: netDevice
W0719 07:09:40.574739       1 manager.go:159] Cannot add PCI Address [0000:3b:0a.0]. Already allocated.
W0719 07:09:40.574752       1 manager.go:159] Cannot add PCI Address [0000:3b:0a.1]. Already allocated.
I0719 07:09:40.574757       1 manager.go:132] no devices in device pool, skipping creating resource server for inteldpdk3
I0719 07:09:40.574763       1 main.go:72] Starting all servers...
I0719 07:09:40.575039       1 server.go:196] starting inteldpdk device plugin endpoint at: openshift.io_inteldpdk.sock
I0719 07:09:40.576932       1 server.go:222] inteldpdk device plugin endpoint started serving
I0719 07:09:40.577070       1 main.go:77] All servers started.
I0719 07:09:40.577079       1 main.go:78] Listening for term signals
I0719 07:09:41.153694       1 server.go:106] Plugin: openshift.io_inteldpdk.sock gets registered successfully at Kubelet
I0719 07:09:41.153730       1 server.go:131] ListAndWatch(inteldpdk) invoked
I0719 07:09:41.153791       1 server.go:139] ListAndWatch(inteldpdk): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:3b:0a.0,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:0a.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},},}




4. and no inteldpdk3 was created. 


# oc describe node dell-per740-14.rhts.eng.pek2.redhat.com | grep "openshift.io/inteldpdk" 
  openshift.io/inteldpdk:  2
  openshift.io/inteldpdk:  2
  openshift.io/inteldpdk  0

Comment 7 errata-xmlrpc 2022-08-09 12:52:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.47 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5889