2056340 – [4.8] SRIOV exclusive pooling

Bug 2056340 - [4.8] SRIOV exclusive pooling

Summary: [4.8] SRIOV exclusive pooling

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.8.z
Assignee:	Balazs Nemeth
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:	2056339
Blocks:	2056342
TreeView+	depends on / blocked

Reported:	2022-02-21 03:36 UTC by zenghui.shi
Modified:	2022-08-09 12:53 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	2056339
Clones:	2056342 (view as bug list)
Environment:
Last Closed:	2022-08-09 12:52:44 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift sriov-network-device-plugin pull 54	0	None	open	Bug 2056340: exclusive pool	2022-07-15 11:13:21 UTC
Red Hat Product Errata	RHBA-2022:5889	0	None	None	None	2022-08-09 12:53:07 UTC

Comment 1 Balazs Nemeth 2022-06-29 09:42:45 UTC

@zzhao

I am unable to reproduce this on my setup for 4.8.

[root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-mlx
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    deviceID: "1019"
    rootDevices:
    - 0000:d8:00.0
    vendor: "15b3"
    pfNames:
    - ens8f0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 6
  priority: 5
  resourceName: mlxnics
[root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx-2.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-mlx-2
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  mtu: 1100
  nicSelector:
    deviceID: "1019"
    rootDevices:
    - 0000:d8:00.0
    vendor: "15b3"
    pfNames:
    - ens8f0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 6
  priority: 5
  resourceName: mlxnics


I'm creating these two policies and the webhook already stops me from doing this:

[root@wsfd-advnetlab50 sriov-network-operator]# oc create -f policy-mlx-2.yaml
Error from server (VF index range in ens8f0 is overlapped with existing policy policy-mlx): error when creating "policy-mlx-2.yaml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: VF index range in ens8f0 is overlapped with existing policy policy-mlx


Can you please check how to reproduce this?

Comment 2 zhaozhanqi 2022-07-11 03:04:31 UTC

Balazs Nemeth 
we need to disable webhook as above first step mentioned. 
1.  disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false`

Comment 4 zhaozhanqi 2022-07-19 07:32:50 UTC

Verified this bug on 4.8.0-202207180915  

# oc get csv -n openshift-sriov-network-operator
NAME                                        DISPLAY                   VERSION              REPLACES                                    PHASE
sriov-network-operator.4.8.0-202207180915   SR-IOV Network Operator   4.8.0-202207180915   sriov-network-operator.4.8.0-202207071636   Succeeded


with steps

1.  disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false`
2.  Create two policy with same PF . eg 

# cat intel-dpdk.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  mtu: 1700
  nicSelector:
    deviceID: "158b"
    pfNames:
      - ens1f1
    rootDevices:
      - '0000:3b:00.1'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 2
  priority: 99
  resourceName: inteldpdk
# cat intel-dpdk.yaml-2
cat: intel-dpdk.yaml-2: No such file or directory
[root@dell-per740-36 rhcos]# cat intel-dpdk.yaml_2 
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk3
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  nicSelector:
    deviceID: "158b"
    pfNames:
      - ens1f1
    rootDevices:
      - '0000:3b:00.1'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 2
  priority: 99
  resourceName: inteldpdk3


3. Check the dp logs

# oc logs sriov-device-plugin-8w7sm -n openshift-sriov-network-operator

I0719 07:09:40.565241       1 manager.go:112] number of config: 2
I0719 07:09:40.565247       1 manager.go:116] 
I0719 07:09:40.565252       1 manager.go:117] Creating new ResourcePool: inteldpdk
I0719 07:09:40.565257       1 manager.go:118] DeviceType: netDevice
I0719 07:09:40.569687       1 factory.go:108] device added: [pciAddr: 0000:3b:0a.0, vendor: 8086, device: 154c, driver: vfio-pci]
I0719 07:09:40.569701       1 factory.go:108] device added: [pciAddr: 0000:3b:0a.1, vendor: 8086, device: 154c, driver: vfio-pci]
I0719 07:09:40.569720       1 manager.go:146] New resource server is created for inteldpdk ResourcePool
I0719 07:09:40.569725       1 manager.go:116] 
I0719 07:09:40.569728       1 manager.go:117] Creating new ResourcePool: inteldpdk3
I0719 07:09:40.569732       1 manager.go:118] DeviceType: netDevice
W0719 07:09:40.574739       1 manager.go:159] Cannot add PCI Address [0000:3b:0a.0]. Already allocated.
W0719 07:09:40.574752       1 manager.go:159] Cannot add PCI Address [0000:3b:0a.1]. Already allocated.
I0719 07:09:40.574757       1 manager.go:132] no devices in device pool, skipping creating resource server for inteldpdk3
I0719 07:09:40.574763       1 main.go:72] Starting all servers...
I0719 07:09:40.575039       1 server.go:196] starting inteldpdk device plugin endpoint at: openshift.io_inteldpdk.sock
I0719 07:09:40.576932       1 server.go:222] inteldpdk device plugin endpoint started serving
I0719 07:09:40.577070       1 main.go:77] All servers started.
I0719 07:09:40.577079       1 main.go:78] Listening for term signals
I0719 07:09:41.153694       1 server.go:106] Plugin: openshift.io_inteldpdk.sock gets registered successfully at Kubelet
I0719 07:09:41.153730       1 server.go:131] ListAndWatch(inteldpdk) invoked
I0719 07:09:41.153791       1 server.go:139] ListAndWatch(inteldpdk): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:3b:0a.0,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:0a.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},},}




4. and no inteldpdk3 was created. 


# oc describe node dell-per740-14.rhts.eng.pek2.redhat.com | grep "openshift.io/inteldpdk" 
  openshift.io/inteldpdk:  2
  openshift.io/inteldpdk:  2
  openshift.io/inteldpdk  0

Comment 7 errata-xmlrpc 2022-08-09 12:52:44 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.47 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5889

Note You need to log in before you can comment on or make changes to this bug.