Bug 2056340
| Summary: | [4.8] SRIOV exclusive pooling | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zenghui.shi <zshi> | |
| Component: | Networking | Assignee: | Balazs Nemeth <bnemeth> | |
| Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | medium | |||
| Priority: | medium | CC: | ddelcian, dosmith, zshi, zzhao | |
| Version: | 4.6 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.8.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | 2056339 | |||
| : | 2056342 (view as bug list) | Environment: | ||
| Last Closed: | 2022-08-09 12:52:44 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 2056339 | |||
| Bug Blocks: | 2056342 | |||
Balazs Nemeth we need to disable webhook as above first step mentioned. 1. disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false` Verified this bug on 4.8.0-202207180915
# oc get csv -n openshift-sriov-network-operator
NAME DISPLAY VERSION REPLACES PHASE
sriov-network-operator.4.8.0-202207180915 SR-IOV Network Operator 4.8.0-202207180915 sriov-network-operator.4.8.0-202207071636 Succeeded
with steps
1. disable webhook by edit sriovoperatorconfigs.sriovnetwork.openshift.io to `enableOperatorWebhook: false`
2. Create two policy with same PF . eg
# cat intel-dpdk.yaml
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: intel-dpdk
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
mtu: 1700
nicSelector:
deviceID: "158b"
pfNames:
- ens1f1
rootDevices:
- '0000:3b:00.1'
vendor: '8086'
nodeSelector:
feature.node.kubernetes.io/sriov-capable: 'true'
numVfs: 2
priority: 99
resourceName: inteldpdk
# cat intel-dpdk.yaml-2
cat: intel-dpdk.yaml-2: No such file or directory
[root@dell-per740-36 rhcos]# cat intel-dpdk.yaml_2
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: intel-dpdk3
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
nicSelector:
deviceID: "158b"
pfNames:
- ens1f1
rootDevices:
- '0000:3b:00.1'
vendor: '8086'
nodeSelector:
feature.node.kubernetes.io/sriov-capable: 'true'
numVfs: 2
priority: 99
resourceName: inteldpdk3
3. Check the dp logs
# oc logs sriov-device-plugin-8w7sm -n openshift-sriov-network-operator
I0719 07:09:40.565241 1 manager.go:112] number of config: 2
I0719 07:09:40.565247 1 manager.go:116]
I0719 07:09:40.565252 1 manager.go:117] Creating new ResourcePool: inteldpdk
I0719 07:09:40.565257 1 manager.go:118] DeviceType: netDevice
I0719 07:09:40.569687 1 factory.go:108] device added: [pciAddr: 0000:3b:0a.0, vendor: 8086, device: 154c, driver: vfio-pci]
I0719 07:09:40.569701 1 factory.go:108] device added: [pciAddr: 0000:3b:0a.1, vendor: 8086, device: 154c, driver: vfio-pci]
I0719 07:09:40.569720 1 manager.go:146] New resource server is created for inteldpdk ResourcePool
I0719 07:09:40.569725 1 manager.go:116]
I0719 07:09:40.569728 1 manager.go:117] Creating new ResourcePool: inteldpdk3
I0719 07:09:40.569732 1 manager.go:118] DeviceType: netDevice
W0719 07:09:40.574739 1 manager.go:159] Cannot add PCI Address [0000:3b:0a.0]. Already allocated.
W0719 07:09:40.574752 1 manager.go:159] Cannot add PCI Address [0000:3b:0a.1]. Already allocated.
I0719 07:09:40.574757 1 manager.go:132] no devices in device pool, skipping creating resource server for inteldpdk3
I0719 07:09:40.574763 1 main.go:72] Starting all servers...
I0719 07:09:40.575039 1 server.go:196] starting inteldpdk device plugin endpoint at: openshift.io_inteldpdk.sock
I0719 07:09:40.576932 1 server.go:222] inteldpdk device plugin endpoint started serving
I0719 07:09:40.577070 1 main.go:77] All servers started.
I0719 07:09:40.577079 1 main.go:78] Listening for term signals
I0719 07:09:41.153694 1 server.go:106] Plugin: openshift.io_inteldpdk.sock gets registered successfully at Kubelet
I0719 07:09:41.153730 1 server.go:131] ListAndWatch(inteldpdk) invoked
I0719 07:09:41.153791 1 server.go:139] ListAndWatch(inteldpdk): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:3b:0a.0,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:0a.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},},}
4. and no inteldpdk3 was created.
# oc describe node dell-per740-14.rhts.eng.pek2.redhat.com | grep "openshift.io/inteldpdk"
openshift.io/inteldpdk: 2
openshift.io/inteldpdk: 2
openshift.io/inteldpdk 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.47 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:5889 |
@zzhao I am unable to reproduce this on my setup for 4.8. [root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-mlx namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 6 priority: 5 resourceName: mlxnics [root@wsfd-advnetlab50 sriov-network-operator]# cat policy-mlx-2.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-mlx-2 namespace: openshift-sriov-network-operator spec: deviceType: netdevice mtu: 1100 nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 6 priority: 5 resourceName: mlxnics I'm creating these two policies and the webhook already stops me from doing this: [root@wsfd-advnetlab50 sriov-network-operator]# oc create -f policy-mlx-2.yaml Error from server (VF index range in ens8f0 is overlapped with existing policy policy-mlx): error when creating "policy-mlx-2.yaml": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: VF index range in ens8f0 is overlapped with existing policy policy-mlx Can you please check how to reproduce this?