Description of problem: Setup ocp cluster with multus enabled. Deploy the sriov-device-plugin and sriov-cni to the cluster. Enable the SRIOV for the NIC on the node: echo 6 > /sys/class/net/eno1/device/sriov_numvfs Try to create pod with sriov interface which requests more than 1 vf. And the pod will get only one interface with the requested number of vf consumed on the node. Version-Release number of selected component (if applicable): v4.0 How reproducible: always Steps to Reproduce: 1. Try to create pod which requests max_vfs number of the node (sriovdp log will be attached below) apiVersion: v1 kind: Pod metadata: generateName: testpod- labels: env: test annotations: k8s.v1.cni.cncf.io/networks: sriov-network spec: containers: - name: test-pod image: bmeng/centos-network resources: requests: intel.com/sriov: 6 limits: intel.com/sriov: 6 2. Check the interfaces in the first pod 3. Try to create one more pod which requests 1 vf apiVersion: v1 kind: Pod metadata: generateName: testpod- labels: env: test annotations: k8s.v1.cni.cncf.io/networks: sriov-network spec: containers: - name: test-pod image: bmeng/centos-network resources: requests: intel.com/sriov: 1 limits: intel.com/sriov: 1 Actual results: 2. There is only one sriov vf allocated in the pod sh-4.2# ip -d link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 3: eth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default link/ether 0a:58:0a:80:00:08 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 veth addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 205: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 9e:b9:04:4c:9e:20 brd ff:ff:ff:ff:ff:ff promiscuity 0 addrgenmode eui64 numtxqueues 16 numrxqueues 16 gso_max_size 65536 gso_max_segs 65535 sh-4.2# ethtool -i net1 driver: i40evf version: 3.2.2-k firmware-version: N/A expansion-rom-version: bus-info: 0000:3d:02.4 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: yes 3. The 2nd pod cannot be created due to "Insufficient intel.com/sriov" Expected results: The pod which requests more than 1 vf via resource limit should be able to allocate the requested number of vfs. Additional info: sriov-device-plugin log for the first pod creation: I1128 06:37:07.577333 11279 sriov-device-plugin.go:279] SRIOV Network Device Plugin server started serving I1128 06:37:07.579007 11279 sriov-device-plugin.go:290] SRIOV Network Device Plugin registered with the Kubelet I1128 06:37:07.580036 11279 sriov-device-plugin.go:391] ListAndWatch: send initial devices &ListAndWatchResponse{Devices:[&Device{ID:0000:3d:02.1,Health:Healthy,} &Device{ID:0000:3d:02.2,Health:Healthy,} &Device{ID:0000:3d:02.3,Health:Healthy,} &Device{ID:0000:3d:02.4,Health:Healthy,} &Device{ID:0000:3d:02.5,Health:Healthy,} &Device{ID:0000:3d:02.0,Health:Healthy,}],} I1128 06:38:18.966389 11279 sriov-device-plugin.go:442] DeviceID in Allocate: 0000:3d:02.3 I1128 06:38:18.966460 11279 sriov-device-plugin.go:442] DeviceID in Allocate: 0000:3d:02.4 I1128 06:38:18.966475 11279 sriov-device-plugin.go:442] DeviceID in Allocate: 0000:3d:02.5 I1128 06:38:18.966487 11279 sriov-device-plugin.go:442] DeviceID in Allocate: 0000:3d:02.0 I1128 06:38:18.966499 11279 sriov-device-plugin.go:442] DeviceID in Allocate: 0000:3d:02.1 I1128 06:38:18.966510 11279 sriov-device-plugin.go:442] DeviceID in Allocate: 0000:3d:02.2 I1128 06:38:18.966522 11279 sriov-device-plugin.go:456] PCI Addrs allocated: 0000:3d:02.3,0000:3d:02.4,0000:3d:02.5,0000:3d:02.0,0000:3d:02.1,0000:3d:02.2,
Thanks for testing! Good coverage on multiple resource request! This might be the expected behavior when pod only has one network custom resource specified in Pod spec annotation but multiple devices are requested. Meaning in order to configure networks on multiple devices or allocate multiple devices for one pod or container, we will need to add as many network customer resources as requested devices in Pod Spec annotation field(the num of network custom resource shall be equal to the num of requested devices) and separate them comma. for example: 1) Pod spec requesting one device apiVersion: v1 kind: Pod metadata: generateName: testpod1 annotations: k8s.v1.cni.cncf.io/networks: sriov-network spec: containers: - name: test-pod image: <image> resources: requests: intel.com/sriov: 1 limits: intel.com/sriov: 1 2) Pod spec requesting multiple devices apiVersion: v1 kind: Pod metadata: generateName: testpod1 annotations: k8s.v1.cni.cncf.io/networks: sriov-network, sriov-network, sriov-network spec: containers: - name: test-pod image: <image> resources: requests: intel.com/sriov: 3 limits: intel.com/sriov: 3 The name of network resource above can be different depending on which network the device is expected to be connected to, for example, one can specify the annotation as: sriov-network-a, sriov-network-b, sriov-network-c and the 3 requested devices in 2) will be connected to sriov-network-a/b/c separately.
Yeah, the condition you mentioned in comment#1 is how it should be. But it will still be a problem when the annotation number does not match the spec.resources.requests. For example, if I give only one annotation but set the spec.resources.requests to 3. It will consume all the 3 requested vfs, and only one will be attached to the pod. That means some of the vfs are wasted.
(In reply to Meng Bo from comment #2) > Yeah, the condition you mentioned in comment#1 is how it should be. > > But it will still be a problem when the annotation number does not match the > spec.resources.requests. > > For example, if I give only one annotation but set the > spec.resources.requests to 3. It will consume all the 3 requested vfs, and > only one will be attached to the pod. > > That means some of the vfs are wasted. agreed on that user might be confused about where the un-configured devices are when they are not shown in the container namespace. Just to update the thoughts we discussed in nfvpe-container meeting on how we expect this to be solved: 1) Ideally we may have multus capture this potential configuration issue and prompt an warning message so that user is able to find out what's wrong with their configuration, but multus will still config the device the way it does now. 2) Add an admission webhook to block the creation of pod whenever there is a mismatch between the num of network custom resource(containing device plugin resourceName annotation) and requested devices.
(In reply to zenghui.shi from comment #3) > (In reply to Meng Bo from comment #2) > > Yeah, the condition you mentioned in comment#1 is how it should be. > > > > But it will still be a problem when the annotation number does not match the > > spec.resources.requests. > > > > For example, if I give only one annotation but set the > > spec.resources.requests to 3. It will consume all the 3 requested vfs, and > > only one will be attached to the pod. > > > > That means some of the vfs are wasted. > > agreed on that user might be confused about where the un-configured devices > are when they are not shown in the container namespace. > > Just to update the thoughts we discussed in nfvpe-container meeting on how > we expect this to be solved: > > 1) Ideally we may have multus capture this potential configuration issue and > prompt an warning message so that user is able to find out what's wrong with > their configuration, but multus will still config the device the way it does > now. > > 2) Add an admission webhook to block the creation of pod whenever there is a > mismatch between the num of network custom resource(containing device plugin > resourceName annotation) and requested devices. option 2) is the way to go, admission controller mutates resource limit/request fields in the pod spec with num of network custom resource(containing device resourceName as annotation) found in pod annotation.
SR-IOV admission controller is available for testing in 4.2 now. It helps fill the resource limit and request according to resourceName specified in net-attach-def. There is no need for user to specify the resource limit and request manually which should resolve this mismatch problem. Moving to ON_QA.
this bug was block to verify by https://bugzilla.redhat.com/show_bug.cgi?id=1732598
verified this bug on 4.2.0-0.nightly-2019-09-19-153821 when using below yaml, the pod will consume 1 VF apiVersion: v1 kind: Pod metadata: generateName: testpod- labels: env: test annotations: k8s.v1.cni.cncf.io/networks: sriov-network spec: containers: - name: test-pod image: bmeng/centos-network resources: requests: intel.com/sriov: 6 limits: intel.com/sriov: 6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922