Bug 1877648 - [sriov]VF from allocatable and capacity of node is incorrect when the policy is only 'rootDevices'
Summary: [sriov]VF from allocatable and capacity of node is incorrect when the policy ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: zenghui.shi
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1888828 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-10 05:18 UTC by zhaozhanqi
Modified: 2021-11-16 07:41 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:17:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sriov-network-operator pull 370 0 None closed Bug 1877648: Support generating device plugin configmap with rootDevice selector 2021-02-09 03:51:02 UTC
Red Hat Knowledge Base (Solution) 5499151 0 None None None 2020-10-19 12:58:57 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:18:17 UTC

Description zhaozhanqi 2020-09-10 05:18:02 UTC
Description of problem:
When Creating one policy only with 'rootDevices' as below show:
******
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-netdevice
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
    rootDevices:
      - '0000:3b:00.0'
    vendor: '8086'
  nodeSelector:
    feature.node.kubernetes.io/sriov-capable: 'true'
  numVfs: 5
  priority: 99
  resourceName: intelnetdevice
*******

From above numVfs is 5. but when check the capacity on node, it shows 9
oc get node  dell-per740-14.rhts.eng.pek2.redhat.com -o yaml | grep "openshift.io/intelnetdevice"
          f:openshift.io/intelnetdevice: {}
          f:openshift.io/intelnetdevice: {}
    openshift.io/intelnetdevice: "9"
    openshift.io/intelnetdevice: "9"


Version-Release number of selected component (if applicable):
4.6.0-202009082256.p0

How reproducible:
always

Steps to Reproduce:
1. create above policy
2. Check the node info
3. Check the device plugin configmap
4. Check the logs of device plugin

Actual results:

step 2 show 9 as Description 
step 3:

{"resourceName":"intelnetdevice","selectors":{"vendors":["8086"],"IsRdma":false},"SelectorObj":null}]}

step 4: 
oc logs sriov-device-plugin-kzklc
I0910 03:58:38.144607      19 manager.go:52] Using Kubelet Plugin Registry Mode
I0910 03:58:38.144835      19 main.go:44] resource manager reading configs
I0910 03:58:38.145060      19 manager.go:86] raw ResourceList: {"resourceList":[{"resourceName":"cx4ib","selectors":{"vendors":["15b3"],"devices":["1014"],"IsRdma":true},"SelectorObj":null},{"resourceName":"cx6ib60","selectors":{"vendors":["15b3"],"devices":["101c"],"IsRdma":true},"SelectorObj":null},{"resourceName":"intelnetdevice","selectors":{"vendors":["8086"],"IsRdma":false},"SelectorObj":null}]}
I0910 03:58:38.145180      19 manager.go:106] unmarshalled ResourceList: [{ResourcePrefix: ResourceName:cx4ib DeviceType:netDevice Selectors:0xc00000cee0 SelectorObj:0xc0000e8fd0} {ResourcePrefix: ResourceName:cx6ib60 DeviceType:netDevice Selectors:0xc00000cf00 SelectorObj:0xc0000e9130} {ResourcePrefix: ResourceName:intelnetdevice DeviceType:netDevice Selectors:0xc00000cf20 SelectorObj:0xc0000e9290}]
I0910 03:58:38.145297      19 manager.go:193] validating resource name "openshift.io/cx4ib"
I0910 03:58:38.145335      19 manager.go:193] validating resource name "openshift.io/cx6ib60"
I0910 03:58:38.145363      19 manager.go:193] validating resource name "openshift.io/intelnetdevice"
I0910 03:58:38.145376      19 main.go:60] Discovering host devices
I0910 03:58:38.222243      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:18:00.0	02          	Intel Corporation   	I350 Gigabit Network Connection         
I0910 03:58:38.222729      19 netDeviceProvider.go:116] excluding interface eno1:  default route found: {Ifindex: 2 Dst: <nil> Src: <nil> Gw: 10.73.117.254 Flags: [] Table: 254}
I0910 03:58:38.222806      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:18:00.1	02          	Intel Corporation   	I350 Gigabit Network Connection         
I0910 03:58:38.223077      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:18:00.2	02          	Intel Corporation   	I350 Gigabit Network Connection         
I0910 03:58:38.223258      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:18:00.3	02          	Intel Corporation   	I350 Gigabit Network Connection         
I0910 03:58:38.223470      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:3b:00.0	02          	Intel Corporation   	Ethernet Controller XXV710 for 25GbE ...
I0910 03:58:38.223717      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:3b:00.1	02          	Intel Corporation   	Ethernet Controller XXV710 for 25GbE ...
I0910 03:58:38.223890      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:3b:02.0	02          	Intel Corporation   	Ethernet Virtual Function 700 Series    
I0910 03:58:38.224041      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:3b:02.1	02          	Intel Corporation   	Ethernet Virtual Function 700 Series    
I0910 03:58:38.224179      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:3b:02.2	02          	Intel Corporation   	Ethernet Virtual Function 700 Series    
I0910 03:58:38.224317      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:3b:02.3	02          	Intel Corporation   	Ethernet Virtual Function 700 Series    
I0910 03:58:38.224455      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:3b:02.4	02          	Intel Corporation   	Ethernet Virtual Function 700 Series    
I0910 03:58:38.225023      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:5e:00.0	02          	Mellanox Technolo...	MT27800 Family [ConnectX-5]             
I0910 03:58:38.225290      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:5e:00.1	02          	Mellanox Technolo...	MT27800 Family [ConnectX-5]             
I0910 03:58:38.225617      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:60:00.0	02          	Mellanox Technolo...	MT27710 Family [ConnectX-4 Lx]          
I0910 03:58:38.225906      19 netDeviceProvider.go:78] netdevice AddTargetDevices(): device found: 0000:60:00.1	02          	Mellanox Technolo...	MT27710 Family [ConnectX-4 Lx]          
I0910 03:58:38.226882      19 main.go:66] Initializing resource servers
I0910 03:58:38.227372      19 manager.go:112] number of config: 3
I0910 03:58:38.227417      19 manager.go:115] 
I0910 03:58:38.227443      19 manager.go:116] Creating new ResourcePool: cx4ib
I0910 03:58:38.227466      19 manager.go:117] DeviceType: netDevice
I0910 03:58:38.238116      19 manager.go:130] no devices in device pool, skipping creating resource server for cx4ib
I0910 03:58:38.238134      19 manager.go:115] 
I0910 03:58:38.238139      19 manager.go:116] Creating new ResourcePool: cx6ib60
I0910 03:58:38.238144      19 manager.go:117] DeviceType: netDevice
I0910 03:58:38.244138      19 manager.go:130] no devices in device pool, skipping creating resource server for cx6ib60
I0910 03:58:38.244155      19 manager.go:115] 
I0910 03:58:38.244160      19 manager.go:116] Creating new ResourcePool: intelnetdevice
I0910 03:58:38.244165      19 manager.go:117] DeviceType: netDevice
I0910 03:58:38.255461      19 factory.go:106] device added: [pciAddr: 0000:18:00.1, vendor: 8086, device: 1521, driver: igb]
I0910 03:58:38.255485      19 factory.go:106] device added: [pciAddr: 0000:18:00.2, vendor: 8086, device: 1521, driver: igb]
I0910 03:58:38.255493      19 factory.go:106] device added: [pciAddr: 0000:18:00.3, vendor: 8086, device: 1521, driver: igb]
I0910 03:58:38.255499      19 factory.go:106] device added: [pciAddr: 0000:3b:00.1, vendor: 8086, device: 158b, driver: i40e]
I0910 03:58:38.255505      19 factory.go:106] device added: [pciAddr: 0000:3b:02.0, vendor: 8086, device: 154c, driver: iavf]
I0910 03:58:38.255511      19 factory.go:106] device added: [pciAddr: 0000:3b:02.1, vendor: 8086, device: 154c, driver: iavf]
I0910 03:58:38.255517      19 factory.go:106] device added: [pciAddr: 0000:3b:02.2, vendor: 8086, device: 154c, driver: iavf]
I0910 03:58:38.255523      19 factory.go:106] device added: [pciAddr: 0000:3b:02.3, vendor: 8086, device: 154c, driver: iavf]
I0910 03:58:38.255534      19 factory.go:106] device added: [pciAddr: 0000:3b:02.4, vendor: 8086, device: 154c, driver: iavf]
I0910 03:58:38.255553      19 manager.go:145] New resource server is created for intelnetdevice ResourcePool
I0910 03:58:38.255562      19 main.go:72] Starting all servers...
I0910 03:58:38.255666      19 server.go:191] starting intelnetdevice device plugin endpoint at: openshift.io_intelnetdevice.sock
I0910 03:58:38.256618      19 server.go:217] intelnetdevice device plugin endpoint started serving
I0910 03:58:38.256720      19 main.go:77] All servers started.
I0910 03:58:38.256738      19 main.go:78] Listening for term signals
I0910 03:58:40.114299      19 server.go:106] Plugin: openshift.io_intelnetdevice.sock gets registered successfully at Kubelet
I0910 03:58:40.114436      19 server.go:131] ListAndWatch(intelnetdevice) invoked
I0910 03:58:40.114480      19 server.go:139] ListAndWatch(intelnetdevice): send devices &ListAndWatchResponse{Devices:[]*Device{&Device{ID:0000:3b:02.2,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:02.4,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:18:00.2,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:18:00.3,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:00.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:02.0,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:18:00.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:02.1,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},&Device{ID:0000:3b:02.3,Health:Healthy,Topology:&TopologyInfo{Nodes:[]*NUMANode{&NUMANode{ID:0,},},},},},}


Expected results:

node capacity should show:

openshift.io/intelnetdevice: "5"

Additional info:

Comment 1 zenghui.shi 2020-10-20 13:26:04 UTC
*** Bug 1888828 has been marked as a duplicate of this bug. ***

Comment 3 zhaozhanqi 2020-11-02 02:32:06 UTC
Verified this bug on 4.7.0-202010311421.p0

Comment 6 errata-xmlrpc 2021-02-24 15:17:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.