Bug 1955874
Summary: | Webscale: sriov vfs are not created and sriovnetworknodestate indicates sync succeeded - state is not correct | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nabeel Cocker <ncocker> | ||||
Component: | Networking | Assignee: | zenghui.shi <zshi> | ||||
Networking sub component: | SR-IOV | QA Contact: | zhaozhanqi <zzhao> | ||||
Status: | CLOSED ERRATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | unspecified | CC: | anbhat, bbennett, skanakal, zshi | ||||
Version: | 4.8 | ||||||
Target Milestone: | --- | ||||||
Target Release: | 4.8.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1958467 (view as bug list) | Environment: | |||||
Last Closed: | 2021-07-27 23:05:33 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1958467 | ||||||
Attachments: |
|
Hello, Could you please let me know what info is needed? thank you Nabeel upstream fix for config daemon panic (index out of range when getting VF interface name): https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/127 @ncocker hi, I have a try with your above policy many times on old version. However I did not met this issue. Do you have steps to reproduce this issue which can help verify this bug? Tried many times, this issue cannot be reproduced on 4.8.0-202105100942.p0 Move this to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Created attachment 1777999 [details] dmeseg, Description of problem: Inaccurate state between the config daemon and sriovnetwork node state. We are seeing that the vfs are not getting created until the config-daemon pod is deleted and in some cases deleting the sriovnetworknodestate. This is happening when the node is first enabled with nnp. Version-Release number of selected component (if applicable): OCP 4.6.17 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: [corona@stablcurco ~]$ [corona@stablcurco ~]$ cat sriov-nnp.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: ens3f0vf1115 namespace: openshift-sriov-network-operator spec: deviceType: netdevice mtu: 9100 nicSelector: deviceID: "1017" pfNames: - ens3f0#11-14 vendor: 15b3 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 16 priority: 99 resourceName: ens3f0vf1115 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: ens3f1vf1115 namespace: openshift-sriov-network-operator spec: deviceType: netdevice mtu: 9100 nicSelector: deviceID: "1017" pfNames: - ens3f1#11-14 vendor: 15b3 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 16 priority: 99 resourceName: ens3f1vf1115 [core@worker-149 ~]$ cat /sys/bus/pci pci/ pci_express/ [core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d 0000:d7:00.0/ 0000:d7:0e.1/ 0000:d7:12.0/ 0000:d7:15.0/ 0000:d8:00.1/ 0000:d8:00.6/ 0000:d8:01.3/ 0000:d8:02.0/ 0000:d8:02.5/ 0000:d8:03.2/ 0000:d8:03.7/ 0000:d7:05.0/ 0000:d7:0f.0/ 0000:d7:12.1/ 0000:d7:16.0/ 0000:d8:00.2/ 0000:d8:00.7/ 0000:d8:01.4/ 0000:d8:02.1/ 0000:d8:02.6/ 0000:d8:03.3/ 0000:d8:04.0/ 0000:d7:05.2/ 0000:d7:0f.1/ 0000:d7:12.2/ 0000:d7:16.4/ 0000:d8:00.3/ 0000:d8:01.0/ 0000:d8:01.5/ 0000:d8:02.2/ 0000:d8:02.7/ 0000:d8:03.4/ 0000:d8:04.1/ 0000:d7:05.4/ 0000:d7:10.0/ 0000:d7:12.4/ 0000:d7:17.0/ 0000:d8:00.4/ 0000:d8:01.1/ 0000:d8:01.6/ 0000:d8:02.3/ 0000:d8:03.0/ 0000:d8:03.5/ 0000:d7:0e.0/ 0000:d7:10.1/ 0000:d7:12.5/ 0000:d8:00.0/ 0000:d8:00.5/ 0000:d8:01.2/ 0000:d8:01.7/ 0000:d8:02.4/ 0000:d8:03.1/ 0000:d8:03.6/ [core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:0 0000:d8:00.0/ 0000:d8:00.3/ 0000:d8:00.6/ 0000:d8:01.1/ 0000:d8:01.4/ 0000:d8:01.7/ 0000:d8:02.2/ 0000:d8:02.5/ 0000:d8:03.0/ 0000:d8:03.3/ 0000:d8:03.6/ 0000:d8:04.1/ 0000:d8:00.1/ 0000:d8:00.4/ 0000:d8:00.7/ 0000:d8:01.2/ 0000:d8:01.5/ 0000:d8:02.0/ 0000:d8:02.3/ 0000:d8:02.6/ 0000:d8:03.1/ 0000:d8:03.4/ 0000:d8:03.7/ 0000:d8:00.2/ 0000:d8:00.5/ 0000:d8:01.0/ 0000:d8:01.3/ 0000:d8:01.6/ 0000:d8:02.1/ 0000:d8:02.4/ 0000:d8:02.7/ 0000:d8:03.2/ 0000:d8:03.5/ 0000:d8:04.0/ [core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:0 0000:d8:00.0/ 0000:d8:00.3/ 0000:d8:00.6/ 0000:d8:01.1/ 0000:d8:01.4/ 0000:d8:01.7/ 0000:d8:02.2/ 0000:d8:02.5/ 0000:d8:03.0/ 0000:d8:03.3/ 0000:d8:03.6/ 0000:d8:04.1/ 0000:d8:00.1/ 0000:d8:00.4/ 0000:d8:00.7/ 0000:d8:01.2/ 0000:d8:01.5/ 0000:d8:02.0/ 0000:d8:02.3/ 0000:d8:02.6/ 0000:d8:03.1/ 0000:d8:03.4/ 0000:d8:03.7/ 0000:d8:00.2/ 0000:d8:00.5/ 0000:d8:01.0/ 0000:d8:01.3/ 0000:d8:01.6/ 0000:d8:02.1/ 0000:d8:02.4/ 0000:d8:02.7/ 0000:d8:03.2/ 0000:d8:03.5/ 0000:d8:04.0/ [core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:00.6/ ari_enabled d3cold_allowed infiniband_mad/ local_cpus numa_node resource0 vendor broken_parity_status device infiniband_srp/ max_link_speed physfn/ resource0_wc class dma_mask_bits infiniband_verbs/ max_link_width pools revision config driver/ iommu/ modalias power/ subsystem/ consistent_dma_mask_bits driver_override iommu_group/ msi_bus ptp/ subsystem_device current_link_speed enable irq msi_irqs/ reset subsystem_vendor current_link_width infiniband/ local_cpulist net/ resource uevent [core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:00.6/n net/ numa_node [core@worker-149 ~]$ cat /sys/bus/pci/devices/0000\:d8\:00.6/net/ cat: '/sys/bus/pci/devices/0000:d8:00.6/net/': Is a directory [core@worker-149 ~]$ cd /sys/bus/pci/devices/0000\:d8\:00.6/net/ [core@worker-149 net]$