Bug 2008287

Summary: SNO: Updating SriovNetworkNodePolicy is rejected if node was provisioned incorrectly with wrong rdma initially
Product: OpenShift Container Platform Reporter: yliu1
Component: NetworkingAssignee: zenghui.shi <zshi>
Networking sub component: SR-IOV QA Contact: zhaozhanqi <zzhao>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified    
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-30 00:40:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yliu1 2021-09-27 19:23:34 UTC
Description of problem:
On a SNO, if an incorrect isRdma config got applied initially, which resulted in the node to be provisioned incorrectly with error "numVfs(8) in CR sriov-nnp-du-fh exceed the maximum allowed value(0)"; then after updating the policy
from hub cluster and let it sync to spoke cluster, it will be rejected due to the node was provisioned incorrectly. 


Version-Release number of selected component (if applicable):
4.9 

How reproducible:
100%

Steps to Reproduce:
1. install and config SNO with sriov operator. isRdma in one of the SriovNetworkNodePolicy is set to true for a FLV nic.
2. update the SriovNetworkNodePolicy policy in hub cluster

Actual results:
2. updated policy got rejected with the previous node error.

      Event Name:      cnfde10-policies.cnfde10-sriov-nnp-fh-policy.16a8bdf7d4d3bc23
      Last Timestamp:  2021-09-27T17:28:57Z
      Message:         NonCompliant; violation - Error updating the object `sriov-nnp-du-fh`, the error is `admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: numVfs(8) in CR sriov-nnp-du-fh exceed the maximum allowed value(0)`


Expected results:
2. updated SriovNetworkNodePolicy is synced over to spoke cluster and got applied successfully


Additional info:

Comment 1 zenghui.shi 2021-09-29 04:00:57 UTC
(In reply to yliu1 from comment #0)
> Description of problem:
> On a SNO, if an incorrect isRdma config got applied initially, which
> resulted in the node to be provisioned incorrectly with error "numVfs(8) in
> CR sriov-nnp-du-fh exceed the maximum allowed value(0)"; then after updating
> the policy
> from hub cluster and let it sync to spoke cluster, it will be rejected due
> to the node was provisioned incorrectly. 
> 
Does the issue happen if isRdma is not applied initially?
> 
> Version-Release number of selected component (if applicable):
> 4.9 
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. install and config SNO with sriov operator. isRdma in one of the
> SriovNetworkNodePolicy is set to true for a FLV nic.
> 2. update the SriovNetworkNodePolicy policy in hub cluster
> 
> Actual results:
> 2. updated policy got rejected with the previous node error.
> 
>       Event Name:     
> cnfde10-policies.cnfde10-sriov-nnp-fh-policy.16a8bdf7d4d3bc23
>       Last Timestamp:  2021-09-27T17:28:57Z
>       Message:         NonCompliant; violation - Error updating the object
> `sriov-nnp-du-fh`, the error is `admission webhook
> "operator-webhook.sriovnetwork.openshift.io" denied the request: numVfs(8)
> in CR sriov-nnp-du-fh exceed the maximum allowed value(0)`

This message indicates that the totalVfs for the configured device is zero.
I'm wondering what is the actual value in the system for `/sys/class/net/<pf-name>/device/sriov_totalvfs`.

Can you get the sriovnetworknodestate and check the `/sys/class/net/<pf-name>/device/sriov_totalvfs` when the issue happens?

Comment 2 yliu1 2021-09-29 14:13:54 UTC
yes you are right. Even after redeployment with the correct configs, my node still shows 0 totalvfs. I will look into that. We can close this bz.

Comment 3 zenghui.shi 2021-09-30 00:40:04 UTC
Yang, thanks for confirming!