Description of problem:
The webhook rejects a new Policy claiming it does not match any available NIC despite there are available matching NICs and it worked with 4.4 SR-IOV operator.
Version-Release number of selected component (if applicable):
SR-IOV operator 4.5 rh-verified-operators
Steps to Reproduce:
1. Check available NICs
- deviceID: "1572"
2. Create a policy:
cat <<EOF | oc create -f -
It fails with:
Error from server (no matched NIC is selected by the nicSelector in CR policy-ens2f1): error when creating "STDIN": admission webhook "operator-webhook.sriovnetwork.openshift.io" denied the request: no matched NIC is selected by the nicSelector in CR policy-ens2f1
Should configure the Policy.
Created attachment 1698586 [details]
Could you share the output of 'oc get node -o yaml' and 'oc get sriovnetworknodestate -o yaml' for node cnvqe-10.lab.eng.tlv2.redhat.com and cnvqe-11.lab.eng.tlv2.redhat.com?
I cannot reproduce this issue in my environment. Does it always happen in yours?
From the node state manifests you attached, it looks the policy has been applied. Did you turn off the operator webhook to make it happen?
Yes, we disabled the webhook to get over this issue. It happened always on our 4.5 environment.
The device which you want to select is not in the supported NIC list. It has been blocked by https://github.com/openshift/sriov-network-operator/pull/204. So it is expected behavior. So when you want to configure an unsupported NIC model, you shall disable the operator webhook.
Makes sense. So all that needs to be done here is to fix the error message from "no matched NIC is selected by the nicSelector" to something more appropriate?
How about change to "no supported NIC is selected by the nicSelector"?
That sounds ok.
I have some second thoughts, I get that Red Hat can't officially support models it does not test. However, the operator is known to work with other similar models too and it seems to me wasteful to hard-limit ourselves with a subset of them. Can we have a configuration option to disable the model check instead of disabling of the whole webhook? It would have the same effect, it would be 100% explicit and we would be able to utilize other features of the webhook.
It would help the upstream community, including myself who has X710 (not the supported XXV710).
I agree that we shall a way to allow users to try the unsupported NIC models. We're planning to have a proper systematic solution for that. It is on our to-do list.
For this BZ, can we close it with the error message change?
I can hardly complain about something which is unsupported, the message change would be great :)
For the solution you mention, could I track it anywhere? On Jira maybe?
It hasn't started yet. I'll keep you posted.
Verified this bug
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.