Bug 2214457
| Summary: | Mixing bridge and sr-iov networks with same name fails and is confusing for the user | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Germano Veit Michel <gveitmic> |
| Component: | Networking | Assignee: | Petr Horáček <phoracek> |
| Status: | NEW --- | QA Contact: | Nir Rozen <nrozen> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13.0 | ||
| Target Milestone: | --- | ||
| Target Release: | future | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2224990 | ||
| Bug Blocks: | |||
Hey, thanks for reporting this. Very confusing indeed. This is what I plan to do with this BZ, let me know if it is sensible: * Open a bug on UI asking them to figure out the binding type (bridge vs SR-IOV) from the selected network. If they won't be able to find a match for a selected network (RBAC does not allow them to read it, or it uses a third-party CNI we don't know), only then it should show the dropdown. * Open a bug on SR-IOV operator asking them to not touch existing NAD if they are not of their own type. About shared names, IIUIC you would like to be able to expose multiple different network attachment definitions as one. I could create network "blue" from the UI and it would act as an abstraction over NAD "blue-bridge" and SR-IOV network "blue-sriov". I don't think we can do this without a shared high-level OpenShift object for a "network" - that sounds like a pretty ambitious RFE for OpenShift Network. I would ask you to open an RFE, but I don't think it's realistic now. We would not only need a new abstraction for network definition, but also for network request, since on a Pod, you just reference the NAD name directly. Let me know what you think. Yes, I think those 2 things will fix the experience problem a user may have with this. IMHO nothing is really a *bug* here on when seen from each component perspective, but things are not fitting well together and those 2 changes should correct this. Thanks Petr! I'm keeping this BZ open to have a central tracker. I will target it to "future" so it's not in the way while the bugs we depend on are getting targeted and solved. This depends on: * https://bugzilla.redhat.com/show_bug.cgi?id=2224990 for internally assigning the binding method based on the requested NetworkAttachmentDefinition. * https://issues.redhat.com/browse/OCPBUGS-16683 for not overwriting NetworkAttachmentDefinitions that are not owned by the SR-IOV operator |
Description of problem: I don't think this is a bug on a specific component, but more about how things work together: SR-IOV + CNV + Console. For example, a user does the following: 1. Configure some bridge network, my example is virt-toca network, which is just a bridge using VLAN 2. apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: annotations: k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/virt.toca name: virt-toca namespace: homelab spec: config: >- {"name":"virt.toca","type":"cnv-bridge","cniVersion":"0.4.0","bridge":"virt.toca","macspoofchk":false,"ipam":{}} 2. Go to the CNV UI: Virtualization -> Virtual Machines -> example_vm -> Configuration -> Network Interfaces 3. Edit a NIC 4. See you can select a Network (item 2 in the menu), *plus* a type for that network (item 3 in the menu). For example, I have a network named virt-toca, which I just configured a bridge NAD which is working fine. But now I want SR-IOV too and the dialog above suggests I can have an Sr-iov network with the same name and just select a different type for the VM NIC (i.e. SR-IOV) 5. Now I configure an sr-iov network with the same name: apiVersion: v1 items: - apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: annotations: name: virt-toca namespace: openshift-sriov-network-operator spec: networkNamespace: homelab resourceName: virt_toca_sriov_resource spoofChk: "off" trust: "on" vlan: 2 kind: List metadata: resourceVersion: "" 6. Here it gets confusing. Now in the CNV dialog from step 4 above one would expect to use virt-toca in bridge or sr-iov mode, just by changing the type of the network (item 3 in NIC edit menu) right? No, it does not work like that, and now both networks are broken. 7. The SR-IOV network defined at step 5 overwrote the NAD of the bridge, there is still a single NAD named virt-toca, but its not pointing to the sr-iov resource and not the bridge one. apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/virt_toca_sriov_resource creationTimestamp: '2023-06-13T03:26:51Z' generation: 2 managedFields: - apiVersion: k8s.cni.cncf.io/v1 fieldsType: FieldsV1 fieldsV1: 'f:metadata': 'f:annotations': {} 'f:spec': {} manager: kubectl-client-side-apply operation: Update time: '2023-06-13T03:26:51Z' - apiVersion: k8s.cni.cncf.io/v1 fieldsType: FieldsV1 fieldsV1: 'f:metadata': 'f:annotations': 'f:k8s.v1.cni.cncf.io/resourceName': {} 'f:spec': 'f:config': {} manager: sriov-network-operator operation: Update time: '2023-06-13T03:38:48Z' name: virt-toca namespace: homelab resourceVersion: '966960' uid: 9d63f136-10be-45fb-b303-f51949d17efa spec: config: >- { "cniVersion":"0.3.1", "name":"virt-toca","type":"sriov","vlan":2,"spoofchk":"off","trust":"on","vlanQoS":0,"ipam":{} } So essentially the sr-iov one overwrote the bridge one, and now its all broken. VMs configured with bridge network will fail to schedule. Look at this, a VM with bridge network interfaces: - bridge: {} macAddress: '02:52:59:00:00:02' model: virtio name: nic-virt-toca networks: - multus: networkName: virt-toca name: nic-virt-toca Fails to schedule because of a missing sr-iov resource on the node its pinned to (that node does not have SR-IOV card) message: >- 0/9 nodes are available: 1 Insufficient openshift.io/virt_toca_sriov_resource, Things get very very confusing for the user, only to latter notice that the NAD for the bridge was overwritten, and the CNV UI dialog doesn't really make much sense because each network (NAD) can have only one type, so its redundant to ask for both - as it implies it would work. Would be nice to actually have this work, I don't want a network named vlan-2-bridge that I use with bridge type, and another vlan-2-sriov that I use with sriov type. Having just vlan-2 and then choose the type would be the best user experience. Version-Release number of selected component (if applicable): 4.13 How reproducible: Always Steps to Reproduce: As above Actual results: - Configuring sr-iov breaks bridge - weird things happen Expected results: - Don't overwrite the bridge NAD - Allow both NAD to co-exist, or merge them instead of overwrite - If each network can only have a single type, then the UI field for type selection is redundant