Bug 2203590
| Summary: | No connectivity between 2 VMs over SR-IOV connection with VLAN tag | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Yossi Segev <ysegev> |
| Component: | Networking | Assignee: | Petr Horáček <phoracek> |
| Status: | NEW --- | QA Contact: | Yossi Segev <ysegev> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.13.0 | CC: | edwardh, omergi |
| Target Milestone: | --- | ||
| Target Release: | 4.14.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Yossi Segev
2023-05-14 10:50:00 UTC
Hi Or, Per your questions and requests: > Could you please share the VM and virt-launcher state after they were created? ysegev@ysegev-fedora (bz-2203590) $ ysegev@ysegev-fedora (bz-2203590) $ oc get vm NAME AGE STATUS READY sriov-vm3-1684176024-1295114 2m58s Running True sriov-vm4-1684176043-6111841 2m39s Running True ysegev@ysegev-fedora (bz-2203590) $ ysegev@ysegev-fedora (bz-2203590) $ oc get vmi NAME AGE PHASE IP NODENAME READY sriov-vm3-1684176024-1295114 3m2s Running 10.129.1.110 master1.bm02-ibm.ibmc.cnv-qe.rhood.us True sriov-vm4-1684176043-6111841 2m42s Running 10.128.1.20 master2.bm02-ibm.ibmc.cnv-qe.rhood.us True ysegev@ysegev-fedora (bz-2203590) $ ysegev@ysegev-fedora (bz-2203590) $ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-sriov-vm3-1684176024-1295114-6zmqd 2/2 Running 0 3m5s virt-launcher-sriov-vm4-1684176043-6111841-nr4g7 2/2 Running 0 2m46s > Are the VM attached to VFs from the exact same PF on each node? Yes > It sounds like a routing or packet filtering issue, could you verify with the infra team that the VLAN you are using is applicable? It is, as I used it in another test (the one where the VM's secondary interface is connected to a VLAN-tagged network, which is connected to a standard Linux bridge and not to an SR-IOV VF). And just to clarify - this test passed when the 2 VMs were scheduled on different nodes. > 1. Double-checking the SriovNetworkNodePoliocy doesn't specify any interface that belongs to ovn-k networks. If you are talking about dedicated interfaces like "br-ex" then I verified it again that it is not using any such interface. I also tried using another PF interface, and the result is still the same. > 2. Verify connectivity between the nodes through the SR-IOV PF interface to each VM is connected to Successful connection between the PFs. > 3. Run the test so that both VMs get scheduled on the same node using the same PF (traffic passed through the SR-IOV internal switch). This scenario succeeded. This is a quote from an offline message sent by @omergi on 2023-06-16:
> For further troubleshooting, I suggest the following:
> 1. test with pods:
> spin up two pods with SR-IOV interface with VLAN configuration similar to the test and check connectivity between them.
> 2. check connectivity between nodes through VFs directly:
> On each cluster node (source and target) create VF of netdevice kind (no using vfio-pci driver) and set it with the same VLAN
> that is used in tests and check connectivity.
Please update on the results of the troubleshooting.
(In reply to Edward Haas from comment #9) > This is a quote from an offline message sent by @omergi on 2023-06-16: > > > For further troubleshooting, I suggest the following: > > 1. test with pods: > > spin up two pods with SR-IOV interface with VLAN configuration similar to the test and check connectivity between them. > > 2. check connectivity between nodes through VFs directly: > > On each cluster node (source and target) create VF of netdevice kind (no using vfio-pci driver) and set it with the same VLAN > > that is used in tests and check connectivity. 2. This test failed - no connectivity between VFs. I will continue to run the setup of the test 1 (pods connectivity), which requires some more setup actions as I discussed with Or. In addition, I will verify again that the VLAN tag (1000) is indeed supported (by isolating the SR-IOV setup). After debugging, I filed https://issues.redhat.com/browse/CNV-31351, so the DevOps can verify if there is any infrastructure issues on the clusters. Thank you Or for the cooperation. |