Description of problem: Using VLAN provider networks and ovs v2.9.0, we are seeing inconsistent open flow rules which we believe are contributing to L3 connectivity issues between 2 VMs on 2 different hypervisors. The VMs are dual-homed - connected to two separate provider networks. Only one of the interfaces experiences the connectivity issues. (please see the attached artifacts) Version-Release number of selected component (if applicable): ovs_version: 2.9.0 How reproducible: Setup a single VM per hypervisor using VLAN provider nets. At random time start observing L3 connectivity loss between 2 VMs, but only on one of the 2 links and only in one direction. Steps to Reproduce: 1. Start with a clean environment with full VM-to-VM connectivity across 2 hypervisors 2. After some random time passes, we observe L3 connectivity loss on one of the 2 links and only in one direction. 3. Perform t-shooting and while looking at open flows, we see some strange rules that either should not be there or require further explanation/follow-up. Actual results: L3 connectivity is disrupted and remains in this broken state until neutron-ovs-agent is restarted on the hypervisor of the source VM. Expected results: L3 connectivity should be maintained throughout. Additional info: L2 reachability seems to be ok (ARP resolution looks fine end to end). The tests were done with ping. ICMP Echo Reqs observed on the corresponding TAP interfaces but not on the physical interfaces linked with the provider net. Testing in the opposite direction of the same flow (just reversing the source and destination VMs) ping works fine – the Echo Replies are received. Not easily reproducible.
There are flows for which packets leave the VM but do not make it to the physical interfaces of the hypervisor. We need help figuring out why that is and how to address it in the long run. In the given example, for the faulty flow, we notice an entry in 'table=73' matching on reg6 (br-int internal vlan) and dl_dst MAC. All other entries of this type point to local TAP interfaces. This one eventually (through more lookups in other tables) points to a vxlan interface and traffic gets dropped (incrementing counters in 'table=92'). The mirrored flow (reversing source and destination) works fine. ARP traffic flows ok in all directions. In addition, if the assumption that those flows should not be in OVS proves to be correct : - why weren't they cleaned up - why were they defined in the first place
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0770