Description of problem: OVN currently uses one conntrack zone for every logical switch port connected to a VM/container. This seems wasteful and can probably be avoided by using a single conntrack zone for implementing ACLs in the logical switch pipeline. This BZ is opened to investigate feasibility of moving to a single CT zone per switch as this has a few advantages: - it may reduce complexity of ovn-controller code - it reduces by 50% the number of conntrack entries required in the switch pipeline, see bug 2025664 - it would allow the CMS to support "mode 5 (balance-tlb)" bonding [0] With "mode 5 (balance-tlb)" bonding [0], i.e., active load balancing of transmit traffic only, in an OSP with OVN deployment, individual bond members on the VM will be associated to an independent logical switch port on the same logical switch. If the configuration is such that stateful security groups are used (which translate to allow-related ACLs that only allow explicitly matched traffic and replies on established sessions) then this bonding mode will not work properly on the VM side. That's because part of the traffic originated from the VM is in one zone while the other is in a different zone. At least in one of them the connection will never be properly established from conntrack point of view. Moving to a single zone per switch would fix this. [0] https://www.ibm.com/docs/en/linux-on-systems?topic=recommendations-bonding-modes
An initial PoC shows that we need more than just blindly switching to a single CT zone for a logical switch. Such an attempt can be found here: https://github.com/dceara/ovn/commit/e3bed9f9f4eb4d34b7d4e8444816069d5abefc1b And fails tests: https://github.com/dceara/ovn/runs/4581680878?check_suite_focus=true#step:13:4224 The problem with this is that the current logical flow (and openflow) pipeline will cause every packet that's part of a single connection to go through connection tracking twice in the LS zone. This also includes committing the connection twice and also breaks scenarios when stateful ACLs are used together with load balancers. A real implementation will probably need to avoid this conntrack duplication (ingress and egress logical switch pipelines) and always commit just once, in the egress pipeline. This is however quite an intrusive change.