ofctrl_add_or_append_flow() is looking for the flow with the same match and conjunction() action present in the action list, but it doesn't check if the same conjunction is already in the action list. At the same time remove_flows_from_sb_to_flow(), that supposed to clean up flows before adding new ones during incremental processing, only removes the flow reference form the list, but doesn't remove conjunctions form the action list. That leads to situation where actions on a single flow may grow indefinitely due to re-addition of the same conjunctions to the action list without ever removing them, unless the whole flow has to be removed. Also, broken outdated conjunctions may stay in the flow after removal of logical flows that triggered their addition. We were able to reproduce that behavior with hairpin_snat_ip before BZ2171423 got fixed. But there might be other types of flows that might be affected.
On a quick glance over the code, ACLs with address sets in them might be affected, if the address set is frequently modified without modifying the ACL itself. But I didn't test to confirm.
Is the implication that a full recompute erases the extra unnecessary conjunctions?
(In reply to Mark Michelson from comment #2) > Is the implication that a full recompute erases the extra unnecessary > conjunctions? Probably, yes.
I actually tried to reproduce this using ACLs as this is the only thing with complicated conjunction that comes to my mind, but nothing. I have tried several approaches like updating the ACLs itself, updating the address sets in various ways, the result was always fine. I would suggest to close this BZ and if we come across scenario in the future we can reopen it. WDYT? Thanks, Ales
I'd keep it open. It's a logical bug in ovn-controller and we need to fix it before users will step into it, even if it's not easy to reproduce with the current code. If you want a solid reproducer, you may revert a fix for BZ2171423. For the ACls, is there some sort of recompute always involved?
I will probably need to try that revert or on 22.12. Because the ACLs are not triggering any recompute e.g.: Node: logical_flow_output - recompute: 0 - compute: 2 - abort: 0 Node: physical_flow_output - recompute: 0 - compute: 2 - abort: 0 Node: controller_output - recompute: 0 - compute: 2 - abort: 0
Patch posted for review: https://patchwork.ozlabs.org/project/ovn/patch/20230829084753.209210-2-amusil@redhat.com/
ovn23.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239060 ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239062 ovn22.12 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239065 ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239067 ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239068 ovn22.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239071 ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239072 ovn22.03 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239075 ovn22.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239076
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn23.03 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:5822