Because of issue https://bugzilla.redhat.com/show_bug.cgi?id=1955161 generating extra conjunction actions, we saw an instance where OVN generated a flow that was larger than the maximum size of an openflow message. The result was this: 1) ovn-controller generates huge OF message and sends to ovs-vswitchd. The OF message has a truncated length field 'len'. 2) ovs-vswitchd attempts to parse the OF message. It pulls 'len' bytes from the OF message and reports back to ovn-controller that it has a bad length. 3) ovn-controller closes the UNIX socket connection to ovs-vswitchd. 4) ovs-vswitchd attempts to read the rest of the OF message from the socket. Since it is reading malformed OF, it repeatedly generates "bad version" messages until the entire OF message has been read from the socket. 5) ovn-controller then re-establishes a connection with ovs-vswitchd. 6) Repeat from step (1) There likely is some clever way that we can ensure that ovn-controller can break a large flow into smaller sub-flows. However, it's unlikely that we would ever need to legitimately generate such a large flow. In this case in particular, the large flow is generated due to a bug. Instead, we should just check the size of the OpenFlow message ovn-controller is generating and just not send it if it's too large.
Additionally, for the purposes of alerting, it would be good to have a db table that tracks the number of un-programmable flows. It should be incremented whenever a "bad" flow is encountered, and decremented when that flow is deleted. That way we can write reliable alerts.
> However, it's unlikely that we would ever need to legitimately generate such a large flow. Unfortunately, there is a legitimate use case for Load Balancers with 'hairpin_snat_ip' option set described in BZ 2171423.
Patch posted for review: https://patchwork.ozlabs.org/project/ovn/patch/20230829084753.209210-1-amusil@redhat.com/
ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239063 ovn23.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239064 ovn22.12 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239066 ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239069 ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239070 ovn22.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239073 ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239074 ovn22.03 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239077 ovn22.03 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2239078
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn23.09 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:0392