Bug 1955167

Summary: OVN does not check length of OpenFlow FLOW_MOD messages
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Mark Michelson <mmichels>
Component: OVNAssignee: OVN Team <ovnteam>
Status: NEW --- QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: high    
Version: FDP 21.DCC: ctrautma, i.maximets
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1953613    

Description Mark Michelson 2021-04-29 15:01:47 UTC
Because of issue https://bugzilla.redhat.com/show_bug.cgi?id=1955161 generating extra conjunction actions, we saw an instance where OVN generated a flow that was larger than the maximum size of an openflow message. The result was this:

1) ovn-controller generates huge OF message and sends to ovs-vswitchd. The OF message has a truncated length field 'len'.
2) ovs-vswitchd attempts to parse the OF message. It pulls 'len' bytes from the OF message and reports back to ovn-controller that it has a bad length.
3) ovn-controller closes the UNIX socket connection to ovs-vswitchd.
4) ovs-vswitchd attempts to read the rest of the OF message from the socket. Since it is reading malformed OF, it repeatedly generates "bad version" messages until the entire OF message has been read from the socket.
5) ovn-controller then re-establishes a connection with ovs-vswitchd.
6) Repeat from step (1)

There likely is some clever way that we can ensure that ovn-controller can break a large flow into smaller sub-flows. However, it's unlikely that we would ever need to legitimately generate such a large flow. In this case in particular, the large flow is generated due to a bug. Instead, we should just check the size of the OpenFlow message ovn-controller is generating and just not send it if it's too large.

Comment 1 Casey Callendrello 2021-05-05 20:50:13 UTC
Additionally, for the purposes of alerting, it would be good to have a db table that tracks the number of un-programmable flows. It should be incremented whenever a "bad" flow is encountered, and decremented when that flow is deleted.

That way we can write reliable alerts.

Comment 3 Ilya Maximets 2023-02-20 14:26:59 UTC
>  However, it's unlikely that we would ever need to legitimately generate such a large flow.

Unfortunately, there is a legitimate use case for Load Balancers with 'hairpin_snat_ip' option set described in BZ 2171423.