Bug 1960787

Summary: After upgrading to 4.7.9. seeing HAProxy OOM errors
Product: OpenShift Container Platform Reporter: Daniel Del Ciancio <ddelcian>
Component: NetworkingAssignee: Candace Holman <cholman>
Networking sub component: router QA Contact: Arvind iyengar <aiyengar>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: high    
Priority: medium CC: amcdermo, aos-bugs, bperkins, cholman, hongli, mfisher, mmasters, sgreene, wking
Version: 4.7   
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-12 23:15:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 6 Daniel Del Ciancio 2021-05-25 15:51:30 UTC
Even with the hard-stop-after option in place, after 4.7.9 upgrade, the customer was still seeing pod OOM errors.  They have removed the option since then to see if stability would improve, but still the same.

Comment 21 Daniel Del Ciancio 2021-06-04 16:15:48 UTC
Regarding the HAproxy OOM issue, the customer wants us to be see if we can reproduce this at scale and simulate using a similar number of backends used by them in DEV.  They still aren't convinced that tweaking timeouts on routes, etc is the appropriate way to address this especially since these routes were all configured this way in 4.6.  This increase in haproxy memory consumption only started once they upgraded to 4.7.