Bug 1960787

Summary:	After upgrading to 4.7.9. seeing HAProxy OOM errors
Product:	OpenShift Container Platform	Reporter:	Daniel Del Ciancio <ddelcian>
Component:	Networking	Assignee:	Candace Holman <cholman>
Networking sub component:	router	QA Contact:	Arvind iyengar <aiyengar>
Status:	CLOSED INSUFFICIENT_DATA	Docs Contact:
Severity:	high
Priority:	medium	CC:	amcdermo, aos-bugs, bperkins, cholman, hongli, mfisher, mmasters, sgreene, wking
Version:	4.7
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-12 23:15:26 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Comment 6 Daniel Del Ciancio 2021-05-25 15:51:30 UTC

Even with the hard-stop-after option in place, after 4.7.9 upgrade, the customer was still seeing pod OOM errors.  They have removed the option since then to see if stability would improve, but still the same.

Comment 21 Daniel Del Ciancio 2021-06-04 16:15:48 UTC

Regarding the HAproxy OOM issue, the customer wants us to be see if we can reproduce this at scale and simulate using a similar number of backends used by them in DEV.  They still aren't convinced that tweaking timeouts on routes, etc is the appropriate way to address this especially since these routes were all configured this way in 4.6.  This increase in haproxy memory consumption only started once they upgraded to 4.7.