Bug 1420994

Summary:	django quickstart can trigger excessive disk io when it hits memory limits
Product:	Red Hat Software Collections	Reporter:	Jaspreet Kaur <jkaur>
Component:	rh-python35-container	Assignee:	Python Maintainers <python-maint>
Status:	CLOSED CURRENTRELEASE	QA Contact:	BaseOS QE - Apps <qe-baseos-apps>
Severity:	high	Docs Contact:
Priority:	high
Version:	rh-python35	CC:	aos-bugs, cstratak, hhorak, jkaur, jokerman, jorton, kanderso, mchappel, mmccomas, torsava
Target Milestone:	---
Target Release:	3.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-12-07 15:57:31 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jaspreet Kaur 2017-02-10 05:12:36 UTC

Description of problem: Running the OpenShift recommended example:
https://github.com/openshift/django-ex

results in subtantially more memory consumption than the assumed '10M'.

This means that running on a shared platform where the memory/cpu ratio may
seem strange due to memory quotas (for example '8' CPUs but '512M' RAM) we
can end up spawning far more workers than we have memory for. Resulting in
new workers being constantly killed, and high disk IO usage as the workers
keep respawning.

This then happens add nauseum exhausting the Burst IO allocated to the AWS volume and impacting all users of that node.

Note: While Docker can support limiting the number of CPUs seen, currently
Kubernetes and OpenShift do not support this.

Version-Release number of selected component (if applicable):

How reproducible:

Preparation:
Our nodes are m4.2xlarge: 32G Ram & 8 CPU
Our docker storage is 100G GP2 (300 IOP/s)
We apply a reasonably aggressive default memory limit. We use 512Mi, which means the pods need to see moderate use before they start having problems. To simplify triggering the issue you can setting the memory.limit to around 256Mi, however you can also expose the issue by increasing the number of cores on the host. The CPU limits don't have any impact on the calculations since they're not seen by the s2i startup script.

Trigger:
Create a project using the 'Python 3.5' example and the example django app: https://github.com/openshift/django-ex

Effect:
Once the pod starts up you'll see that some of the workers keep getting killed by the OOM Killer and respawned. Depending on your storage backend you'll also see an increased disk IO usage.

Steps to Reproduce:
1.
2.
3.

Actual results: Excessive disk IO with django Quickstart

Expected results: The issue should have prevented in the default image itself.

Additional info: Workaround tested : setting WEB_CONCURRENCY=1 in the dc

Comment 7 Charalampos Stratakis 2017-12-07 15:57:31 UTC

The fix is at the latest image so closing the issue.

Please feel free to reopen it, if you experience again the issue.