Bug 1458238
Summary: | OCP Master APIs are using an excessive amount of memory in containerized env | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jaspreet Kaur <jkaur> | ||||||||||
Component: | Master | Assignee: | Michal Fojtik <mfojtik> | ||||||||||
Status: | CLOSED NOTABUG | QA Contact: | Chuan Yu <chuyu> | ||||||||||
Severity: | low | Docs Contact: | |||||||||||
Priority: | low | ||||||||||||
Version: | 3.5.0 | CC: | aos-bugs, decarr, dmsimard, jokerman, mak, mmccomas, wmeng | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2017-09-07 14:13:44 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Jaspreet Kaur
2017-06-02 11:22:33 UTC
For reference, over what time period are you seeing memory grow? After a restart, how much memory is used, and what is the slope of memory growth over an operational period? You can get a heap profile from a containerized OpenShift master. Edit /etc/sysconfig/origin-master (or maybe atomic-openshift-master) Add "-e OPENSHIFT_PROFILE=web" to OPTIONS systemctl restart atomic-openshift-master.service Because the containerized master runs in the host network namespace, you can: curl -s http://127.0.0.1:6060/debug/pprof/heap > heap.profile Please attach the heap profile along with the output of "oc version" Unfortunately, this does involve restarting the master, thus having to wait for a recreate. Created attachment 1285079 [details]
Heap profile
Output from 'oc version': [root@ip-10-98-10-244 ~]# /var/usrlocal/bin/oc version oc v3.4.1.7 kubernetes v1.4.0+776c994 features: Basic-Auth GSSAPI Kerberos SPNEGO The total size of the process as reported in the heap profile is only 283MB. I would assume you are not observing the reported issue yet. As of now, the top memory user is etcd storage code with ~150MB between decodeNodeList() and decodeObject() results being added to the deserialization cache via addToCache(). Can you post a new heap profile when the issue occurs? Sure, that was provided by the Customer. I'll ask them to provide a new one ones this happens again. I'm only today and tomorrow onsite. If this is not happen again tomorrow I will ask the Customer to upload it (to the case) when the issue occurs. We are also noticing significant RAM usage of the origin master process: === # oc version oc v1.5.1 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://192.168.1.17:8443 openshift v1.5.1 kubernetes v1.5.2+43a9be4 # ps aux | grep -e "openshift start master" -e USER USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 14240 3.7 74.0 19072488 5935176 ? Ssl Aug15 935:21 /usr/bin/openshift start master --config=/etc/origin/master/master-config.yaml --loglevel=2 === The master process is eating up all the available RAM (8GB) until OOM killer starts killing containers. We only have two namespaces and this is a standalone registry deployment. We are not using any applications. We have 3 pods in total: router, registry and registry-console. The issue seems to be exacerbated when pushing a batch of new container images to the registry that is exposed through the router. I will enable the OPENSHIFT_PROFILE=web option and report back with a heap dump once we eventually run out of RAM again. Created attachment 1320975 [details]
openshift master RAM usage screenshot
Created attachment 1321012 [details]
Heap after restarting origin-master
I was recommended to post a heap profile after restarting origin-master and take another dump once we notice the RAM issue. Here's the first dump, I will attach a new one once we hit critical RAM tresholds -- might take a few days.
Created attachment 1321049 [details]
Heap after origin-master is taking up RAM
It turns out it didn't take a few days for the master process to eat up all the RAM again but we're ramping up our usage so it might be related... Here's the heap dump from the master eating almost all the available RAM and with the machine starting to swap.
I just noticed the bug mentions a *containerized* master taking too much ram. Our master implementatation was deployed with openshift-ansible and is not containerized. Should I open a new bug ? David - please open a new bug for your issue. Lets keep this bug focused on the containerized use-case. Thanks! |