Created attachment 1196444 [details] Memory Utilization Graph Description of problem: Memory consumption on master goes up to 3G and still increasing after few days of running reliability tests. Please look at following bug https://bugzilla.redhat.com/show_bug.cgi?id=1323733#c37 Even after setting following memory consumption is increasing. deserialization-cache-size: - "1000" Version-Release number of selected component (if applicable): openshift v3.3.0.22 kubernetes v1.3.0+507d3a7 etcd 2.3.0+git How reproducible: After running reliability tests for few days, in this case we ran it for 10 days. Steps to Reproduce: 1. Create openshift cluster in AWS 2. Start tests which creates/builds/redeploys/delete projects and apps 3. Let it run for few days and watch memory consumption Actual results: Memory consumption goes up, and still growing. Expected results: After sometime it should not grow. Additional info: Please find attached the graph from Cloudwatch, Y axis is in % or total RAM. Total RAM was on master was 16G. Sudden drop in memory shows when the deserialization-cache-size was added to master config and restarted the master process. Also find attached pprof heap profile taken last night and this morning.
Created attachment 1196445 [details] pprof heap
Created attachment 1196446 [details] pprof heap for another time
Vikas -- if you use stress to allocate a bunch of RAM and put the system under (gentle) memory pressure, does the RSS of the master go down? https://copr-be.cloud.fedoraproject.org/results/ndokos/pbench/epel-7-x86_64/00182790-pbench-stress/
After further analysis, we are not seeing continued rss growth. rss usage stabilized and in future can be tuned further with etcd caching for origin specific objects when we revisit https://github.com/openshift/origin/pull/10719
Created attachment 1199116 [details] mem growth I am still seeing the memory growth, graph attached shows memory consumed by only openshift-master process. Sample has taken every 15 mins in last few days.
This master is running with etcd embedded -- can we do this with an external etcd? It will be far, far easier to diagnose that way.
Started another run with external etcd, will update the bug again with some data.
I had another cluster with same issue, after stopping the tests and deleting all the projects Memory usage on nodes came down. But the master node and master process was still taking up same memory (on this cluster it was at 2.3G). Then I ran pbench stress mentioned in comment #3, that did not make any difference. Even after forcing OOM on master using stress also did not make rss of master go down.
Another note, after restarting the master process mem consumption came down to 285MB.
Created attachment 1206900 [details] Openshift master memory Attached graph which shows the memory growth after running the tests for 22 days, growth is still there but its slow right now.